

(Supachai-Paiboolbanpot/Shutterstock)
By some accounts, data lakes appear poised to supplant data warehouses as the center of gravity of modern analytics systems, particularly with today’s sophisticated data virtualization capabilities. But with the advent of cloud data warehouses that separate compute and storage, companies should take a hard look at their data lakes.
That’s the message that Fivetran CEO and co-founder George Fraser delivered during his keynote address for the “Modern Data Stack Conference 2020,” his company’s virtual conference that took place two weeks ago.
“In my opinion, data lakes are not part of the modern data stack. Data lakes are legacy,” Fraser said. “There are organizational [and] quasi-political reasons why people adopt data lakes. But there are no longer technical reason for adopting data lakes.”
Fraser is not advocating that companies (virtually) rip out their data lakes, like Amazon Web Services’ S3 or Microsoft Azure’s Data Lake Storage, and replace them with a cloud data warehouse, such as Snowflake’ offering or Google Cloud’s BigQuery.
“There are a lot of people who have data lakes because they inherited them, and that’s a perfectly valid reason to keep them going,” he said. “If you have a working system, you should keep using it. You shouldn’t take it out just for the sake of change.”
But if you were to build a new system from scratch, Fraser has a list of reasons why you should select a data warehouse rather than a data lake to live at the center of your modern data stack.

Cloud data warehouses have evolved significantly in recent years, Fivetran’s Fraser said (ramcreations/Shutterstock)
The first reason is cost. Data lakes grew in popularity because they carried a sizable cost advantage over data warehouses, Fraser said. Staging data in a data lake–essentially massive key-value stores accessible via a REST API–was much more affordable than storing data in a massively parallel processing (MPP) column-oriented relational database.
But with the separation of storage and compute in modern cloud data warehouses, the cost of storage has come down considerably, while simultaneously freeing customers to scale up processing to meet sudden changes in demand for massive SQL workloads. That essentially eliminated cost as a good reason for going with the data lake. And without the cost advantage, the technical shortcomings of analyzing data stored in a data lake compared to a data warehouse begin to rear their heads, Fraser said.
“Data warehouses that separate compute from storage have all of the advantages of data lakes and more,” Fraser continued. “They give you the kind of user management [you expect]. They give you even better performance [than data lakes] because they can do optimizations by controlling both the storage format and the compute format…Data warehouses are just fundamentally more user friendly that data lakes are.”
Fivetran develops software designed to load data from source systems into cloud data warehouses, which is sometimes called ETL (extract, transfer, and load), except that Fivetran has adopted the ELT method, whereby the transformation of the data occurs in the data warehouse (and it doesn’t really do much of the “T” anyway).
Armed with a cloud data warehouse, Fivetran, and a dedicated transformation tool, such as the Data Build Tool (dbt) offering from Fishtown Analytics (which, like Fivetran, is funded by Andreessen Horowitz), companies are well-equipped to meet the demands of modern data analytics, Fraser argued.
The popularity of cloud data warehouses manifested itself last month with Snowflake’s massive IPO, which valued the company at $68 billion. Other cloud data warehouses, including AWS’ Red Shift, Microsoft’s Synapse Analytics, Google’s BigQuery, and Databricks’ Unified Data Service (which includes SQL processing as well as support for machine learning), have also seen their popularity rise.
“The data warehouse that you should have at the center of a modern data stack should be based on MPP column-store technology,” Fraser said. The data warehouse “is the part of the modern data stack that really started the revolution. It underwent this extremely technical change that enabled everything else in the modern data stack to happen.”
Fraser said that his views on data lakes may be a bit controversial. They certainly go against what others in big data community are saying, including the folks at Dremio, which recently updated its data virtualization service to enable large-scale SQL analytics directly against data lakes. A similar federated approach to analytics is enabled by products like Presto, which can power SQL queries against cloud data stores, file systems, databases, and Kafka, as well as Hive, which was built to run atop HDFS.
By keeping analytics and data storage separate, customers can eliminate the need for a data warehouse and all of the ETL (or ELT) that goes along with it, says Dremio co-founder and chief product officer Tomer Shiran.
“All this data movement and the need to create data marts and extracts and aggregation tables and all that creates a huge amount of cost and also a long delay,” Dremio co-founder and chief product officer Tomer Shiran told Datanami. “Anytime you want to change the dashboard or change the data, you have to wait weeks.”
Clearly, there are multiple camps on the data centralization question, and even advocates of federated approaches admit there are cases where centralization makes sense. Fraser clearly sees a bright future in centralizing data and simplifying data integration and ETL/ELT to the greatest extent possible. With a valuation in excess of $1 billion, Fivetran is gaining speed.
Despite the focus on cloud data warehouses, Fivetran is actively working to support data lakes with its product, Fraser said. “So despite my opinion that they’re not the optimal solution in the world of the modern data stack, we are capable of listening,” he said.
But Fraser didn’t stop with some friendly data lake bashing. In fact, he also proclaimed that, thanks to advances in cloud data warehouses, they can also replace Kafka in some cases.
“With the emergence of stream processing, particularly in Snowflake, where you can create these tasks and streams to process data incrementally, you can do the kinds of workflows that previously you would need to hire an entire software engineering team and build a stream processing on to a message broker, like Kafka,” he said, “now you can do with a SQL query inside of a data warehouse.”
It’s not only easier to build a stream processing system with SQL, but it’s also easier to maintain, he said. Data warehouses can’t deliver answers with milli-second latencies, he said, but they work well in use cases that require latencies measured in seconds.
To that end, Fivetran is currently working to boost the frequency at which it can update a cloud data warehouse. Currently, the limit is one minute. That’s down considerably from the past. It started at once per day, then moved to one hour, then 15 minutes. The company is working to push latency lower.
“With new features being developed in the data warehouse, latencies of seconds and tens of seconds are fundamentally possible and we’re working hard at Fivetran to keep battling every element of the pipeline, keep battling down that latency number. Because we understand that lower latency is going to enable all of these exciting use case.”
You can view Fraser’s keynote address here.
Related Items:
Did Dremio Just Make Data Warehouses Obsolete?
Running Sideline to Sideline with Big Data]
Fivetran Launches Pay-As-You-Go Option for ETL
Snowflake Pops in ‘Largest Ever’ Software IPO
April 2, 2025
- Lovelytics and Nousot Announce Merger to Form New Databricks Consulting Firm
- The Linux Foundation Announces General Availability of Valkey 8.1
- John Snow Labs Launches Generative AI Lab 7.0 to Streamline LLM Evaluation for Domain Experts
- Informatica Introduces New AI-Powered Cloud Integration and MDM Capabilities
- MLCommons Releases New MLPerf Inference v5.0 Benchmark Results
- IDC: AI Investments to Represent 3.7% of Global GDP by 2030
April 1, 2025
- Carahsoft and ZL Technologies Partner to Bring Unstructured Data Management Solutions to Public Sector
- OneStream Named a Leader in 2025 Gartner Magic Quadrant for Financial Close and Consolidation
- Kinaxis Partners with Databricks to Accelerate AI-Powered Supply Chain Orchestration
- Dataiku Achieves AWS Generative AI Competency
- ControlTheory Secures $5M Seed Funding to Bring Controllability to Observability
- Crunchy Data Unveils Kubernetes-Native Data Warehouse with Iceberg and DuckDB
- OpenText Launches Titanium X with CE 25.2 for AI-Powered Digital Workforce
- Sourcetable Raises $4.3M to Launch AI-Powered ‘Self-Driving’ Spreadsheet
- Intel and IBM Announce Availability of Intel Gaudi 3 AI Accelerators on IBM Cloud
- Hitachi Vantara Validated for US Government Software Security Framework Compliance
March 31, 2025
- Striim Expands SQL Server Replication Capabilities with SQL2Fabric-X GA Release
- Precisely Acquires DTS Software to Expand Mainframe Storage Optimization Portfolio
- Fivetran Expands Microsoft Fabric Integration with 700+ Connectors, Enabling AI-Ready Data Lakes
- CData Launches Microsoft Fabric Integration Accelerator
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- Accelerating Agentic AI Productivity with Enterprise Frameworks
- Will Model Context Protocol (MCP) Become the Standard for Agentic AI?
- When Will Large Vision Models Have Their ChatGPT Moment?
- Your Next Big Job in Tech: AI Engineer
- Data Warehousing for the (AI) Win
- Nvidia Touts Next Generation GPU Superchip and New Photonic Switches
- Can You Afford to Run Agentic AI in the Cloud?
- What Benchmarks Say About Agentic AI’s Coding Potential
- More Features…
- Clickhouse Acquires HyperDX To Advance Open-Source Observability
- NVIDIA GTC 2025: What to Expect From the Ultimate AI Event?
- Grafana’s Annual Report Uncovers Key Insights into the Future of Observability
- Google Launches Data Science Agent for Colab
- FlashBlade//EXA Moves Data at 10+ TB/sec, Pure Storage Says
- Reporter’s Notebook: AI Hype and Glory at Nvidia GTC 2025
- Weaviate Introduces New Agents to Simplify Complex Data Workflows
- Mathematica Helps Crack Zodiac Killer’s Code
- HPE Preps for the AI Era with Updated Data Fabric, Storage, and Compute Offerings
- Immuta Brings AI to Data Governance, Launches Copilot
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Snowflake Ventures Invests in Anomalo for Advanced Data Quality Monitoring in the AI Data Cloud
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- NVIDIA Unveils AI Data Platform for Accelerated AI Query Workloads in Enterprise Storage
- Accenture Invests in OPAQUE to Advance Confidential AI and Data Solutions
- MinIO: Introducing Model Context Protocol Server for MinIO AIStor
- Alation Introduces Agentic Platform to Automate Data Management and Governance
- Gartner Identifies Top Trends in Data and Analytics for 2025
- Qlik Survey Finds AI at Risk as Poor Data Quality Undermines Investments
- Palantir and Databricks Announce Strategic Product Partnership to Deliver Secure and Efficient AI to Customers
- More This Just In…