

(greenbutterfly/Shutterstock)
If you’re a supporter of open data, it’s hard not to feel good about last week’s news around Apache Iceberg. Customers demanded an open storage format, and the two leading providers, Snowflake and Databricks, are delivering it, in a big way.
To recap: Databricks surprised the big data community last Tuesday by throwing its weight behind Apache Iceberg with the announcement of its intent to acquire Tabular, which was founded by former Netflix engineers who created Iceberg.
That announcement came a day after Snowflake unveiled Polaris, a new metadata catalog designed to work with Iceberg, thereby enabling customers to use open query engines with their data. The move furthered Snowflake’s transition from a proudly proprietary cloud data warehouse into an open data platform for analytics and AI.
Members of the open data ecosystem responded with applause. Among the biggest supporters is Dremio, which develops an open-source query engine of the same name, is the main backer for an open metadata catalog, Project Nessie, and also manages an Iceberg-based lakehouse for customers.
“I think it’s a statement that, in table formats, Iceberg won. I think it’s the realization of that,” said James Rowland-Jones (JRJ), Dremio’s vice president of product management. “It’s also the realization that table format bifurcation, when you are not winning, is not helpful to your business.”
Databricks’ table format, called Delta, was the most-used table format when Dremio surveyed customers on their lakehouse technologies in late 2023. While Delta was number one in terms of total deployments, Iceberg was the leader in terms of planned deployments over the next three years, said Read Maloney, Dremio’s chief marketing officer.
“Who’s driving these changes? It’s customers. Customers are sick of being locked-in, and the only way to do that is to ensure that you’re not only in an open table format, but then you have an open catalog,” Maloney told Datanami in an interview at Snowflake’s Data Cloud Summit in San Francisco last week.
“So now customers own their own storage, they own their own data, they own their own metadata, and then all the vendors in the ecosystem build around that. And the customer now has the ability to say ‘I want that vendor for this, I want that vendor for this,’ and they all work within the common ecosystem,” he says. “The more there’s commonality in the specification around the catalogs, it makes it way easier for everyone to get involved in the ecosystem.”
“We’re listening to customers,” Ron Ortluff, the head of data lake and iceberg at Snowflake, told Datanami in an interview last week. “That’s kind of the guiding principle.”
The pending launch of Polaris, which Snowflake plans to donate to the open source community within 90 days, means that Snowflake customers soon will be able to query their Iceberg data using any query engine that supports Iceberg’s REST-based API. That list includes Apache Spark, Apache Flink, Presto, Trino, and (soon) Dremio. And of course, they will also be able to query Iceberg data using Snowflake’s fast proprietary SQL engine.
The momentum behind open data is sign of the continued decoupling of compute stacks, said Siva Padisetty, the CTO for New Relic, which develops an observability platform.
“After storage and compute became decoupled, all of the layers from storage through analytics began to be similarly unbundled, a process currently taking place with tables,” Padisetty said via email. “Overall, the focus here remains on data stack optimization and how organizations assemble the appropriate storage, table format, and compute engines to process their data use cases in the fastest possible manner.”
The key, Padisetty says, “is maintaining vendor unlock, speed, and agility across compute and storage while solving business use cases in the most cost-effective manner with the gravity of data without multiple copies.”
The value of having a centralized data platform that can handle huge data volumes and maintain performance and security for multiple use cases, such as IT telemetry, data lake, and SQL analytics is paramount, he said.
“Enterprises get the value add of open-source technology while maintaining centralized data,” Padisetty continued. “The centralization of the use cases is going to happen, and companies should be positioning themselves to address that.”
The folks at Starburst, the commercial outfit behind the open source Trino, are also watching the Iceberg developments closely. Iceberg was originally developed in part to enable Netflix to use Presto, which Trino forked from, so the growth of Iceberg is definitely a positive one.
“The benefit to the market and customers is that this competition actually creates openness,” said Justin Borgman, the CEO and chairman of Starburst, which also offers an Iceberg-based lakehouse service. “Starburst is one such beneficiary and can now be considered a strong third option in the Databricks vs. Snowflake debate.”
Borgman is closely watching what comes next, particularly around the metadata catalog. Just as the battle over open table formats ended up being a new source of data silo-ization (which is ironic, since they were created to foster open data), the metadata catalogs are also a potential source of lock-in, as they broker connections between processing engines and the data.
“With Tabular, Databricks’s Unity catalog has the potential to capture a lot more market share, including organizations using either Delta Lake or Iceberg,” Borgman told Datanami via email. “Snowflake’s open-sourcing of Polaris is a way to compete against Databricks by highlighting that while the market is rapidly moving to open storage formats like Iceberg, catalogs like Unity are a new source of lock-in. One could speculate that this will pressure Databricks to eventually open source Unity, but it is too early to know for sure.”
Taken as a whole, however, the news of the past week is very good for customers and supporters of open data. Momentum for open data platforms is building, and it couldn’t come at a better time.
“The Iceberg ecosystem has been growing quickly. I think it’s going to grow even faster on the back of both of these announcements,” Maloney said. “If you’re in the Iceberg community, this is go time in terms of entering the next era.”
Related Items:
What the Big Fuss Over Table Formats and Metadata Catalogs Is All About
Databricks Nabs Iceberg-Maker Tabular to Spawn Table Uniformity
Snowflake Embraces Open Data with Polaris Catalog
March 21, 2025
- Oracle Introduces AI Agent Studio
- Acceldata Unveils Autonomous 25 Event on Agentic AI, Speaker Submissions Now Open
- Silicon Data Secures $4.7M in Seed Funding and Expands Compute Industry Data Product Suite
- NTT DATA Launches Agentic AI Services for Hyperscaler AI Tech
- Domo Unveils New Features to Transform Development of Data Products
- Dataminr Secures $85M to Expand GenAI Platform and Global Reach
- NetApp Validated for NVIDIA DGX SuperPOD, Cloud Partners, and Certified Systems
- Airbyte Streamlines Data Movement for AI, Simplifies Integrations, and Improves Efficiency
March 20, 2025
- ServiceNow and NVIDIA Advance Agentic AI to Redefine Enterprise Intelligence
- DataRobot Accelerates Agentic AI Applications in Collaboration with NVIDIA
- Alluxio Partners with vLLM Production Stack to Accelerate LLM Inference
- Micron Expands AI Memory Portfolio with HBM3E and SOCAMM
- Dataiku and Deloitte Deliver Analytics Modernization Program
- NetApp Fuels Future of Agentic AI Reasoning Solutions with NVIDIA AI Data Platform
- Quobyte Expands Support to ARM, Enabling Mixed-Architecture Storage Clusters
- Accenture Expands AI Refinery and Launches New Industry Agent Solutions
- Supermicro Unveils Petascale Storage Server with NVIDIA Grace CPU for AI and ML Workloads
- Vultr Announces Early Availability of NVIDIA HGX B200 Worldwide
- NVIDIA Unveils AI Data Platform for Accelerated AI Query Workloads in Enterprise Storage
- Google Cloud Brings GenAI and Data Analytics to Quest Diagnostics
- PayPal Feeds the DL Beast with Huge Vault of Fraud Data
- OpenTelemetry Is Too Complicated, VictoriaMetrics Says
- The Future of AI Agents is Event-Driven
- Your Next Big Job in Tech: AI Engineer
- Data Warehousing for the (AI) Win
- When Will Large Vision Models Have Their ChatGPT Moment?
- Krishna Subramanian, Komprise Co-Founder, Stops By the Big Data Debrief
- Demystifying AI: What Every Business Leader Needs to Know
- The AI Firm Turning 1M Real-Time Data Sources Into Actionable Intelligence
- As AI Storage Balloons, MinIO Eyes Faster Growth
- More Features…
- IBM to Buy DataStax for Database, GenAI Capabilities
- Clickhouse Acquires HyperDX To Advance Open-Source Observability
- EDB Says It Tops Oracle, Other Databases in Benchmarks
- NVIDIA GTC 2025: What to Expect From the Ultimate AI Event?
- Databricks Unveils LakeFlow: A Unified and Intelligent Tool for Data Engineering
- Meet MATA, an AI Research Assistant for Scientific Data
- CDOAs Are Struggling To Measure Data, Analytics, And AI Impact: Gartner Report
- Google Launches Data Science Agent for Colab
- Big Data Heads to the Moon
- Weaviate Introduces New Agents to Simplify Complex Data Workflows
- More News In Brief…
- Gartner Predicts 40% of Generative AI Solutions Will Be Multimodal By 2027
- Snowflake Ventures Invests in Anomalo for Advanced Data Quality Monitoring in the AI Data Cloud
- Starburst Closes Record FY25, Fueled by Rising AI Demand and Growing Enterprise Momentum
- Seagate Unveils IronWolf Pro 24TB Hard Drive for SMBs and Enterprises
- Accenture Invests in OPAQUE to Advance Confidential AI and Data Solutions
- Intel Unveils High-Performance, Power-Efficient Ethernet Solutions
- Qlik Study: 94% of Businesses Boost AI Investment, But Only 21% Have Fully Operationalized It
- Gartner Identifies Top Trends in Data and Analytics for 2025
- Qlik Survey Finds AI at Risk as Poor Data Quality Undermines Investments
- Palantir and Databricks Announce Strategic Product Partnership to Deliver Secure and Efficient AI to Customers
- More This Just In…