Follow BigDATAwire:

June 4, 2024

Snowflake Gives Cloud Customers What They Need and Want at Summit 2024

Snowflake Sridhar Ramaswamy delivers the keynote at Data Cloud Summit 2024 Monday June 3

AI is like candy these days, enticing enterprises with the promise of amazing things to come. But AI doesn’t work without a good solid data foundation. Snowflake seems to understand this, which is why the company is spending time at its Data Cloud Summit today giving customers what they want (AI) as well as what they need (better data), all washed down with extensive enhancements to the developer experience.

While AI is all the rage these days–and Snowflake CEO Sridhar Ramaswamy, hailing from AI search vendor Neeva, was hired as CEO to bolster Snowflake’s AI story–the company knows that it can’t overlook the meat and potatoes of good data management.

To that end, the company made several data-related announcements at Data Cloud Summit today, including the general availability of external tables on Apache Iceberg; the launch of a new Internal Marketplace; the general availability of Universal Search; and the preview of AI-powered object descriptions.

The GA announcement for Iceberg has been a long time in coming. Snowflake first talked about its fondness for Iceberg back in February 2022, with the tech preview becoming available later that year. Now Snowflake is rolling out support for external tables in the Iceberg table format. Customers can store their Iceberg tables in AWS, Azure, and Google cloud.

The GA of Iceberg comes a day after Snowflake unveiled its Polaris data catalog, which is designed to work with Iceberg tables. Polaris will also enable customers to run their choice of query engine on data stored in external Iceberg tables, including Spark, Flink, Trino, Presto, and Dremio, Snowflake said.

Snowflake offers thousands of third-party datasets and apps on Snowflake Marketplace, which has been around in some form since 2019. Customers liked the idea so much that they petitioned Snowflake to let them build their own marketplaces for internal use, and Snowflake responded with Internal Marketplace.

According to Christian Kleinerman, Snowflake’s EVP of product, the Internal Marketplace will allow the various departments of a company to curate and publish data products, including datasets, machine learning models, applications, and other functions. “Anything they need to do to more easily get value out of this data,” Kleinerman said.

Another Snowflake product going GA this week is Universal Search, a new AI-powered search engine based on the Neeva product that Snowflake acquired one year ago–the same deal that brought Ramaswamy to Snowflake.

What’s special about Universal Search, Kleinerman said, is that it works across all of the data that a customer has in Snowflake, including internal tables, external Iceberg tables, data from third-party providers, and data from the Internal Marketplace too.

Snowflake supports multiple data workloads for a variety of personnas

“Our goal is to do away with the need for customers to know where to find what, and with a single central experience, have them search, and we will surface a set of data products and data sets that might be helpful to them, whatever the task at hand may be,” he said during a press conference last week.

AI-powered object descriptions, meanwhile, is a new feature that leverages a large language model (LLM) to automatically describe data, including columns, tables, views. The offering, which will soon be in private preview, will make it easier for customers to find relevant data.

“None of us likes documentation,” Ramaswamy said. “And the only thing we like even less than writing documentation is updating documentation. Language models don’t get bored.”

AI and ML Enhancements

Snowflake also made several AI enhancements today, including updates to Snowflake Cortex AI, the fully managed Generative AI service it unveiled in November, as well as new features in Snowflake ML. It also unveiled the capability to fine-tune Cortex systems, a security-focused GenAI system called Cortex Guard, a new offering for extracting information from documents dubbed Document AI; and new MLOps capabilities.

Snowflake Cortex AI provides managed GenAI services

On the Cortex front, Snowflake is teasing the addition of two new GenAI services, including Snowflake Cortex Analyst and Snowflake Cortex Search, both of which will be in public preview soon.

“Cortex Analyst is an API that allows our customers to securely build applications for their users so they can ask business questions of their analytical data on Snowflake and get accurate answers,” said Baris Gultekin, Snowflake’s head of AI. “We’ve focused heavily on quality,” he added, noting that it beats GPT-4 in structured data analytics.

Cortex Search, meanwhile, is a fully managed text search solution built for RAG chat bots as well as enterprise search, Gultekin said. The combination of Snowflake’s arctic and the Cortex search capability gives customers the tools to “build high-quality chat bots that talk to their data in minutes,” he said.

Cortex Guard, which will soon be generally available, is based on Meta’s Llama Guard and automatically filters and flags harmful content that might appear in a Snowflake customer’s system.

Customers will soon be able to use Document AI, another managed AI capability from Snowflake that enables them to extract information from documents. The software is based on Snowflake Arctic-TILT, the company’s multimodal LLM, which, it notes, outperformed GPT-4 on the DocVQA benchmark test.

Snowflake has a multi-pronged GenAI strategy

Individuals who want to leverage the power of AI without coding may be interested in Snowflake AI & ML Studio. The offering, currently in private preview, is a no-code interactive interface that allows users to test models from a variety of sources, including Google, Meta, Mistral AI, and Reka–as well as Snowflake’s own Arctic model–and build custom search experiences without touching a line of code.

Many LLMs are pretrained, which don’t give users the opportunity to improve them. But Snowflake is allowing customers to bolster some of its models with Cortex Fine Tuning. Now in public preview, the serverless function lets customers top off their models with some custom data through the AI & ML Studio. Alternatively, fine-tuning can be done with a SQL function.

Good management of AI and ML models is critical to business success, which is why Snowflake has been investing in MLOps. At Data Cloud Summit 2024, the company is making several pertinent announcements, including the general availability of the Snowflake Model Registry, which allows customers to govern the access and use of AI and ML models.

It also announced the public preview of the Snowflake Feature Store, which will allow customers to better manage the individual features that go into an ML model. Finally, it’s starting a private preview for ML Lineage, which will allow data science teams to trace the usage of features, datasets, and models across the ML lifecycle.

Developer Experience

As if the data and AI/ML enhancements were not enough, the folks at Snowflake have also been busy improving the developer experience for its customers. The company prides itself on making things easy for developers, data scientists, and analysts to create things, and the enhancements it’s delivering at Data Cloud Summit–with new Container Services, the Snowflake Notebook, the pandas API, Git integration, a new CLI, observability enhancements, and others–would appear to push that particular ball forward.

Snowflake is adding a distributed pandas API to go along with its DataFrames API for Snowpark

For starters, the company is going GA with Snowpark Container Services. First unveiled earlier this year as a feature for Snowpark, Container Services streamline the management of Python, Java, and Scala apps developed in Snowpark. Container Services are GA on AWS while the public preview is starting for Azure; support for Google Cloud will follow, the company says.

The company unveiled Snowflake Notebooks at a Snow Day in November, and now it’s ready to enter the public preview stage. It will enable customers to write both SQL and Python code, and support functions such as scheduling and integration with Git. It will also integrate with the new Snowflake Copilot, Kleinerman said.

Developers will also be happy to hear that Snowflake is rolling out a public preview of its support for pandas, the very popular Python framework for data science. While pandas is limited to running on a single machine, Snowflake has built a distributed implementation that lets customers scale pandas functions to run against “as much data as they need,” Kleinerman said. “We expect this to be very well received.”

Hardcore developers don’t always live in GUIs, which is why the general availability of the new command line interface (CLI) is expected to be a hit with the Snowflake crowd. The CLI will be used to manage CI/CD pipelines. That goes hand in hand with the GA of Snowflake’s new Python API, as well as the integration with Git, which is designed to improve how teams collaborate; it’s entering public preview. Finally, Snowflake is also rolling out a new database change management capability that will provide better tracking of how the Snowflake database evolves.

Snowflake Trail provides observability for data and workloads in Snowflake

Snowflake is also rolling out a new observability solution dubbed Snowflake Trail, which will allow customers to gain more insight into the behavior of Snowpark applications and data pipelines by capturing and storing logs, metrics, and traces.

“We’re introducing the ability to have metrics and traces and logs within Snowpark code, within Snowpark Container Services code, and have all the telemetry land in a table natively in every single Snowflake account,” Kleinerman said.

The solution, which is based on the OpenTelemetery data standard, will allow customers to use other tools, such as Datadog, Grafana, Metaplane, PagerDuty, and Slack, to analyze the data. Snowflake will also partner with Monte Carlo and Observe.

While the number of announcements and the amount of new features may be large at Data Cloud Summit, CEO Ramaswamy is adamant that simplicity is the name of the game for Snowflake.

“We don’t have hundreds of SKUs like some of the big providers have,” Ramaswamy said during the press conference last week. “We have one product. All of the features are available in that one product. We take the trouble to make sure that things work with one another. It places a higher bar on it, but we think ultimately it makes it much easier for our customers…”

Related Items:

Snowflake Embraces Open Data with Polaris Catalog

Snowflake, AWS Warm Up to Apache Iceberg

It’s a Snowday! Here’s the New Stuff Snowflake Is Giving Customers

BigDATAwire