Follow BigDATAwire:

December 4, 2024

AWS Highlights Amazon SageMaker Updates to Bridge Analytics and AI

LAS VEGAS, Dec. 4, 2024 — At AWS re:Invent, Amazon Web Services, Inc. (AWS), has announced the next generation of Amazon SageMaker, unifying the capabilities customers need for fast SQL analytics, petabyte-scale big data processing, data exploration and integration, model development and training, and generative artificial intelligence (AI) into one integrated platform.

The new SageMaker Unified Studio makes it easy for customers to find and access data from across their organization and brings together purpose-built AWS analytics, machine learning (ML), and AI capabilities so customers can act on their data using the best tool for the job across all types of common data use cases, assisted by Amazon Q Developer along the way. SageMaker Catalog and built-in governance capabilities allow the right users to access the right data, models, and development artifacts for the right purpose.

The new SageMaker Lakehouse unifies data across data lakes, data warehouses, operational databases, and enterprise applications, making it easy to access and work with data from within SageMaker Unified Studio and using familiar AI and ML tools or query engines compatible with Apache Iceberg. New zero-ETL integrations with leading Software-as-a-Service (SaaS) applications make it easy to access data from third-party SaaS applications in SageMaker Lakehouse and Amazon Redshift for analytics or ML without complex data pipelines.

“We are seeing a convergence of analytics and AI, with customers using data in increasingly interconnected ways—from historical analytics to ML model training and generative AI applications,” said Swami Sivasubramanian, vice president of Data and AI at AWS. “To support these workloads, many customers already use combinations of our purpose-built analytics and ML tools, such as Amazon SageMaker—the de facto standard for working with data and building ML models—Amazon EMR, Amazon Redshift, Amazon S3 data lakes, and AWS Glue. The next generation of SageMaker brings together these capabilities—along with some exciting new features—to give customers all the tools they need for data processing, SQL analytics, ML model development and training, and generative AI, directly within SageMaker.”

Collaborate and Build Faster with Amazon SageMaker Unified Studio

Today, hundreds of thousands of customers use SageMaker to build, train, and deploy ML models. Many customers also rely on the comprehensive set of purpose-built analytics services from AWS to support a wide range of workloads, including SQL analytics, search analytics, big data processing, and streaming analytics.

Increasingly, customers are not using these tools in isolation; rather, they are using a combination of analytics, ML, and generative AI to derive insights and power new experiences for their users. These customers would benefit from a unified environment that brings together familiar AWS tools for analytics, ML, and generative AI, along with easy access to all of their data and the ability to easily collaborate on data projects with other members of their team or organization.

The next generation of SageMaker includes a new, unified studio that gives customers a single data and AI development environment where users can find and access all of the data in their organization, act on it using the best tool for the job across all types of common data use cases, and collaborate within teams and across roles to scale their data and AI initiatives. SageMaker Unified Studio brings together functionality and tools from the range of standalone “studios,” query editors, and visual tools that customers enjoy today in Amazon Bedrock, Amazon EMR, Amazon Redshift, AWS Glue, and the existing SageMaker Studio.

This makes it easy for customers to access and use these capabilities to discover and prepare data, author queries or code, process data, and build ML models. Amazon Q Developer assists along the way to support development tasks such as data discovery, coding, SQL generation, and data integration. For example, a user could ask Amazon Q, “What data should I use to get a better idea of product sales?” or “Generate a SQL to calculate total revenue by product category.” Users can securely publish and share data, models, applications, and other artifacts with members of their team or organization, accelerating discoverability and usage of the data assets.

With the Amazon Bedrock integrated development environment (IDE) in SageMaker Unified Studio, users can build and deploy generative AI applications quickly and easily using Amazon Bedrock’s selection of high-performing foundation models and tools such as Agents, Guardrails, Knowledge Bases, and Flows. SageMaker Unified Studio comes with data discovery, sharing, and governance capabilities built in, so analysts, data scientists, and engineers can easily search and find the right data they need for their use case, while applying desired security controls and permissions, maintaining access control, and securing their data.

Meet Enterprise Security Needs with Amazon SageMaker Data and AI Governance

The next generation of SageMaker simplifies the discovery, governance, and collaboration of data and AI across an organization. With SageMaker Catalog, built on Amazon DataZone, administrators can define and implement consistent access policies using a single permission model with granular controls, while data workers from across teams can securely discover and access approved data and models enriched with business context metadata created by generative AI.

Administrators can easily define and enforce permissions across models, tools, and data sources, while customized safeguards help make AI applications secure and compliant. Customers can also safeguard their AI models with data classification, toxicity detection, guardrails, and responsible AI policies within SageMaker.

Reduce Data Silos and Unify Data with Amazon SageMaker Lakehouse

Today, more than one million data lakes are built on Amazon Simple Storage Service (Amazon S3), allowing customers to centralize their data assets and derive value with AWS analytics, AI, and ML tools. Data lakes make it possible for customers to store their data as-is—making it easy to combine data from multiple sources. Customers may have data spread across multiple data lakes, as well as a data warehouse, and would benefit from a simple way to unify all of this data.

SageMaker Lakehouse provides unified access to data stored in Amazon S3 data lakes, Redshift data warehouses, and federated data sources, reducing data silos and making it easy to query data, no matter how and where it is physically stored. With this new Apache Iceberg-compatible lakehouse capability in SageMaker, customers can access and work with all of their data from within SageMaker Unified Studio, as well as with familiar AI and ML tools and query engines compatible with Apache Iceberg open standards.

Now, customers can use their preferred analytics and ML tools on their data, no matter how and where it is physically stored, to support use cases including SQL analytics, ad-hoc querying, data science, ML, and generative AI. SageMaker Lakehouse provides integrated, fine-grained access controls that are consistently applied across the data in all analytics and AI tools in the lakehouse, enabling customers to define permissions once and securely share data across their organization.

Quickly and Easily Access SaaS Data with the New Zero-ETL Integrations with SaaS Applications

To truly leverage data across their operations, businesses need seamless access to all their data, regardless of its location. That is why AWS has invested in a zero-ETL future, where data integration is no longer a tedious, manual effort, and customers can easily get their data where they need it. This includes zero-ETL integrations for Amazon Aurora MySQL and PostgreSQL, Amazon RDS for MySQL, and Amazon DynamoDB with Amazon Redshift, which help customers quickly and easily access data from popular relational and non-relational databases in Redshift and SageMaker Lakehouse for analytics and ML. In addition to operational databases and data lakes, many customers also have critical enterprise data stored in SaaS applications and would benefit from easy access to this data for analytics and ML.

The new zero-ETL integrations with SaaS applications make it easy for customers to access their data from applications such as Zendesk and SAP in SageMaker Lakehouse and Redshift for analytics and AI. This removes the need for data pipelines, which can be challenging and costly to build, complex to manage, and prone to errors that may delay access to time-sensitive insights. Zero-ETL integrations for SaaS applications incorporate best practices for full data sync, detection of incremental updates and deletes, and target merge operations.

Organizations of all sizes and across industries, including Infosys, Intuit, and Woolworths, are already benefiting from AWS zero-ETL integrations to quickly and easily connect and analyze data without building and managing data pipelines. With the zero-ETL integrations for SaaS applications, for example, online real estate platform idealista will be able to simplify their data extraction and ingestion processes, eliminating the need for multiple pipelines to access data stored in third-party SaaS applications and freeing their data engineering team to focus on deriving actionable insights from data rather than building and managing infrastructure.

The next generation of SageMaker is available today. SageMaker Unified Studio is currently in preview and will be made generally available soon.

The AWS Blog for details on today’s announcement.


Source: AWS

BigDATAwire