Rockset Looks to Compute-Compute Isolation for Real-Time Advantage
The separation of compute and storage is a bedrock of big data architecture and has enabled nearly infinite scalability in cloud storage. Now a related concept called compute-compute isolation is being introduced to databases used for real-time analytics, with Rockset leading the way.
In the early days of the big data revolution, compute and storage were cohabitants on the same nodes in a cluster. If you wanted to add more storage to your Hadoop cluster, then you would also be adding more compute. Similarly, if you needed more compute to handle tough queries, you would also be adding more storage, thanks to the concept of storage locality adopted to minimize data movement (and the network congestion it brings) in Hadoop.
However, the unrelenting growth of big data meant organizations were buying compute capacity when all they needed was more storage, or vice versa. By separating the compute and storage tiers, organizations gained the capability to scale each resource independently, enabling them to expand clusters to handle their specific storage or compute requirements needed, without wasting money on unneeded resources.
We take the separation of compute and storage as a given in the cloud. Today, customers store massive amounts of data in object stores, such as Microsoft ALDS or AWS S3, and bring specific compute engines to bear on that data as needed. This has also helped to unchain data while spurring development of standalone analytic engines, such as Presto, Trino, and Dremio, as well as helping the rise of table formats, such as Apache Iceberg and Delta Lake.
Real-time analytics databases have also benefited from the separation of compute and storage. This emerging product category serves organizations that need to run a large number of SQL queries on large amounts of streaming data with low latency. Vendors like Rockset, Clickhouse, Imply, and StarTree are leading the development of real-time databases.
Because of the unique computational demands of these products, which must simultaneously run data ingestion workloads while running SQL queries, an additional step may be required: compute-compute separation.
Rockset co-founder and CEO Venkat Venkatarami says compute-compute separation, which Rockset announced in its cloud analytics database earlier this year, enables Rockset to continue to query data at high speeds while massive amounts of data are simultaneously being loaded into the database, with a guarantee that one will not impact the other.
Compute-compute separation protects against flash floods of data on the ingest side, according to Venkatarami. “If there’s more data [arriving], just scale the ingest compute, and your queries will be completely unaffected by it,” he says. “Your applications will be just as responsive as they were. Whether there’s a flash flood of data or not doesn’t matter.”
Similarly, if there’s a sudden spike of query activities and more analysis happening on the stream of data, the data ingest won’t bog down as a result of more CPUs going toward crunching SQL. That can be critical when responding to an anomaly, such as suspicious activity that could turn out to be a security threat.
“Your query compute blows up, and your entire application becomes not real-time anymore because all the compute is getting hijacked by the queries,” the Datanami 2022 Person to Watch says. “And now you’re not ingesting data in real time, and you have a huge lag exactly when you don’t want that lag. I’m doing a lot of investigation suddenly, and now my blind spot is going from one second subsequent to 10 minutes. Those 10 minutes are exactly when I need real-time.”
Having additional compute resources to throw at a flash flood of data or a burst of SQL activity typically requires the organization to be running in the cloud, where they can immediately spin up more compute clusters and dedicate them to one type of compute in Rockset. In theory, compute-compute separation could also work on-prem, but only if the organization is sitting on large amounts of unused compute capacity. Having spare processors and RAM on the backplane that can be activated at a moment’s notice is common in mainframe environments, but it’s not often encountered in industry-standard compute environments.
Venkatarami says this innovation is giving Rockset an edge in the emerging market for real-time analytics databases. “I think compute-compute separation is a not incremental [improvement],” he says. “It’s a leapfrog movement for the entire analytics space.”
“If real time analytics were a branch of science, we would have won the Nobel Prize for it,” he continues. “I’m not just saying that because we are the ones that have it. I want every real time database in the world to have the capability… It just make sense.”
Related Items:
Real-Time Analytics Databases Emerge to Take On Big, Fast-Moving Data
Rockset Says It’s Ready for Real-Time AI
The New Economics of the Separation of Compute and Storage