
New Benchmark for Real-Time Analytics Released by Timescale

Real-time analytics pushes the limits on data that distributed hardware and software can deliver. To adequately measure the relative performance of real-time analytics databases, Timescale today released a real-time analytics benchmark dubbed RTABench.
Timescale is a real-time analytics database provider by way of its flagship offering, TimescaleDB, which is a modified version of Postgres that treats time-series data as a first-class data type. The software has been adopted in gaming and other consumer-facing applications that are exposed to fast-changing data and require low-latency responses to many concurrent users.
Those three database capabilities–massive concurrency, low latency, and real-time updates–are largely what separately the new crop of real-time analytics databases from their traditional column-store brethren. While the data warehouses (or data lakes) from vendors like Snowflake and Databricks can adequately handle ad-hoc queries on big data sets, companies with real-time analytics needs often turn to other vendors, such as Timescale, ClickHouse, StarTree, Imply, StarRocks, Materialize, and others.
“Historically, the industry has relied on TPC-H and TPC-DS as the standard benchmarks for evaluating analytical databases,” Timescale wrote in its blog today. “They are designed to simulate business intelligence and decision support systems that run complex, ad-hoc analytical queries across multiple tables on large data sets.”
Timescale notes that ClickHouse launched ClickBench, a real-time analytics benchmark. Several dozen databases have taken the test since it launched in 2022, with the Umbra database currently holding the number one position. TimescaleDB shows five entries in the ClickBench results, where it sits in the bottom 25%.
While ClickBench has received quite a bit of attention, the folks at Timescale weren’t entirely happy with it. The company says that the way ClickBench evaluates databases–by “using a single table of clickstream data, representative of workloads like web analytics, BI, and log aggregation”–isn’t conducive to the a fair hearing on the full breadth of real-time analytic workloads.
“It [ClickBench] also favors full-table large scans and large-scale aggregations on denormalized data,” Timescale says in its blog. “Full table scans and large aggregations on a single denormalized table do not effectively represent the query patterns in applications delivering real-time analytics.”
So Timecale developed its own benchmark to better address the real-world workloads that it sees real-time analytics being asked to run. What makes RTABench different is how it handles behind-the-scenes data tasks in real-time analytics databases, such as joins, filters, and pre-aggregations.
For instance, database joins are important to bring together tables storing disparate data, such as event data and metadata, Timescale says. “You need fast joins on fresh data to retrieve related records from multiple tables,” the company writes in the blog.
Filtering and indexing are other common database techniques to avoid the dreaded full-table scans. “Databases built for real-time applications must excel at indexing, partitioning, and fast lookups–not just bulk aggregations over large datasets,” Timescale writes.
Pre-aggregations are another common way to speed up the inevitable queries that will come down the pike. “Existing benchmarks like ClickBench do not benchmark pre-aggregation,” Timescale writes, “but many real-time applications depend on it for sub-second response times.”
To develop RTABench, Timescale started with the open source ClickBench framework, and then modified it with different data and queries. It also created RTABench to work on normalized data (i.e. data straight from the database), as opposed to working on denormalized data, as ClickBench has done.
The database that Timescale created for the benchmark contains 171 million order events, about 1,100 customers, more than 9,250 products, and about 10 million historical orders. Timescale then created 40 queries that are designed to test how the database handles common tasks, such as counting the number of departed shipments per day from a specific terminal, finding the last recorded status of a given order, or showing the total revenue generated by each customer in the last 30 days.
“RTABench is a new benchmark we have developed to evaluate databases using query patterns that mirror real-world application workloads–something missing from existing benchmarks,” Timescale says in its blog. “Unlike ClickBench and other benchmarks, RTABench closely reflects the actual needs of real-time analytics applications, measuring key factors such as joins, selective filtering, and pre-aggregations.”
The company decided to leave out several measurements. For instance, while pre-aggregation queries using incrementally updated materialized views is an important feature of its database, only TimescaleDB and ClickHouse currently support those features, so it left that out. It also left out data ingest and high-concurrency queries.
“These additions would add a lot of complexity, make the benchmark much harder and longer to run, and introduce more variance in the results, making them harder to reproduce and interpret,” the company noted. “We’ve decided to leave those out to make the benchmark easier to use, but we will explore ways to add them while keeping the benchmark simple to run and interpret.”
The company is publishing the results of RTABench tests at rtabench.com. TimescaleDB, Clickhouse, MongoDB, Postgres, and MySQL currently are the only databases that have been tested. The company is openly soliciting people to help with the project. You can read more on the company’s blog post.
Related Items:
Slicing and Dicing the Real-Time Analytics Database Market
TimescaleDB Is a Vector Database Now, Too
Real-Time Analytics Databases Emerge to Take On Big, Fast-Moving Data