InfluxData Revamps InfluxDB with 3.0 Release, Embraces Apache Arrow
InfluxData has announced the release of InfluxDB 3.0, its newly rebuilt database and storage engine for time series analytics.
Previous iterations of InfluxDB were written in Go, a programming language known for its compute resource efficiency. Version 3.0 is written in Rust, the programming language used in the Apache Arrow ecosystem. This ecosystem includes the DataFusion distributed SQL query engine, Flight SQL query engine built on top of Apache Arrow Flight, and Parquet, a columnar storage file format.
This release supports use cases with high cardinality data, or data with a large number of distinct values or categories. This data type can be resource intensive as it requires more storage space and is harder to compress, slower to query, and trickier to incorporate in real-time analytics.
In a previous announcement, InfluxData CTO Paul Dix said the new storage engine represents the next phase of InfluxDB, where the company is bringing metric data and event data time series into a single database core, which he says gives users the ability to create time series on the fly from raw, high-precision event data.
“We decided the new core should be built in Rust because of its many advantages with high-performance systems software. We also decided to build it around the Apache Arrow ecosystem for greater interoperability and collaboration with a much wider group of developers,” he wrote.
InfluxDB 3.0 was developed as the open source project InfluxDB IOx, announced last year. In another blog post, InfluxData VP of Product Rick Spencer calls the Apache Arrow Project’s specification for columnar data the “gold standard for high performance computing for analytics use cases.” He goes on to say that Parquet’s compression achieves orders of magnitude gains in disk space efficiency. He also mentions the company has enhanced DataFusion’s SQL dialect to include key time series functions, along with bringing InfluxQL, InfluxData’s time series query language, forward into DataFusion.
InfluxData claims that compared to previous versions, InfluxDB 3.0 delivers 100x faster queries across high cardinality data for real-time query response, 10x ingest performance, and 10x greater data compression. Users can continuously ingest, transform, and analyze “hundreds of millions” of time series data points with no limitations, the company says.
InfluxDB 3.0 is now the foundation for all InfluxDB products and supports a full range of time series data (metrics, events, and traces) to power use cases around high-cardinality time series data like observability, real-time analytics, and IoT sensor data. Version 3.0 is available now in InfluxDB Cloud Serverless and InfluxDB Cloud Dedicated, a newly announced single tenant version of InfluxDB.
The company also teased two new products coming later this year: InfluxDB 3.0 Clustered, an evolution of InfluxDB Enterprise, and InfluxDB 3.0 Edge, a single node instance for local and edge deployments.
InfluxData raised $81 million in a Series E round in February for a total of $171 million in equity funding. The fresh funds helped to accelerate the development and rollout of this new database engine, the company says.
“InfluxDB 3.0 is a major milestone for InfluxData, developed with cutting edge technologies focused on scale and performance to deliver the future of time series,” said InfluxData CEO Evan Kaplan in a release. “Built on Apache Arrow, the most important ecosystem in data management, InfluxDB 3.0 delivers on our vision to analyze metric, event, and trace data in a single datastore with unlimited cardinality. InfluxDB 3.0 stands as a massive leap forward for both time series and real-time analytics, providing unparalleled speed and infinite scalability to large data sets for the first time.”
Related Items:
It’s About Time for InfluxData
A Peek at the Future of the Open Data Architecture
Voltron Data Releases Enterprise Subscription for Arrow