March 8, 2024

Apache Arrow Announces DataFusion Comet

Ali Azhar

Apache Arrow, a software development platform for building high-performance applications, has announced the donation of the Comet project.

Comet is an Apache Spark plugin that uses Apache Arrow Datafusion to improve query efficiency and query runtime. It does this by optimizing query execution and leveraging hardware accelerators.

With its ability to allow multiple analytics engines and accelerate analytical workload on big data systems, Apache Arrow has become increasingly popular with software developers, data engineers, and data analysts. With Apache Arrow, users of big data processing and analytics engines, such as Spark, Drill, and Impala can access data without reformatting. Comet aims to accelerate Spark using native columnar engines such as Databricks Photon Engine and open-source projects such as Sparks RAPIDS and Gluten.

Interestingly, Comet was originally implemented at Apple, and the engineers on that project are also contributors to Apache Arrow Data Fusion. The Comet project is designed to replace Spark’s JVM-based SQL execution engine by offering better performance for a variety of workloads.

The Comet donation will not result in any major disruption for users as they can still interact with the same Spark ecosystem, tools, and APIs. The queries will still be through Spark’s SQL planner, task scheduler, and cluster manager. However, the execution is delegated to Comet, which is more powerful and efficient than a JVM-based implementation. This means better performance with no Spark behavior change from the end users’ point of view.

(Tee11/Shutterstock)

Comet supports the full implementation of Spark operators and built-in expressions. It also offers native Parquet implementation for both the writer and the reader. Users can also use the UDF framework to mitigate existing UDF to native.

As different applications store data differently, developers often have to manually organize information in memory to speed up processing, however, this requires extra effort and time. Apache Arrow helps solve this issue by making data applications faster so organizations can quickly extract more useful insights from their business data, and enable applications to easily exchange data with one another.

The co-founder of Apache Arrow, West McKinney, was one of Datanami’s People to Watch 2018. In an interview with Datanami that year McKinney shared that as big data systems continue to grow more mature, he hoped to see “increased ecosystem-spanning collaborations on projects like Arrow to help with platform interoperability and architectural simplification. I believe that this defragmentation, so to speak, will make the whole ecosystem more productive and successful using open source big data technologies.”

With the Comet donation, Apache Arrow will get to accelerate its development and grow its community. With the current momentum toward accelerating Spark through native vectorized execution, Apache believes that open-sourcing will benefit other Spark users.

Voltron Data Unveils Enterprise Subscription for Apache Arrow

Dremio Announces Support for Apache Arrow Flight High-performance Data Transfer

Applications: Data Management

Tags: Apahce Arrow, big data systems, Comet, Spark, West McKinney

Apache Arrow Announces DataFusion Comet

January 27, 2025

January 24, 2025

January 23, 2025

January 22, 2025

Sponsored Partner Content

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Apache Arrow Announces DataFusion Comet

January 27, 2025

January 24, 2025

January 23, 2025

January 22, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link