Tag: Parquet
How Acceldata Helped T-Mobile’s Data Modernization Strategy
When T-Mobile started migrating some of its data estate from an on-prem Hadoop system to cloud-based data platforms, it found the move liberating. But as it settled into a hybrid-cloud world, T-Mobile realized costs were Read more…
Tabular Plows Ahead with Iceberg Data Service, $26M Round
Apache Iceberg appears to have the inside track to become the defacto standard for big data table formats at this point. And with today’s $26 million round, the company behind the open source project, Tabular, is bette Read more…
Hugging Face and Databricks Streamline Dataset Creation with Spark
Databricks and Hugging Face have unveiled a new integration that will allow users to create a Hugging Face dataset from an Apache Spark dataframe. Databricks has written and committed these Spark changes to the Huggin Read more…
InfluxData Revamps InfluxDB with 3.0 Release, Embraces Apache Arrow
InfluxData has announced the release of InfluxDB 3.0, its newly rebuilt database and storage engine for time series analytics. Previous iterations of InfluxDB were written in Go, a programming language known for its c Read more…
Teradata Unveils New Data Lake, Advanced Analytics Offerings
Teradata today rolled out a pair of new products designed to broaden its appeal to a new generation of users, including a new data lake called VantageCloud Lake that melds the workload management capabilities of its epon Read more…
Cloudera Picks Iceberg, Touts 10x Boost in Impala
Cloudera is now supporting the open source Apache Iceberg table format in its cloud data platform, or lakehouse, the vendor announced yesterday. The move will help to ensure transactional integrity in the big data enviro Read more…
Why the Open Sourcing of Databricks Delta Lake Table Format Is a Big Deal
Databricks introduced Delta back in 2019 as a way to gain transactional integrity with the Parquet data table format for Spark cloud workloads. Over time, Delta evolved to become its own table format and also to become m Read more…
Tabular Seeks to Remake Cloud Data Lakes in Iceberg’s Image
The creators of the table format Apache Iceberg launched a new company this summer called Tabular that’s aiming to remake how companies store data in the cloud. If the company has its way, much of the minutia of how da Read more…
A Peek at the Future of the Open Data Architecture
Hadoop may have fizzled out as a data platform, but it laid the groundwork for an open data architecture that continues to grow and evolve today, largely in the cloud. We got a peek at the future of this open data archit Read more…
Presto the Future of Open Data Analytics, Foundation Says
The openness of Presto, its adherence to standard SQL, and the ubiquity and performance of modern cloud storage have combined to put Presto in the driver’s seat of the big data analytics stack for the foreseeable futur Read more…
Data Headaches Targeted with a Dose of .BIG
Working with large numbers of files--and large files--remains a roadblock to productivity for data professionals around the world. Now a software startup named Exponam says it has come up with a potential solution to the Read more…
Return of the Living Data
When Google published a paper on its proprietary BigQuery engine about nine years ago, the open source community reproduced the technology as best they could, just as they did with MapReduce and the Google File System, w Read more…
Data Startup Aims to Make S3 ‘Work Like Dropbox’
Quilt Data emerged from stealth today with a new service that aims to make S3 work more like Dropbox, the handy file sharing service. For about $500 per month, Quilt Data allows teams to securely large share files that a Read more…
Celebrating Data Independence
Every company wants the independence to do what they wish with their data. That's one of the first assumptions underlying this whole big data movement. But depending on where and how a business stores its data -- such as Read more…
Big Data File Formats Demystified
So you're filling your Hadoop cluster with reams of raw data, and your data analysts and scientists are champing at the bit to get started. Then the question hits you: How are you going to store all this data so they can Read more…