Pivotal Refreshes Hadoop Offering, Adds In-Memory Processing
As the commercial Hadoop field grows increasingly competitive, providers of this popular big data framework are working to differentiate their offerings. Pivotal, for one, has been honing the technologies developed and acquired by its parent companies as part of its vision to help enterprises realize the full potential of big data.
The cloud/big data company that was spun out of EMC/VMware in early 2013 today announced Pivotal HD 2.0, an enterprise Hadoop distribution based on Apache Hadoop 2.2, which now includes YARN (Yet Another Resource Negotiator) resource management capability. The news comes just one year after the debut of Pivotal HD 1.0, an Apache Hadoop distro with native integration of EMC’s Greenplum massively parallel processing (MPP) database.
This refresh of Pivotal’s commercial Hadoop offering includes the integration of GemFire XD, now in broad release. Acquired when VMware’s purchased GemStone in 2010, the in-memory database is designed to help businesses make prescriptive decisions in real-time. Use cases include stock trading, fraud detection, intelligence for energy companies, routing for the telecom industries, and reservations.
By combining Apache Hadoop with the HAWQ query engine and Gemfire XD, Pivotal seeks to create a more powerful big data application framework than Hadoop alone.
“When it comes to Hadoop, other approaches in the market have left customers with a mishmash of un-integrated products and processes,” explains Josh Klahr, vice president of product management at Pivotal. “Building on our industry-leading SQL-on-Hadoop offer, HAWQ, Pivotal HD 2.0 is the first platform to fully integrate proven enterprise in-memory technology, Pivotal GemFire XD, with advanced services on Hadoop 2.2 that provide native support for a comprehensive data science toolset. Data driven businesses now have the capabilities they need to gain a massive head start toward developing analytics and applications for more intelligent and innovative products and services.”
The software vendor is positioning Pivotal HD 2.0 as a key component of its Business Data Lake architecture, which seeks to unlock the promise of real-time data and analytics by removing the cost constraints of data storage and movement. The Business Data Lake approach can be likened to Cloudera’s Enterprise Data Hub concept, except that the proprietary HAWQ and GemFire XD components do not yet fully integrate with YARN.
Pivotal HD 2.0 also marks the world’s first enterprise integration of GraphLab, a graph-based, open-source distributed computation framework that enables advanced analytics and machine learning, sometimes referred to as the Hadoop of graphs.
Another main element of the update is an enhanced HAWQ SQL query engine, the real-time parallel query engine acquired from Greenplum that replaces Hive in the enterprise Hadoop product. With increased MADlib support, HAWQ can now leverage MADlib’s more than 50 in-database algorithms. The update also supports the Parquet columnar storage format as well as various data learning experiments using custom functions for R, Python, and Java-based queries and applications.
More information out the revised commercial Hadoop offering will be forthcoming as part of a webinar, hosted by Pivotal, to be held on March 27. Pivotal will also use the opportunity to introduce the core components of the Business Data Lake environment.
Related Items:
Pivotal Helps NYSE with Multi-Petabyte Problem
Can the Internet of Things Help Us Avoid Disasters?
Datanami Dishes on ‘Big Data’ Predictions for 2014