April 21, 2014

Hortonworks Keen on Cascading-Tez Combo

Alex Woodie

In the future, it will be easier to build big data applications, and they’ll run faster and utilize more real-time data than today’s apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they’ll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark.

Hortonworks and Concurrent have been partners for some time. As one of the central Hadoop players, Hortonworks is well aware of Concurrent and its open-source Cascading development framework, which abstracts away the difficult part of writing MapReduce applications with an easy-to-use Java API and library.

Cascading is one of the success stories of first-gen Hadoop apps. Concurrent boasts more than 6,000 commercial deployments of its Cascading framework, and says customers like Nokia, Kohl’s, and Twitter are using it to simplify development of MapReduce apps on Hadoop. The product is being downloaded about 130,000 times per month, putting it on the cusp of big data rock star status.

With the upcoming launch of Cascading 3.0 in June, Concurrent will add support for Tez and Apache Spark, giving customers powerful new options for developing Hadoop applications beyond the MapReduce paradigm. Hortonworks, which already supports Tez with its Hadoop distribution HDP 2.1 and is currently offering a tech preview of Spark, likes where Cascading is headed and wanted to get ahead of customer demands, according to John Kreisa, vice president of corporate strategy for Hortonworks.

“Given the clear adoption patterns we’re seeing with Hadoop around building various data centric apps and the desire to put those apps into production, it made sense to deepen the relationship [with Concurrent],” Kreisa tells Datanami. “We know our customers want to develop apps. We know Cascading is popular–we see it in our user base. So it just made sense for us to take this next step and include it directly in the platform to accelerate the adoption of Hadoop.”

Previously, the two companies worked together to certify the integration and testing of HDP and Cascading, but it was up to customers to obtain the Cascading code and ensure that it worked. Under the expanded pact, Hortonworks will distribute the Cascading software development kit (SDK) as part of HDP and provide level one and level two technical support for customers; Concurrent will provide level three support.

In early June, Hortonworks will include support for the forthcoming release of Cascading 3.0 as a tech preview in the HDP sandbox environment. It will become generally available (GA) in late summer or early fall, says Tim Hall, vice president of product management for Hortonworks.

Hortonworks is a big believer in how Concurrent is building support for Tez into Cascading 3.0. “Tez is a significant leap forward,” Hall says. “It’s one of the critical things Hortonworks has been investing in from the open community for Hadoop, which is moving this from a batch-centric, mostly serialized approach to accessing data on Hadoop–that was MapReduce 1–and shifting this to a mixed workload environment that runs on YARN.”

The recently launch of HDP 2.1, which enabled Hive to either use the legacy MapReduce execution engine or the new Tez engine, is Hortonworks’ contribution. “Concurrent is going to follow that lead and go down that path as well with the Cascading SDK,” Hall said. “We will likely invest in working with the open source community to move some of these other tools from legacy MapReduce 2 to the next generation, which is Tez.”

Cascading 3.0 will also support Apache Spark, the in-memory framework that’s gaining a ton of momentum as yet another replacement for MapReduce. Hortonworks, whose developers largely spearheaded the development of Tez, is taking a bit of a wait-and-see approach regarding Cascading and its Spark prospects.

“One of the interesting things about the Cascading SDK is it does provide some additional libraries on top of the Java API. One of the ones we’re most interested in is Scalding libraries, which provides us a Scala interface,” Hall says. “Obviously having that access point, and seeing what the interest is in the community of Scala and the relationship of that Scalding SDK and how it does or does not work with Spark, will be something we’ll be looking at very closely with our customers.”

Shining a Light on Hadoop’s ‘Black Box’ Runtime

The Future of Hadoop Runs on Tez, Hortonworks Says

Applications: Data Mining, Enterprise Analytics

Technologies: Middleware

Vendors: Hortonworks

Tags: Cascading, Hadoop, mapreduce, Spark, tez, yarn

Hortonworks Keen on Cascading-Tez Combo

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

February 18, 2025

February 14, 2025

February 13, 2025

Sponsored Partner Content

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Hortonworks Keen on Cascading-Tez Combo

Join the discussion Cancel reply

Only registered users may comment. Register using the form below.

February 18, 2025

February 14, 2025

February 13, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors

Share

Copy short link