Hortonworks Shares Vision of Connected Data Planes
First came the data lake, then the data hub. This week at its annual Hadoop Summit conference, Hortonworks (NASDAQ: HDP) will be sharing its vision of a connected data plane that presents a single interface for analyzing and managing data wherever it is–at rest in Hadoop, in motion on Kafka, in the data center, or in the cloud.
The rise of data-emitting devices connected through wireless LTE networks is creating a new set of opportunities and product requirements for Hortonworks (NASDAQ: HDP) customers, explains the company’s vice president of product and alliance marketing Matt Morgan.
“Today, big data use case are requiring cloud connectivity in a way that’s more substantial than we’ve seen in the past,” Morgan says. “Connected data platforms are about this federated data fabric–a data plane is what we’re calling it–where data exists at the edge points all the way through the cloud and in the data center, and is managed uniformly across all of these areas.”
Hortonworks, like all Hadoop distributors, has pivoted toward streaming data analytics as the Internet of Things (IoT) phenomenon picked up steam. Last year’s launch of Hortonworks Data Flow (HDF) gave Hortonworks a set at the table for the emerging market around streaming data management and streaming analytics.
“We believe that the next big data wave will bring together all of these distributed platforms,” Morgan tells Datanami. “[Hadoop] is still a very big part of the story and a very big area of business. We’re still in the early innings of that. But when we talk about data fabric and connected data platforms, we’re abstracting a layer above that.”
The company’s Hadoop Summit announcements this week revolves around the launch of HDP 2.5, which is an “extended services release” aimed at bolstering technologies in key areas, as opposed to a “core” release that updates YARN, HDFS, and MapReduce.
HDP 2.5, which is expected to ship in the third quarter, brings several new capabilities that bolster Hortonworks’ data plane theme. This week Hortonworks is announcing:
- New releases of Apache Storm, Apache Hive, and Apache HBase;
- New release of Apache Atlas, and Atlas integration with Apache Ranger;
- the official launch of a new data science notebook Apache Zeppelin;
- a new release of Apache Ambari console;
- an expansion of its partnership with Microsoft (NASDAQ: MSFT);
- and a partnership with OLAP-on-Hadoop vendor AtScale.
The integration of Atlas (used for data governance in Hadoop) and Ranger (used for access control in Hadoop) will protect the integrity of data flowing into the platform through the use of dynamic tagging. According to Morgan, it’s the first technology of its kind, and is far superior to the current row and column access control that’s widespread today.
“It gives you a dynamic security blanket that can automatically update itself based on new data that streams in,” he says. “Maybe it’s PII [personally identifiable information]– you can have policies for that that say you can see Social Security numbers and names but we’ll never let you can see Social Security numbers and names together.”
Cross-component compatibility of data lineage tracking is another new feature in Atlas. Previously, Atlas users could set up lineage for one Hadoop component, like Hive, only to find the lineage didn’t extend to other projects. Now, the data lineage and tracking features of Atlas stay intact no matter how the data is accessed, be it through Hive, Kafka, Scoop, or Storm.
HDP 2.5 also brings the formal launch of Apache Zeppelin. The data science notebook, which Hortonworks unveiled earlier this year, should provide comfortable place for data scientists to engage with Apache Spark. “It’s about taking Spark out of the pilot stage and going enterprise wide,” Morgan says.
Ambari, the open source operations console, gets spruced up with a new feature called Smart Sense that’s designed to give administrators a predictive and proactive view of their infrastructure, “so they can automatically identify problems within in their infrastructure before they manifest themselves,” Morgan says.
The new release of Hive–which Hortonworks engineers dubbed Live Long and Process (LLAP, que the Mr. Spock references)–should bring an 80 to 90 performance boost compared to pervious versions, Morgan says. “We’re now able to have a conversation about sub second response time to queries. This is enormously valuable to somebody doing ad hoc analytics,” he says.
The HBase NoSQL database has gotten something that almost everybody will appreciate it: support for SQL. “We’ve enabled full SQL access within the HBase layer with connectivity to any ODBC compliant BI tool,” Morgan says. “That’s enormous value for people who want to standardize around HBase.”
The new release of Storm (version 1.0) introduces new streaming capabilities, including sliding and tunneling window support, which allows customers to take snapshots of a stream. “It also brings new ways to manage back pressure so users don’t lose streaming data, and enables you to do resource awareness scheduling and integration into operations console,” Morgan says.
The new partnership with AtScale will give data analysts new ways to access data stored in Hadoop. “The idea you can take Hadoop and open it up to business analysts through a full BI tool from a single vendor, enables organizations to maximize the value of that deployment,” Morgan says.
Hortonworks has had a close partnership with Microsoft for years. In fact, Microsoft’s version of Hadoop, called HDInsights, is based on HDP. Now the two vendors are taking their partnership into the cloud, thanks to Hortonworks decision to name HDInsight as its premier connected data platform cloud solution partner.
The cloud partnership with Microsoft is a key part of Hortonworks’ connected data plane strategy, Morgan says.
“Whether you’re talking about mobile apps, you’re talking about energy and production, or your talking about farming and agriculture, the cloud is now becoming a big piece of the data fabric,” he says. “It’s not beyond the data lake. It’s in addition to a data lake, and in fact it could be many data lakes within in the cloud. So connected data platform is embracing this mega trend.”