Tag: mapreduce
Spark Graduates Apache Incubator
As we've touched on before, Hadoop was designed as a batch-oriented system, and its real-time capabilities are still emerging. Those eagerly awaiting this next evolution will be pleased to hear about the graduation of Apache Spark from the Apache Incubator. On Sunday, the Apache Spark Project committee unanimously voted to promote the fast data-processing tool out of the Apache Incubator. Read more…
What Can GPFS on Hadoop Do For You?
The Hadoop Distributed File System (HDFS) is considered a core component of Hadoop, but it’s not an essential one. Lately, IBM has been talking up the benefits of hooking Hadoop up to the General Parallel File System (GPFS). IBM has done the work of integrating GPFS with Hadoop. The big question is, What can GPFS on Hadoop do for you? Read more…
Rethinking Real-Time Hadoop
Hadoop is considered by many to be the best and brightest platform for running big data analytics. Its ability to scale and its vibrant open source community, it is thought, are cementing its place as the center of the analytic data hub of the future. The only problem: Hadoop was envisioned as a batch-oriented system, and its real-time capabilities are still emerging, which has created a gap that fast in-memory NewSQL databases are rushing to fill. Read more…
MapReduce Alternatives in Bioinformatics
Over the last decade, MapReduce has emerged as a popular software framework for processing big data sets on large clusters of commodity hardware. The technique came out of a paper published by Google in 2004, which explains: “Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model...” Read more…
How SQL-on-JSON Analytics Bolstered a Business
Running a multi-channel loyalty and engagement platform requires a lot of data. With 18,000 loyalty programs running across Web, mobile, social, and email channels, the folks at PunchTab are up to their eyeballs in data for customers like Arby's, Green Day, ConAgra Foods. But when it came to allowing customers to analyze all that data, its NoSQL-based transactional system came to a grinding halt. Read more…
MapR Embraces Co-Existence with Hadoop Update
MapR Technologies today unveiled new products based on the Hadoop version 2 codebase that it says will allow customers to continue to run MapReduce version 1 applications while also reaping the rewards of a post-YARN Hadoop world. The company also announced the capability to run the HP Vertica columnar analytic database directly on its Hadoop stack. Read more…
Has Dirty Data Met Its Match?
One of the dirty little secrets about big data is the amount of manual effort it takes to clean the data before it can be analyzed. You may have the best and brightest data scientists on your team, but unless you liberate them from the drudgeries of digital janitorial work, you aren't getting their best work. Today, the data cleansing startup Trifacta launched its first product aimed at alleviating data professionals from the burden posed by traditional data cleansing processes. Read more…
Shining a Light on Hadoop’s ‘Black Box’ Runtime
Let's face it: Writing MapReduce processes is not very fun. That's the main reason that the Cascading framework is gaining such a big following--because it abstracts away the difficult part of MapReduce with an easy-to-use Java API and library. With today's launch of a new product called Driven, the company behind Cascading is enabling users to instrument the data analytic apps developed with Cascading, in pursuit of faster troubleshooting and higher performance. Read more…
Set Your ‘Inner Graphista’ Free, Neo Says
Got a little graph database in you? Then you may be a "graphista" in the eyes of graph database developer Neo Technology, which yesterday unveiled a new release of its Neo4j graph database that it says will usher in the era of mainstream graph processing. Read more…
Google Bypasses HDFS with New Cloud Storage Option
Google Hadoop customers can now run MapReduce jobs directly against data stored in the Google Cloud Storage and leave HDFS out of the big data equation as a result of a new cloud storage Hadoop connector the Web giant unveiled today. Read more…
Datanami Dishes on ‘Big Data’ Predictions for 2014
This space was going to feature a "Top 10 Big Data Predictions for 2014" story. But considering the large number of such stories currently in circulation, a different tact was in order. Instead, you'll find a selection of pertinent predictions from players in the "big data" software industry, followed by Datanami's opinion as to whether it will be spot on or whether the soothsaying will miss the mark. Read more…
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Hadoop MapReduce has been widely embraced for analyzing large, static data sets. New technology integrates a stand-alone MapReduce engine into an in-memory data grid, enabling real-time analytics on live, operational data. This dramatically shortens analysis time by 20x from minutes to seconds. Numerous applications now can benefit from real-time MapReduce. Read more…
Reaping the Fruits of Hadoop Labor in 2014
There's been a lot of work poured into Hadoop over the last few years, culminating with the launch of Hadoop version 2 in October. As we head into 2014, commercial Hadoop vendors like Hortonworks and Cloudera will continue to invest in R&D, but you can also expect to see a stronger emphasis on converting that past investment into sales and profits. However, going forward, the business models for these top two Hadoop vendors are diverging. Read more…
Finding Big Data Treasure in the Cloud
Heading into 2014, one of the big data trends that will intensify is the transition toward end-to-end data analytic services hosted in the cloud. One of the promising big data cloud services is Treasure Data, a Silicon Valley company that offers an interesting mix of MapReduce, columnar databases, and intelligent agent technology that's aimed at helping clients get a quick return on their big data investments. Read more…
Cloudera Articulates a ‘Data Hub’ Future for Hadoop
The evolution of Hadoop from an overflow parking lot for data into a field of analytic dreams is unfolding right before our eyes. Among the vendors trying to help the elephant along is Cloudera, which used the Strata +Hadoop World conference this week to lay out its plans to remake Hadoop as a centralized "data hub" for enterprises. The firm also launched betas for its Hadoop 2 distributions, a partnership with the company behind Apache Spark, and a new cloud program for partners. Read more…
HDP 2.0: Rise of the Hadoop Data Lake
Hortonworks became the first Hadoop distributor to ship the new Hadoop version 2 software today when it announced the general availability of Hortonworks Data Platform (HDP) 2.0. The update will enable customers with small Hadoop clusters to upgrade their big data platform into a shared Hadoop service, or a data lake, a Hortonworks executive explains. Read more…
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Hadoop MapReduce has been widely embraced for analyzing large, static data sets. New technology integrates a stand-alone MapReduce engine into an in-memory data grid, enabling real-time analytics on live, operational data. This dramatically shortens analysis time by 20x from minutes to seconds. Numerous applications now can benefit from real-time MapReduce. Read more…
HortonWorks Reaches Out to SAS and Storm
Hortonworks this week revealed a new partnership with SAS that will enable the analytics giant to use its tools to analyze data stored in Hortonworks' Hadoop distribution. It also announced plans to integrate the Apache Storm stream processing engine into its distribution, and to ship a preview by the end of the year. Read more…
GridGain Puts Thrusters On MapReduce With Hadoop Accelerator
In-memory computing has been all the rage in 2013 as vendors far and wide line up to offer businesses the ability to get more speed for their processing dollar. GridGain, a company that has built itself around the in-memory concept, say it’s bringing it to Hadoop with an accelerator that it says gives thrusters to the framework. Read more…