mapreduce Archives - Page 4 of 6

Spark Graduates Apache Incubator

As we've touched on before, Hadoop was designed as a batch-oriented system, and its real-time capabilities are still emerging. Those eagerly awaiting this next evolution will be pleased to hear about the graduation of Apache Spark from the Apache Incubator. On Sunday, the Apache Spark Project committee unanimously voted to promote the fast data-processing tool out of the Apache Incubator. Read more…

What Can GPFS on Hadoop Do For You?

The Hadoop Distributed File System (HDFS) is considered a core component of Hadoop, but it’s not an essential one. Lately, IBM has been talking up the benefits of hooking Hadoop up to the General Parallel File System (GPFS). IBM has done the work of integrating GPFS with Hadoop. The big question is, What can GPFS on Hadoop do for you? Read more…

Rethinking Real-Time Hadoop

Hadoop is considered by many to be the best and brightest platform for running big data analytics. Its ability to scale and its vibrant open source community, it is thought, are cementing its place as the center of the analytic data hub of the future. The only problem: Hadoop was envisioned as a batch-oriented system, and its real-time capabilities are still emerging, which has created a gap that fast in-memory NewSQL databases are rushing to fill. Read more…

MapReduce Alternatives in Bioinformatics

Over the last decade, MapReduce has emerged as a popular software framework for processing big data sets on large clusters of commodity hardware. The technique came out of a paper published by Google in 2004, which explains: “Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model...” Read more…

How SQL-on-JSON Analytics Bolstered a Business

Running a multi-channel loyalty and engagement platform requires a lot of data. With 18,000 loyalty programs running across Web, mobile, social, and email channels, the folks at PunchTab are up to their eyeballs in data for customers like Arby's, Green Day, ConAgra Foods. But when it came to allowing customers to analyze all that data, its NoSQL-based transactional system came to a grinding halt. Read more…

MapR Embraces Co-Existence with Hadoop Update

MapR Technologies today unveiled new products based on the Hadoop version 2 codebase that it says will allow customers to continue to run MapReduce version 1 applications while also reaping the rewards of a post-YARN Hadoop world. The company also announced the capability to run the HP Vertica columnar analytic database directly on its Hadoop stack. Read more…

Red Hat Deal Gives Hortonworks Enterprise Clout

The strategic partnership that Hortonworks and Red Hat unveiled today is interesting on several fronts. For starters, there are the joint development projects, most notably the connector that allows Hortonworks' Hadoop distribution to talk natively to Red Hat Storage. Beyond that, the partnership with the $10-billion commercial open source software company provides a glimpse of who Hortonworks might want to be when it grows up. Read more…

Has Dirty Data Met Its Match?

One of the dirty little secrets about big data is the amount of manual effort it takes to clean the data before it can be analyzed. You may have the best and brightest data scientists on your team, but unless you liberate them from the drudgeries of digital janitorial work, you aren't getting their best work. Today, the data cleansing startup Trifacta launched its first product aimed at alleviating data professionals from the burden posed by traditional data cleansing processes. Read more…

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Let's face it: Writing MapReduce processes is not very fun. That's the main reason that the Cascading framework is gaining such a big following--because it abstracts away the difficult part of MapReduce with an easy-to-use Java API and library. With today's launch of a new product called Driven, the company behind Cascading is enabling users to instrument the data analytic apps developed with Cascading, in pursuit of faster troubleshooting and higher performance. Read more…

Set Your ‘Inner Graphista’ Free, Neo Says

Got a little graph database in you? Then you may be a "graphista" in the eyes of graph database developer Neo Technology, which yesterday unveiled a new release of its Neo4j graph database that it says will usher in the era of mainstream graph processing. Read more…

Datanami Dishes on ‘Big Data’ Predictions for 2014

This space was going to feature a "Top 10 Big Data Predictions for 2014" story. But considering the large number of such stories currently in circulation, a different tact was in order. Instead, you'll find a selection of pertinent predictions from players in the "big data" software industry, followed by Datanami's opinion as to whether it will be spot on or whether the soothsaying will miss the mark. Read more…

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Reaping the Fruits of Hadoop Labor in 2014

There's been a lot of work poured into Hadoop over the last few years, culminating with the launch of Hadoop version 2 in October. As we head into 2014, commercial Hadoop vendors like Hortonworks and Cloudera will continue to invest in R&D, but you can also expect to see a stronger emphasis on converting that past investment into sales and profits. However, going forward, the business models for these top two Hadoop vendors are diverging. Read more…

Finding Big Data Treasure in the Cloud

Heading into 2014, one of the big data trends that will intensify is the transition toward end-to-end data analytic services hosted in the cloud. One of the promising big data cloud services is Treasure Data, a Silicon Valley company that offers an interesting mix of MapReduce, columnar databases, and intelligent agent technology that's aimed at helping clients get a quick return on their big data investments. Read more…

Cloudera Articulates a ‘Data Hub’ Future for Hadoop

The evolution of Hadoop from an overflow parking lot for data into a field of analytic dreams is unfolding right before our eyes. Among the vendors trying to help the elephant along is Cloudera, which used the Strata +Hadoop World conference this week to lay out its plans to remake Hadoop as a centralized "data hub" for enterprises. The firm also launched betas for its Hadoop 2 distributions, a partnership with the company behind Apache Spark, and a new cloud program for partners. Read more…

HDP 2.0: Rise of the Hadoop Data Lake

Hortonworks became the first Hadoop distributor to ship the new Hadoop version 2 software today when it announced the general availability of Hortonworks Data Platform (HDP) 2.0. The update will enable customers with small Hadoop clusters to upgrade their big data platform into a shared Hadoop service, or a data lake, a Hortonworks executive explains. Read more…

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

HortonWorks Reaches Out to SAS and Storm

Hortonworks this week revealed a new partnership with SAS that will enable the analytics giant to use its tools to analyze data stored in Hortonworks' Hadoop distribution. It also announced plans to integrate the Apache Storm stream processing engine into its distribution, and to ship a preview by the end of the year. Read more…

GridGain Puts Thrusters On MapReduce With Hadoop Accelerator

In-memory computing has been all the rage in 2013 as vendors far and wide line up to offer businesses the ability to get more speed for their processing dollar. GridGain, a company that has built itself around the in-memory concept, say it’s bringing it to Hadoop with an accelerator that it says gives thrusters to the framework. Read more…

Spark Graduates Apache Incubator

What Can GPFS on Hadoop Do For You?

Rethinking Real-Time Hadoop

MapReduce Alternatives in Bioinformatics

How SQL-on-JSON Analytics Bolstered a Business

MapR Embraces Co-Existence with Hadoop Update

Red Hat Deal Gives Hortonworks Enterprise Clout

Has Dirty Data Met Its Match?

Shining a Light on Hadoop’s ‘Black Box’ Runtime

Set Your ‘Inner Graphista’ Free, Neo Says

Datanami Dishes on ‘Big Data’ Predictions for 2014

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Reaping the Fruits of Hadoop Labor in 2014

Finding Big Data Treasure in the Cloud

Cloudera Articulates a ‘Data Hub’ Future for Hadoop

HDP 2.0: Rise of the Hadoop Data Lake

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

HortonWorks Reaches Out to SAS and Storm

GridGain Puts Thrusters On MapReduce With Hadoop Accelerator

July 7, 2025

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

Sponsored Partner Content

AI That Knows Your Business: Meet Cube D3

Mainframe data: A powerful source for AI insights

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

Unlocking Unstructured Data with GenAI
No Comments

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Tag: mapreduce

July 7, 2025

July 3, 2025

July 2, 2025

July 1, 2025

June 30, 2025

Sponsored Partner Content

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Sponsored Multimedia

Contributors