Follow BigDATAwire:

Tag: mapreduce

Yahoo: We Run the Whole Company on Hadoop

Hadoop is absolutely critical to the operations of Yahoo, executives with the company said this week at the Hadoop Summit. While the company, which spun out Hortonworks in 2011, is moving away from “traditional” Hado Read more…

Hortonworks Spins Up a YARN Readiness Program

Hortonworks today launched its YARN Ready Program to help Hadoop application vendors adopt the technology that's at the heart of the modern Hadoop v2 infrastructure. YARN is the key piece of technology that enables mu Read more…

MUJI Finds Retail Mojo with Hybrid Analytics

Japanese retailer MUJI is enjoying the fruits of a hosted big data solution that delivers a more accurate view of its customers and their interests. The solution, which is split between Treasure Data and Amazon Redshift, Read more…

Making Hadoop Relevant to HPC

Despite its proven ability to affordably process large amounts of data, Apache Hadoop and its MapReduce framework are being taken seriously only at a subset of U.S. supercomputing facilities and only by a subset of profe Read more…

Cascading Now Supports Tez–Spark and Storm Up Next

Concurrent, the company behind the open source Cascading framework, today unveiled a major update that will allow its customers to migrate their Hadoop applications from using MapReduce to use the new Apache Tez engine, Read more…

IBM Finds the Need for (SQL on Hadoop) Speed

IBM will be joining Cloudera, Hortonworks, and others in the great SQL-on-Hadoop performance race when it ships Big SQL version 3 next month. In addition to peddling unadulterated speed, IBM will be touting security, dat Read more…

Big Data Confusion Drives MongoDB and Cloudera Together

At first glance, the partnership that Cloudera and MongoDB unveiled today is a bit of a head scratcher. While the two companies are arguably the biggest software vendors in the nascent space, they swim in opposite ends of the big data pool. It turns out, that's exactly why the companies felt they needed to work together. Read more…

Crossing the Big Data Stream with DataTorrent

Enterprises eager for a competitive edge are turning to in-memory stream processing technologies to help them analyze big data in real time. The Apache Spark and Storm projects have gained lots of momentum in this area, as have some analytic NoSQL databases and in-memory data grids. Another streaming technology worth keeping an eye on is DataTorrent. Read more…

Hortonworks Keen on Cascading-Tez Combo

In the future, it will be easier to build big data applications, and they'll run faster and utilize more real-time data than today's apps, too. Two vendors working to make that future a reality, Hortonworks and Concurrent, today announced they'll work together to build and assemble the next generation of Hadoop apps running on YARN, Tez, and Apache Spark. Read more…

How Fast Data is Driving Analytics on the IoT Superhighway

The promise of big data is morphing into the fast data opportunity. Unless you have the capability to respond to the Internet of Things and the trillions of data points generated by smartphones, sensors, and social media, the business opportunities of fast data can pass you by. Read more…

Faceboook Gets Smarter with Graph Engine Optimization

Last fall, the folks in Facebook's engineering team talked about how they employed the Apache Giraph engine to build a graph on its Hadoop platform that can host more than a trillion edges. While the Graph Search engine is capable of massive graphing tasks, there were some workloads that remained outside the company's technical capabilities--until now. Read more…

Hortonworks Drives Stinger Home with HDP 2.1

Hortonworks today unveiled a major new release of its Hadoop distribution that puts significant new capabilities into the hands of its customers. The speed and scale of SQL processing in Apache Hive were improved with the final phase of the Stinger initiative, while the additions of Apache Storm and Apache Solr in HDP 2.1 open up new ways for customers to manipulate their data. Security and data governance were bolstered with Apache Knox and Apache Falcon, respectively, while Apache Spark is now available as a tech preview. Read more…

Databricks Moves to Standardize Apache Spark

Databricks, the company behind open source Apache Spark, today rolled out a certification program that creates a Spark standard that big data analytic application developers can write to, and that customers can rely on. It's a smart move by Databricks, which is looking to avoid the forking that has clouded Hadoop's march into the enterprise. Read more…

Picking the Right Tool for Your Big Data Job

There is a lot of debate in the big data space about tools and technology, and which ones are best. Is SQL better than NoSQL? Hadoop or Spark? What about R or Python? Of course no single tool or technology is the best for all situations, and you would do well to pick the right tool or technology for the job at hand. Read more…

Hadoop and NoSQL Now Data Warehouse-Worthy: Gartner

Not long ago, the rules for what constituted a data warehouse were fairly well defined. The schema was fixed, you could say, and was based primarily on relational database technology designed to process structured data. My, how times have changed. Last week, Gartner for the first time accepted non-relational technologies--including those based on Hadoop and NoSQL--in its annual Magic Quadrant for Data Warehouses report. Read more…

Lessons In Machine Learning From GE Capital

The financial services industry is always on the cutting edge, and so it is with machine learning at GE Capital, the lending and leasing arm of the industrial giant. Read more…

Avoiding Big IT Outlays with Cloud-Based Analytics

If your organization has adopted a big data strategy, chances are good that it's also making big capital outlays to support that strategy. One firm that's hoping to short-circuit the hardware and software investment cycle is GoodData. Today the San Francisco company announced the addition of Hadoop to its cloud-based big data analytics stack, giving it the capability to ingest and store a whole new level of data. Read more…

Top 10 Netflix Tips on Going Cloud-Native with Hadoop

Four years ago Netflix made the decision to move all of its data processing--everything from NoSQL and Hadoop to HR and billing--into the cloud. While going "cloud native" on Amazon Web Services hasn't been without its challenges, the move has benefited Netflix in multiple and substantial ways. Here are 10 tips from Netflix on making the cloud work. Read more…

A Peek Inside Cisco’s Hadoop Security Machine

The Internet is the ultimate invention of man, a creation that will forever change how humans work, live, and play. But for all the good it's capable of, the Internet has also created a comfortable home for cybercriminals, who use increasingly sophisticated techniques to siphon hundreds billions of dollars from the global economy. One company that's upping the ante in the battle against cybercriminals is Cisco, which is using a 60-node Hadoop cluster to separate criminal signals from the Internet's noise. Read more…

Hadoop No ‘Sideshow,’ Intel Says

When Intel launched its Hadoop distribution a year ago, there were some in the industry who viewed the move skeptically. Intel's specialty, after all, is developing processors, motherboards, and systems. What does it know about writing and supporting open source software? Apparently, quite a bit. What's more, the company's commitment to open source software may be stronger than you think. Read more…

BigDATAwire