Hadoop Gets a Security Boost from eBay’s Eagle
Security gaps have long been a concern for Hadoop users, prompting a flurry of data encryption approaches and cloud-based tools aimed at locking down the database platform.
Now, hyper-scale users like eBay are joining the effort by developing new tools to secure Hadoop in real time. The online shopping site and big time user of big data techniques announced recently that Eagle, an open source data activity-monitoring tool running on Hadoop has been accepted as an Apache incubator project.
The e-commerce web site currently uses Hadoop for analytics chores like targeted advertising (that tends to show up minutes after you have searched on a specific product) as well as click stream analysis. The online retailer’s Hadoop security approach follows “four pillars”: access control, perimeter security, data classification and data activity monitoring.
“We believe Eagle is a core component of Hadoop data security,” eBay engineers noted in announcing the Apache Eagle security effort in a recent blog post.
The eBay developers emphasize Apache Eagle’s real-time capabilities since speed is critical in dealing with security breaches. Hence, they designed Apache Eagle “to make sure that the alerts are generated in a sub-second and that the anomalous activity is stopped if it’s a real threat.”
According to the new Github site established for Apache Eagle developers, the open source monitoring tool is intended to “instantly identify access to sensitive data, recognize attacks, malicious activities in Hadoop and take actions in real time.”
The eBay developers also stressed scaling, noting that Eagle can be deployed on multiple Hadoop clusters with petabytes of data and up to 800 million user access events per day.
The security tool also would include a machine-learning module. That, the developers said, “provides capabilities to define user activity patterns or user profiles for Hadoop users based on the user behavior in the platform.
“The idea is to provide anomaly detection capability without setting hard thresholds in the system,” they explained.
The user profiles generated by the Eagle system are modeled with machine-learning algorithms that are intended to detect anomalous activities such as when a user’s activity pattern differs from their pattern history. Eagle currently uses two algorithms for anomaly detection, the developers said: Eigenvalue Decomposition and Density Estimation.
Eagle also uses the Storm framework for near-real-time anomaly detection to determine if current user activities are suspicious or normal with respect to their model.
The eBay developers also said Eagle includes an SQL-like service API to support crunching massive data sets. Eagle also supports HBase for data storage along with a relational database.
The framework is currently being used by eBay to monitor data access activities on a 2,500-node Hadoop cluster. The e-commerce site plans to extend Eagle to other Hadoop clusters encompassing 10,000 nodes by the end of 2015.
Finally, Eagle is monitors Hadoop applications, core services and eBay’s entire Hadoop cluster along with the overall operation of nodes.