Hadoop clusters are popping up everywhere. Almost every large enterprise customer I speak with has already deployed or is in the process of deploying Hadoop clusters for generating data-driven business intelligence. Unfortunately, Hadoop was not designed with security in mind and that can pose a serious problem in this age of intensifying cyber threats. The simple fact is, data is ingested into Hadoop clusters from many sources and it typically includes sensitive data such as Personally Identifiable Information (PII), Personal Health Information (PHI) and/or intellectual property (IP) that must be protected.
Yes, that new large and data rich Hadoop repository represents a big, juicy target for cyber criminals because it is a mixing bowl of business-critical data from multiple sources. And perimeter security is of little to no use because attackers have most likely penetrated the perimeter already. What enterprises need to do to is take a holistic approach to securing their valuable data, and it entails bringing controls closer to the data. For those interested in mitigating data breach risks, taking a best-practices approach to protecting their Big Data implementations should include the following three steps:
- Create a data firewall: Establish policies that only allow access to authorized users. A good policy manager/security system will also detect and prevent privileged users from unauthorized access.
- Protect the data: Use encryption and centralized key management to protect the data in transit between the various nodes of the Hadoop cluster as well as the data at rest in storage.
- Gather Security Intelligence: Analyze an audit log of all accesses to your data. Doing this will provide important security intelligence such as anomaly detection or suspicious frequent access to sensitive data and can help prevent Advanced Persistent Threats (APTs).
So, what is your organization doing to protect its Hadoop clusters?