Thales Blog

Variety, Volume And Velocity: Advice For A CSO In The Age Of Data Proliferation

March 17, 2015

Sol Cates Sol Cates | Principal Technologist, Data Protection More About This Author >

Exponential Data GrowthData as we all know is growing at a rapid pace. When we speak about data proliferation – the enormous amount of data stored by businesses and government agencies – it is imperative to note that data is not homogenous. This data is coming in via many sources such as cloud, big data and legacy environments and in varied forms. Therein you have the three “V’s” of data proliferation: Velocity, Variety and Volume*. These three variables make managing and securing data a real challenge.

ClickToTweet: Data velocity, variety and volume - Securing data is a real challenge @solcates #DefenderOfData

Data is coming in at a faster clip than ever before. The velocity of data production has IT departments grappling to manage it all. Where will all this data live? And not just today’s data, but what about tomorrow’s data? And let’s not forget yesterday’s data. It’s not uncommon for organizations to have versions of data that is 20 years old.

As I noted earlier, data is not consistent or identical; it’s varied. It could be in a big data farm or a data lake. Data is stored in ERP systems and CRM systems. There is also the unstructured data from video files, secret recipes, and contract proposals. All viable sources for organizations to store, analyze and derive value. But how do you secure all of it? Especially when the data is held in a cloud service? When using SaaS products, the format and how users interact is unique to each provider. Who is responsible for protecting my customer data held in a SaaS-based CRM system? If a security professional is looking at the data, do they understand all the ways that they need to protect it?

Everyone is producing data. Employees are producing data. Your customers are producing data. Machines are producing data. And it’s happening every single minute. EMC recently reported that data volumes can be expected to double every two years, with the greatest growth coming from the vast amounts of new data being produced by intelligent devices and sensors. This massive volume of data production is only getting worse every year. And we are not great at storing this data in a single cohesive way. Which means that the security guy is left to figure out how to protect all this data. For example a large bank that invested in creating a data lake. Then they realized that they could not put any regulated data in the data lake – it didn’t meet the standards. If you put in credit card numbers or healthcare information, you must still comply with PCI DSS or HIPAA regulations. No one knew how to put controls around a data lake. And in data lakes, the data is almost always unstructured.

So, if you’re a CSO with a data proliferation problem, what do you do?

Start by asking yourself three questions:

  • What type of data do I have and where is it stored?
  • What data do I care about?
  • How should I protect the data I care about?

Once you identify where your data is and what data you care about, then you get to the harder question of how to protect it. The good news is that there are options! But be wary of falling into the trap of niche solutions. It’s best to find a platform that can address most of your requirements and security concerns. Use the K.I.S.S. method – Keep It Simple Stupid. By simplifying your solution, you’ll simplify your problem. Using 20 different products means working with 20 different vendors and teams of people that only know how to work with certain products. Sounds like more headaches.

I recommend encryption, tokenization and data masking as part of protecting the data you care about. What do you use?

* Note that the 3V's concept originates with Doug Laney at Gartner ... Find a post on the 3V's here: