This week I spent some time at the Strata+ HADOOP Big Data conference in New York. The conference is co-sponsored by O’Reilly and Cloudera and had about 3000 attendees.
The Big Data HADOOP ecosystem feels like the Wild West: a very large ecosystem, many new products and product areas (most quite immature) but few clear leaders in most areas. There are more symptoms that this industry segment is still in the process of “booting up” - a severe shortage of data scientists, analysts and HADOOP experts was one clear indicator, with many companies openly recruiting at the event.
Even so, it was clear at the conference that though Big Data HADOOP is forecasted to be a multi-billion dollar enterprise market soon, organizations will probably not perform a “rip and replace” of their existing data warehouse applications and infrastructure, the two solutions are likely to co-exist and complement each other for some time. One example case, is HADOOP acting as the filter layer for massive data sets, that can then be used more productively in a traditional data warehouse. Another view (one proposed by Teradata) sees a three component view, with HADOOP, Enterprise Data Warehousing, and Discovery platforms as the three core components of a next generation data architecture. Some do put forward the case that HADOOP is the platform for the future, and that (inevitably) data currently residing in Data Warehouse Systems will be migrated to HADOOP, but from what I saw at the conference, this seems unlikely.
Clearly the headline lines news for the show was Cloudera’s announcement of its fifth generation Platform for Big Data, Cloudera Enterprise (Cloudera HADOOP V5 or CDH 5), and positioning that its distribution (Cloudera HADOOP) is the Enterprise Data Hub. The new features include in-memory operation capabilities, snapshotting, NFS-based access via NFSv3, resource management extensions, discovery and more. At the same time, security features continue to creep into the big Data framework. MapR, for instance, announced at the show a security beta, featuring HTTPS/certificate-based and Kerberos authentication integrated with Active Directory via LDAP.
Data security also came up as a significant issue in a number of my conversations at the conference, especially as enterprises are moving beyond the exploration phase and beginning to seriously deploy HADOOP platforms. There were also several security focused session on multiple tracks for the conference. Vormetric already has a significant offering here – protecting data at the file system level within HADOOP cluster nodes.
My key take-aways from this conference? As successful as HADOOP is today, we could be just at the start of its relevance and success in enterprise markets. In addition, there is clearly customer demand for enhanced security around the HADOOP platform, and this should continue to increase as more organizations move beyond the exploratory stage and into wider adoption.