Big data refers to huge volume of data coming from any type of data source and is streaming at a faster rate which makes it difficult for traditional data processing method to store and process it. Organizations need to process this data from various data sources quickly and maintain the data quality at the same time. The Big Data in combination with Analytics can support organizations answer key questions like predicting customer behavior, reduction in cost, reduce wastage, fraud detection, determine risks in matter of seconds and at near real time wherever feasible.
This will help organizations use underlying reliable information to drive the strategy and make intelligent decisions. Traditionally, organizations have been using concept of “Data Warehouse” to store and transform data before consuming that using Business Intelligence and Analytic tools. But with the rise of new data sources like sensors, Internet of Things, numerous unstructured data and the rise in data volume, data warehouse is struggling to support the new data streams.
Data Lake compliments Data Warehouse to support this drawback by providing staging area to store all of the raw data before processing them. The emphasis is on the raw data being available without adhering to any pre-defined requirement. That way downstream applications can consume it anytime in future.