Everyday day around 2.5 Quintillion (10 followed by 18 Zeros) bytes of data is created, and most of the data has been created in the past 2 years!!!!. Most of the data available today is unstructured providing real time useful information.
Thus Big data can be defined as any data having 3V (Volume, Variety & Velocity)
Volume – Any data that has considerable size like data size increasing to multi-terabyte and multi-petabyte range would need different processing from the conventional algorithm for speed and accuracy. Thus Volume is the first dimension for any data to quality as BIG data.
Velocity – Velocity is the second dimension for the Big data. Conventional algorithm can process large volume of data with a trade off on the time which can range from few hours to overnight processing. However in case of national security or real time proactive action, overnight is not a good option. So when there is a need to process huge amount of data in quick span of time, the data can be called Big Data.
Variety – The third dimension of Big Data is that the data has mix of various data types & formats. The data can be logs, video, audio, pictures, financial transaction, text message, etc. Traditional database mandates that data be represented in rows and columns. But in real life information is generated in all forms of artifacts.
When Volume, Velocity & Variety of data is combined, there is a need to process the data differently from conventional methods. – This were Big Data processing comes into picture.
There are 2 other V recommended by SAS for Big Data
Variability – The volume velocity and Variety can vary depending on the period of the day or based on a particular event. Like during festivals the sales data , retail transaction data, and browsing data can peak.
Complexity – There are various sources of data in today’s world. In some situation it is needed to match up data from different sources in different format to reach a logical conclusion.