5 v's of Big Data

Embed Size (px)

Citation preview

  • 7/25/2019 5 v's of Big Data

    1/6

    Sahil RajMBA Business Analytics

    University of Petroleum & Energy Studies (Dehradun)

  • 7/25/2019 5 v's of Big Data

    2/6

    Big dataas by its name suggests data in great volumes. But there is lot more than just volume when it comes

    to Big Data, and I will try to address all those aspects through this article. Simultaneously it has brought both,

    great opportunity and change to the technological industry at the same time. Data scientists traditionally look

    at the existing V's, the ones that have classically been utilized to understand key variables of any data set.

    Subsequently, we should also look for other factors which determine the character and value of the

    voluminous data available.

    Fig.1: http://goranxview.blogspot.in/2011/03/new-buzz-social-media-analytics.html

  • 7/25/2019 5 v's of Big Data

    3/6

    Fig.2: Big Data Timeline

    Let us go back and discover how Big Datacame into picture. In early 1960s we used to have traditionallegacy systems. Then we moved towards Mainframes by 1970s. They grew till 1990s, the time when

    personal computers surfaced. Around mid to late 1990s revolutionary changes occurred in storage as well as

    computing capabilities. This was the time when .com boom came; companies like Yahoo, Google, Ebay,

    Amazon etc. came into picture and started generating huge data streams. After this period social media came

    into existence with huge names like Orkut, Facebook, Linkedin, Twitter etc. All this created a huge surge in

    the data we were generating till date. It can be understood by the following image:

    Legacy

    SystemsMainframes

    Personal

    Computers.com Boom Social media

    1960s 1970s 1990s 1990s 2000s

  • 7/25/2019 5 v's of Big Data

    4/6

    Fig.2:http://freepress.intel.com/servlet/JiveServlet/showImage/38-4608-2199/InternetMinuteInfographic.jpg

    As we all know data is a gold mine of information. Thats why companies planned to store it and mine it to

    gather important information. But the data generated was not only huge in size it was also not homogeneous.

    It was in the form of text, video, audio, pictures, geospatial information etc. This forced industry big wigs to

    gather and develop solutions for storing, mining and gaining advantage out of it. These efforts gave birth to

    the term Big Data and initiated the Hadoop Project. To simply classify which data to call big and which not

    to, some guidelines in terms of Vs were created, which will be discussed in this post.

    Whenever we talk about big data, we generally come across 3 major Vs used to describe the issues of

    information overload in our digital world. Let us talk about 3 existing Vs, what other Vs can be added and

    how to deal with some of the problems arising due to Vs.

    The Existing Vs

    Analyst Doug Laney first of all coined the 3 Vs of Big Data. Data scientists traditionally look at the existing Vs

    that have classically been used to understand key variables of any data set. These are:-

    1.

    Volume:-

    Every mouse click, like, phone call, text message, web search and purchase transaction now a day is

    catalogued and stored in a cloud of big data by the organizations. The amount of data created in

    digital universe is around one Zettabyte which is equal to sextillion bytes. This explains in what

    volumes data has been created and stored these days and why it is called big data. Also withtechnology spreading ever widely this data is supposed to increase. With Internet of things being

    implemented fastly by the industry this figure is going to hit Brontobytes by the end of this decade.

    The primary goal of this large volume of data is to make it useful to companies as well as consumers

    by optimizing future results.

    2.

    Variety:-

    In todays multi-faceted internet culture, the great volumes of data are also extremely varied in form.

    So many variables can be thrown at a company that the true value of this information is often lost in

    the sea of data. For example we have purchase transactions, website traffic, rewards programs, heat

    maps, social media conversations, IoT, IT/OT, sensors data etc.

    3.

    Velocity:-

    More than 90% of the data that we have stored or are using has been generated in last 10 years or so.

    This statement shows how fastly data is being generated. Velocity is also a factor which signifies the

    big data one of its very significant attributes. Information is being created at a faster pace than ever

    before. The varied channels of big data are each day increasing their output of content. There are

  • 7/25/2019 5 v's of Big Data

    5/6

    over 1.49 billion users of Facebook alone which gives our imagination a complexity about the kind of

    data they are generating every moment.

    The missing Vs:

    With passage of time, these Vs are also not being able to classify big data. Its time to look beyond and

    inculcate some new parameters which can be helpful. The two more Vs which can be added to resolve some

    of the problems are:

    4.

    Veracity:

    This V talks about the accuracy of the data available to us. It may happen that whatever data we are

    storing may be less than 2% of that is useful. It is required to understand the problems like

  • 7/25/2019 5 v's of Big Data

    6/6

    inconsistency, incompleteness, missing data problem which can occur during data generation or

    storing. Also it may happen that we are storing data which is not even relevant to what we do.

    5.

    Value:

    This can trump all the Vs discussed till now for Big Data. All the enterprises deal with fixed business

    and before dwelling into deploying Big Data initiatives, the look at the return what they will be

    getting out of it. Until and unless it is useful to the company there is no point having access to it. So it

    is very important to understand what value we want to derive out of the data. We can also talk about

    what we want to store and what not for a specific business.

    In the end we can represent Big Data in following manner, which will not only help us in

    understanding it well, but also working with it will become easier.

    Links:

    http://www.enterprisecioforum.com/en/blogs/jdodge/who-came-5-vs-big-data-0

    https://hrboss.com/blog/2014-03-26/missing-vs-big-data-hr-5-v-model-here

    http://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/

    https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-know

    http://dataconomy.com/seven-vs-big-data/

    https://datafloq.com/read/3vs-sufficient-describe-big-data/166

    http://www.pros.com/big-vs-big-data

    Sources:

    1. Big Data, for Better or Worse: 90% of Worlds Data Generated over Last Two Years.

    2. New York Stock Exchange Ticks on Data Warehouse Appliances.

    3. The Rising Data Deluge Opportunity.

    Volume

    Velocity

    Variety

    Veracity

    Value

    http://www.enterprisecioforum.com/en/blogs/jdodge/who-came-5-vs-big-data-0http://www.enterprisecioforum.com/en/blogs/jdodge/who-came-5-vs-big-data-0https://hrboss.com/blog/2014-03-26/missing-vs-big-data-hr-5-v-model-herehttps://hrboss.com/blog/2014-03-26/missing-vs-big-data-hr-5-v-model-herehttp://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/http://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-knowhttps://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-knowhttp://dataconomy.com/seven-vs-big-data/http://dataconomy.com/seven-vs-big-data/https://datafloq.com/read/3vs-sufficient-describe-big-data/166https://datafloq.com/read/3vs-sufficient-describe-big-data/166http://www.pros.com/big-vs-big-datahttp://www.pros.com/big-vs-big-datahttp://www.pros.com/big-vs-big-datahttps://datafloq.com/read/3vs-sufficient-describe-big-data/166http://dataconomy.com/seven-vs-big-data/https://www.linkedin.com/pulse/20140306073407-64875646-big-data-the-5-vs-everyone-must-knowhttp://davebeulke.com/big-data-impacts-data-management-the-five-vs-of-big-data/https://hrboss.com/blog/2014-03-26/missing-vs-big-data-hr-5-v-model-herehttp://www.enterprisecioforum.com/en/blogs/jdodge/who-came-5-vs-big-data-0