NiteshRoy Big Data Analytics

Embed Size (px)

DESCRIPTION

Big Data is the new thing! Few facts and figures explaining "How Big is Big Data"

Citation preview

  • Big Data Analytics

    Nitesh A Roy

    2015351,Sect ion: F, E-mail: [email protected], [email protected]

    Introduction

    With worlds total data doubling every 1.2 years it is almost impossible to store, manage and retrieve this amount of

    data with least efforts. This is where big data analytics comes into picture. Out of approximately 7 Billion people in

    the world:

    40 Zeta byte of will be created by 2020,

    6 Billion People use cell phones everybody,

    4,261,631,795 google searches on 11:35 pm on 21st August 2015. [1]

    1 TB of data is processed in every working session of NYSE,

    Most companies in U.S. have close to 100 Tera Bytes of data storage.

    Generating 2.5 Quintillion Bytes (1018 Bytes) every day from consumer transactions, Communication Devices and Online Behaviour [2].

    30 billion pieces of content shared everyday on Facebook.

    Source: http://www.scidev.net/global/data/feature/big-data-for-development-facts-and-figures.html

  • With such huge amount of unstructured data being available, the most important thing which might be done is

    restructure it into some useful information. This data when examined properly shall:

    uncovers hidden patterns,

    explain various market trend both domestic as well as global,

    explain customer preferences, likes and dislikes.

    Lastly shall also help in determining unknown correlations,

    This analytical findings might not only lead to more effective marketing, but open new revenue opportunities,

    better customer satisfaction techniques, improved service qualities , improved operational productivity and

    efficiently, shall improve competitive advantages over rival organizations and other business benefits.

    Source: http://www.internetlivestats.com/google-search-statistics/

    Concept

    Using internet on laptops, tablets and our personal devices not only consumes data in terms of internet packets but is

    also generates it at almost equivalent speed. Any modification that we bring about in the internet by either using search

    engine, online shopping, social networking not only gets recorded in the data ware houses of the websites, it almost

    simultaneously becomes accessible to them for analysis as well.

    In terms of Big Data we the internet users may be termed as Data Agents. Data Agents as simply put are any agents that bring additional information to the resources house by either adding a new set of data or modifying the existing

    records with new data.

    In 2014, the worlds information was totaled over 3.2 Zeta bytes (1021 Bytes).[1] With approximation to this if we apply Moores Law hypothesis of exponential development in Information Technology then it shall be somewhere close to

    40 Trillion Gigabytes (40*1021 Bytes). If this amount of data had to be manually analyzed it would not only cost

    enormous amount of time but shall also cost huge amount of money.

    Fixed and reliable data could be recovered only after at least a time span of 4 to 5 years minimum that too on the

    assumption that no new data shall be generated in addition to this data. We would require at least 10 times more

    servers. Presently around the globe number of computer scientists are limited to only 500,000 whereas number of

    mathematicians for data analysis is curtailed to only around 3000 around the globe. This is where Big Data Anayltics

    comes into picture.

  • Source: http://brandequity.economictimes.indiatimes.com/news/digital/big-data-special-infographic-facts-and-

    figures-that-will-amaze-you/48261745

  • Evolution of Big Data:

    2001 Google white paper submission on Google Distributed File System with sole intention of making saving and

    storing the exorbitant amount of unstructured data. 2002- 03 Google published another white paper name Google

    MapReduce Algorithm (GMRA) for analysis of this highly unstructured stored data into some meaningful

    synthesized information. [3]

    Problem with data storage system before this was it was extremely difficult to record, analyses and storage

    unstructured data. With the boom in technology and advent of Social Media websites like Twitter, Facebook, LinkedIn

    the proportion of unstructured data viz. photographs, textual data, videos, audio files etc. relating to same use r was

    booming exponentially.

    In 2005 Roger Mougalas from OReilly Media coined the term Big Data for the first time, only a year after they

    created the term Web 2.0. It refers to a large set of data that is almost impossible to manage and process using

    traditional business intelligence tools [4].

    2005 is also the year that Hadoop was created by Yahoo! built on top of Googles MapReduce. Its goal was to index

    the entire World Wide Web and nowadays the open-source Hadoop is used by a lot organizations to crunch through

    huge amounts of data. Doug Cutting in association with Yahoo developed Hadoop. Hadoop was the initial software

    which helped in structuring such unstructured data for the purpose of business analysis. This initial version was later

    sold to company name Apache.

    As more and more social networks start appearing and the Web 2.0 takes flight, more and more data is created on a

    daily basis. Innovative startups slowly start to dig into this massive amount of data and also governments start working

    on Big Data projects. In 2009 the Indian government decides to take an iris scan, fingerprint and photograph of all of

    tis 1.2 billion inhabitants. All this data is stored in the largest biometric database in the world.

    In 2010 Eric Schmidt speaks at the Techonomy conference in Lake Tahoe in California and he states that "there were

    5 exabytes of information created by the entire world between the dawn of civilization and 2003. Now that same

    amount is created every two days ."

    In 2011 the McKinsey report on Big Data: The next frontier for innovation, competition, and productivity, states that

    Here the need for analytics which were faster, more productive and efficient was developed. Big data anaylitcs

    practicality boomed globally exactly from 2011 onwards. Since as per estimations in 2018 the USA alone will face a

    shortage of 140.000 190.000 data scientist as well as 1.5 million data managers.

    Theory

    Big data analysis is a huge task. Imagine the huge volume of data and variety of formats in which the data is available

    (both structured and unstructured data) . this huge collected data is flown to and fro across the entire organizations.

    With so much amount of data n number of permutation and combinations can be made making different types of data

    the task of combining, contrasting and analyzing to find patterns and other useful business information seems more

    difficult.

    The first challenge is in breaking down data silos to access all data an organization stores in different places and often

    in different systems. A second big data challenge is in creating platforms that can pull in unstructured dat a as easily

    as structured data. [5]

    Big data can be analyzed with the software tools commonly used as part of advanced analytics disciplines such

    as predictive analytics, data mining, text analytics and statistical analysis. Mainstream BI software and data

    visualization tools can also play a role in the analysis process. But the semi-structured and unstructured data may not

    fit well in traditional data warehouses based on relational databases.

    Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need

    to be updated frequently or even continually -- for example, real-time data on the performance of mobile applications

    or of oil and gas pipelines. As a result, many organizations looking to collect, process and analyze big data have turned

  • to a newer class of technologies that includes Hadoop and related tool like YARN, MapReduce, Spark, Hive and Pigas

    well as NoSQL databases.

    In some cases, Hadoop clusters and NoSQL systems are being used as landing pads and staging areas for data before

    it gets loaded into a data warehouse for analysis, often in a summarized form that is more conducive to relational

    structures. Increasingly though, big data vendors are pushing the concept of a Hadoop data lake that serves as the

    central repository for an organization's incoming streams of raw data. In such architectures, subsets of the data can

    then be filtered for analysis in data warehouses and analytical databases, or it can be analyzed directly in Hadoop

    using batch query tools, stream processing software and SQL on Hadoop technologies that run interactive, ad hoc

    queries written in SQL.[6]

    Applications

    Big Data has been in minds of every individuals and corporates since 2012. Evaluation of Big Data and its Analytics

    is a huge task. Since 2003, Big Data seems to be a driving force for many corporate houses to understand new

    customers but also find new potential customers to expand business in un-treaded lands. It also proves to be beneficial

    to Government organizations to Serve Citizens and to mitigate frauds. In India, a part of Big data Analytics was a

    implemented to one of the leading political part named BJP (Bhartiya Janta Party) and its allies to win a highly

    successful Indian General Elections 2014 [10]. Apart from this the new govt. policies of linking the Aadhar card

    numbers to each citizen of India for ease in Tax Transparencies, LPG Subsidies, Bank KYCs not only help Govt. Of India in keeping track of the numerous transactional activities across each individual but also helps in making next

    financial and functional 5 year plans.

    Last year IBM services also helped the Bangalore Water Supply and Sewage Board (BWSSB) in incorporating

    tehnologies that creates a dashboard, taking resources from a command center at IBM Intelligent Operations Center

    (IOC), which monitors, administers and also manages the citys water supply networks. The command center monitors waterflow in 284 of 784 bulk flow meters in the city and provides a clear, single view of the functioning of all the

    bulk flow meter, amount of water transmitted by each of them, the amount of water supplied to individual parts of the

    distribution system, the level of water in each reservoir or tank etc. Data from every working flow meter will be

    reported on a single dashboard [7]. With the city population around of close to 10 million people and only two major

    water bodies Cauvery and Arkawati rivers it was almost impossible to meet the per capita water consumption. Big

    data Analytics helps quenching Bangalores thirst here.

    With Net ads increasing to close to 6 million, India is now the No.1 telecom market in the world i.e. close to 25 phones

    per 100 people. With 250 million mobile connections India is one of the most competitive market for the telecom

    industry. Bharti Airtel has entirely outsourced it Billing and 60 other platforms. With 33 thousand mobile towers in

    India huge amount of efforts go into maintaining and managing the towers i.e. 60-70% of their budget and time efforts.

    Thus Airtel Partnered with IBM, Wherein they were able to monitor real time data, which helped them to predict

    maintenance records and other data which brought down their 60 70% efforts that was initially consumed for managing this huge data. This strategic partnering with IBM has thus allowing them to focus more on customers.

    Airtel which was predominantly a service provider is now transformed into a market leader in telecom services .thus

    benefiting them to successfully handle 1.5 million new customers per month by integrating its channels and customer-

    facing processes to provide a more seamless customer experience. Bharti Airtel can now very easily move to new

    market of Tele media with Big Data Analytics working for them day and night.[8]

    Many such examples Right from Jet Airways entry into European market and its problem with carbon foot printing

    has been accurately resolved with IBMs technologies using Big Data Analytics. Jet Airways is India airline and the market leader (28 percent market share) in the domestic sector. The problem emerged when Jet Airways wished to

    extend its services to European countries. The European Union had included aviation in the EU Emissions Trading

    Scheme as part of the initiative to abate global warming. Non-European carriers also needed to comply with the

    scheme. Carriers which did not follow the compliance would face huge penalties and could even be banned from

    operating in the EU. Jet Airways India needed a solution that would help it automatize and accurately measure its

    carbon footprint and create reports . Jet Airways India now able an automated calculation and analysis process that

    will Provide accurate emission readings and predictions. The system uses rich analytics for annual emis sions,

    regulatory carbon offsets, and statistical checks the emissions. The solution can analyze the aircraft level and exclude

  • fuel overused during maintenance activities. It will help the airways in such a way that all EU flight emissions are

    properly calculated and reporting is accurate and Timely, helping achieve regulatory compliance. [9]

    Source: http://www-03.ibm.com/press/in/en/pressrelease/43242.wss

  • Additional Information: (Survey by KPMG published in Livemint on July 23 rd , 2014)

    Source: http://www.livemint.com/Industry/bUQo8xQ3gStSAy5II9lxoK/Are-Indian-companies-making-enough -

    sense-of-Big-Data.html

  • Conclusion:

    Big Data is exactly what Microsoft Office was in the early 90s. with huge potential to tap all the information around the globe Big Data Analytics is the solution to every Business related issues in this century. From just an idea for

    white paper submission to a full-fledged technology, Big Data is today incorporated in almost every sector, every

    Country. Right from providing a suggestion for airline booking, suggestion for similar products at Flipkart, EBay and

    Jabong, providing retail transactional information to companies and banks, knowing your customers needs, wants and demand was never so easy before. Right from controlling water supplies for providing water to understanding a

    country economic progress all the data is around us now. Big Data is the key that shall drive organ izations tomorrow.

    References

    [1] http://www.internetlivestats.com/google-search-statistics/

    [2] http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg

    [3] https://www.youtube.com/watch?v=ruBo8ss813Y

    [4] https://datafloq.com/read/big-data-history/239

    [5] http://www.webopedia.com/TERM/B/big_data_analytics.html

    [6] http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics

    [7] http://www-03.ibm.com/press/in/en/pressrelease/43242.wss

    [8] http://www-03.ibm.com/press/in/en/pressrelease/43585.wss

    [9] http://www-03.ibm.com/software/businesscasestudies/dk/da/corp?synkey=A624156E90445W76

    [10] http://www.livemint.com/Industry/bUQo8xQ3gStSAy5II9lxoK/Are-Indian-companies-making-enough-sense-of-Big-

    Data.html