The dawn of big data

  • View
    2.215

  • Download
    3

  • Category

    Business

Preview:

DESCRIPTION

Big Data basics

Citation preview

THE DAWN OF BIG DATA

New Rules; New Structures

Neal J. HannonUniversity of KansasFebruary 9, 2012

Definition

• Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.

More Data Please…

• In a 2001 research report[14] and related conference presentations, then META Group (now Gartner) analyst, Doug Laney, defined data growth challenges (and opportunities) as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in/out), and variety (range of data types, sources). Gartner continues to use this model for describing big data.[15]

Gartner• Worldwide information volume is growing

annually at a minimum rate of 59 percent annually, and while volume is a significant challenge in managing big data, business and IT leaders must focus on information volume, variety and velocity.• Volume• Variety • Velocity

Volume

• Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue.

Variety• Variety: IT leaders have always had an issue

translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more.

Velocity• Velocity: This involves streams of data,

structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.

Data is becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart.

Source: Data, Data Everywhere, The Economist, February 25, 2010

There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing

Eric Schmidt, Google CEO, Techonomy Conference, August 4, 2010

Why now?

Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)

Source: Mike Driscoll, CTO Metamarkets: The Three Sexy Skills of Data Scientists (& Data Driven Startups)

Large• Billions of

web clicks+1TB

• Millions of web pages

10GB-1TB

• Thousands of Sales figures

<10GB

Real-time

How can big data create value?

 • Creating transparency – enabling, for example,

the manufacturing sector to integrate “data from R&D, engineering, and manufacturing units to enable concurrent engineering ... (to) significantly cut time to market and improve quality.” This seems much like traditional data warehousing.

How can Big Data create value?• Enabling experimentation – “organizations can

collect more accurate and detailed performance data ... to instrument processes and then set up controlled experiments … (which) can enable leaders to manage performance at higher levels.” Super-crunching equals analytics + experiments.

How can Big Data create value?• Innovating new business models – “The

emergence of real-time location data has created an entirely new set of location-based services from navigation to pricing property and casualty insurance based on where, and how, people drive their cars.” This affirms Mike Loukides' assertion “that data science enables the creation of data products.”  

How can Big Data create value?• Supporting human decision making with

automated algorithms – “decision making may never be the same; some organizations are already making better decisions by analyzing entire datasets from customers, employees, or even sensors embedded in products.” The statistical learning world continues to progress.

SAS - unstructured text

• http://www.youtube.com/user/SASsoftware?v=NHAq8jG4FX4&feature=pyv&ad=8557352196&kw=data%20analytics

Pattern Based Strategy• "The ability to manage extreme data will be a core competency of enterprises that

are increasingly using new forms of information — such as text, social and context — to look for patterns that support business decisions in what we call Pattern-Based Strategy," said Yvonne Genovese, vice president and distinguished analyst at Gartner. "Pattern-Based Strategy, as an engine of change, utilizes all the dimensions in its pattern-seeking process. It then provides the basis of the modeling for new business solutions, which allows the business to adapt. The seek-model-and-adapt cycle can then be completed in various mediums, such as social computing analysis or context-aware computing engines."

Tricks of the Trade

• New Architecture

• In Memory Analytics

In-Memory Indexing at SAP• We have also got enterprise search time, we really started doing that back in

2003/2004 time period, that’s also when we started coming out with business warehouse accelerator that was when Google was just really starting to become Google, and we tried to do the same thing with enterprise data that Google does with website data as far as indexing it. So we also put the indexes in memory, so its speeded up even further and you know now if you actually look at HANA really is kind of the next evolutionary step in that that chain. This is in-memory process and this isn’t something just for a specialist. It really is a technology that’s matured to a level that it can run the entire business suite and run your entire company in-memory and get all those benefits for everything.

• http://docs.media.bitpipe.com/io_10x/io_102428/item_477005/The%20Next%20Chapter%20of%20In-Memory%20Computing_PT_12.22.11.pdf

For more on HADOOP• http://www.slideshare.net/PhilippeJulio/hadoop-

architecture

Obligatory Questions slide

• Any Questions?

Recommended