11
CITO Research Advancing the craft of technology leadership Big Data is for Everyone Sponsored by QlikView

Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

CITO ResearchAdvancing the craft of technology leadership

Big Data is for Everyone

Sponsored by QlikView

Page 2: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

Contents

Introduction: Why Does Big Data Matter to Me? 1

Approaches and Barriers to Extracting Big Data Value 2

Business Discovery Can Help 4

Big Data At Work 6

Conclusion: Big Data Is For Everyone 9

Page 3: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

1

Big Data is for Everyone

Introduction: Why Does Big Data Matter to Me?“Big data” refers to a relatively new universe of data being created by web interac-tions, social media, mobile devices, RFID tags, web logs, smart meters, weather sensors, and virtually anything that generates an electrical pulse.

Big data affects everyone. It’s shaping the conversations we have about our world, turning speculation into informed discourse. Behavioral scientists use it to track and control the spread of disease in developing countries. Campaigns use it to win elections. Drivers use it to inform other drivers about congestion and accidents with social traffic applications on mobile phones. Businesses use it to make more informed predictions about who will buy their products and to develop better, more appropriately targeted offers to their customers.

Harnessing big data to tell us about our world and our businesses is the ultimate competitive advantage—and increasingly, the failure to harness big data will be a competitive disadvantage for many businesses.

Embracing big data means you can gain incredibly valuable insights about your organization, your customers, and the world at large, based not only on unstruc-tured data that is outside of the cleansed data in a structured database or data warehouse, but also on data that you don’t and will never own. A mess of data exists out there and it can help you build a more accurate model of your world, but without the right tools, it’s just that—a mess. There has been much discussion of the various platforms and mechanisms needed to process big data, but compara-tively little discussion of how businesspeople can actually explore big data and gain insight.

CITO Research has endeavored to find other methods and tools that business-people can access so that big data is used to its full potential. With the right tools in hand, big data can help you create a richer model of your organization and the wider world, recognize events you would not have discovered otherwise, and deliver a view of trends that can help you establish a competitive edge.

Big data is shaping the conversations we have about our world, turning speculation into informed discourse.

Page 4: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

2

Big Data is for Everyone

Approaches and Barriers to Extracting Big Data ValueBig data truly can change the way we do business, in the same way that the in-vention of the microscope changed the way we understand matter. Technologist David Weinberger explained it this way. Before the microscope, we could never have imagined that water was filled with molecules and microorganisms or that blood was made up of cells. Examining big data has the same effect on our con-cept of business information—what seems like an incoherent flow of digits and words can reveal itself to be a rich archetype that points to behaviors and trends never before visible.

The old model of sentiment analysis was the Nielsen report—structured data that was delivered by a trusted third party, collected in a standard way from a heavily vetted, if limited, group of subjects. But just as no consumer marketer would stop at the old shopping-mall focus group in determining whether a new ad campaign is working, no TV network today would stop at the Nielsen ratings to find out how programs (and the ads placed in them) are trending. Now, sentiment analysis is an amalgam of tweets, social media posts, web page clicks, and media downloads, in addition to traditional reporting methods. The world has changed.

Capturing and Storing Is Only the BeginningOften, big data is discussed as if it automatically creates value. It’s true that big data has great potential for creating a more detailed model of the business, such as tracing a customer’s path through a store and analyzing post-sale sentiments expressed in social media to create better offers. But to date there has been lots of talk about how it can be stored and captured and very little about the practical ways businesses can exploit it.

For example, the Hadoop File System has become popular shorthand for talk-ing about big data technology because it provides an affordable way to capture massive amounts of unstructured data on commodity hardware. The big advan-tage is that there’s plenty of capacity, and you don’t have to have a specific plan for the data at the time you store it, which was the case with previous genera-tions of data warehouses.

The real promise of big data is that it contains information that isn’t part of the typical well-structured view of the business world. It’s all about “not knowing what you don’t know.”

Page 5: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

3

Big Data is for Everyone

But storing and capturing big data won’t make it valuable. Additional tools are needed to explore it and analyze it. As it turns out, the simplicity of Hadoop stops with capturing data. Because Hadoop requires queries to be written in a specific language, MapReduce, and because it processes large batches of data at once, it’s not particularly conducive to the on-the-fly questions business users have every day. Just as with a traditional relational database, you have to know the questions you want to ask in advance—but how can you? The real promise of big data is that it contains information that isn’t part of the typical well-structured view of the business world. It’s all about “not knowing what you don’t know.”

The Big Data BottleneckBig data analysis is fast becoming yet another bottlenecked process, where a select few data scientists have access to all the tools and business users must make requests and wait for the “priesthood” of experts to deliver results. All of that wait-ing means it’s less likely you’ll be able to respond quickly to the signals contained in the data—especially since much of it is being created in short bursts in real time, which demands a rapid response, not a report. This is not a modern vision of the potential of data—in fact, it sounds similar to the old mainframe world, where the business had to wait for the “lab coats” to ask the great machine in the big room what was going on.

Nor is it an optimal use of the capabilities of the data scientists themselves. The hard number-crunching of data science works best when constructing algo-rithms that will generate repeatable and hopefully profitable outcomes, but it’s a waste of talent and money when it comes to the ad-hoc questions business users usually have.

The large “stack” vendors that have always supplied enterprise data systems have their solutions, but they’re too complex and expensive for the end user, requiring a significant commitment of IT personnel and a dedicated hardware and software configuration just to run them, much less to ask a question. At the other end of the spectrum, visualization tools can present an appealing view of data that has already been explored and processed, but they lack the depth of functionality needed to explore structured and unstructured data at the same time and deliver meaningful associations.

Business Discovery gives people the freedom and flexibility to explore any data, from anywhere, at any time.

Page 6: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

4

Big Data is for Everyone

Business Discovery Can HelpBusiness Discovery is a form of self-service BI for decision makers who need to understand everything happening in and around their business. Unlike traditional BI tools in which predefined reports and dashboards are fairly static and limited to simple filters, selections, and drill-downs, Business Discovery means people have the freedom and flexibility to explore big data from anywhere, at any time—it’s robust and secure, yet app-driven, light, social, mobile, and facilitates collaborative decision making.

Imagine a platform that delivers BI in the form of apps—much like smartphone apps that are lightweight and are designed to fill one area of need—and allows users to answer question after question about the data by just clicking, with-out having to recreate new visualizations from scratch. Everyone can tailor their insights to meet their own needs. Anyone can start asking questions and gaining insight right away and can provide their input to others either in real time or on their time, annotated on the data as it was, when they saw it. For global teams, col-laborating both synchronously and asynchronously is vital.

Sources ofBig Data

Distill and aggregatelarge datasets

Advanced analyticaltechniques

Real-time monitoringand analysis

Advanced applicationdevelopment

User-drivenapplicationdevelopment

Interactiveexplorationand discovery

Interactivevisualization

Load datasets in memory or create live connections

Userappldeve

79%

11%

5%

3%2%

Page 7: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

5

Big Data is for Everyone

With big data, the data itself and the structure of that data are both constantly changing, often in unexpected ways. At the same time, no enterprise will be throwing out its carefully manicured, structured databases. To reveal actionable insights, a BI tool must be able to simultaneously query structured, unstructured, and real-time sources, which will comprise an increasingly large portion of the data generated and consumed by businesses. With Business Discovery, this is not only possible, but intuitive and productive.

Once processed, data is then presented in an associative experience, in which every data point is associated with every other data point. In past reports, we’ve compared it to a fiber-optic spider web, where everything is connected. Pulling on one thread, or making a selection, lights up the related elements in other fields, showing you new paths through the data and revealing new kinds of connections. That means user-driven analytical applications can be built on the fly, to ask ques-tions that occur as data arrives. If sales are flagging, it’s crystal-clear where and why and that means immediate, corrective action can be taken. Users can even identify analytical gaps in previous, traditional database queries.

Business Discovery platforms must be able to simultaneously query structured, unstructured, and real-time sources.

Page 8: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

6

Big Data is for Everyone

Big Data At WorkThe world is replete with exciting examples of how big data is being mined for previously unimaginable insights.

Big Data Can Improve Public HealthMore than 5 petabytes of data are generated by mobile phones each day. Such figures are usually expressed in terms of the burden of processing all that data, but there’s another story. Phone records can do an immense amount of social good. A project by Nathan Eagle, professor at Harvard’s School of Public Health and the MIT Media Lab, is compiling millions of phone records to help development groups decide where in Kenya to launch a malaria eradication campaign and help healthcare providers pinpoint cholera outbreaks in Rwanda. 1

Using algorithms, the team can extrapolate mobility patterns and behavioral data, which are then transmitted to organizations such as the United Nations and the World Bank. The key is to think of people like particles. Anomalies in movement patterns can indicate something is awry.

For example, a collective narrowing of the range of movement of a village popu-lation in Rwanda could indicate a cholera outbreak, as people are hobbled and restricted from routine activities. (Naturally such assumptions must be confirmed with additional information; watching an area with restricted movements was considered a prediction of a cholera outbreak when instead a flood had washed out roads and kept people from going very far.) Conversely, mobility into and out of a known malaria zone could indicate there is too much migration to effectively apply eradication techniques and other measures, such as quarantines, should be explored in service of public health.

Passive data collection is only part of the picture, of course. Nothing beats empiri-cal information entered by people on the ground. But mobile phones can be of assistance here, too. Eagle founded a company called Jana, which collects individ-ual survey information and mass-crowdsourced data. In exchange for answering survey questions on 2.1 million mobile phones in Africa, Asia, and Latin America, customers receive free minutes, a valuable commodity in developing countries. The same tools normally used for marketing are reapplied for behavioral science—and for the betterment of human health.

1 Kevin Fitchard, “Can cell phone data cure society’s ills?” GigaOm, March 11, 2012, http://gigaom.com/2012/03/11/10-ways-big-data-is-changing-everything/8/.

The potential to protect public health, win elections, map the human genome or cut down on wasteful processes is only the beginning.

Page 9: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

7

Big Data is for Everyone

Winner of the 2012 US Elections: Big DataWhatever your political affiliation, it was impossible to deny that there was one clear winner in the 2012 US elections—big data. While pollsters and campaigns have always collected data, in 2012 the sophisticated use of social media and the refusal to leave any assumption unverified was unprecedented.

The Obama campaign mounted a multipronged engagement strategy that encouraged supporters to volunteer personal information, comments, post pho-tos and videos, donate funds, and importantly, to galvanize others. The result: 33 million Facebook “likes,” 240,000 YouTube subscribers, and 246 million YouTube page views.

By comparison, the Romney campaign attracted 23,700 subscribers and 26 mil-lion page views. The Obama campaign didn’t stop there—its volunteer system, using open-source software on the Amazon Web Services cloud—was able to rank names in call lists according to “persuadability.” Seventy-five percent of the data covered basics such as gender, age, address, and voting record, but 25 percent of the consumer data collected allowed the campaign to predict who was likely to

Mobile app users give 4x more; expand program

Run election 66,000 times every night; allocate resources based on results

Test and place ads based on results

Detailed models for swing state voters, help predict who will give and volunteer

Hire analytics team 5x larger than 2008 campaign

Crunch

Expand

Experts

Model

Target

Merge all databases and add social media, polls, Democratic voter �les for swing states

How the Obama Campaign Leveraged Big Data

g p

Consolidate

Measure Everything

How the Obama Campaign leveraged big data in 2012

Page 10: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

8

Big Data is for Everyone

make a donation online, who would do so by mail, or who would become a volun-teer. The campaign also ran 66,000 computer simulations each night, in order to find the optimum breakdown of campaign efforts.

What did this drive the campaign to do differently? Using big data, campaign researchers were able to identify macro trends and figure out how to optimize large-scale tactics, like sending more email from Michelle Obama, as well as using demographic data to determine the best way to sway voters in crucial swing states.

The use of social media and big data wasn’t limited to the campaigns. Debate viewers were able to see real-time sentiment changes among male and female undecided voters, as soon as the Presidential candidates answered a question. Polling aggregators and data-focused journalists, such as Drew Linzer and Nate Silver, were able to call the election for President Barack Obama with incredible accuracy—Linzer had predicted a 332 electoral-vote win for Obama in June 2012, and that is exactly the number Obama won in November. 2

Traditional “talking heads” and propagandistic ad campaigns failed to mold real-ity into their image. The idea that opinionated Washington insiders with years of experience could call elections on a “hunch” was completely turned on its head by using big data in new ways. In fact, even the Obama Campaign’s data analysts ran betting pools about what targeting strategies would work best. Invariably, even these data insiders were wrong and the data drove the next round of decisions on how to allocate campaign resources.

2 Mike Lynch, “Barack Obama’s Big Data won the US election,” Computer-world, November 13, 2012, http://www.computerworld.com/s/article/9233587/Barack_Obama_39_s_Big_Data_won_the_US_election.

The idea that opinionated Washington insiders with years of experience could call elections on a “hunch” was completely turned on its head by using big data in new ways.

Page 11: Big Data is for Everyonego.qlikview.com/rs/qliktech/images/CITO_Big_Data_For_Everyone_WP.pdf“Big data” refers to a relatively new universe of data being created by web interac-tions,

9

Big Data is for Everyone

CITO ResearchCITO Research is a source of news, analysis, research, and knowledge for CIOs, CTOs, and other IT and business professionals. CITO Research engages in a dialogue with its audience to capture technology trends that are harvested, analyzed, and communicated in a sophisticated way to help practitioners solve difficult business problems.

Visit us at http://www.citoresearch.com

Conclusion: Big Data Is For EveryoneBig data is changing the world. Practically everything we do can be recorded. The potential to improve public health, win elections, map the human genome, and cut down on wasteful processes is only the beginning. To borrow Cotton Inc.’s tag line, big data really is the “fabric of our lives.” Whoever explores it more deeply and aggressively first will have that much greater an insight into its commercial, social, and scientific potential and will be able to make decisions that change the course of our lives. Whoever hesitates will be left behind.

If big data is to provide its promised value, it can’t wait for experts. It can’t take weeks to generate a report, when data from the Web and social media is con-stantly changing. We can’t spend all of our time and effort capturing, storing, and cleansing data, without thinking about that critical “last mile,” where the user inter-rogates the data. And we can’t neglect that much of the value of unstructured big data from new sources will come from correlation with the standardized, struc-tured enterprise data businesses have carefully been collecting and managing for decades. If big data is truly to be liberated from bottlenecks, it must be exposed and explored in an intuitive, user-friendly way.

Sophisticated, yet easy-to-use methods are required to harness big data’s full potential, for every user in every organization. CITO Research has determined that Business Discovery, especially its ability to simultaneously query real-time and his-torical databases, will play a major role in delivering big data in a way that is useful to everyone.

This paper was created by CITO Research and sponsored by QlikView.

Sophisticated yet easy-to-use tools are required to harness big data’s full potential, for every user in every organization.