31
© 2012 IBM Corporation Bringing Big Data to the Enterprise Dipl.Ing.W olfgang Nimfuehr Information Agenda Executive Consultant Big Data Tiger Team IBM Software Group Europe 7 June 2012 [email protected]

EDF2012 Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation

Bringing Big Data to the EnterpriseDipl.Ing.W olfgang Nimfuehr

Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe

7 June 2012

[email protected]

Page 2: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation2

Legal Disclaimer

© IBM Corporation 2012. All Rights Reserved.

The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal ob ligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.

Page 3: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation33

2009 800,000 petabytes

2020 35 zettabytes

as much Data and ContentOver Coming Decade

44xBusiness leaders frequently make decisions based on information they don’t trust, or don’t have

1 in 3

83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness

Business leaders say they don’t have access to the information they need to do their jobs1 in 2

of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions

60%Organizations Need Deeper Insights

Of world’s datais unstructured

80%

The Information Explosion in Data and Real World Ev ents

Page 4: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation4

Challenge Study a Large Volume and Variety of Data to Find Ne w Insights

Identify criminals and threats from disparate video, audio, and data feeds

Make risk decisions and frauds detection based on real-time transactional data

Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement

Support medical diagnosticsDetect life-threatening conditions

Multi-channel customer sentiment and experience a analysis

Page 5: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation5

Information Management Capabilities

• Event triggers• Customer Profitability

analysis• Complaint Data• Voice to Text Data• Transactional data• Policy & Procedure

data

DashboardsData Scientist Call CenterClient Mgr

• Relationship / risk data

• Product profitability data

• Email correspondents

• Company website logs

Internal Data

Big Data Analytics

Hub

Natural Language

External Data

• Web Logs• Twitter feeds• Facebook chats• YouTube Video• Blogs/Posting• Appraisal data• Credit bureau data

Leveraging Big Data Analytics can improve Experienc e

……

Page 6: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation6

On Feb 16 2011 the IBM Watson system won Jeopardy!

Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and

retrieving, analyzing and understanding vast amounts of information in real-time?

Page 7: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation7

IBM Watson‘s project started 2007

“IBM is not in the entertainment business. But we are in the business of technology and pushing frontiers.”

David Shepler, IBM Research Program Manager

• Project started in 2007, lead David Ferrucci

• Initial goal: create a system able to process natural language & extract knowledge faster than any other computer or human

• Jeopardy! was chosen because it’s a huge challenge for a computer to find the questions to such “human” answers under time pressure

• Watson was NOT online!

• Watson weighs the probability of his answer being right – doesn’t ring the buzzer if he’s not confident enough

• Which questions Watson got wrong almost as interesting as which he got right!

Page 8: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation8

Different Types of Evidence: Keyword Evidence

celebrated

India

In May 1898

400th anniversary

arrival in

Portugal

India

In May

Garyexplorer

celebrated

anniversary

in Portugal

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

Keyword MatchingKeyword Matching

arrived in

In May, Gary arrived in India after he celebrated hisanniversary in Portugal .

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Evidence suggests “Gary”is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence

Page 9: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation9

On 27th May 1498, Vasco da Gama landed in Kappad Beach

On 27th May 1498, Vasco da Gama landed in Kappad Beach

celebrated

May 1898 400th anniversary

arrival in

In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.

Portugal

landed in

27th May 1498

Vasco da Gama

Temporal Reasoning

Statistical Paraphrasing

GeoSpatialReasoning

explorer

On 27th May 1498, Vasco da Gama landed in Kappad BeachOn the 27 th of May 1498, Vasco da

Gama landed in Kappad Beach

Kappad Beach

Para-phrases

Geo-KB

DateMath

India

Stronger evidence can be much harder to find and score.

The evidence is still not 100% certain.

�Search Far and Wide

�Explore many hypotheses

�Find Judge Evidence

�Many inference algorithms

Different Types of Evidence: Deeper Evidence

Page 10: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation10

Question100s Possible

Answers

1000’s of Pieces of Evidence

Multiple Interpretations

100,000’s scores from many simultaneous Text Analysis Algorithms100s sources

. . .

HypothesisGeneration

Hypothesis and Evidence Scoring

Final Confidence Merging & Ranking

SynthesisQuestion &

Topic Analysis

QuestionDecomposition

HypothesisGeneration

Hypothesis and Evidence Scoring

Answer & Confidence

DeepQA: Massively Parallel Probabilistic Evidence-Based Arc hitecture

Page 11: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation11

Maximum Benefit Requires Combining Deep and Reactive Analytics

Predictive Analytics

100,000 records/sec, 6B/day10 ms/decision

6 PB f or Deep Analytics

DeepQA

100s GB for Deep Analytics

3 sec/decision

1 PB training corpus

Smart Traffic

250K GPS probes/sec

630K segments/sec

2 ms/decision, 4K vehicles

Real time Optimization

100,000 updates/sec,5 ms/decision

Round-trip automation10 PB f or Deep Analytics

Dat

a S

cale

Decision FrequencyOccasional Frequent Real-time

Traditional Data Warehouse and BusinessIntelligence

Integration

Inte

grat

ion

yr mo wk day hr min sec … ms µs

Exa

Peta

Tera

Giga

Mega

Kilo

Feedback

Reactive Analytics

Reality � Actions

Fast

Observations

History

DeepAnalytics

Deep

Hypotheses � Predictions

Integration

Page 12: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation12

Big Data use cases across all industries

Utilities� Weather impact analysis on

power generation� Transmission monitoring� Smart grid management

Retail� 360° View of the Customer� Click-stream analysis� Real-time promotions

Law Enforcement� Real-time multimodal surveillance� Situational awareness� Cyber security detection

Transportation� Weather and traffic

impact on logistics and fuel consumption

Financial Services� Fraud detection� Risk management� 360° View of the Customer

IT� Transition log analysis

for multiple transactional systems

� Cybersecurity

Health & Life Sciences� Epidemic early warning

system� ICU monitoring� Remote healthcare monitoring

Telecommunications� CDR processing� Churn prediction� Geomapping / marketing� Network monitoring

Page 13: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation13

Monetizing Relationships - not just Transactions

Social Network PublicDatabase

Amy Bearn

32, Married, mother of 3,Accountant

Telco Score: 91CPG Score: 76Fashion Score: 88

Telc

oco

mpa

ny

How v aluable is Amy to my mobile phone network? How likely is she to switch carriers? How many other customers will f ollow

Merged NetworkCalling Network

How v aluable is Amy to my retail sales? Who does she influence? What do they spend?

Telc

o R

eta

il

Page 14: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation14

Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental

Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental

Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services

Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services

Social Media based 360-degree

Consumer Profiles

Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…

Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…

Monetizable intent to buy products Life Events

Location announcementsIntent to buy a house

I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin

I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin

Looks like we'll be moving to New Orleans sooner than I thought.Looks like we'll be moving to New Orleans sooner than I thought.

College: Off to Stanford for my MBA! Bbye chicago!College: Off to Stanford for my MBA! Bbye chicago!

I'm at Starbucks Parque Tezontle http://4sq.com/fYReSjI'm at Starbucks Parque Tezontle http://4sq.com/fYReSj

I need a new digital camera for my food pictures, any recommendations around 300?

I need a new digital camera for my food pictures, any recommendations around 300?

What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!

What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!

Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition

Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition

Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…

Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…

Sample: Big Data 360 °°°°Lead Generation

Page 15: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation15

Micro-segmentation of consumers by hobbies

Micro-segmentation of consumers by hobbies

Micro-segmentation of product intents by

occupation

Micro-segmentation of product intents by

occupation

Real-time product intents enriched with consumer attributes

Real-time product intents enriched with consumer attributes

Real-time tracking by micro-segmentation

Real-time tracking by micro-segmentation

Integration across Social Media sitesIntegration across Social Media sites

Entries contain promotional messages, wishful thinking, questions, etc

Entries contain promotional messages, wishful thinking, questions, etc

For many of the attributes we need to extract, cleanse, normalize and categorize

For many of the attributes we need to extract, cleanse, normalize and categorize

Sample: Big Data 360 °°°°Lead Generation

Page 16: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation16

Sample: Institutional Risk ApplicationComprehensive view of publicly traded companies and related people based on regulatory filings

Extract

Integrate

Page 17: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation17

Requirements for a Big Data Solution Platform

Analyze Information in MotionStreaming data analysis

Large volume data bursts & ad-hoc analysis

Analyze a Variety of InformationNovel analytics on a broad set of mixed information that could not be analyzed before

Multiple relational & non-relational data types and schemas

Discover & ExperimentAd-hoc analytics, data discovery & experimentation

Analyze Extreme Volumes of InformationCost-efficiently process and analyze petabytes of information

Manage & analyze high volumes of structured, relational data

Manage & PlanEnforce data structure, integrity and control to ensure consistency for repeatable queries

Page 18: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation18

IBM Big Data Platform for Ingest, Data and Analytic s

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Stream Computing

Data Warehouse

New analytic applications drive the requirements for a big data platform

• Integrate and manage the full variety, velocity and volume of data

• Apply advanced analytics to information in its native form

• Visualize all available data for ad-hoc analysis

• Development environment for building new analytic applications

• Workload optimization and scheduling

• Security and Governance

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictive Analytics

Content Analytics

Analytic Applications

Page 19: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation19

Big Data Capabilities

IBM BigInsightsHadoop-based processing for analytics on variety and volumes of data

IBM StreamsLow latency analytics for streaming data

IBM NetezzaAnalytic appliance for high speed, advanced analytics on large structured data sets

IBM Big Data SolutionsBig Data Challenges

• High volume of structured data

• Valuable Information• Compute intensive analytics• Low latency response on queries• Business Intelligence and Analytics

• Understanding the customer through segmentation and analysis

• Very high volumes (TBs to PBs) unstructured data

• Exploration and discovery• Text, Entity and Social Media Analytics

• Real time processing• Detect failure patterns• High volume, low latency processing

• Scoring and decision analytics

SQ

L D

ata

NoS

QL

Dat

a S

trea

min

g

Page 20: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation20

Based on open source & IBM technologies

Distinguishing characteristics

• Built-in analytics enhances business knowledge

• Enterprise software integration complements and extends existing capabilities

• Production-ready platform with tooling for analysts, developers, and administrators speeds time-to-value and simplifies development/maintenance

IBM advantage

• Combination of software, hardware, services and advanced research

InfoSphere BigInsightsAnalytical platform for Big Data at-rest

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictiv e Analytics

Content Analytics

Analytic Applications

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

Stream Computing

Data Warehouse

HadoopSystem

Page 21: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation21

InfoSphere BigInsightsEmbrace and Extend Hadoop

HDFS

Storage HBase

GPFS-SNC *)

Application

AdaptiveMRZook

eepe

r

Avr

oPig Hive Jaql

MapReduce

Flume

Data Sources/Connectors

JDBC

Netezza BoardReader

DB2

Streams

Web Crawler

Oozie

Analytics ML Analytics *)Text Analytics Interface

Lucene

R

CSV/XML/JSONData Stage SPSS

IBM

LZ

O C

ompr

essi

on

BigSheets

BigIndexFLEX

Open Source

IBM

Management Console (browser based)

*) future release

Developing Tooling(Eclipse Plug-Ins)

Rest API(for Applications)

Page 22: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation22

�A visual tool for data manipulation and prototyping BigSheets

• Ad-hoc analytics for LOB user

• Analyze a variety of data - unstructured and structured

• Spreadsheet metaphor for exploring/ visualizing data

• Browser-based

Page 23: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation23

Pre-configured text annotators ready for distribute d processing on Big Data

Support for native languages including double-byte

Turns disparate words into measurable insights

Identify positive or negative sentiment,

NLP-based analytics, define

variables, macros and rules.

Physically assemble data,

standardize formats, address

auto-identify language, process punctuation and non-grammatical

characters, standardize

spelling.

Part-of-speech identification, standard

and customized extraction dictionaries,

proper noun identification, concept

categorization, synonyms, exclusions,

multi-terms, regular expressions, fuzzy-

matching

Iterative classification using

automated and manual techniques.

Concept derivation & inclusion, semantic

networks and co-occurrence rules

Reporting/Monitoring social commentary,

combination w /structured data, clustering,

associated concepts, correlated concepts, auto-

classification of documents, sites, posts.

Text Analytics

Page 24: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation242424

� Public wind data is available on 284km x 284 km grids (2.5o LAT/LONG)

� More data means more accurate and richer models (adding hundreds of variables)

- Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term

- Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations

� Reduced turbine placement identification from weeks to hours

� Perspective: The Vestas Wind library

24

Page 25: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation25

Built to analyze data in motion

• Multiple concurrent input streams

• Massive scalability

Process and analyze a variety of data

• Structured, unstructured content, video, audio

• Advanced analytic operators

InfoSphere StreamsAnalytical platform for Big Data in-motion

BI / Reporting

Exploration / Visualization

FunctionalApp

IndustryApp

Predictiv e Analytics

Content Analytics

Analytic Applications

IBM Big Data Platform

Systems Management

Application Development

Visualization & Discovery

Accelerators

Information Integration & Governance

HadoopSystem

Data Warehouse

Stream Computing

Page 26: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation26

Linear Scalability

� Clustered deployments – unlimited scalability

Automated Deployment

� Automatically optimize operator deployment across clusters

Performance Optimization

� JVM Sharing – minimize memory use

� Fuse operators on same cluster

� Telco client – 25 Million messages per second

Analytics on Streaming Data

� Analytic accelerators for a variety of data types

� Optimized for real-time performance

InfoSphere StreamsMassively Scalable Stream Analytics

Visualization

Streams Runtime

Deployments

SyncAdapters

AnalyticOperators

SourceAdapters

Automated and Optimized Deployment

Streaming DataSources

Streams Studio IDE

Page 27: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation27

� Use case– Neonatal infant monitoring– Predict infection in ICU 24 hours in advance

� Solutions– 120 children monitored :120K msg/sec, billion msg/day– Trials expanding to include hospitals in US and China

University of Ontario Institute of Technology

SensorNetwork

Stream-based Distributed Interoperable Health care Infrastructure

Solutions (Applications)

Event Pre-processer

Analysis Framework

Page 28: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation28

Without a Big Data Platform You Code…

Streams provides development, deployment, runtime, and infrastructure services

“TerraEchos developers can deliver applications 45% faster due to the agility of Streams

Processing Language…”– Alex Philip, CEO and President, TerraEchos

Multithreading

Custom SQLand

Scripts

PerformanceOptimization

Debug

ApplicationManagement

EventHandling

Connectors

CheckPointing

Security

HA Acceleratorsand

Tool kits

Over 100 sample applications and toolkits with industry focused toolkits with 300+ functions and operators

Page 29: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation29

2012IBM is Committed to Innovation

2005* TeaLeaf, Varicent Vivismo pending acquisition close

• $16B+ in acquisit ions since 2005

• 10,000+ technical professionals

• ~8000 dedicated consultants

• 27,000+ business partner certifications

• 8 Analytics Solutions Centers

• 100 analytics-based research assets; almost 300 researchers

• $16B+ in acquisit ions since 2005

• 10,000+ technical professionals

• ~8000 dedicated consultants

• 27,000+ business partner certifications

• 8 Analytics Solutions Centers

• 100 analytics-based research assets; almost 300 researchers

IBM ResarchAlmadenAustinMelbourneSao PauloBeijingHaif aDelhiIrelandYamatoWatsonZurich

“Watson is going to revolutionize many, many industries and it will fundamentally change the way we interact with computers & machines.”

John Kelly, SVP & Head of IBM Research

Selected SW Acquisitions

Page 30: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation30

bigdatauniversity.com/

ibm.com/software/data/bigdata/

Making Learning Easy and Fun

ibm.com/software/data/infosphere/biginsights/ youtube.com/user/ibmbigdata

Page 31: EDF2012   Wolfgang Nimfuehr - Bringing Big Data to the Enterprise

© 2012 IBM Corporation31

Dipl.Ing.Wolfgang Nimführ

Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe

IBM AustriaObere Donaustrasse 95A1020 Vienna

Tel [email protected]

Questions & Answers