Upload
european-data-forum
View
1.207
Download
3
Embed Size (px)
DESCRIPTION
Citation preview
© 2012 IBM Corporation
Bringing Big Data to the EnterpriseDipl.Ing.W olfgang Nimfuehr
Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe
7 June 2012
© 2012 IBM Corporation2
Legal Disclaimer
© IBM Corporation 2012. All Rights Reserved.
The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal ob ligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
© 2012 IBM Corporation33
2009 800,000 petabytes
2020 35 zettabytes
as much Data and ContentOver Coming Decade
44xBusiness leaders frequently make decisions based on information they don’t trust, or don’t have
1 in 3
83%of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness
Business leaders say they don’t have access to the information they need to do their jobs1 in 2
of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions
60%Organizations Need Deeper Insights
Of world’s datais unstructured
80%
The Information Explosion in Data and Real World Ev ents
© 2012 IBM Corporation4
Challenge Study a Large Volume and Variety of Data to Find Ne w Insights
Identify criminals and threats from disparate video, audio, and data feeds
Make risk decisions and frauds detection based on real-time transactional data
Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement
Support medical diagnosticsDetect life-threatening conditions
Multi-channel customer sentiment and experience a analysis
© 2012 IBM Corporation5
Information Management Capabilities
• Event triggers• Customer Profitability
analysis• Complaint Data• Voice to Text Data• Transactional data• Policy & Procedure
data
DashboardsData Scientist Call CenterClient Mgr
• Relationship / risk data
• Product profitability data
• Email correspondents
• Company website logs
Internal Data
Big Data Analytics
Hub
Natural Language
External Data
• Web Logs• Twitter feeds• Facebook chats• YouTube Video• Blogs/Posting• Appraisal data• Credit bureau data
Leveraging Big Data Analytics can improve Experienc e
……
© 2012 IBM Corporation6
On Feb 16 2011 the IBM Watson system won Jeopardy!
Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and
retrieving, analyzing and understanding vast amounts of information in real-time?
© 2012 IBM Corporation7
IBM Watson‘s project started 2007
“IBM is not in the entertainment business. But we are in the business of technology and pushing frontiers.”
David Shepler, IBM Research Program Manager
• Project started in 2007, lead David Ferrucci
• Initial goal: create a system able to process natural language & extract knowledge faster than any other computer or human
• Jeopardy! was chosen because it’s a huge challenge for a computer to find the questions to such “human” answers under time pressure
• Watson was NOT online!
• Watson weighs the probability of his answer being right – doesn’t ring the buzzer if he’s not confident enough
• Which questions Watson got wrong almost as interesting as which he got right!
© 2012 IBM Corporation8
Different Types of Evidence: Keyword Evidence
celebrated
India
In May 1898
400th anniversary
arrival in
Portugal
India
In May
Garyexplorer
celebrated
anniversary
in Portugal
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
Keyword MatchingKeyword Matching
arrived in
In May, Gary arrived in India after he celebrated hisanniversary in Portugal .
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Evidence suggests “Gary”is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence
© 2012 IBM Corporation9
On 27th May 1498, Vasco da Gama landed in Kappad Beach
On 27th May 1498, Vasco da Gama landed in Kappad Beach
celebrated
May 1898 400th anniversary
arrival in
In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India.
Portugal
landed in
27th May 1498
Vasco da Gama
Temporal Reasoning
Statistical Paraphrasing
GeoSpatialReasoning
explorer
On 27th May 1498, Vasco da Gama landed in Kappad BeachOn the 27 th of May 1498, Vasco da
Gama landed in Kappad Beach
Kappad Beach
Para-phrases
Geo-KB
DateMath
India
Stronger evidence can be much harder to find and score.
The evidence is still not 100% certain.
�Search Far and Wide
�Explore many hypotheses
�Find Judge Evidence
�Many inference algorithms
Different Types of Evidence: Deeper Evidence
© 2012 IBM Corporation10
Question100s Possible
Answers
1000’s of Pieces of Evidence
Multiple Interpretations
100,000’s scores from many simultaneous Text Analysis Algorithms100s sources
. . .
HypothesisGeneration
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
SynthesisQuestion &
Topic Analysis
QuestionDecomposition
HypothesisGeneration
Hypothesis and Evidence Scoring
Answer & Confidence
DeepQA: Massively Parallel Probabilistic Evidence-Based Arc hitecture
© 2012 IBM Corporation11
Maximum Benefit Requires Combining Deep and Reactive Analytics
Predictive Analytics
100,000 records/sec, 6B/day10 ms/decision
6 PB f or Deep Analytics
DeepQA
100s GB for Deep Analytics
3 sec/decision
1 PB training corpus
Smart Traffic
250K GPS probes/sec
630K segments/sec
2 ms/decision, 4K vehicles
Real time Optimization
100,000 updates/sec,5 ms/decision
Round-trip automation10 PB f or Deep Analytics
Dat
a S
cale
Decision FrequencyOccasional Frequent Real-time
Traditional Data Warehouse and BusinessIntelligence
Integration
Inte
grat
ion
yr mo wk day hr min sec … ms µs
Exa
Peta
Tera
Giga
Mega
Kilo
Feedback
Reactive Analytics
Reality � Actions
Fast
Observations
History
DeepAnalytics
Deep
Hypotheses � Predictions
Integration
© 2012 IBM Corporation12
Big Data use cases across all industries
Utilities� Weather impact analysis on
power generation� Transmission monitoring� Smart grid management
Retail� 360° View of the Customer� Click-stream analysis� Real-time promotions
Law Enforcement� Real-time multimodal surveillance� Situational awareness� Cyber security detection
Transportation� Weather and traffic
impact on logistics and fuel consumption
Financial Services� Fraud detection� Risk management� 360° View of the Customer
IT� Transition log analysis
for multiple transactional systems
� Cybersecurity
Health & Life Sciences� Epidemic early warning
system� ICU monitoring� Remote healthcare monitoring
Telecommunications� CDR processing� Churn prediction� Geomapping / marketing� Network monitoring
© 2012 IBM Corporation13
Monetizing Relationships - not just Transactions
Social Network PublicDatabase
Amy Bearn
32, Married, mother of 3,Accountant
Telco Score: 91CPG Score: 76Fashion Score: 88
Telc
oco
mpa
ny
How v aluable is Amy to my mobile phone network? How likely is she to switch carriers? How many other customers will f ollow
Merged NetworkCalling Network
How v aluable is Amy to my retail sales? Who does she influence? What do they spend?
Telc
o R
eta
il
© 2012 IBM Corporation14
Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental
Personal Attributes• Identifiers : name, address, age, gender, occupation…• Interests : sports, pets, cuisine…• Life Cycle Status : marital, parental
Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services
Products Interests• Personal preferences of products• Product Purchase history• Suggestions on products & services
Social Media based 360-degree
Consumer Profiles
Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…
Life Events• Life-changing events : relocation, having a baby, getting married, getting divorced, buying a house…
Monetizable intent to buy products Life Events
Location announcementsIntent to buy a house
I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin
I'm thinking about buying a home in Buckingham Estates per a recommendation. Anyone have advice on that area? #atx #austinrealestate#austin
Looks like we'll be moving to New Orleans sooner than I thought.Looks like we'll be moving to New Orleans sooner than I thought.
College: Off to Stanford for my MBA! Bbye chicago!College: Off to Stanford for my MBA! Bbye chicago!
I'm at Starbucks Parque Tezontle http://4sq.com/fYReSjI'm at Starbucks Parque Tezontle http://4sq.com/fYReSj
I need a new digital camera for my food pictures, any recommendations around 300?
I need a new digital camera for my food pictures, any recommendations around 300?
What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!
What should I buy?? A mini laptop with Windows 7 OR a Apple MacBook!??!
Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition
Timely Insights• Intent to buy various products • Current Location• Sentiment on products, services, campaigns• Incidents damaging reputation• Customer satisfaction/attrition
Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…
Relationships• Personal relationships : family, friends and roommates…• Business relationships: co-workers and work/interest network…
Sample: Big Data 360 °°°°Lead Generation
© 2012 IBM Corporation15
Micro-segmentation of consumers by hobbies
Micro-segmentation of consumers by hobbies
Micro-segmentation of product intents by
occupation
Micro-segmentation of product intents by
occupation
Real-time product intents enriched with consumer attributes
Real-time product intents enriched with consumer attributes
Real-time tracking by micro-segmentation
Real-time tracking by micro-segmentation
Integration across Social Media sitesIntegration across Social Media sites
Entries contain promotional messages, wishful thinking, questions, etc
Entries contain promotional messages, wishful thinking, questions, etc
For many of the attributes we need to extract, cleanse, normalize and categorize
For many of the attributes we need to extract, cleanse, normalize and categorize
Sample: Big Data 360 °°°°Lead Generation
© 2012 IBM Corporation16
Sample: Institutional Risk ApplicationComprehensive view of publicly traded companies and related people based on regulatory filings
Extract
Integrate
© 2012 IBM Corporation17
Requirements for a Big Data Solution Platform
Analyze Information in MotionStreaming data analysis
Large volume data bursts & ad-hoc analysis
Analyze a Variety of InformationNovel analytics on a broad set of mixed information that could not be analyzed before
Multiple relational & non-relational data types and schemas
Discover & ExperimentAd-hoc analytics, data discovery & experimentation
Analyze Extreme Volumes of InformationCost-efficiently process and analyze petabytes of information
Manage & analyze high volumes of structured, relational data
Manage & PlanEnforce data structure, integrity and control to ensure consistency for repeatable queries
© 2012 IBM Corporation18
IBM Big Data Platform for Ingest, Data and Analytic s
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Stream Computing
Data Warehouse
New analytic applications drive the requirements for a big data platform
• Integrate and manage the full variety, velocity and volume of data
• Apply advanced analytics to information in its native form
• Visualize all available data for ad-hoc analysis
• Development environment for building new analytic applications
• Workload optimization and scheduling
• Security and Governance
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictive Analytics
Content Analytics
Analytic Applications
© 2012 IBM Corporation19
Big Data Capabilities
IBM BigInsightsHadoop-based processing for analytics on variety and volumes of data
IBM StreamsLow latency analytics for streaming data
IBM NetezzaAnalytic appliance for high speed, advanced analytics on large structured data sets
IBM Big Data SolutionsBig Data Challenges
• High volume of structured data
• Valuable Information• Compute intensive analytics• Low latency response on queries• Business Intelligence and Analytics
• Understanding the customer through segmentation and analysis
• Very high volumes (TBs to PBs) unstructured data
• Exploration and discovery• Text, Entity and Social Media Analytics
• Real time processing• Detect failure patterns• High volume, low latency processing
• Scoring and decision analytics
SQ
L D
ata
NoS
QL
Dat
a S
trea
min
g
© 2012 IBM Corporation20
Based on open source & IBM technologies
Distinguishing characteristics
• Built-in analytics enhances business knowledge
• Enterprise software integration complements and extends existing capabilities
• Production-ready platform with tooling for analysts, developers, and administrators speeds time-to-value and simplifies development/maintenance
IBM advantage
• Combination of software, hardware, services and advanced research
InfoSphere BigInsightsAnalytical platform for Big Data at-rest
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictiv e Analytics
Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
Stream Computing
Data Warehouse
HadoopSystem
© 2012 IBM Corporation21
InfoSphere BigInsightsEmbrace and Extend Hadoop
HDFS
Storage HBase
GPFS-SNC *)
Application
AdaptiveMRZook
eepe
r
Avr
oPig Hive Jaql
MapReduce
Flume
Data Sources/Connectors
JDBC
Netezza BoardReader
DB2
Streams
Web Crawler
Oozie
Analytics ML Analytics *)Text Analytics Interface
Lucene
R
CSV/XML/JSONData Stage SPSS
IBM
LZ
O C
ompr
essi
on
BigSheets
BigIndexFLEX
Open Source
IBM
Management Console (browser based)
*) future release
Developing Tooling(Eclipse Plug-Ins)
Rest API(for Applications)
© 2012 IBM Corporation22
�A visual tool for data manipulation and prototyping BigSheets
• Ad-hoc analytics for LOB user
• Analyze a variety of data - unstructured and structured
• Spreadsheet metaphor for exploring/ visualizing data
• Browser-based
© 2012 IBM Corporation23
Pre-configured text annotators ready for distribute d processing on Big Data
Support for native languages including double-byte
Turns disparate words into measurable insights
Identify positive or negative sentiment,
NLP-based analytics, define
variables, macros and rules.
Physically assemble data,
standardize formats, address
auto-identify language, process punctuation and non-grammatical
characters, standardize
spelling.
Part-of-speech identification, standard
and customized extraction dictionaries,
proper noun identification, concept
categorization, synonyms, exclusions,
multi-terms, regular expressions, fuzzy-
matching
Iterative classification using
automated and manual techniques.
Concept derivation & inclusion, semantic
networks and co-occurrence rules
Reporting/Monitoring social commentary,
combination w /structured data, clustering,
associated concepts, correlated concepts, auto-
classification of documents, sites, posts.
Text Analytics
© 2012 IBM Corporation242424
� Public wind data is available on 284km x 284 km grids (2.5o LAT/LONG)
� More data means more accurate and richer models (adding hundreds of variables)
- Vestas wind library at 2.5 PB: to grow to over 6 PB in the near-term
- Granularity 27km x 27km grids: driving to 9x9, 3x3 to 10m x 10m simulations
� Reduced turbine placement identification from weeks to hours
� Perspective: The Vestas Wind library
24
© 2012 IBM Corporation25
Built to analyze data in motion
• Multiple concurrent input streams
• Massive scalability
Process and analyze a variety of data
• Structured, unstructured content, video, audio
• Advanced analytic operators
InfoSphere StreamsAnalytical platform for Big Data in-motion
BI / Reporting
Exploration / Visualization
FunctionalApp
IndustryApp
Predictiv e Analytics
Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management
Application Development
Visualization & Discovery
Accelerators
Information Integration & Governance
HadoopSystem
Data Warehouse
Stream Computing
© 2012 IBM Corporation26
Linear Scalability
� Clustered deployments – unlimited scalability
Automated Deployment
� Automatically optimize operator deployment across clusters
Performance Optimization
� JVM Sharing – minimize memory use
� Fuse operators on same cluster
� Telco client – 25 Million messages per second
Analytics on Streaming Data
� Analytic accelerators for a variety of data types
� Optimized for real-time performance
InfoSphere StreamsMassively Scalable Stream Analytics
Visualization
Streams Runtime
Deployments
SyncAdapters
AnalyticOperators
SourceAdapters
Automated and Optimized Deployment
Streaming DataSources
Streams Studio IDE
© 2012 IBM Corporation27
� Use case– Neonatal infant monitoring– Predict infection in ICU 24 hours in advance
� Solutions– 120 children monitored :120K msg/sec, billion msg/day– Trials expanding to include hospitals in US and China
University of Ontario Institute of Technology
SensorNetwork
Stream-based Distributed Interoperable Health care Infrastructure
Solutions (Applications)
Event Pre-processer
Analysis Framework
© 2012 IBM Corporation28
Without a Big Data Platform You Code…
Streams provides development, deployment, runtime, and infrastructure services
“TerraEchos developers can deliver applications 45% faster due to the agility of Streams
Processing Language…”– Alex Philip, CEO and President, TerraEchos
Multithreading
Custom SQLand
Scripts
PerformanceOptimization
Debug
ApplicationManagement
EventHandling
Connectors
CheckPointing
Security
HA Acceleratorsand
Tool kits
Over 100 sample applications and toolkits with industry focused toolkits with 300+ functions and operators
© 2012 IBM Corporation29
2012IBM is Committed to Innovation
2005* TeaLeaf, Varicent Vivismo pending acquisition close
• $16B+ in acquisit ions since 2005
• 10,000+ technical professionals
• ~8000 dedicated consultants
• 27,000+ business partner certifications
• 8 Analytics Solutions Centers
• 100 analytics-based research assets; almost 300 researchers
• $16B+ in acquisit ions since 2005
• 10,000+ technical professionals
• ~8000 dedicated consultants
• 27,000+ business partner certifications
• 8 Analytics Solutions Centers
• 100 analytics-based research assets; almost 300 researchers
IBM ResarchAlmadenAustinMelbourneSao PauloBeijingHaif aDelhiIrelandYamatoWatsonZurich
“Watson is going to revolutionize many, many industries and it will fundamentally change the way we interact with computers & machines.”
John Kelly, SVP & Head of IBM Research
Selected SW Acquisitions
© 2012 IBM Corporation30
bigdatauniversity.com/
ibm.com/software/data/bigdata/
Making Learning Easy and Fun
ibm.com/software/data/infosphere/biginsights/ youtube.com/user/ibmbigdata
© 2012 IBM Corporation31
Dipl.Ing.Wolfgang Nimführ
Information Agenda Executive ConsultantBig Data Tiger TeamIBM Software Group Europe
IBM AustriaObere Donaustrasse 95A1020 Vienna
Questions & Answers