View
39
Download
0
Category
Preview:
DESCRIPTION
Stream Computing: A New Paradigm To Gain Insight and Value. Nagui Halim System S Team. News/weather Text,data feeds. Market feeds. System S is a high performance computing platform designed to host a new class of stream analytic applications - PowerPoint PPT Presentation
Citation preview
© 2008 IBM Corporation04/22/23
IBM T. J. Watson Research Center
Stream Computing: A New Paradigm To Gain Insight and Value
Nagui Halim
System S Team
System S at IBM
© 2008 IBM Corporation – All Rights Reserved 2
04/22/23
News/weatherText,data feeds
Market feeds
System S at IBM
© 2008 IBM Corporation – All Rights Reserved 3
04/22/23
System S is a high performance computing platform designed to host a new class of stream analytic applications
Designed for high ingest volumes and to adapt to changing data, needs, and capability
System S is an operational prototype, with a stable core that serves as the base for pilots and for systems and stream computing research
4 Making Sense of the Clutter | System S © 2008 IBM Corporation
InputConnectors
OutputConnectors
High performance scalability infrastructureHigh Volume, Structured & Unstructured Streaming Data Sources
Result DataDelivery / Visualization
continuous processing of streaming data
SchedulerJob
Manager
Workflow Development Tooling
IDE WorkflowAssembly
Data Source Management
Heterogeneous, Multi-scaleand/or Commodity Hardware
Component Repository
ComponentGeneration
System S Functional Overview
ImageAudio, voice, VoIP
Video, TV, financial newsRadio, police scanners Web traffic, email, chat,
GPS dataFinancial transaction data,
Satellite dataSensors, badge swipes, …
Secure, Privacy PreservingUsing Certified Downgraders
5 Making Sense of the Clutter | System S © 2008 IBM Corporation
System S Analytic Processing Building BlocksClassifiers, Annotators, Correlators, Filters, Aggregators
Correlate Transform
Annotator
Classifier
Filter
6 Making Sense of the Clutter | System S © 2008 IBM Corporation
X86 Box
X86 Blade
CellBlade
X86 Blade
FPGABlade
X86 Blade
X86 Blade
X86Blade
X86 Blade
X86Blade
System S Runtime Services
Transport System S Data Fabric
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation
Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters
7 Making Sense of the Clutter | System S © 2008 IBM Corporation
X86 Box
X86 Blade
CellBlade
Blue Gene
FPGABlade
X86 Blade
X86 Blade
X86Blade
X86 Blade
X86Blade
Transport System S Data Fabric
System S Runtime Services
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Processing Element
Container
Adapts to changes in resources, workload, data rates
Capable of exploiting specialized hardware
Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters
Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation
8 System S © 2007 IBM Corporation
Overview, Beacon Institute for Rivers and Estuaries
Nonprofit organization, based in Beacon, NY
Patterned after Woods Hole Oceanographic Institute
Formed 2000 by Gov. Pataki
Mission: “To create a global center for interdisciplinary research, policy-making and education regarding rivers, estuaries and their connection with society.”
$30M capital, additional $12M this year + program funds
90% NY State funding
Balance NSF and private donors
9 System S © 2007 IBM Corporation
Evolution/locations Troy Office Research
Beacon HQ Harbor (pier for
research vessel) Multi-use building Research Center
Palisades Columbia’s Lamont
Doherty Earth Observatory
Manhattan Research pier
Center for Advanced Environmental Technology (40,000 ft2 )
10 System S © 2007 IBM Corporation
Core: An advanced sensor-based environment
Autonomous Microbial Genosensor
Solar-powered Autonomous Underwater Vehicle
Conductivity
Temperature
Turbidity
pH/ORP
Chlorophyll
Sontek-YSI Array
Open and scalable network Bearer network agnostic
Heterogeneous Physical Chemical Biological Radiological
Multiple deployment platforms Fixed Mobile
End-to-end middleware Device management Security
11
S&D/Research
2007 FOAK Program IBM Confidential © 2007 IBM Corporation
FSS Industry Point of View
The financial markets industry is growing quickly while experiencing rapid electronification and automation.
Speed and transparency will increase dramatically. To survive firms will specialize, and compete based on technology.
Sources: IBM Institute for Business Value “FM2015 – The Trader is Dead, Long Live the Trader”; IBM / EIU Macro Model, 2007; SIAC, OPRA, and NASDAQ courtesy the TABB Group
0
500
1,000
1,500
2,000
2,500
3,000
3,500
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 20040
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
Average Daily Trade Volume (millions)(10 year CAGR 19%)
Security brokers and services personnel ('000)(10 year CAGR 3%)
Average Daily Trade Volumes vs. Headcount, 1994-2004 (Millions of Shares; Number of Employees (‘000), Volume per Employee)
Volume of daily shares traded per employee (10 year CAGR 15%)
0
500
1,000
1,500
2,000
2,500
3,000
3,500
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 20040
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
Average Daily Trade Volume (millions)(10 year CAGR 19%)
Security brokers and services personnel ('000)(10 year CAGR 3%)
Average Daily Trade Volumes vs. Headcount, 1994-2004 (Millions of Shares; Number of Employees (‘000), Volume per Employee)
Volume of daily shares traded per employee (10 year CAGR 15%)
Financial System Depth, 1995-2025(Investable Assets, $ Trillions)
Currency / Deposits Size Securities Size
Securities % of GDP 219%
0
100
200
300
400
500
600
700
800
1995 2005 2015 2025
305% 393% 523%
Securities CAGR
N/A 8% 9% 9%
Deposits CAGR
N/A 5% 7% 6%
Financial System Depth, 1995-2025(Investable Assets, $ Trillions)
Currency / Deposits Size Securities Size
Securities % of GDP 219%
0
100
200
300
400
500
600
700
800
1995 2005 2015 2025
305% 393% 523%
Securities CAGR
N/A 8% 9% 9%
Deposits CAGR
N/A 5% 7% 6%
IBM Research
© 2008 IBM Corporation12 04/22/23
StreamSight Visualization – Market Making
IBM Research
© 2008 IBM Corporation13 04/22/23
StreamSight Visualization – Market Making Full Scale
IBM Research
© 2008 IBM Corporation – All Rights Reserved 14
04/22/23
M1 Rs Processing Pipeline: Wafer Processing
Raw Wafer Processed
Wafer
SiCOH1,1
SiCOH6,6
Anneal1 CMP1,1
SiCOH1,2 Anneal2
Anneal4
CMP1,2
CMP13,2
Photoresist and Etch to create structures
Deposit Metal (Cu) in structure
Use Chemical and MechanicalPolishing to planarize surface
IBM Research
© 2008 IBM Corporation – All Rights Reserved 15
04/22/23
m1 Rs Processing Pipeline: Instrumentation
Process Data
Raw Wafer Processed
Wafer
SiCOH1,1
SiCOH6,6
Anneal1 CMP1,1
Defect Data
Test Data
Oxide ThicknessRefractive IndexAnneal Duration
Pad HrsDresser Hrs
Slurry Compos.
SiCOH1,2 Anneal2
Anneal4
CMP1,2
CMP13,2
m1 Rs valueYield
FDC Summary Statistics
Dat
a A
vaila
bilit
y Ti
me
Data Warehouse
Other DataEvent, sensor, alarm, tool log, control job, process job
Trace Data
Statistical Process Control (SPC) Identify tool/product drift and automatically shut down recipe/tool
Fault Detection and Classification (FDC) Multivariate monitoring for real-time process fault detection and classification
Advanced Process Control (APC) Feedback and feed forward controls to compensate for variations in incoming material and prior level processing
IBM Research
© 2008 IBM Corporation – All Rights Reserved 16
04/22/23
Two-Class Decision Tree
Predicted True Label Bad-OK Good-Exc Accuracy
Bad-OK 52 6 89.65%
Good-Exc 7 51 87.9%
Built Decision Tree
Confusion Matrix
Bad-OK Good-Exc
Bad-OK Good-Exc
Bad-OK
Bad-OK Good-Exc
Good-ExcBad-OK
Y N
90% prediction accuracy
Prediction accuracy with tool based operating thresholds ~10%
Sensitivity varies across FDC values
17
IBM Research
IBM Confidential © 2007 IBM Corporation
h
Century
ClientClient
DeviceAdapter
DeviceAgent
SensorData
SensorData
SensorData
SODA
EventPreprocessor
DBAgent
Administrator Portal
PatientRegistration
Service
Patient Info
Interoperability Container(HIE Adapter)
PERepository
CENTURYCENTURY ServerServerApplicationRegistration
Service
ApplicationInfo
SourcePE
SourcePE
SourcePE
Analysis Jobs
Angina Pectoris
alert
alert
QRS
FA
RR
SP
BP
AR
AP
PT
SPA
BPA
EP
alertWBWT WTAWell-Being
O33 O45
O51
O11
O23
O13
O25 O24 O30
O40O42
O70 O71
O95
I10
I8 I9
I21
I41I45
I2
I50I52
I56
I49
I79I89I80 I83
I96I97
I67I15
External DataAccess Manager
EMR DataPlug-in
OtherPlug-ins
Registration Systen
RegSPE
EnrollmentTrigger
DeMux
Filter
Filter
Filter
System S SPC
Analysis Framework
Stream ElementEngine
DataProvenance
Manager
TVC Accessor
Data Provenance
Query Manager
Process Provenance
Query ManagerDynamicProvenance
Storage Manager
TVC Rule
DynamicProvenance
StreamElement
Provenizer Provenance Server
APP
APP
APP
APP
GUI
GUI
GUI
GUIEvent StoreQuery Service
ProvenanceQuery
Service
SubscriptionServiceSubscription
Data
ProvenanceCache
Delivery System
WAS
EventDelivery
SinkPE
SinkPE
SinkPE
Event StorageManager
Remote Access
ManagerEventStore
Event Management Service
QoI DataQoI
Manager
DBAgentDB
AgentDBAgent SDO2SE
SDO2SESDO2SE
JDL from IDE
QoI Management
18 Market Insights & Business Development
IBM Research
© 2008 IBM CorporationIBM CONFIDENTIAL
Solution positioning based on processing needs ( indicative positioning)
Event complexity (diversity)
Analytics complexity (event correlation
and pattern matching)
Decision latency
Human decisionsAutomated decisions
Predictive processing
(pattern matching and inferencing)
Segment 1: exception detection
Segment 2: operational monitoring
Segment 3: high performance processing
Segment 4: adaptive BPM
structured unstructured
ms s m h
Real-time information
delivery
Automated trading
Telecom billing
Industrial process control
Lease management
systemRetail inventory
optimization
Battlespace command &
control
Call center monitoring (cross sale)
Salesforce enablement
Baggage handling
Clickstream analysisReal-time game
monitoring
Cross-sales
Health monitoring
Artwork safety
Early warning system for energy trading
Health records screening
Fraud detection & prevention
Telecom network security
Geospatial tracking
Multisource monitoring
Capital market surveillance
Database monitoring
Card fraud detection & prevention
Astrophysical data mining
Asset trackingTelco QoS & SLA
monitoring
Trade desk
monitoring
Liquidity management
system
Retail goods receipt
Location based services
Risk management in energy trading
Manufacturing process control
Shop floor monitoring
Risk analytics platform
Online hotel booking
Call center monitoring (quality)
Sensor based water mgtt
19 Market Insights & Business Development
IBM Research
© 2008 IBM CorporationIBM CONFIDENTIAL
Vision
• Stream Computing is a new computing paradigm that opens up entirely new ways of conducting science and business
• System S is a prototype platform that enables new insights to be gained from large volumes of complex data with sophisticated on the fly analysis
• The new insights can drive value to organizations by giving them more accurate answers more quickly
• System S is one element of an overall solution framework that will include other elements such as databases, messaging, and modeling
• The quantities and types of data that organizations can take advantage of will increase by orders of magnitude over time; new computational paradigms are necessary to drive new value from this information
Recommended