58
IBM Research © 2008 IBM Corporation – All Rights Reserved System S – System S – High-Performance Stream High-Performance Stream Computing Platform Computing Platform Olivier Verscheure Olivier Verscheure IBM T.J. Watson Research Center IBM T.J. Watson Research Center

IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved

System S –System S –High-Performance Stream High-Performance Stream Computing PlatformComputing Platform

Olivier VerscheureOlivier VerscheureIBM T.J. Watson Research CenterIBM T.J. Watson Research Center

Page 2: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 2

Outline

System S Overview

System S for Energy Trading

System S for Astronomy

Page 3: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 3

Stream processing will be everywhere

Page 4: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 4

What is Stream Processing?

Database/data warehouse

Database/data warehouse

Data Sources

data

Stream Processing System

Process data as it is continuously generated

Extracting and organizing information and intelligence

Minimizing time to react

Page 5: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 5

Without Stream Processing?

Database/data warehouse

Database/data warehouse

Data Sources

data

Process data as it is continuously generated

Extracting and organizing information and intelligence

Minimizing time to react

Transaction processing Batch

processing

Page 6: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 6

What Makes a Stream Processing System?

operator

stream

data packet

Stream Processing System

Database/data warehouse

Database/data warehouse

Sensor Network

data

Process data as it is continuously generated

Extracting and organizing information and intelligence

Minimizing time to react

Page 7: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 7

What Makes a Stream Processing System?

Stream Processing System

ToolingDeveloper UIDeveloper UI Composition

UIComposition

UI Analyst UIAnalyst UI

Hardware Platform Servers, networks, storage, operating system, file system

Runtime Environment

Job management, resource management, content routing, programming model, object store

Application Interconnection of operators

Page 8: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 8

Continuous Event and Stream Processing

Analysis Complexity

Time Sensitivity

Event/Data Volume & Diversity

High Volume

Complex Analysis

Time Sensitive

Stream Processing enables…

– high message/data rates,

– low (msec-secs) latency,

– advanced analysis

Today’s Complex Event Processing (CEP) solutions target…

– 10K messages/sec,

– secs-minutes latency,

– rules-based analysis

Page 9: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 9

System S Stream Processing

New stream computing paradigm

Pull information from anywhere in real time

Ultra-low latency, ultra-high throughput

Scalable

Page 10: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 10

System S: A Closer Look

This notional System S application…• Calculates VWAP• Calculates P/E, based earnings from Edgar• Refines earnings based on encumbrances

identified in newsfeeds

System S continually adapts to new inputs, new modalities

System S continually adapts to new inputs, new modalities

Analytics may be a combination of provided and user-developed/legacy operators

Analytics may be a combination of provided and user-developed/legacy operators

System S applications can seamlessly process structured (event) and unstructured data

System S applications can seamlessly process structured (event) and unstructured data

Page 11: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 11

SPADE Building BlocksClassifiers, Annotators, Correlators, Filters, Aggregators

Correlate Transform

Annotator

Segmenter

Classifier

Filter

Edge Adapters

Page 12: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 12

Application Programming

Consumable

Reusable set of operators

Connectors to external static or streaming data sources and sinks

Source Adapters Sink AdaptersOperator Repository

SPADE: Stream processing dataflow scripting language

MARIO: Automated Application Composition

Platform Optimized Compilation

Page 13: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 13

SPADE

SPADE (Stream Processing Application Declarative Engine) is an intermediate language for streaming applications.

– Simplifies design of applications used by System S

– Hides complexities of

• manipulating data streams (e.g., contains generic language support for data types and building block operations)

• fanning out applications to distributed heterogeneous nodes

• transporting data through diverse computer infrastructures (ingesting external data, routing intermediate results, looping in feedback, branching, outputing the results, ...)

Page 14: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 14

Basic Promises of SPADE

SPADE is easy to use

– Programmers provide descriptions of stream-based data processing tasks using SPADE’s intermediate language

– SPADE’s query engine comes up with an execution plan, builds it, and hands it off to System S runtime for deployment

SPADE enables rapid application development

– Customizable operators – do not require low-level coding

– Support for user defined operators and legacy code

SPADE is high performance

– Optimized code generation

Page 15: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 15

A simple example[Application]SourceSink trace

[Typedefs]typespace sourcesink

typedef id_t Integertypedef timestamp_t Long

[Program]// virtual schema declarationvstream Sensor (id : id_t, location : Double, light : Float, temperature : Float, timestamp : timestamp_t)

// a source stream is generated by a Source operator – in this case tuples come from an input filestream SenSource ( schemaFor(Sensor) ) := Source( ) [ “file:///SenSource.dat” ] {}

// this intermediate stream is produced by an Aggregate operator, using the SenSource stream as inputstream SenAggregator ( schemaFor(Sensor) ) := Aggregate( SenSource <count(100),count(1)> ) [ id . location ] { Any(id), Any(location), Max(light), Min(temperature), Avg(timestamp) }

// this intermediate stream is produced by a functor operatorstream SenFunctor ( id: Integer, location: Double, message: String ) := Functor( SenAggregator ) [ log(temperature,2.0)>6.0 ] { id, location, “Node ”+toString(id)+ “ at location ”+toString(location) }

// result management is done by a sink operator – in this case produced tuples are sent to a socketNull := Sink( SenFunctor ) [ “cudp://192.168.0.144:5500/” ] {}

SinkSource Aggregate Functor

Page 16: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 16

Performance optimization and scalability

Split/Aggregate/Join architectural pattern– High-ingest rate input stream must be split– Aggregate: model creation– Join: correlation

Operator Fusion– Fine-granularity operators– From small parts, make a big one that fits

Code generation– Actual code must match the underlying

runtime environment• Number of cores• Interconnect characteristics• Architecture-specific instructions

Compiler-based optimization– Driven by automatic profiling– Driven by incremental learning of

application characteristics

Lo

gic

al a

pp

vie

w

Ph

ys

ica

l ap

p v

iew

Page 17: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 17

Operator Fusion - Illustration

Fuse all except sources and sinksA truly random partitioning

One PE per Operator Spade compiler can generate optimized operator grouping schemes

Page 18: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 18

X86 Box

X86 Blade

CellBlade

X86 Blade

FPGABlade

X86 Blade

X86 Blade

X86Blade

X86 Blade

X86Blade

Operating System

System S Runtime Services

Transport System S Data Fabric

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Page 19: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 19

BG Node BG node BG nodeBG nodeBG nodeX86 Blade

FPGA Blade

X86Blade

X86 Blade

CellBlade

X86 Blade

X86 Blade

X86Blade

X86 Blade

X86Blade

Operating System

Transport System S Data Fabric

System S Runtime Services

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Processing Element

Container

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocation

Optimizing scheduler assigns operators to processing nodes, and continually manages resource allocationAdapts to changes in resources,

workload, data ratesAdapts to changes in resources,

workload, data rates

Capable of exploiting specialized hardwareCapable of exploiting specialized hardware

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Runs on commodity hardware – from single node to blade centers to high performance multi-rack clusters

Page 20: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 20

Site C

Site B

Site A

Distributed operation

Site C

Site B

Site A

Page 21: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 21

Advantages of Stream Processing as Parallelization Model

Automated, Optimized Composition (SPADE, MARIO)

Source Adapters

Sink Adapters

Operator Repository

Automated, Optimized Deploy and Management (Scheduler)

Operator and data source profiling for better resource management

Automated composition

– Query optimization over well-known operators

– Inquiry optimization using semantic tagging of operators and data sources

Reuse of operators across stored and live data

– MapReduce is similar programming model with storage as transport

Streams as first class entity

– Explicit task and data parallelism

– More intuitive approach to multi-core exploitation

Page 22: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 22

Trading AdvantageTrading Advantage

GovernmentGovernment

ManufacturingManufacturing

AstrophysicsAstrophysics

Improving the quality semi-conductor wafers with dynamic manufacturing tools tuning

Identification of and response to opportunities in real-time market data

Detect & respond to phenomena based on large volumes of structured and unstructured information

World’s largest and first fully digital radio observatory for astrophysics, space and earth sciences, and radio research

5 Million events / secMillisecond latency5 Million events / secMillisecond latency

2.5K events / sec10 msec latency2.5K events / sec10 msec latency

1.5 Million events / sec1.5 Million events / sec

Semiconductor SolutionsSemiconductor Solutions

System S Pilots

Page 23: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 23

Sneak Preview: IBM InfoSphere Streams

Business Process Management

Data Warehouse

Applications

Business Intelligence

Page 24: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 24

Outline

System S Overview

System S for Energy Trading

System S for Astronomy

Page 25: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 25

The Energy Trading Scenario using Stream Computing Sample application showing power of Stream Computing

– Only one of many possible applications/services

Weather conditions and events drive pricing of energy futures

– natural events interfering with energy supply

– announcements, news stories, …

Energy traders today struggle to integrate info from multiple sources

– cannot get it in real time, to inform their trade decisions

– they see 8 screens, integrate manually via a spreadsheet

IBM Stream Computing assembles, deploys applications

– integrates diverse sources of data

– provides timely correlations, analyses

Page 26: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 26

Illustrative Example: Fear and Opportunity in the Gulf

News Flash: Hurricane Dean Upgraded to

Category 5 Path Projected through Gulf Oil Stocks Uniformly Down

News Flash: Hurricane Dean Upgraded to

Category 5 Path Projected through Gulf Oil Stocks Uniformly Down

Others are up, showing recovery even beforethe storm hits

…the live tickers tell a different story

Properties with significant assets in path are still down

More affected companies have been identified(a bit late, no?)

Same story, viewed 2 hours later.The story’s the same, but…

• If you saw it coming, because you watched for more primal data…

• Like hurricane path predictions from NOAA• Or even weather satellite and/or sensor data• Real-time equities trade data

• If you’ve been accumulating intelligence on the location (and value) of company assets that could be in the path…

• If you could apply such analysis before the news cycle…

• You could take advantage – in both directions…Oil company stocks down, based on early fears.

Page 27: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 27

Web Zero platform

Capture weather sensor data, analyses hurricane predicted path

Estimate impact on portfolios

Recommendations Based on Hurricane Forecast

Compute portfolio market indicators

(low latency) Make recommendations

and notify

Capture market data

(high volume)

System S platform

DHTML Result rendering

Real-time projections of hurricane path

Dynamically updated risk assessment

for assets in projected path

Correlate combined risk and trade VWAP to

determine buy/sell recommendations

Page 28: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 28

Outline

System S Overview

System S for Energy Trading

System S for Astronomy

– Past & current projects

Page 29: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 29

Past Projects

Outlier detection from single tripole

Decomposing combined DOA’s from single tripole

– SPADE UDOP’s

– Linking against Lapack and Blas libraries

– About 50 non-trivial processing elements

– Being optimized by SPADE team now

Convolutional resampling (tConvolve) on System S

– Mostly built-in operators (soon built-in operators only)

– Fully parametrizable using Perl; e.g., # of w planes

– Does scale very well!

Page 30: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 30

Outlier detection from single tripole

Receive 3D electric field

Demultiplex 3D electric field– Each UDP packet contains multiplexed electric fields

Compute intensity I(t)=|Ex(t)|2+|Ey(t)|2+|Ez(t)|2

Detect outliers Outlier detected if: mI(t-N:t-1) + T.I(t-N:t-1) I(t) mI(t-N:t-1) - T.I(t-N:t-1)

Visualize detected outliers in Matlab in real-time

Page 31: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 31

|Ex|2

|Ey|2

|Ez|2

Barrier Seq, I

Field intensity

Barrier

Filter outSeq<10

UIL?I,{U,L}Filter out

empty lists

Outlier detection

SPADE Flow of Operators

UDPSource

DataDemux

UDPsink

Filesink

^2

AggregateAvg 10c,1c Avg

AggregateAvg 10c,1c Avg

BarriermI

Sqrt(mI2-mI2)

mI

mI2

mI-T.I

mI+T.I

mI, I

Windowed statistics

U, L

Page 32: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 32

|Ex|2

|Ey|2

|Ez|2

Barrier Seq, I

Field intensity

SourceData

Demux

Field Intensity

Page 33: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 33

|E

x | 2

|Ey | 2

|Ez | 2

Barrier

Seq,

I

Field intensity

Source

Data

Dem

ux

Page 34: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 34

|E

x | 2

|Ey | 2

|Ez | 2

Barrier

Seq,

I

Field intensity

Source

Data

Dem

ux

Page 35: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 35

|E

x | 2

|Ey | 2

|Ez | 2

Barrier

Seq,

I

Field intensity

Source

Data

Dem

ux

Page 36: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 36

|E

x | 2

|Ey | 2

|Ez | 2

Barrier

Seq,

I

Field intensity

Source

Data

Dem

ux

Page 37: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 37

|E

x | 2

|Ey | 2

|Ez | 2

Barrier

Seq,

I

Field intensity

Source

Data

Dem

ux

Page 38: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 38

|Ex|2

|Ey|2

|Ez|2

Barrier Seq, I

Field intensity

Barrier

Filter outSeq<10

UIL?I,{U,L}Filter out

empty lists

Outlier detection

SPADE Flow of Operators

UDPSource

DataDemux

UDPsink

Filesink

^2

AggregateAvg 10c,1c Avg

AggregateAvg 10c,1c Avg

BarriermI

Sqrt(mI2-mI2)

mI

mI2

mI-T.I

mI+T.I

mI, I

Windowed statistics

U, L

Page 39: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 39

Windowed Statistics

^2

AggregateAvg 10c,1c Avg

AggregateAvg 10c,1c Avg

BarriermI

Sqrt(mI2-mI2)

mI

mI2

mI-T.I

mI+T.I

mI, I

Windowed statistics

U, L

Page 40: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 40

^2

Aggregate

Avg 10c,1c

Avg

Aggregate

Avg 10c,1c

Avg

Barrier

mI

Sqrt(m

I 2-mI 2)

mI

mI 2

mI -T

.I

mI +

T.

I

mI ,

I

Window

ed statistics

U, L

Page 41: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 41

^2

Aggregate

Avg 10c,1c

Avg

Aggregate

Avg 10c,1c

Avg

Barrier

mI

Sqrt(m

I 2-mI 2)

mI

mI 2

mI -T

.I

mI +

T.

I

mI ,

I

Window

ed statistics

U, L

Page 42: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 42

^2

Aggregate

Avg 10c,1c

Avg

Aggregate

Avg 10c,1c

Avg

Barrier

mI

Sqrt(m

I 2-mI 2)

mI

mI 2

mI -T

.I

mI +

T.

I

mI ,

I

Window

ed statistics

U, L

Page 43: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 43

|Ex|2

|Ey|2

|Ez|2

Barrier Seq, I

Field intensity

Barrier

Filter outSeq<10

UIL?I,{U,L}Filter out

empty lists

Outlier detection

SPADE Flow of Operators

UDPSource

DataDemux

UDPsink

Filesink

^2

AggregateAvg 10c,1c Avg

AggregateAvg 10c,1c Avg

BarriermI

Sqrt(mI2-mI2)

mI

mI2

mI-T.I

mI+T.I

mI, I

Windowed statistics

U, L

Page 44: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 44

Outlier Detection

Barrier

Filter outSeq<10

UIL?I,{U,L}Filter out

empty lists

Outlier detection

UDPsink

Filesink

Intensity

U, L

Page 45: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 45

Bar

rier

Filt

er o

utS

eq<

10

U

IL?

I,{U

,L}

Filt

er o

utem

pty

lists

Out

lier

dete

ctio

n

UD

Psi

nk

File

sinkInte

nsi

tyU

, L

Page 46: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 46

Bar

rier

Filt

er o

utS

eq<

10

U

IL?

I,{U

,L}

Filt

er o

utem

pty

lists

Out

lier

dete

ctio

n

UD

Psi

nk

File

sinkInte

nsi

tyU

, L

Page 47: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 47

Decomposing combined DOA’s from single tripole

Page 48: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 48

Direction of Arrival (DOA)

Simple case

– Two orthogonal waves

Page 49: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 49

Pseudo-Orbital MomentumTwo-wave case, same frequencies

Page 50: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 50

Pseudo-Orbital MomentumTwo-wave case, same phases

Page 51: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 51

Decomposing the combined DOA

Consider n = 1..N incident waves at time t

Can we possibly estimate DOA(n) for all N?

… from a single 3D sensor???

Page 52: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 52

Signal Processing to the Rescue

Let’s get back to wave equations

x-component: (Ex)n(t) = (Ax)

n eixn ei

nt

Estimate n for all n = 1..N from x-component

– Matrix pencil method!

– Requires N timestamps only (minimum)

Plug estimates back in {x,y,z}-components

– System of 3xN linear equations

Retrieve estimates of (A{x,y,z})n ei

n{x,y,z}.

Estimate DOA(n) for all n = 1..N

• E.g., (Vx)n = (Ay)

n eiyn (Az

*)n e-izn - (Az)

n eizn (Ay

*)n e-iyn

Page 53: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 53

SummarySimplified Processing Flow Graph

UDPSource

DataDemux

HankelConstruct

HankelConstruct

HankelConstruct

Pick TopSignals(SVD)

Pick TopSignals(SVD)

Pick TopSignals(SVD)

Frequencyestimates

Frequencyestimates

Frequencyestimates

Denoisefrequencyestimates

LeastSquareSolver

LeastSquareSolver

LeastSquareSolver

X

Y

Z

X

Y

Z

Separatedwaves

V vector

Page 54: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 54

Sample Spade Code

Page 55: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 55

Convolutional resampling (tConvolve)

45 antennas, 30 beamformers

ASKAP System: – 1% of SKA system

– Operational in 2012

– 8h observation produces 2.3TB!

Goal: Evaluate System S as the Central Processing Platform of the Australian SKA Pathfinder (ASKAP)

Page 56: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 56

56

SKA Processing Graph66 PEs on 20 nodes

Heavy Convolutional PEsMain computation(1000-2200 MIPS,

20-42 Mbit/sec)

Page 57: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 57

Scalability

10000 samples randomly generated

5+ times reduction in gridding time

57

Page 58: IBM Research © 2008 IBM Corporation – All Rights Reserved System S – High-Performance Stream Computing Platform Olivier Verscheure IBM T.J. Watson Research

IBM Research

© 2008 IBM Corporation – All Rights Reserved 58

Current Projects

Software imaging with cleaning

– Joint algorithmic/software/hardware optimization

– … in collaboration with Tim Cornwell et al.

Astronomical Signature Clustering

– … in collaboration with Bo Thide, Jan Bergman, Lars K Daldorff