If you can't read please download the document
Upload
mead
View
21
Download
5
Embed Size (px)
DESCRIPTION
Scalable Autonomic Streaming Middleware for Real-Time Processing of Massive Data Flows Ricardo Jimenez-Peris Universidad Politecnica de Madrid Project Coordinator. Project Data. Start: February 2008. Duration: 3 years. Partners: UPM – Spain ( coord .). FORTH - Greece. - PowerPoint PPT Presentation
Citation preview
PowerPoint Presentation
Scalable Autonomic Streaming Middleware for Real-Time Processing of Massive Data Flows
Ricardo Jimenez-PerisUniversidad Politecnica de MadridProject
Coordinator
Project Data
Start: February 2008.Duration: 3 years.Partners:UPM Spain
(coord.).FORTH - Greece.TU Dresden - Germany.Telefonica -
Spain.Exodus - Greece.Epsilon - Italy.
Background
Data streaming is a new paradigm developed in the database
community to process large data flows in memory in an online
fashion.It allows to perform continuous queries over flowing
data.Most existing platforms are centralized, and a few
distributed, and perform 1-2 orders of magnitude better than
relational DBs.
Background: Data Streaming Operators
Background: Data Streaming Query
Scope
Many potential applications in Internet today require to process
huge amounts of information in an online fashion:Mitigation of DDoS
attacks.Spam filtering.Processing the output of sensor
networks.Detecting fraud in cellular telephony.Financial
applications.QoS monitoring for enforcing SLAs.Real time data
mining.Etc.
Objectives
Stream aims at developing a highly scalable middleware
infrastructure to process massive data flows in real time.The
innovation lies in the sheer scale targeted by the project 1-2
orders of magnitude higher than current technology.
Innovation
Parallelizing data streaming operators:Currently a query operator
can be deployed on a single site and it has to process the full
data flow thus becoming the bottleneck.Stream is developing
distributed versions of query operators that enable to run
individual query operators in a cluster of sites.
Innovation: Parallel Data Streaming
Innovation
Exploiting leading edge high performance networks and IO systems:
Reaching 40 gbs for both networking and IO.This results in high
throughput communication among sites and very low latency.Low cost
storage system:1 PC controlling 40 disks.
Architecture
Data Mining Layer
Parallel Data Streaming Layer
Data Streaming Layer
High Performance IO & Storage Layer
Autonomic Controller Layer
Innovation
Self-healing:Able to tolerate failures Novel approach.Able to
online recover new nodes.Self-configuring:Dynamic load
balancing.Self-provisioning:Nodes are added and removed as needed
depending on the load.
Expected Outcome
Highly scalable and autonomic infrastructure to process massive
data flows.2 orders of magnitude more scalable than current
distributed data streaming platforms.Application to 3 different
markets:Telco: Fighting fraud in cellular telephony.Services:
Real-time checking of SLAs fulfillment.Financial/banking: Detection
of laundry financial operations/Fraud detection in credit card
payments/Real time data warehousing.
Current Status
Month 8 of the project.Prototypes of all layers (except automic
controller foreseen for the 2nd year).Cluster with 50 nodes
interconnected with Myrinet10G setup.First tests of parallel data
streaming exhibiting high scalability.Prototypes of IO and storage
tiers in advanced state.
Questions?