1
HDFS ABSTRACT PROTEUS mission is to investigate and develop ready-to-use scalable online machine learning algorithms and real-time interactive visual analytics to deal with extremely large data sets and data streams. The foundation is the use of an optimized implementation of combined batch and streaming processing and building around this later scalable real time processes. New algorithms and techniques will form a library to be integrated into an enhanced version of Apache Flink. PROTEUS addresses fundamental challenges related to the scalability and responsiveness of analytics capabilities. The requirements are dened by a steelmaking industrial use case, but the techniques developed are exible and portable to other data stream-based domains. CONTRIBUTIONS The project will provide the following specic original contributions: · New strategies for real-time hybrid computation, batch data and data streams. · Real-time scalable machine learning for massive, high-velocity and complex data streams analytics. · Real-time interactive visual analytics for Big Data. Implementation the new advances on top of Apache Flink. · Real-world industrial validation of the technology developed. PROJECT DESCRIPTION PROTEUS presents three key technology components (hybrid computation model for both data-at-rest and data-in-motion, scalable online machine learning and real-time interactive visual analytics) integrated into Apache Flink, and will demonstrate the solution for specic problems in an industrial setting: steelmaking. The core innovations and value of PROTEUS are based on a new integrated processing engine able to apply complex analytics techniques at scale for batch data (data-at-rest) and data streams (data-in-motion) in a hybrid-merge mode. This predictive engine will be able to provide real-time predictions while self-adapts continuously to learn more complex and rened learning models. Moreover, visual analytics will be scalable with decreasing latency (interactive) demands using a novel incremental approach that represents the information (both data-in-motion and incremental process of batch data) as data streams. This project is funded by the European Union (Horizon 2020, Ref: 687691) SCALABLE ONLINE MACHINE LEARNING FOR PREDICTIVE ANALYTICS AND REAL-TIME INTERACTIVE VISUALIZATION INTEGRATED HYBRID PROCESSING ENGINE SCALABLE ONLINE MACHINE LEARNING LIBRARY REAL-TIME INTERACTIVE VISUAL ANALYTICS VISUALIZATION LAYER INCREMENTAL ANALYTICS ENGINE DATA COLLECTOR PREDICTIVE ANALYTICS REAL TIME INTERACTIVE DECLARATIVE LANGUAGE CLASSIFICATION CLUTERING PREDICCTION DETECTION SCALABLE BASIC STREAM SKETCHES FAST UPDATEABLE STATE MODEL SYSTEM PROGRAMMING APIs MEMORY MANAGEMENT Managed Free JVM HEAP OPTIMIZATION Plan B Plan A DATA PROCESSING Join Sort Reduce Iterate DISTRIBUTION RUNTIME STREAMING BATCH Apache The PROTEUS impact is manifold: · strategic, by reducing the gap and dependency from the US technology, empowering the EU Big Data platform Apache Flink; · economic, by fostering the development of new skills and opportunities towards economic growth; · industrial, by demonstrating the outcome on an industrial operational setting, and · scientic, by developing original hybrid and streaming analytic architectures that enable scalable online machine learning strategies and advanced interactive visualization techniques. IMPACT INDUSTRIAL VALIDATION WEBSOCKET BATCH DATA INCREMENTAL ANALYTICS ENGINE DATA COLLECTOR VISUALIZATION LAYER DATA STREAM INCREMENTAL ANALYTICS ENGINE SVG CANVAS GRAPHS GIS HIERARCHY AGGRUPATION MACHINE LEARNING MODELS REAL-TIME INTERACTIVE VISUAL ANALYTICS HOT STRIP MILL PROCESS FLATNESS MAPS SENSOR DATA TIME SERIES HISTORICAL DATA FLATNESS PREDICTION 700.000 REGISTERS FOR EACH VARIABLE 500 GB TIMES SERIES AND FLATNESS MAPS SCALABLE ONLINE MACHINE LEARNING ENGINE FOR BIG DATA REAL-TIME RESULTS INTEGRATED IN APACHE FLINK 32-500 ms. STREAM DATA GENERATION 7870 VARIABLES STRUCTURE AND UNSTRUCTURED DATA INDUSTRIAL VALIDATION DOWNLOAD PROTEUS brochure.PDF COORDINATED BY 11 00 01 10 00 01 11 10 01 00 10 11 01 Apache Apache 11 01 10 01 00 01 00 01 00 01 00 01 1 0 1 1 0 1 0 5 10 15 20 25 30 2 6 4 8 W 25 W 26 11 SA 10 FR 9 TH 8 WE 5 SU 7 TU 6 MO 4 SA 3 FR 2 TH 1 WE 31 TU 30 MO 29 SU Lane 0 Lane 1 Lane 2 Lane 3 Lane 4 Lane 5 Lane 6 JUN 16 ITEM ID: 50 21:00 22:00 29 JUN 23:00 00:00 01:00 02:00 03:00 04:00 05:00 01:47 ITEM ID: 50 Franja 1 Franja 2 Franja 3 Franja 4 Franja 5 Franja 6 Franja 7 1 0 1 1 0 0 0 0 1 https://github.com/proteus-h2020/proteic PROTEIC

SCALABLE ONLINE MACHINE LEARNING FOR PREDICTIVE … · HDFS ABSTRACT PROTEUS mission is to investigate and develop ready-to-use scalable online machine learning algorithms and real-time

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SCALABLE ONLINE MACHINE LEARNING FOR PREDICTIVE … · HDFS ABSTRACT PROTEUS mission is to investigate and develop ready-to-use scalable online machine learning algorithms and real-time

HDFS

ABSTRACT

PROTEUS mission is to investigate and develop ready-to-use scalable online machine learning algorithms and real-time interactive visual analytics to deal with extremely large data sets and data streams.

The foundation is the use of an optimized implementation of combined batch and streaming processing and building around this later scalable real time processes. New algorithms and techniques will form a library to be integrated into an enhanced version of Apache Flink.

PROTEUS addresses fundamental challenges related to the scalability and responsiveness of analytics capabilities. The requirements are defined by a steelmaking industrial use case, but the techniques developed are flexible and portable to other data stream-based domains.

CONTRIBUTIONS

The project will provide the following specific original contributions:

· New strategies for real-time hybrid computation, batch data and data streams.

· Real-time scalable machine learning for massive, high-velocity and complex data streams analytics.

· Real-time interactive visual analytics for Big Data. Implementation the new advances on top of Apache Flink.

· Real-world industrial validation of the technology developed.

PROJECT DESCRIPTION

PROTEUS presents three key technology components (hybrid computation model for both data-at-rest and data-in-motion, scalable online machine learning and real-time interactive visual analytics) integrated into Apache Flink, and will demonstrate the solution for specific problems in an industrial setting: steelmaking.

The core innovations and value of PROTEUS are based on a new integrated processing engine able to apply complex analytics techniques at scale for batch data (data-at-rest) and data streams (data-in-motion) in a hybrid-merge mode. This predictive engine will be able to provide real-time predictions while self-adapts continuously to learn more complex and refined learning models.

Moreover, visual analytics will be scalable with decreasing latency (interactive) demands using a novel incremental approach that represents the information (both data-in-motion and incremental process of batch data) as data streams.

This project is funded by the European Union(Horizon 2020, Ref: 687691)

SCALABLE ONLINE MACHINE LEARNINGFOR PREDICTIVE ANALYTICS ANDREAL-TIME INTERACTIVE VISUALIZATION

INTEGRATED HYBRID PROCESSING ENGINE

SCALABLE ONLINE MACHINE LEARNING LIBRARY REAL-TIME INTERACTIVE VISUAL ANALYTICS

VISUALIZATIONLAYER

INCREMENTALANALYTICS

ENGINE

DATACOLLECTOR

PREDICTIVEANALYTICS REAL

TIME

INTERACTIVE

DECLARATIVELANGUAGE

CLASSIFICATIONCLUTERING

PREDICCTIONDETECTION

SCALABLE BASIC STREAM

SKETCHES

FAST UPDATEABLE STATE MODEL

SYSTEM

PROGRAMMING APIs

MEMORY MANAGEMENT

Managed FreeJVM HEAP

OPTIMIZATION

Plan BPlan A

DATA PROCESSING

Join

Sort

Reduce

Iterate

DISTRIBUTION RUNTIME

STREAMING

BATCH

Apache

The PROTEUS impact is manifold:

· strategic, by reducing the gap and dependency from the US technology, empowering the EU Big Data platform Apache Flink;

· economic, by fostering the development of new skills and opportunities towards economic growth;

· industrial, by demonstrating the outcome on an industrial operational setting, and

· scientific, by developing original hybrid and streaming analytic architectures that enable scalable online machine learning strategies and advanced interactive visualization techniques.

IMPACT

INDUSTRIAL VALIDATION

WEBSOCKET

BATCHDATA

INCREMENTALANALYTICS ENGINE

DATACOLLECTOR

VISUALIZATIONLAYER

DATASTREAM

INCREMENTALANALYTICSENGINE

SVG CANVAS

GRAPHS GIS

HIERARCHY AGGRUPATION

MACHINE LEARNING MODELS

REAL-TIME INTERACTIVE VISUAL ANALYTICS

HOT STRIP MILL PROCESS

FLATNESSMAPS

SENSORDATA

TIMESERIES

HISTORICALDATA

FLATNESS PREDICTION

700.000REGISTERS FOR EACH VARIABLE

500 GBTIMES SERIES AND FLATNESS MAPS

SCALABLE ONLINE MACHINE

LEARNING ENGINE FOR BIG DATA

REAL-TIME RESULTS

INTEGRATED IN APACHE

FLINK

32-500 ms.STREAM DATA GENERATION

7870VARIABLES

STRUCTURE AND UNSTRUCTURED

DATA

INDUSTRIAL VALIDATION

DOWNLOADPROTEUS brochure.PDF

COORDINATED BY

1 10 00 11 00 00 11 1

1 00 1

0 01 0

1 10 1

Apache

Apache

1 10 1

1 00 1

0 00 1

0 00 1

0 00 1

0 00 1

1 0 1

1

0

1

0

5

10

15

20

25

30

2 64 8

W 25 W 26

11SA

10FR

9TH

8WE

5SU

7TU

6MO

4SA

3FR

2TH

1WE

31TU

30MO

29SU

Lane 0

Lane 1

Lane 2

Lane 3

Lane 4

Lane 5

Lane 6

JUN 16

ITEM ID: 50

21:00 22:00

29 JUN

23:00 00:00 01:00 02:00 03:00 04:00 05:00

01:47ITEM ID: 50

Franja 1Franja 2

Franja 3

Franja 4

Franja 5

Franja 6

Franja 7

101

10

0

0

0

1

https://github.com/proteus-h2020/proteic

PROTEIC