38
DICE Horizon 2020 Project Grant Agreement no. 644869 hp://www.dice-h2020.eu Funded by the Horizon 2020 Framework Programme of the European Union DICE Project J.I. Requeno, J. Merseguer, S. Bernardi Universidad de Zaragoza, Spain

DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

DICE Project

J.I. Requeno, J. Merseguer, S. Bernardi

Universidad de Zaragoza, Spain

Page 2: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

DICE Project

o DICE - Developing Data-Intensive Cloud Applications withIterative Quality Enhancements

o Horizon 2020 Research & Innovation Action Quality-Aware Development for Big Data applications

Feb 2015 - Jan 2018, 4M Euros budget

9 partners (Academia & SMEs), 7 EU countries

2

Page 3: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

o Software market rapidly shifting to Big Data 32% compound annual growth rate in EU through 2016

35% Big data projects are successful [CapGemini 2015]

o ICT-9 call focused on SW quality assurance (QA) ISTAG: call to define environments “for understanding the

consequences of different implementation alternatives (e.g. quality, robustness, performance, maintenance, evolvability, ...)”

o QA evolving too slowly compared to the technologytrends (Big data, Cloud, DevOps ...) DICE aims at closing the gap

Still crucial for competiveness!

Motivation

3

Page 4: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

o Reliability

o Efficiency

o Safety &Privacy

Quality Dimensions

4

Availability

Fault-tolerance

Performance

Costs

Verification (e.g., deadlines)

Data protection

Page 5: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Some Challenges in Big Data…

o Lack of quality-aware development for Big Datao How to describe in MDE Big Data technologies

o Spark, Hadoop/MapReduce, Storm, Cassandra, ...

oCloud storage, auto-scaling, private/public/hybrid, ...

o Today no QA toolchain can help reasoning ondata-intensive applications

oWhat if I double memory?

oWhat if I parallelize more the application?

5

Page 6: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

Performance Analysis of Apache Storm Applications using SPNs

José Merseguer

Universidad de Zaragoza, Spain

Page 7: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Context

o Apache Stormo Distributed real-time computation system for

processing large volumes of high-velocity data

o Real-time data-processing stream applicationso E.g., customization of searches, sentiment analysis in

social networks Big Data

7

Page 8: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

The Problem

o Capgemini Researcho Only 13% companies have achieved full-scale

production on Big Data technologies

o Storm specific problemso Low-latency processing Highly demanding

performance requirements

o Youthfulness of the technology

8

Page 9: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

The Need

o Urgent need for novel, performance oriented,software engineering methodologies and tools capable of dealing with the complexity of such a new environment

9

Page 10: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Our Proposal

o Assessment of performance requirementsoWhile configuring their Storm designs to specific

execution contexts, i.e., multi-user private or public cloud infrastructures

10

Page 11: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Our Proposal …

o We have developed a Quality-driven frameworkfor Stormo UML modelling of Storm applications

oWe propose a novel UML profile Domain-specific modelling language

o Transformation of the UML Storm models intoStochastic Petri Nets performance model

o Simulation of the performance model

o Getting performance results from the simulation

11

Page 12: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Benefits

o Benefits of our proposalo Predict the behaviour of the application for future

demands (e.g., response time, throughput or utilization)

o Impact of the stress situations in some performance parameters

o Detection of performance bottlenecks

12

Page 13: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Modelling Storm Applications

o A Storm application is designed as a DAGo Two kinds of nodes:

o Spouts, sources of information that inject streams of data into the topology

oBolts, process input data and produce results

13

spout_1

spout_2

bolt_1bolt_3

bolt_2oparallelism, number of concurrent threads executing the

same task (spout or bolt)

parallelism=2

Page 14: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Modelling Storm Applications

o Edges, define the connections for the transmission of data from one node to another:oweight, number of tuples the next bolt requires for

emitting a new message

o grouping, the way a message is propagated to and handled by the receiving nodes (all, shuffle, subset)

14

spout_1

spout_2

bolt_1bolt_3

bolt_2

weight = 5weight = 4

grouping = all

parallelism=2weight = 2

Page 15: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Storm Concepts for Performance

15

Page 16: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

UML Modelling for Storm

16

spout_1

spout_2

bolt_1bolt_3

bolt_2

synchronous

asynchronous

spout_1

spout_2

bolt_1bolt_3

bolt_2

asynchronous

UML Activity Diagram

DAG

Page 17: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

A UML Profile for Storm

17

Storm Concept Stereotype Tag MARTE inheritance

Bolt <<StormBolt>> <<GaStep>>

exec. time hostDemand

parallelism parallelism

Spout <<StormSpout>> <<GaStep>>

emission rate avgEmitRate

Stream <<StormStreamStep>> <<GaStep>>

weight numTuples

grouping grouping

Scheduling resMult <<GaExecHost>>

capacity <<GaCommHost>>

Page 18: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

UML Modelling for Storm

18

Page 19: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

UML Modelling for Storm

19

{ parallelism = $n0, avgEmtRate= ( exp= $sp_1,

unit= Hz, statQ= mean, source= est)

}

Definition of a <<StormSpout>>:

ValueSpecificationLanguagefromMARTE

Page 20: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

UML Modelling for Storm

20

{ utilization= ( exp= $uti,

unit= %, statQ= mean, source= calc)

}

Definition of a performance metric:

Page 21: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

UML Modelling for Storm

21

Page 22: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Transformation of the Storm

o For evaluation of the metrics, we need totransform the Storm design into a performance model

o Generalized Stochastic Petri Net (GSPN)

o We propose a set of transformation patterns ;o Each pattern takes as input a part of the Storm design

and produces a GSPN subnet

oWe compose the pieces

22

Page 23: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Examples of patterns

23

Page 24: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Final GSPN

24

Page 25: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Implementation

o We have implemented in Eclipse (PapyrusModelling environment):o The Storm profile

o The transformation patterns (using QVT) to PNML

o Transformation to GreatSPN (using Acceleo)

o The evaluation of performance metrics (throughput,response time, utilization)

o You can download:o [8] DICE Consortium. DICE Simulation Tool, 2017.

URL:https://github.com/dice-project/DICE-Simulation/

25

Page 26: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Validation

o First: We used our toolo 1) to model the application

o 2) to transform it into a GSPN

o 3) to get results

o Second: We deployed the Stormapplication in a real cluster andgot results from the Stormmonitoring tool

o Third: We compared results.Relative error

o Metric: Utilization of the bolts

26

Page 27: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Thankswww.dice-h2020.eu

27

Page 28: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

High-Level Objectives

o Tackling skill shortage and steep learning curves Data-aware methods, models, and tools

o Shorter time to market for Big Data applications Cost reduction, without sacrificing product quality

o Decrease development and testing costs Select optimal architectures that can meet SLAs

o Reduce number and severity of quality incidents Iterative refinement of application design

28

Page 29: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

… in a DevOps fashion

o Software development methods are evolving

o DevOps closes the gap between Dev and Ops From agile development to agile delivery

Lean release cycles with automated tests and tools

Deep modelling of systems is the key to automation

29

AgileDevelopment

DevOps

Business Dev Ops

Page 30: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

DevOps in DICE: Measurement

30

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

monitoring and incident report

release

Ops

incident report

(performanceunit tests)

Deployment & CI

Page 31: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

31

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

monitoring and incident report

early-stage quality

assessment

Ops

incident report

release

(performanceunit tests)

DevOps in DICE: Early-stage MDE

Deployment & CI

Page 32: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

32

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

incident report& model correlation

continuousquality engineering

(“shared system view” via MDE)

Ops

incident report

continuous monitoring and enhancement

release

(performanceunit tests)

DevOps in DICE: Enhancement

Deployment & CI

Page 33: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Platform-Indep. Model

Domain Models

DICE Integrated Solution

33

ContinuousEnhancement

ContinuousMonitoring

DataAwareness

ArchitectureModel

Platform-Specific Model

PlatformDescription

DICE MARTE

Deployment &Continuous Integration

DICE IDEQA

Models

Data Intensive Application

Page 34: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Bringing QA and DevOps together

34

Requirements

SLAs

Compare Alternatives

Load testing

Cost Tradeoffs

Monitoring

Capacity Management

Incident Analysis

Deployment

ProfilingSPE Testing

APM

Regression

Bottleneck Identification

Root CauseAnalysis

Feedbacks

DICE

User behaviour

Adaptation

Page 35: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

• UML MARTE profile, UML DAM profile, Palladio, …

35

FailureProbability

UsageProfile

SystemBehaviour

Quality-Aware MDE

Page 36: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Platform-Indep. Model

Domain Models

Quality-Aware MDE

36

QAModels

ArchitectureModel

Platform-Specific Model

Code stubgeneration

PlatformDescription

MARTE

Simulation Tools

Cost Optimization Tools

Data Intensive Application

Page 37: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Year 1 Milestones

37

Milestone Deliverables

Baseline andRequirements -July 2015 [COMPLETED]

• State of the art analysis• Requirement specification• Dissemination, communication,

collaboration and standardisation report• Data management plan

ArchitectureDefinition -January 2016

• Design and quality abstractions• DICE simulation tools• DICE verification tools• Monitoring and data warehousing tools• DICE delivery tools• Architecture definition and integration plan• Exploitation plan

Page 38: DICE Project - wp.doc.ic.ac.ukwp.doc.ic.ac.uk/dice-h2020/wp-content/uploads/sites/75/2018/02/FM… · 32% compound annual growth rate in EU through 2016 35% Big data projects are

Demonstrators

38

Case study Domain Features & Challenges

Distributed data-intensive media system (ATC)

• News & Media• Social media

• Large-scale software• Data velocities• Data volumes• Data granularity• Multiple data sources and channels• Privacy

Big Data for e-Government(Netfective)

• E-Govapplication

• Data volumes• Legacy data• Data consolidation• Data stores• Privacy• Forecasting and data analysis

Geo-fencing (Prodevelop)

• Maritimesector

• Vessels movements• Safety requirements• Streaming & CEP• Geographical information