DICE Project -...

Preview:

Citation preview

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

DICE Project

J.I. Requeno, J. Merseguer, S. Bernardi

Universidad de Zaragoza, Spain

DICE Project

o DICE - Developing Data-Intensive Cloud Applications withIterative Quality Enhancements

o Horizon 2020 Research & Innovation Action Quality-Aware Development for Big Data applications

Feb 2015 - Jan 2018, 4M Euros budget

9 partners (Academia & SMEs), 7 EU countries

2

o Software market rapidly shifting to Big Data 32% compound annual growth rate in EU through 2016

35% Big data projects are successful [CapGemini 2015]

o ICT-9 call focused on SW quality assurance (QA) ISTAG: call to define environments “for understanding the

consequences of different implementation alternatives (e.g. quality, robustness, performance, maintenance, evolvability, ...)”

o QA evolving too slowly compared to the technologytrends (Big data, Cloud, DevOps ...) DICE aims at closing the gap

Still crucial for competiveness!

Motivation

3

o Reliability

o Efficiency

o Safety &Privacy

Quality Dimensions

4

Availability

Fault-tolerance

Performance

Costs

Verification (e.g., deadlines)

Data protection

Some Challenges in Big Data…

o Lack of quality-aware development for Big Datao How to describe in MDE Big Data technologies

o Spark, Hadoop/MapReduce, Storm, Cassandra, ...

oCloud storage, auto-scaling, private/public/hybrid, ...

o Today no QA toolchain can help reasoning ondata-intensive applications

oWhat if I double memory?

oWhat if I parallelize more the application?

5

DICE Horizon 2020 Project Grant Agreement no. 644869http://www.dice-h2020.eu Funded by the Horizon 2020

Framework Programme of the European Union

Performance Analysis of Apache Storm Applications using SPNs

José Merseguer

Universidad de Zaragoza, Spain

Context

o Apache Stormo Distributed real-time computation system for

processing large volumes of high-velocity data

o Real-time data-processing stream applicationso E.g., customization of searches, sentiment analysis in

social networks Big Data

7

The Problem

o Capgemini Researcho Only 13% companies have achieved full-scale

production on Big Data technologies

o Storm specific problemso Low-latency processing Highly demanding

performance requirements

o Youthfulness of the technology

8

The Need

o Urgent need for novel, performance oriented,software engineering methodologies and tools capable of dealing with the complexity of such a new environment

9

Our Proposal

o Assessment of performance requirementsoWhile configuring their Storm designs to specific

execution contexts, i.e., multi-user private or public cloud infrastructures

10

Our Proposal …

o We have developed a Quality-driven frameworkfor Stormo UML modelling of Storm applications

oWe propose a novel UML profile Domain-specific modelling language

o Transformation of the UML Storm models intoStochastic Petri Nets performance model

o Simulation of the performance model

o Getting performance results from the simulation

11

Benefits

o Benefits of our proposalo Predict the behaviour of the application for future

demands (e.g., response time, throughput or utilization)

o Impact of the stress situations in some performance parameters

o Detection of performance bottlenecks

12

Modelling Storm Applications

o A Storm application is designed as a DAGo Two kinds of nodes:

o Spouts, sources of information that inject streams of data into the topology

oBolts, process input data and produce results

13

spout_1

spout_2

bolt_1bolt_3

bolt_2oparallelism, number of concurrent threads executing the

same task (spout or bolt)

parallelism=2

Modelling Storm Applications

o Edges, define the connections for the transmission of data from one node to another:oweight, number of tuples the next bolt requires for

emitting a new message

o grouping, the way a message is propagated to and handled by the receiving nodes (all, shuffle, subset)

14

spout_1

spout_2

bolt_1bolt_3

bolt_2

weight = 5weight = 4

grouping = all

parallelism=2weight = 2

Storm Concepts for Performance

15

UML Modelling for Storm

16

spout_1

spout_2

bolt_1bolt_3

bolt_2

synchronous

asynchronous

spout_1

spout_2

bolt_1bolt_3

bolt_2

asynchronous

UML Activity Diagram

DAG

A UML Profile for Storm

17

Storm Concept Stereotype Tag MARTE inheritance

Bolt <<StormBolt>> <<GaStep>>

exec. time hostDemand

parallelism parallelism

Spout <<StormSpout>> <<GaStep>>

emission rate avgEmitRate

Stream <<StormStreamStep>> <<GaStep>>

weight numTuples

grouping grouping

Scheduling resMult <<GaExecHost>>

capacity <<GaCommHost>>

UML Modelling for Storm

18

UML Modelling for Storm

19

{ parallelism = $n0, avgEmtRate= ( exp= $sp_1,

unit= Hz, statQ= mean, source= est)

}

Definition of a <<StormSpout>>:

ValueSpecificationLanguagefromMARTE

UML Modelling for Storm

20

{ utilization= ( exp= $uti,

unit= %, statQ= mean, source= calc)

}

Definition of a performance metric:

UML Modelling for Storm

21

Transformation of the Storm

o For evaluation of the metrics, we need totransform the Storm design into a performance model

o Generalized Stochastic Petri Net (GSPN)

o We propose a set of transformation patterns ;o Each pattern takes as input a part of the Storm design

and produces a GSPN subnet

oWe compose the pieces

22

Examples of patterns

23

Final GSPN

24

Implementation

o We have implemented in Eclipse (PapyrusModelling environment):o The Storm profile

o The transformation patterns (using QVT) to PNML

o Transformation to GreatSPN (using Acceleo)

o The evaluation of performance metrics (throughput,response time, utilization)

o You can download:o [8] DICE Consortium. DICE Simulation Tool, 2017.

URL:https://github.com/dice-project/DICE-Simulation/

25

Validation

o First: We used our toolo 1) to model the application

o 2) to transform it into a GSPN

o 3) to get results

o Second: We deployed the Stormapplication in a real cluster andgot results from the Stormmonitoring tool

o Third: We compared results.Relative error

o Metric: Utilization of the bolts

26

Thankswww.dice-h2020.eu

27

High-Level Objectives

o Tackling skill shortage and steep learning curves Data-aware methods, models, and tools

o Shorter time to market for Big Data applications Cost reduction, without sacrificing product quality

o Decrease development and testing costs Select optimal architectures that can meet SLAs

o Reduce number and severity of quality incidents Iterative refinement of application design

28

… in a DevOps fashion

o Software development methods are evolving

o DevOps closes the gap between Dev and Ops From agile development to agile delivery

Lean release cycles with automated tests and tools

Deep modelling of systems is the key to automation

29

AgileDevelopment

DevOps

Business Dev Ops

DevOps in DICE: Measurement

30

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

monitoring and incident report

release

Ops

incident report

(performanceunit tests)

Deployment & CI

31

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

monitoring and incident report

early-stage quality

assessment

Ops

incident report

release

(performanceunit tests)

DevOps in DICE: Early-stage MDE

Deployment & CI

32

MySQL

NoSQL

S3

DIA Node 1

DIA Node 2Users

Dev

jenkins

chef

incident report& model correlation

continuousquality engineering

(“shared system view” via MDE)

Ops

incident report

continuous monitoring and enhancement

release

(performanceunit tests)

DevOps in DICE: Enhancement

Deployment & CI

Platform-Indep. Model

Domain Models

DICE Integrated Solution

33

ContinuousEnhancement

ContinuousMonitoring

DataAwareness

ArchitectureModel

Platform-Specific Model

PlatformDescription

DICE MARTE

Deployment &Continuous Integration

DICE IDEQA

Models

Data Intensive Application

Bringing QA and DevOps together

34

Requirements

SLAs

Compare Alternatives

Load testing

Cost Tradeoffs

Monitoring

Capacity Management

Incident Analysis

Deployment

ProfilingSPE Testing

APM

Regression

Bottleneck Identification

Root CauseAnalysis

Feedbacks

DICE

User behaviour

Adaptation

• UML MARTE profile, UML DAM profile, Palladio, …

35

FailureProbability

UsageProfile

SystemBehaviour

Quality-Aware MDE

Platform-Indep. Model

Domain Models

Quality-Aware MDE

36

QAModels

ArchitectureModel

Platform-Specific Model

Code stubgeneration

PlatformDescription

MARTE

Simulation Tools

Cost Optimization Tools

Data Intensive Application

Year 1 Milestones

37

Milestone Deliverables

Baseline andRequirements -July 2015 [COMPLETED]

• State of the art analysis• Requirement specification• Dissemination, communication,

collaboration and standardisation report• Data management plan

ArchitectureDefinition -January 2016

• Design and quality abstractions• DICE simulation tools• DICE verification tools• Monitoring and data warehousing tools• DICE delivery tools• Architecture definition and integration plan• Exploitation plan

Demonstrators

38

Case study Domain Features & Challenges

Distributed data-intensive media system (ATC)

• News & Media• Social media

• Large-scale software• Data velocities• Data volumes• Data granularity• Multiple data sources and channels• Privacy

Big Data for e-Government(Netfective)

• E-Govapplication

• Data volumes• Legacy data• Data consolidation• Data stores• Privacy• Forecasting and data analysis

Geo-fencing (Prodevelop)

• Maritimesector

• Vessels movements• Safety requirements• Streaming & CEP• Geographical information

Recommended