Upload
daniel-varro
View
147
Download
0
Tags:
Embed Size (px)
DESCRIPTION
In model-driven software engineering (MDE), model queries are core technologies of many tool and transformation-specific challenges such as design rule validation, model synchronization, view maintenance, simulation and many more. As software models are rapidly increasing in size and complexity, traditional MDE tools frequently face scalability issues that decrease productivity of engineers and increase development costs. Incremental graph queries offer a graph pattern based language for capturing queries. Furthermore, the result set of a query is cached and incrementally maintained upon model changes to provide instantaneous query response time. In this talk, first a brief overview is given on the EMF-IncQuery framework (which is an official Eclipse subproject). Then we discuss how to incorporate incremental queries over a distributed cloud infrastructure (to scale up from a single-node tool to a cluster of nodes) deployed over popular database back-ends (such as Cassandra. 4store, Neo4J, etc). We present our first benchmarking experiments with IncQuery-D to highlight that distributed incremental model queries can perform significantly better than the native query technologies of the underlying database back-end, especially, for complex queries.
Citation preview
Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems
Distributed Incremental Model Queries over the Cloud:
Engineering and Deployment Challenges
Dániel VarróBudapest University of Technology and Economics
Fault Tolerant Systems Research Group
Outline of the TalkMotivation & Background:• Validation of design rules• Graph pattern matching
Incremental Model Queries: The EMF-IncQuery framework• Language - Execution
Distributed Incremental Model Queries (IncQuery-D)• Architecture -
Performance Benchmarks• Distributed model load• Incremental query evaluation
Main Contributors o István Ráth (lead)o Ákos Horvátho Gábor Bergmanno Ábel Hegedüso Zoltán Ujhelyio Benedek Izsóo Gábor Szárnyaso Csaba Debrecenio Dénes Harmatho József Makaio Dániel Stein
SCALABLE MODEL DRIVEN ENGINEERING
Scalable MDE: The MONDO Project
Models and Languages
• Large and heterogeneous
• Construction• Visualization
Queries and Transformations
• Executed over large models
• Incremental• Lazy• Parallel
Collaboration
• Offline (SVN)• Online (Gdocs)• Many
collaborators• Secure access
Persistent Storage
• Efficient• Secure• Interoperability
Case studies: • validate solutions through real case studies• guided by industrial advisory board
Prototype tools: • open source software• open benchmarks
Academic Partners: • Univ. York (UK) Univ. Autónoma Madrid (ES), ARMINES (FR), BME (HU)
Industrial Partners: • The Open Group (UK), Uninova (PT), Softeam (FR), Soft-Maint (FR), IKERLAN (ES)
MOTIVATION FOR INCREMENTAL MODEL QUERIES
Motivation: Early validation of design rulesSystemSignalGroup design rule (from AUTOSAR)
o A SystemSignal and its group must be in the same IPduo Challenge: find violations quickly in large modelso New difficulties
• reversenavigation
• complexmanualsolution
AUTOSAR: • standardized SW architecture of the automotive industry• now supported by modern modeling toolsDesign Rule/Well-formedness constraint: • each valid car architecture needs to respect• designers are immediately notified if violatedChallenge: • >500 design rules in AUTOSAR tools• >1 million elements in AUTOSAR models• models constantly evolve by designers
Domain-Specific Modeling Languages
Abstract
Meta-model
Model
«type»
Validation of Well-formedness Constraints
Meta-model
Model
pattern switchWOSignal(sw) { Switch(sw); neg find switchHasSignal(sw);}
pattern switchHasSignal(sw) { Switch(sw); Signal(sig); Signal.mountedTo(sig, sw);}
Query
Modify
User
Result
Domain-specific modeling languages
Model sizes in practice Models with 10M+ elements are common:
o Car industryo Avionicso Source code analysis
Models evolve and change continuously
Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.
Application Model sizeSystem models 108
Sensor data 109
Geospatial models 1012
Validation can take hours
MODEL QUERIES AND GRAPH PATTERN MATCHING
What is a model query? For a programmer:
o A piece of code that searches for parts of the model For the scientist:
oQuery = set of constraints that have to be satisfied by (parts of) the (graph) model
o Result = set of model element tuples that satisfy the constraints of the query
oMatch = bind constraint variables to model elements A query engine: Supports
o the definition&executionof model queries
Query(A,B) ∧condi(Ai,Bi) • all tuples of model elements a,b• satisfying the query condition• along the match A=a and B=b• parameters A,B can be input/ output
Graph Pattern Matching for Queries
Match: om: L G
(graph morphism)o CSP:
• Variables: Nodes of L• Constraints: Edges of L• Domain values: G
o Complexity: |G|^|L|
L
Gstraight
left
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switchsensor
routeDefinition
All sensors with a switch that belongs to a route must directly be linked to the same route.
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switchsensor
routeDefinition
Graph Pattern Matching (Local Search)
Search Plan: o Select the first node
to be matchedo Define an ordering on
graph pattern edges Search is restarted from
scratch each time
12
0
3
4
straight
left
route: Route sp: SwitchPosition
switch: Switchsensor: Sensor
switchPosition
switchsensor
routeDefinition
Incremental Graph Pattern Matching
Main idea: More space to less timeo Cache matches of patternso Instantly retrieve match (if valid)o Update caches upon model changeso Notify about relevant changes
Approaches: o TREAT, LEAPS, RETE, …o Tools: VIATRA, GROOVE, MoTE, TCore
straight
left
route sp switch sensor
r1 sp1 sw1
Batch vs. Live Query Scenarios Batch query
(pull / request-driven):1. Designer selects a query2. One/All matches are
calculated3. Rule is applied on one/all
matches4. All Steps 1-3 are redone if
model changes Query results obtained
upon designer demand
Live query(push / event-driven):1. Model is loaded2. Rule system is loaded3. Calculate full match set4. Model is changed (rules
fired or designer updates)5. Iterate Steps 3 and 4 until
rule system is stopped Query results are always
available for designer
INCREMENTAL MODEL QUERIES: THE EMF-INCQUERY PROJECT
• Declarative graph query language• Transitive closure,
Negative cond., etc.• Compositional, reusable
Definition
• Incremental evaluation• Cache result set• Maintain incrementally
upon model change
Execution
• Derived features,• On-the-fly validation• View generation,
Notifications, Soft links, Databinding,
Features
EMF-IncQuery: An Open Source Eclipse Project
http://eclipse.org/incquery
The IncQuery (IQ) Graph Query Language
IQ: declarative query languageo Attribute constraints o Local + global querieso Compositionality+Reusabilility o Recursion, Negation, o Transitive Closureo Syntax: DATALOG style
pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); }pattern head(R, Sen) = { Route.routeDefinition(R, Sen);}
ModelQuery(A,B): • tuples of model elements A, B• satisfying the query condition• enumerate 1 / all instances• A,B can be input or output
route: Route sp: SwitchPosition
Switch: Switchsensor: Sensor
switchPosition
switchsensor
routeDefinition
TOOL DEMO: INCQUERY Development Tools
Query Explorer
Pattern Editor
Queries are applied & updates on-the-fly
• Works with most EMF editors out-of-the-box
• Reveals matches as selection
Incremental Query Evaluation by RETE AUTOSAR well-formedness validation rule
Communication channel
Logical signal Mapping Physical signal
Invalid model fragment
Instance model
Valid model fragment
Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changesRead the changes in the result set (deltas)
Incremental Query Evaluation by RETE
join
join
antijoin
Result set
input nodes
Communication channel
Logical signal Mapping Physical signal
worker nodes
Performance of EMF-INCQUERY Incremental graph queries based on Rete Built for the Eclipse Modeling Framework
model size
runtimebatch queries
incremental queries
Runtime is proportional to the size of the modification.
Performance of EMF-INCQUERY
model size
incremental queries
batch queries
memory limit
Storing partial resultsmemory
consumption
Selected Applications (EMF-IncQuery)• Complex traceability• Query driven views• Abstract models by
derived objects
Toolchain for IMA configs
• Connect to Matlab Simulink model
• Export: Matlab2EMF• Change model in EMF• Re-import:
EMF2Matlab
MATLAB-EMF Bridge
• Live models (refreshed 25 frame/s)
• Complex event processing
Gesture recognition
• Experiments on open source Java projects
• Local search vs. Incremental vs. Native Java code
Detection of bad code smells
• Rules for operations• Complex structural
constraints (as GP)• Hints and guidance• Potentially infinite
state space
Design Space Exploration
• Itemis (developer)• Embraer• Thales• ThyssenKrupp• CERN
Known Users
INCQUERY-D: DISTRIBUTED INCREMENTAL MODEL QUERIES
Goals of INCQUERY-D Objectives
o Distributed incremental pattern matchingo Adaptation of EMF-INCQUERY’s tooling to graph DBso Executed over cloud infrastructure (COTS hardware)
Achieve scalability by avoiding memory bottlenecko Sharding separately
• Data• Indexers• Query network
o In memory: • Index + Query
Assumptions• All Rete nodes fit on a server node• Indexers can be filled efficiently• Modification size model size≪• The application requires the complete result
set of the query (opposed to just one match)
Dimensions of Scalability Infrastructure
o Number of machineso Available memory / CPUo Network performanceo Number of concurrent users
Modelo Model sizeo Model characteristics
Querieso Number of querieso Query complexity
Metrics
From EMF-INCQUERY to INCQUERY-D
Transaction
In-memory EMF model
Rete net
Indexer layer
EMF-INCQUERY
Indexing
In-memory storage
Production network• Stores intermediate query results• Propagates changes
Database shard 0
INCQUERY-D Architecture
Server 1
Database shard 1
Server 2
Database shard 2
Server 3
Database shard 3
Transaction
Server 0
Rete net
Indexer layer
INCQUERY-D
Distributed query evaluation network
Distributed indexer Model access adapter
Distributed indexing, notification
Distributed persistent storage
Distributed production network• Each intermediate node can be allocated
to a different host• Remote internode communication
INCQUERY-D Architecture
Server 1
Database shard 1
Server 2
Database shard 2
Server 3
Database shard 3
Transaction
In-memory EMF modelDatabase shard 0
Server 0
Indexer layer
INCQUERY-D
Indexer Indexer Indexer Indexer
JoinJoin
Antijoin
Akka
Triple store (4store),Document DB (Mongo),RDF over Column family
(Cumulus)
Database shard 0
Termination Protocol in INCQUERY-D
Server 1
Database shard 1
Server 2
Database shard 2
Server 3
Database shard 3
Transaction
Server 0
INCQUERY-D
Indexer Indexer Indexer Indexer
JoinJoin
Antijoin
When a production node reached an ACK message is sent back Stack added to each update msg
• Registers the Rete nodes the message passes through
User retrieves query result
IncQuery-D Architectural Layers
•Gremlin, Cypher
•SPARQL
•IQPL (IncQuery)
High-Level Query Lang
•Distributed Indexers (MONDIX)
•SPARQL
Low-Level Query Lang
•Cayley
•Titan
•4store
Distributed Graph DB
•MongoDB
•Cassandra
•4store
Native Storage
•RDF
•XMI / Ecore
•Property Graphs
Storage Format
• Efficient element access by indices• Local queries
• Global queries• Complex navigations
• Can be transparent (via indexers) • Integrates popular graph storages
• Efficient NoSQL storages• Triple stores
• Standardized data formats• Popular interchange formats
Summary: Key Components of IncQuery-D
Distributed Model Storage
• Adaptable to different back-end storages
• Agnostic to graph repres.
• TripleStores (RDF), EMF,Property graph
Model Access Adapter
• Surrogate key to identify distibuted elements
• Graph manip. API
• Change notifications
Distributed Indexer
• Type-instance indices, etc.
• Stored on multiple servers
• Protects exceeding memory limits
Distributed Query Evaluator
• Distributed RETE network
• Distributed termination protocol
• Constructed and deployed by coordinator node
Decouple and separately distribute Storage, Indexer and Query layers
USAGE PHASES
Load Model
Update Model
Request Result
(1) Loading a Query
Deploy RETE
RETE Network
Allocate RETE
Cloud Infra-
structure
Construct RETE
Load Query
Construct RETE• From EMF-IncQuery specs• Should incorporate
infrastructure constraints
Deploy RETE• Managed by a
coordinator node• Intelligent sharding of
RETE nodes
Load Model
Update Model
Request Result
(2) Loading a Model
Model shards
Deploy RETE
RETE Network
Allocate RETE
Maintain Result Set
Cloud Infra-
structure
Construct RETE
Model Access Adapter
Load Query
Load model• Model traversal• Init indexers• Network
communication
Load Model
Update Model
Request Result
(3) Updating a Model
Model shards
Deploy RETE
RETE Network
Allocate RETE
Maintain Result Set
Cloud Infra-
structure
Construct RETE
Model Access Adapter
Load Query
Model manipulation• Update messages• Create / Delete
Load Model
Update Model
Request Result
(4) Requesting Query Result
Model shards
Deploy RETE
RETE Network
Allocate RETE
Evaluate Query
Maintain Result Set
Cloud Infra-
structure
Construct RETE
Model Access Adapter
Load Query
Evaluate query• Process incoming
messages• Propagate along
RETE network
Retrieve results• instantly
Load Model
Update Model
Request Result
(5) Monitoring and Reconfiguration
Model shards
Deploy RETE
RETE Network
Allocate RETE
Evaluate Query
Maintain Result Set
Monitor & Manage
Cloud Infra-
structure
Construct RETE
Model Access Adapter
Load Query
Visualized on a web-based dashboard
OS metrics JVM metrics Rete metricsAkka metrics
DEPLOYMENT PROCESS FOR DISTRIBUTED RETE
RETE Deployment ProcessQuery
Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); }pattern head(R, Sen) = { Route.routeDefinition(R, Sen);}
route: Route sp: SwitchPosition
Switch: Switchsensor: Sensor
switchPosition
switchsensor
routeDefinition
Tooling: RDF Pattern Language
Vocabulary <railway.rdf> base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#>
pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment_length(Segment, SegmentLength); check("SegmentLength <= 0");}
segment: Segment
segment.length � 0
import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl"
pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment.Segment_length(Segment, SegmentLength); check(SegmentLength <= 0);}
EMF-IncQuery syntax
RDF-IncQuery syntax
Xbase (compiles to Java)
Javascript
RETE Deployment Process Construct language-
independent constraints Resolution of
o syntactic sugar o type information
Query Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
Variables route sp switchParameter sensor
Constraints
Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head
RETE Deployment Process Construct RETE structure
(platform independently) Optimizations:
o Model statisticso Expected usage profile
Query Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
join
join
join
RETE Deployment Process Architecture model
(Cloud infrastructure)o Virtual Machines
• Memory limits• CPU speed• Storage capacity
o Communication Channels• Bandwidth
Specified by a textual DSL (Xtext)
Query Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
1 2
3 4
RETE Deployment ProcessMachine Allocated Nodes
1 In1, In2, Join2
2 In3
3 In4
4 Join1, Join3
Query Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
1 2
3 4
Join1
Join3
Join2
In1 In2 In3 In4
RETE Deployment Process Configuration scripts for
o Deploymento Communication
middleware Derived by automated
code generationo Using Eclipse technology:
EMF-IncQuery + Xtend
Query Language
Query Predicates
RETE Structure
Platform Description
Allocation / Mapping
Deployment Descriptor
DISTRIBUTED PERFORMANCE BENCHMARKS
The Train Benchmark Model validation workload:
o User edits the model o Instant validation of
well-formedness constraints o Model is repaired accordingly
Scenario:o Loado Checko Edito Re-Check
Models:o Randomly generatedo Close to real world instanceso Following different metrics o Customized distributionso Low number of violations
Queries:o Two simple queries
(<2 objects, attributes)o Two complex queries
(4-7 joins, negation, etc.)o Validated match sets
Incremental validationBatch validation
Instance model
Read Check Edit ReCheck!
100x
Evaluation of distributed scalability Extensions to previous work (single workstation)
o Generation of large instance modelso Distributed, parallel loading of modelso Distributed transformation and validation
Benchmark Distributed benchmark
Model size 1K – 13M 1K – 88M
Load method Batch Distributed, parallel
Transformation and validation Single workstation Multiple servers
Load and first validation: load the graph to the databases and execute the query
Transformation: query the graph and delete some elements
Revalidation: execute the query
Batch graph scenarioIncremental scenario – IncQuery-D Load and first validation: load the graph to the databases
and initialize the Rete net and retrieve the results
Transformation RevalidationGraphML
DB shards Result set
Rete net
Load and first validation
DB shards Result set
Rete net
Revalidation: retrieve the results from the Rete net
Transformation: incrementally query the graph and delete some elements, propagate the changes
Benchmark environment Private cloud Different DBMSs Query
o The DBMS’s own query languageo IncQuery-D
SPARQL Gremlin
0.030.06
0.110.23
0.430.86
1.743.47
6.9313.90
27.7655.75
1
10
100
1000
10000
4store IncQuery-D TitanIncQuery-D 4store
Model size [million elements]
Runti
me
[s]
Load and first validation55M model: approx. 15 minutes
Rete network’s initialization
overhead pays off
0.030.06
0.110.23
0.430.86
1.743.47
6.9313.90
27.7655.75
1
10
100
1000
4store IncQuery-D TitanIncQuery-D 4store
Model size [million elements]
Runti
me
[s]
Model modification
1. Elementary model query2. Model modification
– Query from the Rete network’s indexer– Propagation of modifications is fast
2 orders of magnitude
0.030.06
0.110.23
0.430.86
1.743.47
6.9313.90
27.7655.75
0.00
0.01
0.10
1.00
10.00
100.00
1000.00
4store IncQuery-D TitanIncQuery-D 4store
Model size [million elements]
Runti
me
[s]
Revalidation
memory limit
Sub-second response time for models with
88M elements
Different characteristics
Benchmarking Conclusions Memory consumption
o Single workstation: 13M model, 4 GBo Cloud of four servers: 55M model, <4×8 GB
Runtimeo Same order of magnitude and similar characteristics to
the single workstation tool
INCQUERY-D is scalable and significantly more efficient for query evaluation than the native query engines in 4store, Titan and Neo4j
Conclusions