IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges

Budapest University of Technology and EconomicsDepartment of Measurement and Information Systems

Distributed Incremental Model Queries over the Cloud:

Engineering and Deployment Challenges

Dániel VarróBudapest University of Technology and Economics

Fault Tolerant Systems Research Group

Outline of the TalkMotivation & Background:• Validation of design rules• Graph pattern matching

Incremental Model Queries: The EMF-IncQuery framework• Language - Execution

Distributed Incremental Model Queries (IncQuery-D)• Architecture -

Performance Benchmarks• Distributed model load• Incremental query evaluation

Main Contributors o István Ráth (lead)o Ákos Horvátho Gábor Bergmanno Ábel Hegedüso Zoltán Ujhelyio Benedek Izsóo Gábor Szárnyaso Csaba Debrecenio Dénes Harmatho József Makaio Dániel Stein

SCALABLE MODEL DRIVEN ENGINEERING

Scalable MDE: The MONDO Project

Models and Languages

• Large and heterogeneous

• Construction• Visualization

Queries and Transformations

• Executed over large models

• Incremental• Lazy• Parallel

Collaboration

• Offline (SVN)• Online (Gdocs)• Many

collaborators• Secure access

Persistent Storage

• Efficient• Secure• Interoperability

Case studies: • validate solutions through real case studies• guided by industrial advisory board

Prototype tools: • open source software• open benchmarks

Academic Partners: • Univ. York (UK) Univ. Autónoma Madrid (ES), ARMINES (FR), BME (HU)

Industrial Partners: • The Open Group (UK), Uninova (PT), Softeam (FR), Soft-Maint (FR), IKERLAN (ES)

MOTIVATION FOR INCREMENTAL MODEL QUERIES

Motivation: Early validation of design rulesSystemSignalGroup design rule (from AUTOSAR)

o A SystemSignal and its group must be in the same IPduo Challenge: find violations quickly in large modelso New difficulties

• reversenavigation

• complexmanualsolution

AUTOSAR: • standardized SW architecture of the automotive industry• now supported by modern modeling toolsDesign Rule/Well-formedness constraint: • each valid car architecture needs to respect• designers are immediately notified if violatedChallenge: • >500 design rules in AUTOSAR tools• >1 million elements in AUTOSAR models• models constantly evolve by designers

Domain-Specific Modeling Languages

Abstract

Meta-model

Model

«type»

Validation of Well-formedness Constraints

Meta-model

Model

pattern switchWOSignal(sw) { Switch(sw); neg find switchHasSignal(sw);}

pattern switchHasSignal(sw) { Switch(sw); Signal(sig); Signal.mountedTo(sig, sw);}

Query

Modify

User

Result

Domain-specific modeling languages

Model sizes in practice Models with 10M+ elements are common:

o Car industryo Avionicso Source code analysis

Models evolve and change continuously

Source: Markus Scheidgen, How Big are Models – An Estimation, 2012.

Application Model sizeSystem models 108

Sensor data 109

Geospatial models 1012

Validation can take hours

MODEL QUERIES AND GRAPH PATTERN MATCHING

What is a model query? For a programmer:

o A piece of code that searches for parts of the model For the scientist:

oQuery = set of constraints that have to be satisfied by (parts of) the (graph) model

o Result = set of model element tuples that satisfy the constraints of the query

oMatch = bind constraint variables to model elements A query engine: Supports

o the definition&executionof model queries

Query(A,B) ∧condi(Ai,Bi) • all tuples of model elements a,b• satisfying the query condition• along the match A=a and B=b• parameters A,B can be input/ output

Graph Pattern Matching for Queries

Match: om: L G

(graph morphism)o CSP:

• Variables: Nodes of L• Constraints: Edges of L• Domain values: G

o Complexity: |G|^|L|

L

Gstraight

left

route: Route sp: SwitchPosition

switch: Switchsensor: Sensor

switchPosition

switchsensor

routeDefinition

All sensors with a switch that belongs to a route must directly be linked to the same route.



switchPosition

switchsensor

routeDefinition

Graph Pattern Matching (Local Search)

Search Plan: o Select the first node

to be matchedo Define an ordering on

graph pattern edges Search is restarted from

scratch each time

12

0

3

4

straight

left



switchPosition

switchsensor

routeDefinition

Incremental Graph Pattern Matching

Main idea: More space to less timeo Cache matches of patternso Instantly retrieve match (if valid)o Update caches upon model changeso Notify about relevant changes

Approaches: o TREAT, LEAPS, RETE, …o Tools: VIATRA, GROOVE, MoTE, TCore

straight

left

route sp switch sensor

r1 sp1 sw1

Batch vs. Live Query Scenarios Batch query

(pull / request-driven):1. Designer selects a query2. One/All matches are

calculated3. Rule is applied on one/all

matches4. All Steps 1-3 are redone if

model changes Query results obtained

upon designer demand

Live query(push / event-driven):1. Model is loaded2. Rule system is loaded3. Calculate full match set4. Model is changed (rules

fired or designer updates)5. Iterate Steps 3 and 4 until

rule system is stopped Query results are always

available for designer

INCREMENTAL MODEL QUERIES: THE EMF-INCQUERY PROJECT

• Declarative graph query language• Transitive closure,

Negative cond., etc.• Compositional, reusable

Definition

• Incremental evaluation• Cache result set• Maintain incrementally

upon model change

Execution

• Derived features,• On-the-fly validation• View generation,

Notifications, Soft links, Databinding,

Features

EMF-IncQuery: An Open Source Eclipse Project

http://eclipse.org/incquery

The IncQuery (IQ) Graph Query Language

IQ: declarative query languageo Attribute constraints o Local + global querieso Compositionality+Reusabilility o Recursion, Negation, o Transitive Closureo Syntax: DATALOG style

pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); }pattern head(R, Sen) = { Route.routeDefinition(R, Sen);}

ModelQuery(A,B): • tuples of model elements A, B• satisfying the query condition• enumerate 1 / all instances• A,B can be input or output


Switch: Switchsensor: Sensor

switchPosition

switchsensor

routeDefinition

TOOL DEMO: INCQUERY Development Tools

Query Explorer

Pattern Editor

Queries are applied & updates on-the-fly

• Works with most EMF editors out-of-the-box

• Reveals matches as selection

Incremental Query Evaluation by RETE AUTOSAR well-formedness validation rule

Communication channel

Logical signal Mapping Physical signal

Invalid model fragment

Instance model

Valid model fragment

Fill the input nodesFill the worker nodesRead the result setModify the modelPropagate the changesRead the changes in the result set (deltas)

Incremental Query Evaluation by RETE

join

join

antijoin

Result set

input nodes

Communication channel

Logical signal Mapping Physical signal

worker nodes

Performance of EMF-INCQUERY Incremental graph queries based on Rete Built for the Eclipse Modeling Framework

model size

runtimebatch queries

incremental queries

Runtime is proportional to the size of the modification.

Performance of EMF-INCQUERY

model size

incremental queries

batch queries

memory limit

Storing partial resultsmemory

consumption

Selected Applications (EMF-IncQuery)• Complex traceability• Query driven views• Abstract models by

derived objects

Toolchain for IMA configs

• Connect to Matlab Simulink model

• Export: Matlab2EMF• Change model in EMF• Re-import:

EMF2Matlab

MATLAB-EMF Bridge

• Live models (refreshed 25 frame/s)

• Complex event processing

Gesture recognition

• Experiments on open source Java projects

• Local search vs. Incremental vs. Native Java code

Detection of bad code smells

• Rules for operations• Complex structural

constraints (as GP)• Hints and guidance• Potentially infinite

state space

Design Space Exploration

• Itemis (developer)• Embraer• Thales• ThyssenKrupp• CERN

Known Users

INCQUERY-D: DISTRIBUTED INCREMENTAL MODEL QUERIES

Goals of INCQUERY-D Objectives

o Distributed incremental pattern matchingo Adaptation of EMF-INCQUERY’s tooling to graph DBso Executed over cloud infrastructure (COTS hardware)

Achieve scalability by avoiding memory bottlenecko Sharding separately

• Data• Indexers• Query network

o In memory: • Index + Query

Assumptions• All Rete nodes fit on a server node• Indexers can be filled efficiently• Modification size model size≪• The application requires the complete result

set of the query (opposed to just one match)

Dimensions of Scalability Infrastructure

o Number of machineso Available memory / CPUo Network performanceo Number of concurrent users

Modelo Model sizeo Model characteristics

Querieso Number of querieso Query complexity

Metrics

From EMF-INCQUERY to INCQUERY-D

Transaction

In-memory EMF model

Rete net

Indexer layer

EMF-INCQUERY

Indexing

In-memory storage

Production network• Stores intermediate query results• Propagates changes

Database shard 0

INCQUERY-D Architecture

Server 1

Database shard 1

Server 2

Database shard 2

Server 3

Database shard 3

Transaction

Server 0

Rete net

Indexer layer

INCQUERY-D

Distributed query evaluation network

Distributed indexer Model access adapter

Distributed indexing, notification

Distributed persistent storage

Distributed production network• Each intermediate node can be allocated

to a different host• Remote internode communication

INCQUERY-D Architecture

Server 1

Database shard 1

Server 2

Database shard 2

Server 3

Database shard 3

Transaction

In-memory EMF modelDatabase shard 0

Server 0

Indexer layer

INCQUERY-D

Indexer Indexer Indexer Indexer

JoinJoin

Antijoin

Akka

Triple store (4store),Document DB (Mongo),RDF over Column family

(Cumulus)

Database shard 0

Termination Protocol in INCQUERY-D

Server 1

Database shard 1

Server 2

Database shard 2

Server 3

Database shard 3

Transaction

Server 0

INCQUERY-D

Indexer Indexer Indexer Indexer

JoinJoin

Antijoin

When a production node reached an ACK message is sent back Stack added to each update msg

• Registers the Rete nodes the message passes through

User retrieves query result

IncQuery-D Architectural Layers

•Gremlin, Cypher

•SPARQL

•IQPL (IncQuery)

High-Level Query Lang

•Distributed Indexers (MONDIX)

•SPARQL

Low-Level Query Lang

•Cayley

•Titan

•4store

Distributed Graph DB

•MongoDB

•Cassandra

•4store

Native Storage

•RDF

•XMI / Ecore

•Property Graphs

Storage Format

• Efficient element access by indices• Local queries

• Global queries• Complex navigations

• Can be transparent (via indexers) • Integrates popular graph storages

• Efficient NoSQL storages• Triple stores

• Standardized data formats• Popular interchange formats

Summary: Key Components of IncQuery-D

Distributed Model Storage

• Adaptable to different back-end storages

• Agnostic to graph repres.

• TripleStores (RDF), EMF,Property graph

Model Access Adapter

• Surrogate key to identify distibuted elements

• Graph manip. API

• Change notifications

Distributed Indexer

• Type-instance indices, etc.

• Stored on multiple servers

• Protects exceeding memory limits

Distributed Query Evaluator

• Distributed RETE network

• Distributed termination protocol

• Constructed and deployed by coordinator node

Decouple and separately distribute Storage, Indexer and Query layers

USAGE PHASES

Load Model

Update Model

Request Result

(1) Loading a Query

Deploy RETE

RETE Network

Allocate RETE

Cloud Infra-

structure

Construct RETE

Load Query

Construct RETE• From EMF-IncQuery specs• Should incorporate

infrastructure constraints

Deploy RETE• Managed by a

coordinator node• Intelligent sharding of

RETE nodes

Load Model

Update Model

Request Result

(2) Loading a Model

Model shards

Deploy RETE

RETE Network

Allocate RETE

Maintain Result Set

Cloud Infra-

structure

Construct RETE


Load Query

Load model• Model traversal• Init indexers• Network

communication

Load Model

Update Model

Request Result

(3) Updating a Model

Model shards

Deploy RETE

RETE Network

Allocate RETE

Maintain Result Set

Cloud Infra-

structure

Construct RETE


Load Query

Model manipulation• Update messages• Create / Delete

Load Model

Update Model

Request Result

(4) Requesting Query Result

Model shards

Deploy RETE

RETE Network

Allocate RETE

Evaluate Query

Maintain Result Set

Cloud Infra-

structure

Construct RETE


Load Query

Evaluate query• Process incoming

messages• Propagate along

RETE network

Retrieve results• instantly

Load Model

Update Model

Request Result

(5) Monitoring and Reconfiguration

Model shards

Deploy RETE

RETE Network

Allocate RETE

Evaluate Query

Maintain Result Set

Monitor & Manage

Cloud Infra-

structure

Construct RETE


Load Query

Visualized on a web-based dashboard

OS metrics JVM metrics Rete metricsAkka metrics

DEPLOYMENT PROCESS FOR DISTRIBUTED RETE

RETE Deployment ProcessQuery

Language

Query Predicates

RETE Structure

Platform Description

Allocation / Mapping

Deployment Descriptor

pattern routeSensor(sensor: Sensor) = { TrackElement.sensor(switch,sensor); Switch(switch); SwitchPosition. switch(sp, switch); SwitchPosition(sp); Route.switchPosition(route, sp); Route(route); neg find head(route, sensor); }pattern head(R, Sen) = { Route.routeDefinition(R, Sen);}


Switch: Switchsensor: Sensor

switchPosition

switchsensor

routeDefinition

Tooling: RDF Pattern Language

Vocabulary <railway.rdf> base <http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl#>

pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment_length(Segment, SegmentLength); check("SegmentLength <= 0");}

segment: Segment

segment.length � 0

import "http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl"

pattern posLength(Segment, SegmentLength) { Segment(Segment); Segment.Segment_length(Segment, SegmentLength); check(SegmentLength <= 0);}

EMF-IncQuery syntax

RDF-IncQuery syntax

Xbase (compiles to Java)

Javascript

http://www.semanticweb.org/ontologies/2011/1/TrainRequirementOntology.owl

RETE Deployment Process Construct language-

independent constraints Resolution of

o syntactic sugar o type information

Query Language

Query Predicates

RETE Structure




Variables route sp switchParameter sensor

Constraints

Edge: SwitchPosition.switch Edge: TrackElement.sensor Edge: Route.switchPosition Negation: head

RETE Deployment Process Construct RETE structure

(platform independently) Optimizations:

o Model statisticso Expected usage profile

Query Language

Query Predicates

RETE Structure




join

join

join

RETE Deployment Process Architecture model

(Cloud infrastructure)o Virtual Machines

• Memory limits• CPU speed• Storage capacity

o Communication Channels• Bandwidth

Specified by a textual DSL (Xtext)

Query Language

Query Predicates

RETE Structure




1 2

3 4

RETE Deployment ProcessMachine Allocated Nodes

1 In1, In2, Join2

2 In3

3 In4

4 Join1, Join3

Query Language

Query Predicates

RETE Structure




1 2

3 4

Join1

Join3

Join2

In1 In2 In3 In4

RETE Deployment Process Configuration scripts for

o Deploymento Communication

middleware Derived by automated

code generationo Using Eclipse technology:

EMF-IncQuery + Xtend

Query Language

Query Predicates

RETE Structure




DISTRIBUTED PERFORMANCE BENCHMARKS

The Train Benchmark Model validation workload:

o User edits the model o Instant validation of

well-formedness constraints o Model is repaired accordingly

Scenario:o Loado Checko Edito Re-Check

Models:o Randomly generatedo Close to real world instanceso Following different metrics o Customized distributionso Low number of violations

Queries:o Two simple queries

(<2 objects, attributes)o Two complex queries

(4-7 joins, negation, etc.)o Validated match sets

Incremental validationBatch validation

Instance model

Read Check Edit ReCheck!

100x

Evaluation of distributed scalability Extensions to previous work (single workstation)

o Generation of large instance modelso Distributed, parallel loading of modelso Distributed transformation and validation

Benchmark Distributed benchmark

Model size 1K – 13M 1K – 88M

Load method Batch Distributed, parallel

Transformation and validation Single workstation Multiple servers

Load and first validation: load the graph to the databases and execute the query

Transformation: query the graph and delete some elements

Revalidation: execute the query

Batch graph scenarioIncremental scenario – IncQuery-D Load and first validation: load the graph to the databases

and initialize the Rete net and retrieve the results

Transformation RevalidationGraphML

DB shards Result set

Rete net

Load and first validation

DB shards Result set

Rete net

Revalidation: retrieve the results from the Rete net

Transformation: incrementally query the graph and delete some elements, propagate the changes

Benchmark environment Private cloud Different DBMSs Query

o The DBMS’s own query languageo IncQuery-D

SPARQL Gremlin

0.030.06

0.110.23

0.430.86

1.743.47

6.9313.90

27.7655.75

1

10

100

1000

10000

4store IncQuery-D TitanIncQuery-D 4store

Model size [million elements]

Runti

me

[s]

Load and first validation55M model: approx. 15 minutes

Rete network’s initialization

overhead pays off

0.030.06

0.110.23

0.430.86

1.743.47

6.9313.90

27.7655.75

1

10

100

1000



Runti

me

[s]

Model modification

1. Elementary model query2. Model modification

– Query from the Rete network’s indexer– Propagation of modifications is fast

2 orders of magnitude

0.030.06

0.110.23

0.430.86

1.743.47

6.9313.90

27.7655.75

0.00

0.01

0.10

1.00

10.00

100.00

1000.00



Runti

me

[s]

Revalidation

memory limit

Sub-second response time for models with

88M elements

Different characteristics

Benchmarking Conclusions Memory consumption

o Single workstation: 13M model, 4 GBo Cloud of four servers: 55M model, <4×8 GB

Runtimeo Same order of magnitude and similar characteristics to

the single workstation tool

INCQUERY-D is scalable and significantly more efficient for query evaluation than the native query engines in 4store, Titan and Neo4j

Conclusions

Software

IncQuery-D: Distributed Incremental Model Queries over the Cloud: Engineering and Deployment Challenges