16
Research Article A Conceptual Approach to Complex Model Management with Generalized Modelling Patterns and Evolutionary Identification Sergey V. Kovalchuk , 1 Oleg G. Metsker, 1 Anastasia A. Funkner , 1 Ilia O. Kisliakovskii , 1 Nikolay O. Nikitin, 1 Anna V. Kalyuzhnaya , 1 Danila A. Vaganov, 1,2 and Klavdiya O. Bochenina 1 ITMO University, Saint Petersburg, Saint Petersburg, Russia University of Amsterdam, Amsterdam, Netherlands Correspondence should be addressed to Sergey V. Kovalchuk; [email protected] Received 1 June 2018; Accepted 17 September 2018; Published 1 November 2018 Guest Editor: Rafael G´ omez-Bombarelli Copyright © 2018 Sergey V. Kovalchuk et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Complex systems’ modeling and simulation are powerful ways to investigate a multitude of natural phenomena providing extended knowledge on their structure and behavior. However, enhanced modeling and simulation require integration of various data and knowledge sources, models of various kinds (data-driven models, numerical models, simulation models, etc.), and intelligent components in one composite solution. Growing complexity of such composite model leads to the need of specific approaches for management of such model. is need extends where the model itself becomes a complex system. One of the important aspects of complex model management is dealing with the uncertainty of various kinds (context, parametric, structural, and input/output) to control the model. In the situation where a system being modeled, or modeling requirements change over time, specific methods and tools are needed to make modeling and application procedures (metamodeling operations) in an automatic manner. To support automatic building and management of complex models we propose a general evolutionary computation approach which enables managing of complexity and uncertainty of various kinds. e approach is based on an evolutionary investigation of model phase space to identify the best model’s structure and parameters. Examples of different areas (healthcare, hydrometeorology, and social network analysis) were elaborated with the proposed approach and solutions. 1. Introduction Today the area of modeling and simulation of complex sys- tems evolves rapidly. A complex system [1] is usually charac- terized by a large number of elements, complex long-distance interaction between elements, and multiscale variety. One of the results of the area’s development is growing complexity of the models used for investigation of complex systems. As a result, contemporary model of a complex system could be easily characterized by the same features as a natural complex system. Usually, a complexity of a model is considered in tight relation to a complexity of a modeling system. Nevertheless, in many cases, the complexity of a model does not mimic the complexity of a system under investigation (at least exactly). It leads to additional issues in managing a complex model during identification, calibration, data assimilation, verification, validation, and application. One of the core reasons for these issues is the uncertainty of various kinds [2, 3] applied on levels of system, data, and model. In addition, complexity is even more extended within multidisciplinary models and models which incorporate additional complex or/and third-party submodels. From the application point of view, complex models are oſten difficult to support and integrate with a practical solution because of a low level of automation and high modeling skills needed to support and adapt a model to the changing conditions. On the other hand, recently evolutionary approaches are popular for solving various types of model-centered oper- ations like model identification [4], equation-free methods [5], ensemble management [6], data assimilation [7], and others. Evolutionary computation (EC) provides the ability to implement automatic optimization and dynamic adaptation Hindawi Complexity Volume 2018, Article ID 5870987, 15 pages https://doi.org/10.1155/2018/5870987

A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Research ArticleA Conceptual Approach to Complex Model Management withGeneralized Modelling Patterns and Evolutionary Identification

Sergey V Kovalchuk 1 Oleg G Metsker1 Anastasia A Funkner 1

Ilia O Kisliakovskii 1 Nikolay O Nikitin1 Anna V Kalyuzhnaya 1

Danila A Vaganov12 and Klavdiya O Bochenina 1

1 ITMO University Saint Petersburg Saint Petersburg Russia2University of Amsterdam Amsterdam Netherlands

Correspondence should be addressed to Sergey V Kovalchuk sergeyvkovalchukgmailcom

Received 1 June 2018 Accepted 17 September 2018 Published 1 November 2018

Guest Editor Rafael Gomez-Bombarelli

Copyright copy 2018 Sergey V Kovalchuk et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Complex systemsrsquo modeling and simulation are powerful ways to investigate a multitude of natural phenomena providing extendedknowledge on their structure and behavior However enhanced modeling and simulation require integration of various data andknowledge sources models of various kinds (data-driven models numerical models simulation models etc) and intelligentcomponents in one composite solution Growing complexity of such composite model leads to the need of specific approachesfor management of suchmodelThis need extends where the model itself becomes a complex system One of the important aspectsof complex model management is dealing with the uncertainty of various kinds (context parametric structural and inputoutput)to control the model In the situation where a system beingmodeled or modeling requirements change over time specific methodsand tools are needed tomakemodeling and application procedures (metamodeling operations) in an automaticmanner To supportautomatic building and management of complex models we propose a general evolutionary computation approach which enablesmanaging of complexity and uncertainty of various kinds The approach is based on an evolutionary investigation of model phasespace to identify the best modelrsquos structure and parameters Examples of different areas (healthcare hydrometeorology and socialnetwork analysis) were elaborated with the proposed approach and solutions

1 Introduction

Today the area of modeling and simulation of complex sys-tems evolves rapidly A complex system [1] is usually charac-terized by a large number of elements complex long-distanceinteraction between elements and multiscale variety One ofthe results of the arearsquos development is growing complexityof the models used for investigation of complex systems Asa result contemporary model of a complex system could beeasily characterized by the same features as a natural complexsystemUsually a complexity of amodel is considered in tightrelation to a complexity of a modeling system Neverthelessin many cases the complexity of a model does not mimicthe complexity of a system under investigation (at leastexactly) It leads to additional issues in managing a complexmodel during identification calibration data assimilation

verification validation and application One of the corereasons for these issues is the uncertainty of various kinds[2 3] applied on levels of system data andmodel In additioncomplexity is even more extended within multidisciplinarymodels and models which incorporate additional complexorand third-party submodels From the application pointof view complex models are often difficult to support andintegrate with a practical solution because of a low level ofautomation and high modeling skills needed to support andadapt a model to the changing conditions

On the other hand recently evolutionary approaches arepopular for solving various types of model-centered oper-ations like model identification [4] equation-free methods[5] ensemble management [6] data assimilation [7] andothers Evolutionary computation (EC) provides the ability toimplement automatic optimization and dynamic adaptation

HindawiComplexityVolume 2018 Article ID 5870987 15 pageshttpsdoiorg10115520185870987

2 Complexity

Ξ

Σ

Parameters Functional characteristics

Structure

S

D

M

Layers

- F-models- DD-models- A-models- complicated transitions- expert knowledge

Φ

Operators ( )LA

Figure 1 Basic concepts of complex modeling on model (119872) data (119863) and system (119878) layers

of the system within a complex state space Still most ofthe solutions are still tightly related to the application andmodeling system

Within the current research we are trying to develop aunified conceptual and technological approach to supportcore operation with a complex model by distinguishing con-cepts and operations on model data and system levels Weconsider a combination of EC and data-driven approachesas a tool for building intelligent solutions for more preciseand systematic managing (and lowering) uncertainty andproviding the required level of automation adaptability andextendibility

2 Conceptual Basis

The proposed approach is based on several key ideas aimedto extend uncertainty management in complex system mod-eling and simulation

(1) Disjoint consideration of model data and system interms of structure behavior and quality is aimed towarda system-level review of modeling and simulation processand distinguishes between the uncertainties of various kindsoriginated from different level [8]

(2) Intelligent technologies like data mining processmining machine learning and knowledge-based approachesare to be hired to fill the gap in automation of modeling andsimulation Key sources for the development of such solutioninclude formalization of various knowledgewithin composite

solutions [9] and data-driven technologies to support theidentification of model components

(3) EC approaches are widespread in modeling andsimulation of complex systems [8 10] We believe thatsystematization of this process with separate consideration ofspaces for a system (with its subsystems) and a model (withits submodels) could enhance such solutions significantly

(4) The aim of the approachrsquos development is twofoldFirst it is aimed towards automation of modeling opera-tions to extend the functionality of possible model-basedapplications Second working with a combination of EC andintelligent data-driven technologies could be considered as anadditional knowledge source for system and model analysis

Furtherly this section considers the conceptual basis ofthe proposed approach with a special focus on the role ofEC algorithms and data-driven intelligent technologies forbuilding and exploiting complex models

21 Core Concepts To distinguish between main modelingconcepts and operations we propose a conceptual framework(see Figure 1) for consideration of key processes and opera-tions duringmodeling of the complex systemThe frameworkmay be considered as a generalization and extension of aframework [11 12] previously defined and used by authorsfor ensemble-based simulation Current research extendsthe concept beyond ensemble-based simulation It is mainlyfocused on complex modeling in general with identifica-tion of key model management procedures and important

Complexity 3

artifacts which can be used for model development andapplication

The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860

119871 where 119860 and

119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ

119878997888rarr119863reflects observation of quantitative param-

eters and operator ΓΞ119863997888rarr119872

stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ

119872and ΓΣ997888rarrΞ119872

are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass

A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ

119878997888rarr119863and ΓΣ119878997888rarr119863

) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872

and ΓΣ119878997888rarr119872

) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ

119863and ΓΞ997888rarrΦ119863

for mining in availabledata ΓΦ997888rarrΞ

119863997888rarr119872for extended discovery of model parameters for

various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ

119863997888rarr119872

ΓΞ997888rarrΦ119863997888rarr119872

ΓΣ119863997888rarr119872

and ΓΦ119863997888rarr119872

for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863

ΓΦ997888rarrΣ119863

ΓΣ997888rarrΦ119872

and ΓΦ997888rarrΣ119872

for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding

22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of

complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC

The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860

119871and similar

notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a

basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ

119878997888rarr119872and ΓΣ

119878997888rarr119872)

(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863

and ΓΞ119863997888rarr119872

Results of model application (Γ1015840Ξ119872997888rarr119863

Γ1015840Σ

119872997888rarr119863 and Γ1015840Φ

119872997888rarr119863) could be considered from descriptive

(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ

119863997888rarr119878 Γ1015840Σ119863997888rarr119878

and Γ1015840Φ

119863997888rarr119878) forming an optimization loop which can be

considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons

P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ

119863) (b) generating surrogate models

for functional characteristics (ΓΞ997888rarrΦ119863

) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ

119863) In contrast to the previous

pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)

P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition

4 Complexity

ΦΣ Ξ

S

D

M

Expe

rtise

Expe

rtise

Quality assessment

Descriptivemodeling

Predictive modeling

Obs

erva

tion

Inpu

tD

A

- data processing

- modeling and simulation- expertise intelligent solution

(a)

ΦΣ Ξ

S

D

M

Mining (DMPM)

ML

Surrogate

Obs

erva

tion

- data processing- data-driven solution EC

(b)

ΦΣ Ξ

S

D

M E1

E2

E4

E3

E1 E5

- data processing- modeling and simulation

(c)

Quality prediction

ΦΣ Ξ

S

D

MMining

(DMPM)

Surrogate

- data processing- modeling and simulation- data-driven solution EC

(d)

ΦΣ Ξ

S

D

M

Mining + EC

EC EC

Surrogate+EC

- data processing- modeling and simulation- data-driven solution EC

(e)

ΦΣ Ξ

S

D

M

EC ECHistory

Generating landscape

Extended quality assessment

- data processing

- modeling and simulation- expertise intelligent solution

- data-driven solution EC

(f)

Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling

ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling

P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872

) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ

119863997888rarr119872) In addition it is possible

to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ

119863997888rarr119878

Γ1015840Ξ997888rarrΣ

119863997888rarr119878 and Γ1015840Ξ997888rarrΦ

119863997888rarr119878)

P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872

) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872

) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)

P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ

119863997888rarr119878and Γ1015840Σ

119872997888rarr119863) and functional

(Γ1015840Ξ997888rarrΦ119863997888rarr119878

and Γ1015840Φ119872997888rarr119863

) characteristics of the actual systemThese patterns could be easily combined to obtain better

results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5

Complexity 5

System

MComposite model

M1 MN Identification calibration

Composition

Modeling and simulation

Observation

Modeling and simulation results

Data assimilation

Data processing

Observation data

Management and control for S D M

EC d

ata-

driv

en a

nd in

telli

gent

pr

oced

ures

Model selection

Domain knowledge

Intelligent proceduresrsquo tuning

System

Data

ModelIntelligentprocedures

ProceduresArtifacts

M1 M

N

Figure 3 Artifacts and procedures within a typical composite solution

P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure

Several important goals may be reached within the pre-sented patterns

(i) automation of complex model management withintelligent solutions DD-models and EC

(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance

(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system

23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned

models 11987210158401 1198721015840

119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel

The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied

(i) to rank and select alternative models(ii) to support model identification calibration compo-

sition and application(iii) to manage artifacts on various conceptual layers in a

systematic way(iv) to infer implicit knowledge from available data and

explicitly presented domain knowledgeThe shown example draws a brief view on the compos-

ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25

24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

2 Complexity

Ξ

Σ

Parameters Functional characteristics

Structure

S

D

M

Layers

- F-models- DD-models- A-models- complicated transitions- expert knowledge

Φ

Operators ( )LA

Figure 1 Basic concepts of complex modeling on model (119872) data (119863) and system (119878) layers

of the system within a complex state space Still most ofthe solutions are still tightly related to the application andmodeling system

Within the current research we are trying to develop aunified conceptual and technological approach to supportcore operation with a complex model by distinguishing con-cepts and operations on model data and system levels Weconsider a combination of EC and data-driven approachesas a tool for building intelligent solutions for more preciseand systematic managing (and lowering) uncertainty andproviding the required level of automation adaptability andextendibility

2 Conceptual Basis

The proposed approach is based on several key ideas aimedto extend uncertainty management in complex system mod-eling and simulation

(1) Disjoint consideration of model data and system interms of structure behavior and quality is aimed towarda system-level review of modeling and simulation processand distinguishes between the uncertainties of various kindsoriginated from different level [8]

(2) Intelligent technologies like data mining processmining machine learning and knowledge-based approachesare to be hired to fill the gap in automation of modeling andsimulation Key sources for the development of such solutioninclude formalization of various knowledgewithin composite

solutions [9] and data-driven technologies to support theidentification of model components

(3) EC approaches are widespread in modeling andsimulation of complex systems [8 10] We believe thatsystematization of this process with separate consideration ofspaces for a system (with its subsystems) and a model (withits submodels) could enhance such solutions significantly

(4) The aim of the approachrsquos development is twofoldFirst it is aimed towards automation of modeling opera-tions to extend the functionality of possible model-basedapplications Second working with a combination of EC andintelligent data-driven technologies could be considered as anadditional knowledge source for system and model analysis

Furtherly this section considers the conceptual basis ofthe proposed approach with a special focus on the role ofEC algorithms and data-driven intelligent technologies forbuilding and exploiting complex models

21 Core Concepts To distinguish between main modelingconcepts and operations we propose a conceptual framework(see Figure 1) for consideration of key processes and opera-tions duringmodeling of the complex systemThe frameworkmay be considered as a generalization and extension of aframework [11 12] previously defined and used by authorsfor ensemble-based simulation Current research extendsthe concept beyond ensemble-based simulation It is mainlyfocused on complex modeling in general with identifica-tion of key model management procedures and important

Complexity 3

artifacts which can be used for model development andapplication

The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860

119871 where 119860 and

119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ

119878997888rarr119863reflects observation of quantitative param-

eters and operator ΓΞ119863997888rarr119872

stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ

119872and ΓΣ997888rarrΞ119872

are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass

A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ

119878997888rarr119863and ΓΣ119878997888rarr119863

) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872

and ΓΣ119878997888rarr119872

) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ

119863and ΓΞ997888rarrΦ119863

for mining in availabledata ΓΦ997888rarrΞ

119863997888rarr119872for extended discovery of model parameters for

various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ

119863997888rarr119872

ΓΞ997888rarrΦ119863997888rarr119872

ΓΣ119863997888rarr119872

and ΓΦ119863997888rarr119872

for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863

ΓΦ997888rarrΣ119863

ΓΣ997888rarrΦ119872

and ΓΦ997888rarrΣ119872

for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding

22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of

complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC

The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860

119871and similar

notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a

basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ

119878997888rarr119872and ΓΣ

119878997888rarr119872)

(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863

and ΓΞ119863997888rarr119872

Results of model application (Γ1015840Ξ119872997888rarr119863

Γ1015840Σ

119872997888rarr119863 and Γ1015840Φ

119872997888rarr119863) could be considered from descriptive

(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ

119863997888rarr119878 Γ1015840Σ119863997888rarr119878

and Γ1015840Φ

119863997888rarr119878) forming an optimization loop which can be

considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons

P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ

119863) (b) generating surrogate models

for functional characteristics (ΓΞ997888rarrΦ119863

) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ

119863) In contrast to the previous

pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)

P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition

4 Complexity

ΦΣ Ξ

S

D

M

Expe

rtise

Expe

rtise

Quality assessment

Descriptivemodeling

Predictive modeling

Obs

erva

tion

Inpu

tD

A

- data processing

- modeling and simulation- expertise intelligent solution

(a)

ΦΣ Ξ

S

D

M

Mining (DMPM)

ML

Surrogate

Obs

erva

tion

- data processing- data-driven solution EC

(b)

ΦΣ Ξ

S

D

M E1

E2

E4

E3

E1 E5

- data processing- modeling and simulation

(c)

Quality prediction

ΦΣ Ξ

S

D

MMining

(DMPM)

Surrogate

- data processing- modeling and simulation- data-driven solution EC

(d)

ΦΣ Ξ

S

D

M

Mining + EC

EC EC

Surrogate+EC

- data processing- modeling and simulation- data-driven solution EC

(e)

ΦΣ Ξ

S

D

M

EC ECHistory

Generating landscape

Extended quality assessment

- data processing

- modeling and simulation- expertise intelligent solution

- data-driven solution EC

(f)

Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling

ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling

P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872

) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ

119863997888rarr119872) In addition it is possible

to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ

119863997888rarr119878

Γ1015840Ξ997888rarrΣ

119863997888rarr119878 and Γ1015840Ξ997888rarrΦ

119863997888rarr119878)

P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872

) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872

) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)

P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ

119863997888rarr119878and Γ1015840Σ

119872997888rarr119863) and functional

(Γ1015840Ξ997888rarrΦ119863997888rarr119878

and Γ1015840Φ119872997888rarr119863

) characteristics of the actual systemThese patterns could be easily combined to obtain better

results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5

Complexity 5

System

MComposite model

M1 MN Identification calibration

Composition

Modeling and simulation

Observation

Modeling and simulation results

Data assimilation

Data processing

Observation data

Management and control for S D M

EC d

ata-

driv

en a

nd in

telli

gent

pr

oced

ures

Model selection

Domain knowledge

Intelligent proceduresrsquo tuning

System

Data

ModelIntelligentprocedures

ProceduresArtifacts

M1 M

N

Figure 3 Artifacts and procedures within a typical composite solution

P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure

Several important goals may be reached within the pre-sented patterns

(i) automation of complex model management withintelligent solutions DD-models and EC

(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance

(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system

23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned

models 11987210158401 1198721015840

119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel

The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied

(i) to rank and select alternative models(ii) to support model identification calibration compo-

sition and application(iii) to manage artifacts on various conceptual layers in a

systematic way(iv) to infer implicit knowledge from available data and

explicitly presented domain knowledgeThe shown example draws a brief view on the compos-

ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25

24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 3

artifacts which can be used for model development andapplication

The proposed framework considers three main layers ofcomplex systemsrsquo modeling namely model (119872) data (119863)and system (119878) Main operations (arrows on the diagram)within the framework are defined within three conceptsquantitative parameters (Ξ) functional characteristics (Φ)and structure (Σ) We denote operations by Γ119860

119871 where 119860 and

119871 stay for concepts and layer (respectively) involved in theoperation Transitions between concepts and between layersare denoted by 1198601 997888rarr 1198602 and 1198711 997888rarr 1198712 respectively egoperator ΓΞ

119878997888rarr119863reflects observation of quantitative param-

eters and operator ΓΞ119863997888rarr119872

stays for basic data assimilationAlso a set of operators may refer to a single modelingoperation eg operators ΓΦ997888rarrΞ

119872and ΓΣ997888rarrΞ119872

are often imple-mented within a single monolithic model Mainly operatorsare related to the specific submodel within a complex modelWe consider three key classes of models F-models are usuallyclassical continuous models developed with knowledge ofa system DD-models are data-driven models based onanalysis of available data sets with corresponding techniques(statistics data mining process mining etc) A-models aremainly intelligent components of a system usually based onmachine learning or knowledge-based approaches Also weconsider EC-based components as belonging to A-modelsclass

A key problemwithin complex systemmodeling and sim-ulation is related to the absent or at least significantly limitedpossibility to observe the structure and functional charac-teristics of the system (operators ΓΦ

119878997888rarr119863and ΓΣ119878997888rarr119863

) directlyThe general solution usually includes implicit substitutionof the operators with the expertise of modeler (operatorsΓΦ119878997888rarr119872

and ΓΣ119878997888rarr119872

) Still the more complex the system underinvestigation and the model are the more limited those oper-ations are To overcome this issue additional DD-models areinvolved (operators ΓΞ997888rarrΣ

119863and ΓΞ997888rarrΦ119863

for mining in availabledata ΓΦ997888rarrΞ

119863997888rarr119872for extended discovery of model parameters for

various functional characteristics) Also A-models are hiredto extend expert knowledge in discovery of119872-layer conceptswith either formalized knowledge or knowledge discoveredin data with machine learning approaches (operators ΓΞ997888rarrΣ

119863997888rarr119872

ΓΞ997888rarrΦ119863997888rarr119872

ΓΣ119863997888rarr119872

and ΓΦ119863997888rarr119872

for direct discovery of struc-ture and functional characteristics directly and operatorsΓΣ997888rarrΦ119863

ΓΦ997888rarrΣ119863

ΓΣ997888rarrΦ119872

and ΓΦ997888rarrΣ119872

for interconnection ofdiscovered characteristics in available data and within theused model) In the proposed approach primary attentionis paid to these kinds of solutions where DD- and A-modelsenable enhancement of complex modeling process with anadditional level of automation adaptation and knowledgeproviding

22 Complex Modeling Patterns Considering the definedconceptual framework we identify several patterns of mod-eling and simulation of a complex system (see Figure 2)The patterns are defined as combinations in a context of theframework described previously (3 layers 3 concepts) Anessential idea of the proposed patterns is systematization of

complex model management approaches with combinationsof expertise intelligent solution (A-models) DD-models andEC

The pattern extends the operators described in Section 21for model building with operators for model application (iemodelling and simulation) and results analysis (eg assessingmodel quality) required for automated model identificationThese additional Operators are denoted with Γ1015840119860

119871and similar

notation for indicesP1 Regular modeling of a system (Figure 2(a)) is a

basic pattern usually applied to discover new knowledgeon the system under investigation A model is built using(a) expertise of modeled for identification of structure andfunctional characteristics of the model (ΓΦ

119878997888rarr119872and ΓΣ

119878997888rarr119872)

(b) available input data usually representing quantitativeparameters of a system considered as a static input of themodel or source for data assimilation (DA) via operatorsΓΞ119878997888rarr119863

and ΓΞ119863997888rarr119872

Results of model application (Γ1015840Ξ119872997888rarr119863

Γ1015840Σ

119872997888rarr119863 and Γ1015840Φ

119872997888rarr119863) could be considered from descriptive

(mainly structural or quantitative characteristics) or predic-tive (often forecasting or other functional characteristics)The obtained results are analyzed in comparison to availableinformation about the investigated system (Γ1015840Ξ

119863997888rarr119878 Γ1015840Σ119863997888rarr119878

and Γ1015840Φ

119863997888rarr119878) forming an optimization loop which can be

considered within the scope of all three concepts Certainlimitations within this pattern being applied to complexsystem modeling and simulation are introduced by twofactors First workingwith complex structural and functionalcharacteristics of the model requires a high level of expertisewhich leads to a limitation of extensibility and automationof model operation Second performing optimization in aloop with most algorithms require multiple runs of a modelAs a result computational-intensive models have limitationsin optimization-based operations (identification calibrationetc) due to performance reasons

P2 Data-driven modeling (Figure 2(b)) provides anextension to the modeling operation describing the rela-tionship between data attributes Application of data-drivenmodels may be considered as replacement of actual ldquofullrdquomodel providing (a) information about structure of systemand model with data mining (DM) and process mining(PM) techniques (ΓΞ997888rarrΣ

119863) (b) generating surrogate models

for functional characteristics (ΓΞ997888rarrΦ119863

) (c) providing estima-tion of investigated parameters with machine learning (ML)algorithms and models (Γ1015840Ξ997888rarrΞ

119863) In contrast to the previous

pattern data-driven models usually operate quickly (althoughit could require significant time to train the model) Stillsuch models have lower quality than original ldquofullrdquo modelsNevertheless combining this pattern with others providessignificant enhancement in functionality and performanceeg data-drivenmodels can be used in optimization loop (seeprevious pattern)

P3 Ensemble-based modeling (Figure 2(c)) extends P1for working with sets of objects (models data sets andstates) reflecting uncertainty variability or alternative solu-tions (eg models) Previously [11] we identified 5 classesof ensembles (see E1-E5 in Figure 2(c)) decomposition

4 Complexity

ΦΣ Ξ

S

D

M

Expe

rtise

Expe

rtise

Quality assessment

Descriptivemodeling

Predictive modeling

Obs

erva

tion

Inpu

tD

A

- data processing

- modeling and simulation- expertise intelligent solution

(a)

ΦΣ Ξ

S

D

M

Mining (DMPM)

ML

Surrogate

Obs

erva

tion

- data processing- data-driven solution EC

(b)

ΦΣ Ξ

S

D

M E1

E2

E4

E3

E1 E5

- data processing- modeling and simulation

(c)

Quality prediction

ΦΣ Ξ

S

D

MMining

(DMPM)

Surrogate

- data processing- modeling and simulation- data-driven solution EC

(d)

ΦΣ Ξ

S

D

M

Mining + EC

EC EC

Surrogate+EC

- data processing- modeling and simulation- data-driven solution EC

(e)

ΦΣ Ξ

S

D

M

EC ECHistory

Generating landscape

Extended quality assessment

- data processing

- modeling and simulation- expertise intelligent solution

- data-driven solution EC

(f)

Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling

ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling

P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872

) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ

119863997888rarr119872) In addition it is possible

to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ

119863997888rarr119878

Γ1015840Ξ997888rarrΣ

119863997888rarr119878 and Γ1015840Ξ997888rarrΦ

119863997888rarr119878)

P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872

) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872

) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)

P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ

119863997888rarr119878and Γ1015840Σ

119872997888rarr119863) and functional

(Γ1015840Ξ997888rarrΦ119863997888rarr119878

and Γ1015840Φ119872997888rarr119863

) characteristics of the actual systemThese patterns could be easily combined to obtain better

results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5

Complexity 5

System

MComposite model

M1 MN Identification calibration

Composition

Modeling and simulation

Observation

Modeling and simulation results

Data assimilation

Data processing

Observation data

Management and control for S D M

EC d

ata-

driv

en a

nd in

telli

gent

pr

oced

ures

Model selection

Domain knowledge

Intelligent proceduresrsquo tuning

System

Data

ModelIntelligentprocedures

ProceduresArtifacts

M1 M

N

Figure 3 Artifacts and procedures within a typical composite solution

P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure

Several important goals may be reached within the pre-sented patterns

(i) automation of complex model management withintelligent solutions DD-models and EC

(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance

(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system

23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned

models 11987210158401 1198721015840

119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel

The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied

(i) to rank and select alternative models(ii) to support model identification calibration compo-

sition and application(iii) to manage artifacts on various conceptual layers in a

systematic way(iv) to infer implicit knowledge from available data and

explicitly presented domain knowledgeThe shown example draws a brief view on the compos-

ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25

24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

4 Complexity

ΦΣ Ξ

S

D

M

Expe

rtise

Expe

rtise

Quality assessment

Descriptivemodeling

Predictive modeling

Obs

erva

tion

Inpu

tD

A

- data processing

- modeling and simulation- expertise intelligent solution

(a)

ΦΣ Ξ

S

D

M

Mining (DMPM)

ML

Surrogate

Obs

erva

tion

- data processing- data-driven solution EC

(b)

ΦΣ Ξ

S

D

M E1

E2

E4

E3

E1 E5

- data processing- modeling and simulation

(c)

Quality prediction

ΦΣ Ξ

S

D

MMining

(DMPM)

Surrogate

- data processing- modeling and simulation- data-driven solution EC

(d)

ΦΣ Ξ

S

D

M

Mining + EC

EC EC

Surrogate+EC

- data processing- modeling and simulation- data-driven solution EC

(e)

ΦΣ Ξ

S

D

M

EC ECHistory

Generating landscape

Extended quality assessment

- data processing

- modeling and simulation- expertise intelligent solution

- data-driven solution EC

(f)

Figure 2 Complexmodeling patterns (a) regularmodeling (b) data-drivenmodeling (c) ensemble-basedmodeling (d) data-driven supportof complex modeling (e) EC in hybrid complex modeling (f) evolutionary space discovery in hybrid complex modeling

ensemble alternative models ensemble data-driven ensem-ble parameter diversity ensemble and metaensemble Allthese patterns can be applied within a context of the proposedframework Still an extension of ensemble structure increasesstructural complexity of the model and thus leads to the needfor additional (automatic) control procedures Moreoverthe performance issues of P1 are getting even worthier inensemble modeling

P4 One of the key ideas of the proposed approach isan implementation of data-driven analysis of model statesstructure and behavior To implement it within a conceptualframework we propose pattern for data-driven complexmodeling (Figure 2(d)) It includes identification and predic-tion of a model structure through DM and PM techniques(ΓΞ997888rarrΣ119863997888rarr119872

) and generation of surrogate models for injectioninto the complex model (ΓΞ997888rarrΦ

119863997888rarr119872) In addition it is possible

to use data-driven techniques to predict the quality of theconsidered model and use it for model optimization (Γ1015840Ξ

119863997888rarr119878

Γ1015840Ξ997888rarrΣ

119863997888rarr119878 and Γ1015840Ξ997888rarrΦ

119863997888rarr119878)

P5 A key pattern for EC implementation is presented inFigure 2(e) Here EC is used to identify a model structure(ΓΞ997888rarrΣ119863997888rarr119872

) and surrogate submodels (ΓΞ997888rarrΦ119863997888rarr119872

) with a consider-ation of population of models As a result modeling result isalso (as well as in P3) presented in multiple instances whichmay be analyzed filtered end evolved within consequentiterations over changing time (and processing of comingobservations of the system) or within a single timestamp (andfixed observation data)

P6 Finally last presented pattern (Figure 2(f)) is aimedat investigation of system phase space using DD-modelsandor EC to reflect unobservable landscape for estimationof model positioning assessing its quality in inferring of(sub-)optimal structural (Γ1015840Ξ997888rarrΣ

119863997888rarr119878and Γ1015840Σ

119872997888rarr119863) and functional

(Γ1015840Ξ997888rarrΦ119863997888rarr119878

and Γ1015840Φ119872997888rarr119863

) characteristics of the actual systemThese patterns could be easily combined to obtain better

results within a specific application Especial interest fromthe point of view of EC is attracted to the patterns wherea set of models (or sub-model) instances is considered (P5

Complexity 5

System

MComposite model

M1 MN Identification calibration

Composition

Modeling and simulation

Observation

Modeling and simulation results

Data assimilation

Data processing

Observation data

Management and control for S D M

EC d

ata-

driv

en a

nd in

telli

gent

pr

oced

ures

Model selection

Domain knowledge

Intelligent proceduresrsquo tuning

System

Data

ModelIntelligentprocedures

ProceduresArtifacts

M1 M

N

Figure 3 Artifacts and procedures within a typical composite solution

P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure

Several important goals may be reached within the pre-sented patterns

(i) automation of complex model management withintelligent solutions DD-models and EC

(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance

(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system

23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned

models 11987210158401 1198721015840

119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel

The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied

(i) to rank and select alternative models(ii) to support model identification calibration compo-

sition and application(iii) to manage artifacts on various conceptual layers in a

systematic way(iv) to infer implicit knowledge from available data and

explicitly presented domain knowledgeThe shown example draws a brief view on the compos-

ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25

24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 5

System

MComposite model

M1 MN Identification calibration

Composition

Modeling and simulation

Observation

Modeling and simulation results

Data assimilation

Data processing

Observation data

Management and control for S D M

EC d

ata-

driv

en a

nd in

telli

gent

pr

oced

ures

Model selection

Domain knowledge

Intelligent proceduresrsquo tuning

System

Data

ModelIntelligentprocedures

ProceduresArtifacts

M1 M

N

Figure 3 Artifacts and procedures within a typical composite solution

P6) It is possible to consider ensemble-based techniques(P3) in a fashion of EC but within our approach we preferconsideration of ensemble as a composite model with severalsubmodels In that case ensemble management refers to theconcept of complex model structure

Several important goals may be reached within the pre-sented patterns

(i) automation of complex model management withintelligent solutions DD-models and EC

(ii) optimization of model structure and applicationunder defined limitations in precision and perfor-mance

(iii) enhanced ways of domain knowledge discovery forapplications and general investigation of a system

23 Composite Solution Development The proposed struc-ture of core concepts and patterns may be applied in variousways to form a solution which combine operators withoriginal implementationwithin the solutions or implementedas external model calls Figure 3 shows the essential elements(artifacts and procedures) in a typical composite solutionwithin the proposed conceptual layers (119878 119863 and 119872) 119878-layer includes actual systemrsquos state which can be assessedthrough the observation procedure and described by explicitdomain knowledge 119863-layer includes datasets divided intoobservation data and simulationmodeling data with proce-dures for data processing and data assimilation Finally 119872-layer includes a set of available basicmodels1198721 119872119873whichmay be identified calibrated with available data having tuned

models 11987210158401 1198721015840

119873 as a result Here essential elementsare model composition (which may be performed eitherautomatically or by the modeler) and application of themodel

The key benefit of the approach is an application of acombination of EC data-driven and intelligent proceduresto manage the whole composite solution including dataprocessing modeling and simulation to lower uncertainty inΣtimesΞtimesΦ Within the shown structure these procedures maybe applied

(i) to rank and select alternative models(ii) to support model identification calibration compo-

sition and application(iii) to manage artifacts on various conceptual layers in a

systematic way(iv) to infer implicit knowledge from available data and

explicitly presented domain knowledgeThe shown example draws a brief view on the compos-

ite solution development while the particular details maydiffer depending on a particular application Key importantprocedures within the proposed composite solution are theimplementation of intelligent procedures to support modelidentification and systematic management of compositemodel are considered in Sections 24 and 25

24 Evolutionary Model Identification Implementing evolu-tion of models within a complex modeling task structurefunctional and quantitative parameters are usually consid-ered as genotype whereas model output (data layer) are

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

6 Complexity

considered as phenotype Within the proposed approach wecan adapt basic EC operations definition within genotype-phenotype mapping [13]

(i) epigenesis as model application 1198911 119878 times 119872 997888rarr 119863(ii) selection 1198912 119878 times 119863 997888rarr 119863(iii) genotype survival 1198913 119878 times 119863 997888rarr 119872(iv) mutation 1198914 119872 997888rarr 119872

In addition we consider quality assessment usuallytreated as fitness for selection and survival (or in morecomplex algorithms for controlling of other operations likemutation)

(i) data quality 119902119889 119878 times 119863 997888rarr 119876119889(ii) model quality 119902119898 119878 times 119872 997888rarr 119876119898

Here 119876119889 and 119876119898 are often considered as R119873 withsome quantitative quality metrics Model quality usually areconsidered through data quality ie 119902119898 sim 119902119889(119904 1198911(119904119898))but within our approach this separation is considered asimportant because in addition we introduce supporting oper-ations with data-driven procedures as in complex modelingmany of these functions (first of all 1198911 119902119898 and 119902119889) havesignificant difficulties to be applied directly (some of theseissues are considered in relationship with patterns) Datadriven operations (first of all 1198911 and 119902119898) can be introducedas substitution of previously introduced basic operations (seealso patterns P2 P4 and P6)

(i) epigenesis as DD-model application 1198911198891 119878 times 119872 997888rarr

119863(ii) model generation 119892119889 119878 times 119863 997888rarr 119872(iii) model quality prediction 119902119889

119898 119878 times 119872 997888rarr 119876119898

(iv) space discovery 119908119889 119878 times 119863 997888rarr 119878

Operation 119908119889 could be used within an intelligent exten-sion within selection or survival operations (1198912 and 1198913) Itbecomes especially important in case of lack of knowledge insystemrsquos structure or functional characteristics Operation 119892119889at the same time could be used as a part ofmutation operation1198914 (or initial population generation) Having this extensionwe can implement enhanced versions of EC algorithms (eggenetic algorithms evolution strategies and evolutionaryprogramming) with data-driven operations to overcome orat least to lower complex modeling issues

25 Model Management Approach and Algorithm By modelmanagement we assume operations with models withinproblem domain solution development and application Thisincludes identification calibration DA optimization predic-tion and forecasting To systematize the model managementin the presented patterns we propose an approach for explicitconsideration of spaces 119878 119863 and119872 within hybrid modelingwith EC and DD-modeling To summarize complex model-ing procedures within the approach we developed a high-level algorithm which includes series of steps to be imple-mented within a context of complex model management

Step 1 (space discovery) This step identifies the descriptionof phase space (in most cases 119878) in case of lack of knowledgeor for automation purposes For example the step couldbe applied in the discovery of system state space or modelstructure Space descriptionmay include (a) distance metrics(b) proximity structure (eg graph clustering hierarchy anddensity) (c) positioning function One of the possible ways toperform this step is an application of DM and EC algorithmto available data (see pattern P6)

Step 2 (identification of supplementary functions) Data-driven functions (Φ) are applied to work in model evolutionwith consideration of space (landscape) representation asavailable information

Step 3 (evolutionary processing of a set of models) Thisstep is described by a combination of basic EC operations(population initialization epigenesis selection mutationand survival) with supplementary functions A form ofcombination depends on (a) selected EC algorithm (b)application requirements and restrictions (c) model-basedissues (eg performance quality of surrogate models etc)

Step 4 (assimilation of updated data and knowledge) Thisstep is applied for automatic adaptation purposes and imple-ment DA algorithm DA can be applied to (a) set of models(b) EC operations (eg affecting selection function) (c) sup-plementary functions (as they are mainly data-driven) (d)phase space description (if descriptive structure is identifiedfrom changed data orand knowledge)

The steps can be repeated in various combinationdepending on an application and implemented pattern Alsothe steps are general and could be implemented in variousways Several examples are provided in the Section 3

26 Available Building Blocks of a Composite Solutions ECproposes a flexible and robust solution to identify complexmodel structures within a complex landscape with possibleadaptation towards changing condition and systemrsquos state(including new states without prior observation A signif-icant additional benefit is an ability to manage alternativesolutions simultaneously with possible switching and variouscombination of them depending on the current needs Stillwithin the task of model identification and management theEC (and also many metaheuristics) have certain drawbackswhich require additional steps to implement the approachwithin particular conditions

(i) high computational cost due to the multiple runs of amodel

(ii) low reproducibility and interpretability of obtainedresults due to randomized nature of the searchingprocedure

(iii) complicated tuning of hyperparameters for better ECconvergence

(iv) indistinct definition of genotype boundaries

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 7

(v) complicated mapping of genotype to phenotypespace

To overcome these issues the proposed approach involvestwo options First the intelligent procedures may be used totune EC hyperparameters (P5) predict features of genotype-phenotype mapping boundaries etc (P4) and discoverinterpretable states and filters (for system data and model)to control convergence and adaptation of population (P2P4 and P5) with interpretable and reproducible (throughthe defined control procedure) Second the composite modelmay use various approachesmethods and elements to obtainbetter quality and performance of the solution

(i) surrogate models (P2 P4 P5) which may increaseperformance (for example within preliminary andintermediate optimization steps)

(ii) ensemble models (P3) which may be considered asinterpretable and controllable population

(iii) interpretation and formal inference using explicitdomain-specific knowledge and results of data min-ing to feed procedures of EC and infer parameters inboth models and EC

(iv) controllable space decomposition (P6) with predic-tive models for possible areas and directions of popu-lation migration in EC to explicitly lower uncertaintyand obtain additional interpretability

Finally an essential feature of the proposed approachis a holistic analysis of a composite solution with possiblecoevolution models (submodes within a composite model)and data processing procedures

3 Application Examples

This section presents several practical examples where theproposed approach patterns or some of their elementswere applied The examples were intentionally selectedfrom diverse problem domains to consider generality ofthe approach The considered problems are developed inseparated projects which are in various stages Problem 1(ensemble metocean simulation) was investigated in a seriesof projects (see eg [11 14 15]) Within this research weare trying to extend model calibration and DA with ECtechniques to develop more flexible and accurate multimodelensembles Problem 2 (clinical pathways (CPs) modelling)is important in several ongoing project aimed to model-based decision support in healthcare (see eg [16ndash18])The proposed approach plays important role by enablingdeeper analysis of clinical pathways in various scenarios(interactive analysis of available CPs with identification ofclusters of similar patients DA in predictive modelling ofongoing cases etc) Finally Problem 3 shows very earlyresults in recently started project in online social networkanalysis

31 Problem 1 Evolution in Models for Metocean SimulationThe environmental simulation systems usually contain

several numerical models serving for different purposes(complementary simulation processes improving thereliability of a system by performing alternative results etc)Each model typically can be described by a large numberof quantitative parameters and functional characteristicsthat should be adjusted by an expert or using intelligentautomatized methods (eg EC) Alternative models insidethe environmental simulation system can be joined inensemble according to complex modeling pattern basedon evolutionary computing (a combination of P3 and P5patterns) In the current case study we introduce an exampleillustrated an ensemble concept in forms of the alternativemodels ensemble parameter diversity ensemble andmetaensemble For identification of parameters of proposedensembles (in a case of model linearity) least square methodor (in a case of nonlinearity) optimization methods can beused As we need to take into account not only functionalspaceΦ and space of parameters Ξ for a single model but alsoperform optimal coexistence of models in the system (ieΣ) evolutionary and coevolutionary approaches seem to bean applicable technique for this task It is worth mentioningthat coevolutionary approach can be applied to independentmodel realizations through an ensemble as a connectionelement In this case parameters (weights) in the ensemblecan be estimated separately from the coevolution procedurein a constant form or dynamically As a case study of complexenvironmental modeling we design ensemble model thatconsists of the SWAN (httpswanmodelsourceforgenet)model for ocean wave simulation based on two differentsurface forcings by NCEP (httpswwwesrlnoaagovpsddatagriddeddatancepreanalysishtml) and ERA Interim(httpswwwecmwfintenforecastsdatasetsarchive-data-setsreanalysis-datasetsera-interim) Thus different imple-mentations of SWAN model were connected in the form ofan alternative models ensemble with least-squares-calculatedcoefficients defining structure of the complex model Twoparametersmdashwind drag andwhitecapping rate (WCR)mdashwerecalibrated using evolutionary and coevolutionary algorithmsimplementing ΓΞ997888rarrΦ

119863997888rarr119872in P5 (for detailed sensitive analysis

of SWAN see [19]) Case of coevolutionary approach canbe represented in a form of parameter diversity ensemblewhere each population is constructed an ensemble ofalternative model results with different parameters Also wecan add ensemble weights to model parameters diversityand get metaensemble that can be identified in a frame ofcoevolutionary approach

In a process of model identification and verificationmeasurements from several wave stations in Kara sea wereused Fitness function represents the mean error (RMSE)for all wave stations For results verification MAE (meanabsolute error) and DTW (dynamic time wrapping) metricswere used

Figure 4(a) represents surface (landscape) of RMSE inthe space of announced parameters (drag and WCR) forimplementations SWAN+ERA and SWAN+NCEP It can beseen that the evolutionary-obtained results converge to theminimum of possible error landscape The landscape wasobtained by starting the model with all parameters variants

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

8 Complexity

RMSE

(m)

2

4

6

8

1 2 3

10

Drag

log(WCR)

0 minus5 minus10 minus15 minus20

NCEPERA

(a)

Log(RMSE)

minus03

minus02

minus01

0

16

Drag WCR4e-5

coNCEPcoERA

3e-52e-5

1e-5

coEnsemble

14 12 1 08 06

01

02

(b)

Figure 4 Metocean simulation (a) error landscape for wave height simulation results using ERA and NCEP reanalysis as input data and (b)Pareto frontier of coevolution results for all generations

Generation

RMSE

(m)

2 3 4 5 6 7 8 9 10 11 12

08

10

12

14

16

18

Figure 5 Coevolution convergence of diversity parameters ensemble for metocean models

from full 30x30 grid (ie 900 runs) while evolutionaryalgorithm was converged in 5 generations with 10 individ-uals (parameters set) in population (50 runs) that allowsperforming identification two orders faster The convergenceof co-evolution for SWAN+ERA+NCEP case is presented inFigure 5

Although error landscapes for a pair of implementationsSWAN+ERA and SWAN+NCEP are close to each other sep-arated evolution does not consider optimization of ensembleresult For this purpose we apply coevolutionary approachthat produces the set of Pareto-optimal solutions for eachgeneration Figure 4(b) shows that the error of each model in

the ensemble is significant (coNCEP and coERA for modelsalong) but the error of the whole ensemble (coEnsemble)converges to minimum very fast

Obtained result can be analyzed from the uncertaintyreduction point of view Model parameters optimizationhelps to reduce parameters uncertainty that can be estimatedthrough error function But when we apply an ensembleapproach to evolutionary optimized results it is suitableto talk about reduction of the uncertainty connected withinput data sources (NCEP and ERA) as well Moreovermetaensemble approach allowed reduction of uncertaintyconnected with ensemble parameters

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 9

Figure 6 Graph-based representation of processes space in healthcare (interactive view) (Demonstration available at httpswwwyoutubecomwatchv=EH74f1w6EeY)

Summarizing results of the metocean case study we candenote that EC approach shows significant efficiency up to120 times compared with grid search without accuracy lossesAccording to this experimental study quality of ensemblewith evolutionary optimized models is similar to results ofthe grid search and MAE metric is equal to 024 m andDTWmetric ndash 51 Also we can mention that coevolutionaryapproach provides 10 accuracy gain compared with resultsof single evolution of model implementations but this isstill similar to ensemble result with evolutionary optimizedmodels Nevertheless coevolutionary approach allowed toachieve 200 times acceleration Within the context of theproposed approach space Φ were investigated using definedstructure of the model in space Σ for the purpose of modelcalibration

32 Problem 2 Modeling Health Care Process Modelinghealthcare processes are usually related to the enormousuncertainty and variability even when modeling single dis-ease One of the ways to identify a model of such processis PM [20] Still direct implementation of PM methodsdoes not remove a major part of the uncertainty Withincurrent research we applied the proposed approach foridentification purposes both in the analysis of historicalcases and prediction of single process development Here weconsider processes of providing health care in acute coronarysyndrome (ACS) cases which is usually considered as one ofthe major death causes in the world We used a set of 3434ACS cases collected during 2010-2015 in Almazov NationalMedical Research Centre one of the leading cardiologicalcenters in Russia The data set contains electronic healthrecords of these patients with all registered events andcharacteristics of a patient

To simplify consideration of multidimensional space ofpossible processes (Γ1015840Ξ997888rarrΣ

119863997888rarr119878ΓΞ119878997888rarr119863

for analysis of Σ on layer 119878)

we introduced graph-based representation of this space withvertices representing cases and edges representing proximityof cases Analysis of such structure enables easy discoveringof common cases (eg as communities in graph) Suchdiscovering enables explicit interpretable structuring of thespace and representation of further landscape for EC in termsof P6 pattern Moreover direct interactive investigation ofvisual representation of such structure (see Figure 6) providessignificant insights for medical researchers

We have developed evolutionary-based algorithm forpatterns identification and clustering in such representationwith two criteria to be optimized (see Figure 7) Hereprocesses were represented by a sequence of labels (symbols)denoting key events in PM model Typical patterns werethen selected for Pareto frontier The convergence process isdemonstrated in Figure 8 (10 best individuals from Paretofrontier according to the integral criterion were selected) Asa result this solution may refer to P5 pattern and operatorΓΞ997888rarrΣ119863997888rarr119872

while discovering model structure Figure 9 shows anexample of typical process model (ie structural characteris-tic of the model) for one of the identified clusters Detaileddescription of the approach algorithms and results on CPsdiscovering clustering and analysis including comparison ofthree version of CP discovery algorithms with performancecomparison can be found in [10] An important outcomeof the approach being applied in this application is inter-pretability of the clusters and identified patterns For example10 clusters and corresponding CPs obtained interpretationby cardiologists from Almazov National Medical ResearchCentre The obtained interpretation and further discoveringand application with CP structure are presented in [17]Another important benefit given by such space structurediscovering is lowering uncertainty of patientrsquos treatmenttrajectory by a hierarchical positioning of an evolved process(selection of a cluster and selection of position withinthe cluster) For example discrete-event simulation model

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

10 Complexity

00 02 04 06 08 10 Number of non-arranged sequences (normed)

12

35

30

20

15

10

25

05

00

minus05

Leng

th o

f pat

tern

(nor

med

)

0

0

1

1

2

2

3

3

4

4

5

5

AFIFNEAFNIFEIFDNEAFENIFEENIFEDNFIEFEAFNIFEDINIFENDDFFEIDNFEEDFIFEAEFNINFDEDFNEIDFDFNEFDIEDNFNEIFDDNIEFFDEAFANIFADIFEFNIEFDEDEINFDEFDEDFFDNIFENDFIEIDAFEEDEIFEDNA

Figure 7 Pareto frontier for CP patterns discovery

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300

Generation

10

08

06

04

02

Nor

med

sum

erro

r

Figure 8 Evolutionary convergence during CP pattern discovery

Entrance289 AD

289Discharge

died13 transferred9

home119 cure138OR

170

IC104

CC13

CD2

AD OD

4

CD

48

IC225

104 12

1

12

2 9

CD1111

230

2 39

14

172

Cluster 7

Figure 9 Example of process model showing transfers between hospitalrsquos departments

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 11

Cluster 0Cluster 1Cluster 2

Cluster 3Cluster 4

Cluster 5Cluster 6

0 2 4 6 8 10 12 14

50

100

150

200

250

0

Step

Dist

ance

sym

bols

(a)

Historical CPs Synthetic CPs Target CP

(b)

0

20

40

60

80

100

Num

ber o

f syn

thet

ic C

Ps

Step0 1 2 3 4 5 6 7 8 9

(c)

Step0 1 2 3 4 5 6 7 8 9

o

f CPs

in co

rrec

t clu

ster

100

90

80

70

60

50

40

30

20

(d)

Figure 10 Evolution of synthetic CPs (a) CP population convergence (b) evolution of possible CP (demonstration available at httpswwwyoutubecomwatchv=twvfX9zKsY8) (c) number of synthetic CPs (d) of CPs in correct cluster

described in [17] provides a more appropriate length of staydistribution within simulation with discovered classes ofCPs (Kolmogorov-Smirnov statistics decreased by 51 (from0255 to 0124)

Furtherly we propose an algorithm to dynamically gen-erate possible development of the process in healthcare usingidentified graph-based space representation with evolution-ary strategies assimilating incoming data (events) within acase (Γ1015840Σ

119872997888rarr119863in P5 and ΓΞ997888rarrΣ

119863in P2) We consider conver-

gence (Figure 10(a)) of the introduced synthetic continuationof the processes to the right class (identified clusters oftypical cases were used) with mapping to the graph-basedspace representation with proximity measures (Figure 10(b))As a result the appearance of the CPrsquos events decreasesthe number of synthetic CPs and increases percentage ofCPs positioned in the correct cluster (see an example inFigure 10(c) and Figure 10(d) correspondingly) This enablesinterpretable positioning and uncertainty lowering in pre-dicting further CPrsquos development for a particular patient

Here a combination of patterns P2 P5 and P6 in theimplementation of the proposed algorithm (see Section 25)enables interactive investigation of processes space and dataassimilation into a population of possible continuations ofa single process during its evolving This solution can beapplied in exploratory modeling and simulation of patientflow processing as well as decision support in specializedmedical centers

33 Problem 3 Mining Social Media Nowadays socialmedia analysis (that began with static network modelsemphasizing a topology of connections between users) strivesto explore dynamic behavioral patterns of individuals whichcan be recovered from their digital traces on the web Theprediction of social media activities requires combininganalytical and data-driven models as well as identifying theoptimal structure and parameters of these models accordingto the available data Herewe show an example of the problemin this field involving evolutionary identification of a model

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

12 Complexity

1

01

001

0001

100

Frac

tion

of u

sers

(log

)8 9

12 3 4 5 6 7 89 2 3 4 5 6 7 8 9

10 100

Activity count (log)

RepostPostComment

Figure 11 Distribution of posts reposts and comments on personal walls of subscribers of bank community

Post Repost50

Comment8End

28

1

37

Post124

2

5 Repost

Post

6Entrance198

86

112

4

24

9651

47 Comment

3

21

2

Repost30

1

3Post

1 19

1411

Repost

4 4

Cluster 2

(a)

Entrance1156

Post

905 Repost

251

330125809

Comment1450End

753

5802

4249

124350

1305216 770

53

Cluster 1

(b)

Figure 12 Example of process model (a) with expanded cycles and (b) with collapsed cycles

A digital trace of a user in an online social network (OSN)is a sequence (chain) of observed activities separated withtime gaps Each OSN supports different types of ldquohiddenrdquoand observable activities For example in a largest Russiansocial network vkcom (further is denoted as VK) a userhas a personal page (wall) with three types of activitiespost (P)mdashwhen a user makes a record by himself repost(R)mdashwhen the user copies the record of another user orcommunity to his or her wall and comment (C)mdashwhenthe user comments the record on his or her wall Figure 11illustrates the distribution of these activities for subscribers oflarge Russian bank community in VK The collected dataset

consists of 100 (or less if unavailable) last entries (posts orreposts) and comments for the entries for 8K user walls ina period January 2017ndashDecember 2017 Comments are muchless common than posts and reposts The distributions ofthe posts and reposts are similar but there is a group ofldquospreadersrdquo with a significant number of reposts

We applied the technique described in Section 32 toanalyze the processes Still the considered process has sig-nificantly different structure By default it is continuous withrandom repetition of events while healthcare process in ACScases has finite andmore ldquostrongrdquo structure Figure 12 shows atypical process structure identified with EC-based approach

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 13

CPP152

PPR

187

PRR

79

CRR

82

CCP

56

CCR34

213

1994

87

PPP

1054

660

160

122

67953701

2195

1168

3705

50512

66

2244

2180

RRR

989

49

484654

CCC

736

25056

601

274549

48

60

1013

1999

Cluster 3 1120 users

228

CPR

Figure 13 Example of process model for cluster 3 using n-gram analysis

Table 1 Mean of activitiesrsquo combinations for usersrsquo clusters

Cluster Size CCC CCP CCR CPP CPR CRR PPP PPR PRR RRR1 5238 052 074 024 1 072 042 668 64 756 8532 2110 011 013 016 016 036 059 209 395 1232 6033 1120 091 138 014 362 075 018 5002 1178 5 277

and visualized with expanded cycles (a) and with collapsedcycles (b) The second one could be considered as morerelevant than the first one which is significantly affected bya length of selected history It is natural to consider it as arandom process or state-transition model In that case threeidentified clusters (characterized by various frequencies oftransitions) could be interpreted as typical behavior models

N-grams analysis is often used to detect patterns in peo-plersquos behaviors [21 22] N-grams analysis is based on countingfrequencies of combinations or sequences of activities Wecollected all sorted 3-grams (so called 3-sets) for each userrsquossequence to analyze the frequency of event combinationsAs a result three clusters of vectors with 3-sets chains wereidentified with k-means clustering method Figure 13 showsall combinations and transitions between them for cluster 3as an example Using Figure 12 and Table 1 it is possible to seethat cluster 3 includes users who oftenmake new records (P)and sometimes comment records (C) So cluster 3 mostlyconsists of ldquobloggersrdquo Cluster 2 includes ldquospreadersrdquo whocopy other records (R) frequently And the biggest cluster1 consists of people who make new records and copy otherones equally but less intensively comparing to other clusters

That may be considered as a typical behavior for user ofOSN N-grams analysis allows detecting typical behavioralpatterns and obtaining process models for social mediaactivities using chains of different lengths as input data Thusthis type of data-driven modeling is more appropriate toresearch continuous processes Figure 14 shows a graph-basedrepresentation of process space with of all usersrsquo patterns

This subsection provides very early results Next stepwithin application of the proposed approach in this appli-cation includes an extension of process model structure (a)with temporal labeling (gaps between events) (b) consideringprocess within a sliding time window to get more structuredprocesses (c) linking the model with causal inference (d)introduction ofDM techniques for EC positioning of ongoingprocesses in model space We believe that these extensionscould enhance discovery of model structure (P4) and providedeeper insight on social media activity investigation

4 Conclusion and Future Work

Thedevelopment of the proposed approach is still an ongoingproject We aimed for further systematization and detailing

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

14 Complexity

Cluster 3

Cluster 1

Cluster 2

Figure 14 Graph-based representation of processesrsquo space for three clusters in social media activity

of the proposed concepts methods and algorithms as wellas more comprehensive and deeper implementation of EC-based applications Further work of the development includesthe following directions

(i) dualization on the role of data-driven and intelligentoperations in proposed approach and described pat-terns

(ii) extended analysis of various EC techniques applicablewithin the approach

(iii) investigation on EC-based discovery for models ofcomplex systems with lack or inconsistent observa-tions

(iv) detailed formalization of expertise and knowledge-based methods within the approach

(v) extending the approach with interactive user-centered modelling and phase space analysis

(vi) development of multilayered approach for decisionsupport and control of system and process 119878 availabledata 119863 and complex model119872

Data Availability

The data used in the examples presented in Section 3 wereinitially obtained within complimentary projects performedby the authors of the paper The data is available from thecorresponding author upon request after explicit claiming ofthe purpose and plan of requested data usage to check forpossible violation of the corresponding projectsrsquo rules

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This paper presents an extension and further developmentof the work [23] This research is financially supported byThe Russian Scientific Foundation Agreement 14-11-00823(15072014)

References

[1] N Boccara Modeling complex systems Graduate Texts inPhysics Springer New York Second edition 2010

[2] HMcManus and D Hastings ldquoA framework for understandinguncertainty and its mitigation and exploitation in complexsystemsrdquo IEEE Engineering Management Review vol 34 no 3pp 81ndash94 2006

[3] W Walker P Harremoes J Rotmans et al ldquoDefining uncer-tainty a conceptual basis for uncertainty management inmodel-based decision supportrdquo Integrated Assessment vol 4no 1 pp 5ndash17 2003

[4] J Yan and J R Deller ldquoNARMAXmodel identification using aset-theoretic evolutionary approachrdquo Signal Processing vol 123pp 30ndash41 2016

[5] I G Kevrekidis C W Gear and G Hummer ldquoEquation-freeThe computer-aided analysis of complex multiscale systemsrdquoAIChE Journal vol 50 no 7 pp 1346ndash1355 2004

[6] H Ihshaish A Cortes and M A Senar ldquoParallel Multi-level Genetic Ensemble for Numerical Weather PredictionEnhancementrdquo Procedia Computer Science vol 9 pp 276ndash2852012

[7] G Dumedah ldquoFormulation of the Evolutionary-Based DataAssimilation and its Implementation in Hydrological Forecast-ingrdquo Water Resources Management vol 26 no 13 pp 3853ndash3870 2012

[8] V V Kashirin A A Lantseva S V Ivanov S V Kovalchuk andA Boukhanovsky ldquoEvolutionary simulation of complex net-worksrsquo structures with specific functional propertiesrdquo Journal ofApplied Logic vol 24 no part A pp 39ndash49 2017

[9] S V Kovalchuk P A Smirnov KV Knyazkov A S Zagarskikhand A V Boukhanovsky ldquoKnowledge-Based Expressive Tech-nologies Within Cloud Computing Environmentsrdquo in PracticalApplications of Intelligent Systems vol 279 of Advances inIntelligent Systems and Computing pp 1ndash11 Springer BerlinGermany 2014

[10] A A Funkner A N Yakovlev and S V Kovalchuk ldquoTowardsevolutionary discovery of typical clinical pathways in electronichealth recordsrdquo Procedia Computer Science vol 119 pp 234ndash244 2017

[11] S V Kovalchuk and A V Boukhanovsky ldquoTowards EnsembleSimulation of Complex Systemsrdquo Procedia Computer Sciencevol 51 pp 532ndash541 2015

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 15: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Complexity 15

[12] S V Kovalchuk A V Krikunov K V Knyazkov and A VBoukhanovsky ldquoClassification issues within ensemble-basedsimulation application to surge floods forecastingrdquo StochasticEnvironmental Research and Risk Assessment vol 31 no 5 pp1183ndash1197 2017

[13] D Fogel ldquoPhenotypes genotypes and operators in evolution-ary computationrdquo in Proceedings of the 1995 IEEE InternationalConference on Evolutionary Computation p 193 Perth WAAustralia

[14] S V Ivanov S V Kovalchuk and A V BoukhanovskyldquoWorkflow-based Collaborative Decision Support for FloodManagement Systemsrdquo Procedia Computer Science vol 18 pp2213ndash2222 2013

[15] A Gusarov A Kalyuzhnaya and A Boukhanovsky ldquoSpatiallyadaptive ensemble optimal interpolation of in-situ observationsinto numerical vector field modelsrdquo in Proceedings of the6th International Young Scientist Conference on ComputationalScience YSC 2017 pp 325ndash333 Finland November 2017

[16] A V Krikunov E V Bolgova E Krotov T M AbuhayA N Yakovlev and S V Kovalchuk ldquoComplex data-drivenpredictive modeling in personalized clinical decision supportfor Acute Coronary Syndrome episodesrdquo in Proceedings of theInternational Conference on Computational Science ICCS 2016pp 518ndash529 USA June 2016

[17] S V Kovalchuk A A Funkner O G Metsker and A NYakovlev ldquoSimulation of patient flow in multiple healthcareunits using process and data mining techniques for modelidentificationrdquo Journal of Biomedical Informatics vol 82 pp128ndash142 2018

[18] A Yakovlev O Metsker S Kovalchuk and E BologovaldquoPrediction of in-hospital mortality and length of stay in acutecoronary syndrome patients using machine-learningmethodsrdquoJournal of the American College of Cardiology vol 71 no 11 pA242 2018

[19] A Nikishova A Kalyuzhnaya A Boukhanovsky and A Hoek-stra ldquoUncertainty quantification and sensitivity analysis appliedto the wind wave model SWANrdquo Environmental Modelling ampSo13ware vol 95 pp 344ndash357 2017

[20] E Rojas J Munoz-Gama M Sepulveda and D CapurroldquoProcess mining in healthcare A literature reviewrdquo Journal ofBiomedical Informatics vol 61 pp 224ndash236 2016

[21] T Sinha P Jermann N Li and P Dillenbourg ldquoYour clickdecides your fate Inferring Information Processing and Attri-tion Behavior from MOOC Video Clickstream Interactionsrdquoin Proceedings of the EMNLP 2014 Workshop on Analysis ofLarge Scale Social Interaction in MOOCs pp 3ndash14 Doha QatarOctober 2014

[22] C Marceau ldquoCharacterizing the Behavior of a Program UsingMultiple-Length N-Gramsrdquo Defense Technical InformationCenter 2005

[23] S V Kovalchuk O G Metsker A A Funkner et al ldquoTowardsmanagement of complex modeling through a hybrid evolu-tionary identificationrdquo in Proceedings of the the Genetic andEvolutionary Computation Conference Companion pp 255-256Kyoto Japan July 2018

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 16: A Conceptual Approach to Complex Model Management with ...downloads.hindawi.com/journals/complexity/2018/5870987.pdf · Parameters Functional characteristics Structure S D M Layers

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom