Chris Martin 1,2 , Mo Haji 2 , Peter Dew 2 , Peter Jimack 2, Mike Pilling 1

Preview:

DESCRIPTION

Semantically Enhanced Model Experiment Evaluation Process (SeMEEP) within the Atmospheric Chemistry Community. Chris Martin 1,2 , Mo Haji 2 , Peter Dew 2 , Peter Jimack 2, Mike Pilling 1 1 School of Chemistry, University of Leeds 2 School of Computing, University of Leeds. - PowerPoint PPT Presentation

Citation preview

Semantically Enhanced Model Experiment Evaluation Process (SeMEEP)

within the Atmospheric Chemistry Community

• Chris Martin 1,2, Mo Haji 2, Peter Dew 2, Peter Jimack 2, Mike Pilling 1

• 1 School of Chemistry, University of Leeds

• 2 School of Computing, University of Leeds

2

Outline of the Presentation

• Introduction

• Atmospheric community

• SeMEEP

• ELN Provenance capture

• Conclusion and next stage

3

Section 1 Overview

• Application domain – atmospheric community

– Reliance on computational models to evaluate data

• Motivation

– Study how to transition from today's ad-hoc processes practises

– Sustainable process of

• Gathering, community evaluation and sharing data & models between scientists

• Minimising changes to proven working practises of the scientist

• Within world-wide co-laboratories

4

Related projects

• CombeChem– Experimental organic chemistry– From source to long term data – perseveration (knowledge)– Semantically-enabled ELN– Data-driven workflow

• Collaboratory for Multi-Scale Chemical Science– Multi-layer chemical model

• myGrid– Bio-informatics and related areas (semantic pattern matching– Reusable semantic workflow using SMD (semantic metadata)– Data Quality

• Karama2– Weather forecasting – computation modelling– Data-driven workflow

Add

Sample

chem1 chem2

Quantum Thermo Kinetic Mechanism Reacting Flow

Chemistry Chemistry Simulation

5

Section 2 Atmospheric Chemistry

• Seeks to understand the chemical processes (reactions) taking place in the lower atmosphere (e.g. smoke)

• It has significant implication for both:

– Air Quality

– Climate Change

6

The Master Chemical Mechanism (MCM)

• Data repository of elementary chemical reactions & rate constants

• The mechanism is described by a computational model that is evaluated against experimental data

– Chamber experiments

– Field experiments

27.11.06 Methyl Glyoxal

0

20

40

60

80

100

120

140

0 5000 10000 15000 20000 25000 30000 35000 40000

time/ s

MG

LY

OX

/ pp

bv

MCMv3.1

measured (calibrated using isoprene)

7

Section 3 SeMEEP

• Today

– Typically within the atmospheric chemistry community the provenance is recorded in an ad-hoc, unstructured fashion, using a combination of traditional lab-book, word processing documents and spreadsheet.

• Move to more sustainable evaluation process supports the gathering, evaluation and sharing of data and models

• Using semantic metadata

8

Laboratory Database (s)

Shared Community Semantic Database

CommunityEvaluation(people)

Scientist (s) with personal ELN

SeMEEEP

Com Data manager

Datamanager

Public Database (s)

Datamanager

SeMEEP Vision

• SeMEEP semantically-enabled MEEP

– Supports the organisation of information but critically, records its provenance (say to recover secondary data)

Mike Pilling : “SeMEEP approach will radically enhance the effectiveness of a research community to deliver new science“

10

Raw Data

Metadata

Publication

Metadata

Process DataE.g. k(T, p)

ELN

Community evaluation

(subjective)

May be partial information

PhysicalExperiment

AnalysisProcess

HistoricalData

Theory(e.g. quantum

mechanic)

IUPAC (kinematic, Int. Union of

pure and applied chemistry

From other labs

Requirements for metadata capture for elementary reactions

•Only published data•Rate constants from several labs•No access to the raw data•No access to secondary data•SeMEEP will provide this.

11

Current Evaluation Processes for the MCM

12

Envisioned Evaluation Processes

LaboratoryArchiveCommunity Semantic Database

Inputs to the modelling process:Benchmark data

Model parameter sets etc.

Scientist’s Personal ELN Archive

Workgroup database

ELN Capture of the Model Development Provenance

Model Development

Model ExecutionAnalysis

Links to experimental dataand provenance generation

processes

Data sources

Community EvaluationSubjective

SeMEEP

Semantic-enabled

ELN

13

Section 4 Electronic Lab-Books (ELNs)

• ELNs address the limitations of the current methods of provenance capture.

• Southampton ELN for organic chemistry experiments.

• Benefits to the modeller

• Modelling process can be automatically captured

• Searchable

• Remote access is possible

• Provenance is structured

• Possible to use resolvable references to resources

14

Will User attach quality metadata?

• Motivate users:

– By demonstrating the value of provenance in their day-to-day work

• Writing publication

• Managing their data

• Reinterpretting the data.

– Management

– Publishers

16

The Modelling Process - A Three Layer Mapping

ExperimentExperiment

PlanExperiment Conclusions

Modelling Iteration

Iteration Plan

Iteration Conclusion /

Plan for Iteration n + 1

Modelling Iteration

Model Development

Model Parameters

Model Output

Model Execution Analysis

Iteration Plan

· Model Source code

· ……...

· Model Output Data from previous iterations

· External Data Sources· ……...

Experiment Layer

Modelling Iteration

Layer

Modelling Layer

Iteration Conclusion /

Plan for Iteration n + 1

Iteration Conclusion /

Plan for Iteration n + 1

Model Parameters

Iteration Conclusion /

Plan for Iteration n + 1

17

MCM Mechanism being investigated

18

Modelling Plan

Ontology

Compare to generate metadata

Mechanism Editing Model Execution Model Output Analysis

Mechanism version n

Mechanism version n-1

Scientific Process

Automatic Metadata Capture

Planning the

Scientific Process

User Annotation

Metadata Storeage

Metadata Storeage

Capture Metadata at run time

ELN Process

19

ELN Screenshots

• Prompts displayed when changing the changing the chemical mechanism;

• Editing a reaction

• Adding a new reaction

20

ELN Screenshots

21

ELN Modelling SMD Architecture

SMD creation(e.g. Data driven

workflow)

Context ontology(e.g. materials/

process)

3-level scientific services (model dev; execution; analysis)

Data Storage (SMD, Model Output

& Analysis)

SMD Middleware Services(e.g. ontology. services, query etc

SMD Modelling sub-system

SemanticMetadata

level

Grid Fabrics

User Interface

Workflow constrictor Annotation interface Database Query & Retrieval

DL-based reasoner

Simulation server

22

Evaluation Methodology

• In-depth interviews with members of the atmospheric chemistry model group here at Leeds, covering:

– Demonstration of the prototype

– User testing of the prototype

– Discussion of scenarios involving the use of the prototype (e.g. )

• Analysis

– Interviews recorded and transcribed

– Analysed using techniques from grounded theory

23

Evaluation

Barriers to adoption:

– Effort required at modelling time for provenance capture

• “[in] your lab book you can write down what ever you want [but with an ELN] it is going to take time to go through the different protocol steps”.

– When asked if they would use an ELN requiring a similar amount of user input to the prototype the response was positive:

• “Yeah, I think it would be a good thing. I don’t think it is too much extra … work.”

– Rather than viewing the prompts for user annotation as interruption to their normal work the user recognised the value of being prompted

• “is a good way to do it because otherwise you won’t [record the provenance].”

24

Evaluation

• Users intuitively grasped the benefits of recording provenance with an ELN and that the benefits would be realised after the time of modelling by a number of stakeholders:

– “if someone else wants to look at … [your provenance], that’s great because the person can see exactly what you have done, where you have been and where to go next. And for yourself, if you are writing up a PhD ... [you can] … see exactly what you’ve done whereas currently you have to rifle through lab-books to see exactly what you have done.”

25

Section 5 Conclusions and future work

• Outlined SeMEEP and ELN

– User evaluated proposed modelling ELN

• Addressed case studies

– IUPAC

– MCM

• Developing a case study with the Geomagnetic community

• User and System issues

– Application of actively theory to capture requirements and user evaluation

– Querying and inference

– Address QoS issues (e.g. security, scalabilty, dynamic roles-based access control)

26

Questions

Recommended