39
PaNdata Barcelona Meeting Welcome 17-18 September 2009

PaNdata Barcelona Meeting Welcome

  • Upload
    moses

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

PaNdata Barcelona Meeting Welcome. 17-18 September 2009. Agenda. Thursday Ongoing activities everything except the proposal Friday Proposal. Thursday. 11:00 - 13:00 First Session Data Policy - next steps (Michael Wilson – applogies ) Standards (Mark Koennecke, - PowerPoint PPT Presentation

Citation preview

Page 1: PaNdata Barcelona Meeting Welcome

PaNdataBarcelona Meeting

Welcome 17-18 September 2009

Page 2: PaNdata Barcelona Meeting Welcome

Agenda

• Thursday– Ongoing activities

• everything except the proposal

• Friday– Proposal

Page 3: PaNdata Barcelona Meeting Welcome

Thursday

11:00 - 13:00 First Session• Data Policy - next steps (Michael Wilson – applogies)• Standards (Mark Koennecke,

– ESRF use of Nexus (Armando Solé )13:00 - 15:00 Lunch15:00 - 17:00 Second Session• ICAT - feedback from developers meeting and plans

(Tom Griffin)• Software catalogue - status and plans (Jean-Francoise

Perrin)

Page 4: PaNdata Barcelona Meeting Welcome

Friday09:30 - 11:00 Session Three• Website and Wiki (Micheal Gleaves)• Review Actions• I3 or CSA

– Line 1.2.3 “Virtual Research Communities” – Line 3.3 “Coordination Actions...”

• Scope for proposal (What’s in scope and what’s out)11:00 - 11:30 Coffee Break11:30 - 13:00 Session Four• Activities/Workpackages• Preparing the document• Review actions13:00 - 14:30 LunchDepart.

Page 5: PaNdata Barcelona Meeting Welcome

PolicyStatus and next steps

Mgmt Support Users

Implementation

Page 6: PaNdata Barcelona Meeting Welcome

Policy Framework discussion1. Issue current draft (1 Oct)2. Agree amongst ourselves (15 Oct) (Michael W)3. Informal pass through management4. Consultation with in-house scientists5. Consultation with users?6. Revise 7. Goto 2

Page 7: PaNdata Barcelona Meeting Welcome

Policy Framework issuesdo variables (eg embargo period) have to be

standardised to ensure fairness?Might it be necessary to have different embargo

periods for different experiments?Could user define there own embargo period as

part of the application/evaluation?Put the variables in a table separate from the text

- State the principle that we are working towards common numbers

Page 8: PaNdata Barcelona Meeting Welcome

Nexus issues• How do we break down the hurdles into small steps?• Nexus person

– What will they do?– Can we get someone/somewhere?

• Can applications use nexus?• Synchr person joining NIAC?• Mark to: (Christmas latest, probabaly 31 Oct)

– Produce meta plan of steps to getting a nexus person– Produce job description for nexus person

• Pandata Data formats “developers” workshop– Presentations on

• Needs of synchrotrons (armando)• Nexus

– (Mark, Freddie and Armando to organise?),

• Larger meeting?– open (<30 people), @PSI? Spring 2010

Page 9: PaNdata Barcelona Meeting Welcome

ICAT status and plans

• Monthly releases of ICAT• Put info on wiki about Identity Mgmt project

(Rudolf)

Page 10: PaNdata Barcelona Meeting Welcome

SW CAT• Catalogue for neutron fairly well populated• Not much for synchrotrons• Catalogue could be completed

– Everyone to provide info on s/w (15 Oct)– And licenses to Mark/Jean-F (15 Oct)

• But not much more likely without manpower• Forge exists for pandata (forge.ill.eu)• Need to register for pan-data.eu• Do we need it?• Joint license negotation (eg MatLab).• Disussions on alternatives eg skylab (free clone)

Page 11: PaNdata Barcelona Meeting Welcome

Web/Wiki

• Words for public part of wiki• Contact freddie about domain name• Michael G 15 Oct

Page 12: PaNdata Barcelona Meeting Welcome

Preparing a proposal

• Which line• Which type of project• What scope• What activities/workpackages• Preparing the document

Page 13: PaNdata Barcelona Meeting Welcome

Which line: 1.2.3 or 3.3?INFRA-2010-1.2.3: Virtual Research Communities• Enable an increasing number of users and research communities from all science and

engineering disciplines to access and use e-Infrastructures• Remove the constraints of distance, access and usability as well as the barriers

between disciplines for a more effective scientific collaboration and Innovation• Deployment of e-Infrastructures in research communities to enable multidisciplinary

collaboration• Deployment of end-to-end e-Infrastructure services and tools for integrating and

increasing research capacities• Build user-configured virtual research facilities and test-beds from collection of

diverse resources• Integrate and interlink regional e-InfrastructuresThe deployment and further evolution of e-Infrastructures addressing the researchinfrastructures of the ESFRI-roadmap is particularly encouraged.Combination of Collaborative projects and Coordination and Support Actions (CP-CSA) (I3)EUR 23 MillionOversubscribedNot specifically data centric

Page 14: PaNdata Barcelona Meeting Welcome

Which line: 1.2.3 or 3.3?INFRA-2010-3.3: Coordination actions, conferences and studies supporting

policy development, including international cooperation, for e-Infrastructures

• Enhance coordination between national and pan-European e-Infrastructure initiatives and programmes

• Strengthen the innovation potential and impact of e-Infrastructures• Establish a new e-Infrastructures scientific software strategy in Europe in

order to reinforce the global position of Europe• Coordinate a European eco-system of scientific data repositories

(preservation and sharing)• Specific studies on e-Infrastructure related topics• Dissemination of information on the e-Infrastructure programme and projects

International cooperation, including:– Further extension of e-Infrastructures to International Cooperation Partner

countries (ICPC)– Joint roadmapping of activities with developed countries– Promotion of the interoperation between similar infrastructures on the global

scale

EUR 10 MCoordination and Support Actions – CA or SA

Page 15: PaNdata Barcelona Meeting Welcome

CSACoordination and Support Actions (CSA)Support Measures• Networking• Coordination or support actions

– (CSA-CA or CSA-SA)• Management of the consortium

Financial model• reimbursement of indirect costs limited to 7% of the direct costs

– (less subcontracting and third party contribution) for all participants

Page 16: PaNdata Barcelona Meeting Welcome

CA or SA

Coordination actions are designed to promote and support the networking and co-ordination of research and innovation activities (projects) at national, regional, European or international level over a fixed period

(at least 3 entities)

Support actions are designed to complement the other FP7 funding schemes. For example, they:

• underpin the implementation of the programme• help in preparations for future Community research and technological

development policy activities• stimulate, encourage and facilitate the participation of SMEs, civil society

organisations, small research teams, newly developed and remote research centres, as well as setting up research clusters across Europe

• Cover one-off events or single purpose activities• (at least one entity)

Page 17: PaNdata Barcelona Meeting Welcome

SA or CAFunding Scheme Purpose • Support to research activities and policies • Coordination of research activities and policies “Target ”audience

Infrastructure operators, End-users (researchers in all fields of science and Engineering) Research institutes, Universities, Industry, including SMEs

Activities covered by EU contribution• Conferences, seminars, workshops, working groups, studies, fact finding, monitoring,

strategy development, awards and competitions, working or expert groups, operational support and dissemination, information and communication activities

• Networking, coordination and dissemination activities • Management of the consortiumForm of reimbursement• Based on eligible cost unless other forms are foreseen in the work programmeAverage duration • Between 9 and 30 months • Between 18 and 30 monthsEnlargement of partnership within the initial budget

– NASpecific characte ristics• No funding of research, development or demonstration • Normally focused on one specific activity and often one specific event. • Possibility of one single participant • In FP6, SA typically had 1- 15 participants and total EC contribution of 0.3- 3 Meuro • In FP6, CA typically had 13-26 participants and total EC contribution of 0.5-2MEuro

Page 18: PaNdata Barcelona Meeting Welcome

CSA Evaluation Criteria

Scientific and technical quality• Soundness of concept, and quality of objectives • Contribution to the coordination of high quality research (CA only)• Quality and effectiveness of the coordination/support action mechanisms and associated

workplan

Implementation• Appropriateness of the management structures and procedures• Quality and relevant experience of the individual participants• Quality of the consortium* as a whole (including complementarity, balance)• Appropriateness of the allocation and justification of the resources to be committed

(budget, staff, equipment)

Impact• Contribution at the European or international level to the expected impacts listed in the

workprogramme under the relevant activity• Appropriateness of measures for spreading excellence, exploiting results and

disseminating knowledge through engagement with stakeholders and the public at large

Page 19: PaNdata Barcelona Meeting Welcome

Scope/Activities• Policy• Standards

– Nexus– Authentication?– Any others?

• Data catalogue?• Data virtualisation?• Software catalogue?• Publication catalogue?• Analysis (remote/parallel/ integration)?

Page 20: PaNdata Barcelona Meeting Welcome

Networking Coordination Dissemination Mgmt

Policy Integration

Standards Events Events

Users Integration

Data Catalogue

Data virtualisation

Software cat

Software integration? Strategy/Roadmap

Publications ?

?

Page 21: PaNdata Barcelona Meeting Welcome

• Roadmap with big vision• End-to-end integration of data pipeline

– “From application to publication”– Goals

• Support users doing analysis– Federated analysis services– Standardiased software (accessibilty, multiuse, licenses)– Open access to software– Audit trail, redo, provenance

• Multiuse of data– Exchange of data/presevere

• Better quality– Quicker (real-time) feedback

• Efficiency– Virtualisation of hw/os to minimise dependencies

– Needs progress on• Policy• Formats• Data volume estimates• Large data sets in te future• Long term preservation/access/combination

– Control systems – Common user i/f – different underneath for instrument scientist.– Proposal systems? – no!

Page 22: PaNdata Barcelona Meeting Welcome

Analysis– Objective

• Analysis in place, in real-time – feedback experimental tuning

– Current situation• exists in some areas - Consultation with users what is needed in other areas• Tools for beamline staff and tools for users (visulaisation – instant feedabck

- for diagnosisi of beamline) - Common data format• On-line analysis eg for PX or Tomography –

– Next step• “Standardised” software service

– For ease for users– for efficiency of providers

– Integration of simulation and experimentation– Presenation (Stephan)

Page 23: PaNdata Barcelona Meeting Welcome

Data formats• Common data format

– Preservation (rerunning onld analysis)• Metadata on format version and sw version• format version control • Sw vc and preservation

– Interoperability• Converters• Adapt applications (what about new sw?)• Cost benefit analysis of adapting software

• ?(new software uptake model)

Page 24: PaNdata Barcelona Meeting Welcome

Common Data Policy

• Uniform compliance with EU policy• See earlier slide for steps• Wider consultation outside consortium

– Other eu initiatives– Other disciplines

• Comparison/consultation with the US• Feedback to EU policy

Page 25: PaNdata Barcelona Meeting Welcome

Publications

• Link between publication and data• To enable redoing of analysis• Meta cat for pubs• Investigate tech for linking• Ids for data• Grey literature (eg PhD data)

Page 26: PaNdata Barcelona Meeting Welcome

PlanningH-J, FrankS, JBic, MjohnsonConsortium• Prepare outline for new partners (30 Sept)• Contact more partners (LLB, FRM2, Polish, EMBL?, ESS?, Bilbao?, - Heinz-J) 15 Oct• SMEs • Liaison with Matlab, IDL, Proposal document• Prepare skeleton• Revise partner profiles 15 Oct• Tele-Meeting schedule weekly teleconfs• Weekly updates to consortiumOther• Consult with brussels (3.3)

– CA or SA – SME and other industry partners.

• eIRG• Events/meetings• Espionage

Page 27: PaNdata Barcelona Meeting Welcome

Other

• Next face to face March?• (hearing March?)

Page 28: PaNdata Barcelona Meeting Welcome

HDF5/NeXusV.A. Solé – ESRF Software

GroupNeXus discussion, Pandata,

Sep. 2009

Page 29: PaNdata Barcelona Meeting Welcome

ESRF current situation: SPEC File Format

• Advantages– Simplicity (multiple column ASCII)– Widespread– Counters, Motors and MCA in same file

Disadvantages• Not suited to large datasets (images)

Page 30: PaNdata Barcelona Meeting Welcome

ESRF current situation: ESRF Data Format

Advantages• Suited to large data sets (images)

Disadvantages• Not widespread (basically ESRF)• Incomplete « official » metadata

Page 31: PaNdata Barcelona Meeting Welcome

Needs

• Efficient format to store different data types • Keep together counters, images, mca, …

• Compression support

• Widespread support

• Efficient and easy access to the data for visualization and analysis

HDF5

Page 32: PaNdata Barcelona Meeting Welcome

HDF5, why not NeXus?• What we like about NeXus• Well defined classes• A lot of endless discussions avoided

• What we do not like about NeXus• A lot of endless discussions pending• No easy way to implement new needs• Can one claim everything is foreseen?• Misuse of NeXus groups: A new need should imply a new

group• Slow reactivity

Page 33: PaNdata Barcelona Meeting Welcome

What do we propose?

• To foresee a new group for unforeseen uses (we have just called it Measurement)• It would prevent misuse of already defined groups• Common use could lead to definition of new instruments (Ex. MCA)• Something as simple as grouping by data dimension solves several issues• Generic scan (common misuse of NXdata at most synchrotrons)• Users getting lost hunting for information• Analysis programs would know what to do with little or no intervention• A dataset of dimension 200x400x1000: 8.0E+07 Scalars in 3D volume?• 200x400 spectra of 1000 channels?• 200 images of 400x1000 pixels?

• One NXentry would only contain one Measurement group

• Similarly structured groups are desirable to store analysis information

Page 34: PaNdata Barcelona Meeting Welcome

How could it look like?

• Advantages

• Simple to implement

• Answers current scientists demands (keep measurement data together, compression, …)

• Compatible with NeXus if desired (specific NeXus groups can be written at any time with links and the opposite is also true)

• Can be seen/used as an intermediate step for not- yet-defined instruments or uses

NXroot Top level. One per file. NXentry One group per measurement

Measurement One group per measurement

Positioners One group per Measurement Ex. All motor positions when the command was issued.

ScalarData One group per Measurement Ex. Scanned motors and counters.

Spectrum Several datasets per Measurement Ex. 1 spectrum dataset per MCA device

ImageData Several datasets per measurement Ex. 1 image dataset per CCD device

Page 35: PaNdata Barcelona Meeting Welcome

Current Status

• Analysis tools have to be ready for NeXus/HDF5 prior to the format implementation

• Scripts to convert from Specfile to HDF5 written• TODO: Add a set of EDF files to a particular scan of an HDF5 file

• Python module for HDF5/NeXus file contents browsing written

• Python support implemented using h5py and in collaboration with CHESS

• Full support for 1D data visualization and analysis incorporated into PyMca• TODO: Apply PyMca 2D, 3D and 4D visualization capabilities to HDF5/NeXus files

Page 36: PaNdata Barcelona Meeting Welcome

PyMca HDF5/NeXus

HDF5 SupportCollaboration with D. Dale, CHESS

SOLEIL NeXus Data courtesy of J.A. Sans and G. Martínez

Page 37: PaNdata Barcelona Meeting Welcome

PyMca Visualization

Data courtesy of P. Cloetens

Data courtesy of J.A. Sans and G. Martínez

Data courtesy of A. Díaz

PyMca Object3D ModuleUp to 4D visualization

Page 38: PaNdata Barcelona Meeting Welcome

We need your experience

How do you deal with instruments saving the data in files with proprietary formats?- Do you include the file names somewhere in the relevant instrument field?- Do you convert the format to include it in the final file?

How do you deal with data originated from several computers?- Is the sequencer who reads and writes the data?

- Everything is bufferized and written by a particular server?- Is concurrent access to the file possible?

Page 39: PaNdata Barcelona Meeting Welcome

ESRF conclusions so far

• HDF5 will be supported

• Analysis codes must be able to deal natively with HDF5 prior to deployment

• NeXus groups we can use will be used

• We consider an error to use NeXus groups for things other than what they were intended for. A non-respected standard is not any longer a standard.

• Our analysis codes will support NeXus in its HDF5 version (but feel free to add XML)