Upload
jonas-terry
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
NA61/NA49 virtualisation:status and plans
Dag Toppe LarsenCERN 08.10.2012
08.10.2012 NA61/NA49 meeting, CERN 2
Outline Quick reminder of CERNVM and installation Tasks
Each task in detail Roadmap Input needed
08.10.2012 NA61/NA49 meeting, CERN 3
CERNVM CERNVM is a Linux-distribution
Designed specifically for virtual machines (VMs) Based on SLC (currently SLC5) Compressed image size ~300MB Both 32-bit and 64-bit versions
Addition software “Standard” software via Conary package manager Experiment software via CVMFS
Contextualisation: images adapted to experiment requirements during boot
Data preservation: all images are permanently preserved
08.10.2012 NA61/NA49 meeting, CERN 4
CVMFS Distributed read-only file system for CERNVM (i.e. the
same as AFS for LXPLUS) Can also be used by “real” machines (e.g. LXPLUS, grid) Files compressed and distributed via HTTP
Global availability Central server, site replication via standard HTTP proxies Files decompressed and cached on (CERNVM) computer
Can run without Internet access if all needed files are cached
Mainly for experimental software, but also other “static” data (e.g. calibration data)
Each experiment has a repository to store all versions of software
Common software (e.g. ROOT) available from SFT repository
08.10.2012 NA61/NA49 meeting, CERN 5
Data preservation As technology evolves, no longer possible to
run legacy software on modern platforms Must be preserved and accessible:
Experiment data Experiment software Operating environment (operating system, libraries,
compilers, hardware) Just preserving data and software is not
enough Virtualisation may preserve operating environment
08.10.2012 NA61/NA49 meeting, CERN 6
CERNVM data preservation “Solution”:
Experiment data stored on Castor Experiment software versions stored on CVMFS
HTTP “lasting” technology Operation environments stored as CERNVM image
versions Thus, a legacy version of CERNVM can be
started as a VM, running a legacy version of experiment software
Forward-looking approach (we start preserving now)
08.10.2012 NA61/NA49 meeting, CERN 7
CernVM for development CernVM makes it possible to run production
version of legacy software/shine on laptop without local install
Also possible to compile Shine from SVN on CernVM “out of the box” when the proper NA61 environment is set up
08.10.2012 NA61/NA49 meeting, CERN 8
CernVM installation on laptop Install a hypervisor of your choice, e.g. Virtualbox:
https://www.virtualbox.org/ Download a matching CernVM desktop image:
http://cernvm.cern.ch/portal/downloads Open http://<ipaddress>:8004 in your web browser (user=admin,
password=password) Select NA61 and PH-SFT software repositories Reboot You are now ready to use NA61 software in CernVM on your
laptop! More information: http://cernvm.cern.ch/portal/cvmconfiguration,
https://twiki.cern.ch/twiki/bin/viewauth/NA61/NewOFInstallation (CernVM section)
08.10.2012 NA61/NA49 meeting, CERN 9
Tasks Make experiment software available Facilitate batch processing Validate outputs On-demand virtual clusters Reference cloud cluster Data (re)production scripts Production reconstruction Data production web interface
08.10.2012 NA61/NA49 meeting, CERN 10
Make experiment software available NA61/NA49 software must be available on
CVMFS for CernVM to process data NA61
Legacy software chain installed Changes to be fed back to SVN
SHINE software installed ROOT and other dependencies provided via CVMFS SVN checkout compiles “out of the box” Using 32-bit CernVM image
NA49 Software has been installed
08.10.2012 NA61/NA49 meeting, CERN 11
Facilitate batch processing LXPLUS uses PBS batch system CernVM uses Condor batch system “Philosophical” differences
PBS has one job script per job Condor has common job description file with
parameters for each job Existing PBS scripts have been ported to
Condor
08.10.2012 NA61/NA49 meeting, CERN 12
Output validation – status Run 8688 has been processed on both
CernVM/CVMFS and LXPLUS/AFS, using software version v2r7g According to analysis by Grzegorz, there are
relatively small discrepancies Despite gap TPC not running on CernVM/CVMFS,
even if same set-up file and working on LXBATCH/CVMFS
When bug has been found, should repeat CernVM/CVMFS, LXBATCH/CVMFS and LXBATCH/AFS comparison
08.10.2012 NA61/NA49 meeting, CERN 13
On-demand virtual clusters A cluster may need VMs of different
configurations, depending on type of jobs Memory, CernVM version, experiment SW, etc.
Thus, need for dynamic creation/destruction of virtual cluster
Created command-line script for creating virtual clusters Later to be controlled by data production web
interface
08.10.2012 NA61/NA49 meeting, CERN 14
Test production reconstruction To run on private cloud and LXCLOUND
Currently, the private cloud has more resources, LXCLOUD the final target, important to do testing
on it Data can currently be processed “by hand” Have tested the (re)production scripts, some
modifications need Output should be compared/validated to the
output from normal LXBATCH production Once this successful, request more LXCLOUD
resources
08.10.2012 NA61/NA49 meeting, CERN 15
Private reference cloud cluster The virtual machines require a cluster of
physical hosts A reference cloud cluster has been created
Private cloud Currently 24 cores Set-up may be replicated on other sites wishing to
provide cloud/CernVM resources
08.10.2012 NA61/NA49 meeting, CERN 16
Cloud cluster The virtual machines require a cluster of
physical hosts A LXCLOUD cloud cluster has been created
Provided by CERN IT New service, currently “experimental”
Currently allocated 4 virtual machines May be expanded to include more VMs Will push for this once complete processing chain is
ready
08.10.2012 NA61/NA49 meeting, CERN 17
Data processing web interface A web interface for processing of the data to be
created Interface to bookkeeping system to extract
runs/chunks belonging to reactions List all existing raw/processed data with status (e.g.
software versions used for processing) Easy selection of data for (re)processing with
selected OS and software version A virtual on-demand cluster is created After processing, data written back to Castor
Using EC2 interface for the cloud management Allows for great flexibility of processing site
08.10.2012 NA61/NA49 meeting, CERN 18
Data processing scripts Created script for submitting reaction for processing
Input: Reaction name Software version Global key (CernVM version)
Needs some “tuning” (e.g. better create set-up files from global key) Needs some improvement of job description files (include SHOE
formats, PSD data) Created script for resubmit failed jobs
Failed jobs identified from: Non-existing/empty/small output DSPACK, SHOE, ROOT files Failed/exited/terminated chunks/events After resubmitting fixed number of times (3?), give up
Mostly working OK, but a small number false positives (short runs with only 1 or 2 “empty” events)
08.10.2012 NA61/NA49 meeting, CERN 19
Data processing web interface & scripts
Data processing web interface a front-end to the data processing scripts
Reaction list from bookkeeping system Reaction run list from bookkeeping system Software list from CVMFS directory tree Global key list from local data base? User selects data and parameters, and click
“process”.
08.10.2012 NA61/NA49 meeting, CERN 20
RoadmapTask Status/done Remaining Expected
NA61 software installation
OK Gap TPC not running November?
NA49 software installation
OK Data validation November?
Facilitate batch system
OK OK November?
Validate outputs In progress Rerun after fixing gap TPC
November?
On-demand virtual cluster
OK OK OK
Production reconstruction
Cluster ready Some improvements to scripts
October
Reference cloud cluster
OK Documentation November
Data processing web interface
Created scripts for data (re)processing
Create web interface November
08.10.2012 NA61/NA49 meeting, CERN 21
Next steps Parallel task 1
Understand source of GAP TPC not running
Rerun validation
Parallel task 2
Finalise data processing scripts
Run large-scale processing using scripts from command line
Request larger LXCLOUD
Transfer to NA61
Parallel task 3
Create web interface
Test web interface
08.10.2012 NA61/NA49 meeting, CERN 22
Input needed NA49 validation NA61 gap TPC Please keep virtualisation (CernVM/CVMFS) in
mind when making plans ...