Upload
rebecca-hutchinson
View
214
Download
0
Embed Size (px)
Citation preview
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Logging and Bookkeepingand Job Provenance ServicesLudek Matyska (CESNET)
on behalf of the JRA1 IT-CZ cluster
2
Enabling Grids for E-sciencE
INFSO-RI-508833
Talk Outline
• Logging and bookkeeping (L&B)– General overview and main features– Deployment– Expected use
• Job Provenance (JP)– Motivation– Overview and relationship with L&B– Expected use
• Conclusion
3
Enabling Grids for E-sciencE
INFSO-RI-508833
Logging and Bookkeeping
• Motivation– Keep track of Grid jobs
• General overview– Capture job control flow– Provide job state information– Just in time or short-term post mortem analysis– Support user generated events
5
Enabling Grids for E-sciencE
INFSO-RI-508833
Features
• L&B events as important points in the flow control of job– Submission– Transfer between components– Match making and brokerage results– Starting/finishing job execution– Events generated directly by user
Only during the actual job execution
• Events delivered in non-blocking way but reliably• Job state computed by fault tolerant state machine
6
Enabling Grids for E-sciencE
INFSO-RI-508833
User interaction
• Implicit:– Submitting a job
• Explicit– Logging events during job execution– Querying the bookkeeping server
Predefined set of common queries– Directly available through the UI
• Public API to access bookkeeping server– More general, for complex queries– User can register to receive a notification about job state
• Both reject “dangerous” queries• Support for aggregated information about DAGs
8
Enabling Grids for E-sciencE
INFSO-RI-508833
User events
• Users can store events in the bookkeeping DB– Non-blocking reliable mechanism for passing job related
information
• Information is available through the L&B querying mechanism– Through the UI or public API
• Still asynchronous– Events from the same CE will usually arrive in correct
order– Internal and user issued timestamps may help
9
Enabling Grids for E-sciencE
INFSO-RI-508833
L&B deployment
• EGEE– Around 50 production installations of bookkeeping
servers– Over 20 000 jobs per day on average– Over 60 GB of data since January 2005
• Other projects using EDG or EGEE middleware– LCG– CrossGrid
10
Enabling Grids for E-sciencE
INFSO-RI-508833
L&B Use
• Provision of job state– Including notification– Feed into R-GMA
• Provision of more detailed info about job flow• Debugging
– Transfer between components, failure trace
• Statistics (JRA2)– Time of submission, execution start and end– Matchmaking results, reasons for no match found– Failures
• End user events– E.g. visualization of progress of job execution
11
Enabling Grids for E-sciencE
INFSO-RI-508833
Job Provenance
• Motivation– The information about jobs has longer value
E.g. repeat a submission of a job executed year ago
– The information about job control flow and job execution environment complements job results E.g. to be able to reliably resubmit a job
• Job Provenance– Preserve information about Grid jobs– Allow data-mining in this information– Assist job re-submission
13
Enabling Grids for E-sciencE
INFSO-RI-508833
JP Gathered Data
• Data from L&B• Job inputs
– The input sandbox– No copies of files in remote storage
However, file/collection identification is available
• Execution track– Data (“measurements”) from CE
Installed software versions, environment, …
– Accounting data DGAS
• User annotations• Scalability
– Record volatile data only
14
Enabling Grids for E-sciencE
INFSO-RI-508833
Primary Data in JP
• Job is the primary entity• Minimal set of core attributes:
– JobID, owner, registration time
• Short data items: tags– “key = value” pairs
• Bulk data: uploaded files
15
Enabling Grids for E-sciencE
INFSO-RI-508833
JP Job Attributes
• A way to provide a generic unified view on any job data– Multivalued– Format: “namespace:key = value”– Namespaces may have defined schema
• User annotations are mapped directly to Job Attributes
• File-type specific plugins– Process bulk files
• Job Attributes used both for internal handling and user queries
16
Enabling Grids for E-sciencE
INFSO-RI-508833
JP Main Components
• Primary storage– Where the data are stored “forever”
• Index server
17
Enabling Grids for E-sciencE
INFSO-RI-508833
JP Primary Storage
• Gather and store data• Process “bulk files” on demand to extract attributes• Interaction with users:
– Annotate– Retrieve job attributes, download files– Always keyed by JobID only
Performance and scalability
• Web service control interface• gsiftp for file transfer
18
Enabling Grids for E-sciencE
INFSO-RI-508833
JP Index Server
• To provide scalability for access• Created and configured for a particular purpose
– Set of Primary servers to register with– Conditions on jobs to retrieve
Job from VO A submitted after January 1st, 2006
– List of attributes to collect
• Only fraction of data from Primary storage• Incremental feed from Primary storage
– Batch feed also available (e.g. after a crash)
• Complex user queries– May refer only to the IS configured attributes
19
Enabling Grids for E-sciencE
INFSO-RI-508833
JP – Current Status
• Prototype implementation– Included in gLite 1.5– Limited IS configuration– Supported files:
L&B and input sandbox
• Plans– Available from GUI– Complex authorization (VOMS based)– Support for re-submission of jobs
20
Enabling Grids for E-sciencE
INFSO-RI-508833
Conclusion
• Job centric monitoring approach– Users and their jobs– User specific data (annotations)– Infrastructure information specific
• Logging and bookkeeping: production– Information gathered and provided when job within the
Grid– Generic interfaces (including web service interface)– Security from the scratch (VOMS authorization)
• Job Provenance: prototype– Permanent job related information storage– Data-mining over complex job sets