Acknowledgements...
• Work presented is the output from two Global Grid Forum Working Groups– Usage Record Working Group (UR-WG)– Resource Usage Service Working Group
(RUS-WG)• I was involved mainly in the RUS-WG• Work was funded through UK e-Science
Markets for Computational Services (MCS) Project
• The recent implementation of the RUS, and much of the material presented today is from John Ainsworth, University of Manchester.
Accounting on the Grid?
Q. Why is it different from HPC Center accounting?
A. Like accounting for a HPC Center, we need to track usage on more than one machine, but:
– users have single sign-on – need to work with X509 Distinguished Names...
– ...so usernames may differ– Also, some machines are at (and run by)
different organizations
How do we do this? (1)• We know that different batch systems produce
different accounting records– As many formats as batch systems (similar content)– But aggregating these directly is hard
• Also, need to cope with single sign-on (X509)• So first, we create a standard accounting record
representation (Usage Record)
• Defined by the GGF UR-WG• This is defined as an XML Schema. The spec. and
XML Schema are at:– http://www.psc.edu/~lfm/Grid/UR-WG/
• The work of this group is nearly completed• Specification is now stable
Example Usage Record<UsageRecord xmlns=http://www.gridforum.org/2003/ur-wg
xmlns:urwg="http://www.gridforum.org/2003/ur-wg" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RecordIdentity urwg:recordId="JSS-UNIQUE-ID”urwg:createTime="2003-08-13T18:56:56Z" />
<JobIdentity>
<GlobalJobId>green147989</GlobalJobId>
<LocalJobId>147989</LocalJobId>
</JobIdentity>
<UserIdentity>
<LocalUserId>wwmarko</LocalUserId>
<ds:KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
<X509Data>
<X509SubjectName>CN=john ainsworth, L=MC, OU=Manchester, O=eScience, C=UK</X509SubjectName>
</X509Data>
</ds:KeyInfo>
</UserIdentity>
...continued! <JobName>------</JobName>
<Status>completed</Status>
<TimeDuration urwg:type="cpuTimeRequested">PT1800S</TimeDuration>
<TimeDuration urwg:type="wallTimeRequested">PT1800S</TimeDuration>
<TimeInstant urwg:type="timeSubmitted">2004-11-29T06:47:30</TimeInstant>
<Processors>1</Processors>
<ProjectName>cs5015</ProjectName>
<Host>green</Host>
<CpuDuration>PT0.0S</CpuDuration>
<WallDuration>PT1S</WallDuration>
<StartTime>2004-11-29T06:48:33</StartTime>
<EndTime>2004-11-29T06:48:34</EndTime>
<MachineName>green</MachineName>
<SubmitHost>wren</SubmitHost>
<Queue>normal</Queue>
<Resource urwg:description="quoteReference">contract1234</Resource>
<Resource urwg:description="contractNumber">escience</Resource>
</UsageRecord>
How do we do this? (2)• Next, we need somewhere to store the records• Something that we can push records into, and pull them
back out of• So first, we now define a standard Web Service
interface (Resource Usage Service)
• Defined by the GGF RUS-WG• Service interface is based on “plain” Web Services, i.e. it
is compliant with the WS-I Basic Profile 1.0• This is defined as WSDL with XML Schema. The spec.
is being updated prior to going to the GGF Editor– http://www-unix.gridforum.org/mail_archive/rus-wg/maillist.html
• The work of this group is nearly completed• Specification is now stable
How do I work this?
• Specs are all very well, but what about running it?
• There is an implementation of the RUS• Also a record spooler for uploading records• Built at ESNW in Manchester• Will be maintained by LeSC in London• Will receive continued support through the UK’s
Open Middleware Infrastructure Institute (OMII)
• Current version is downloadable:– http://www.sve.man.ac.uk/Research/AtoZ/MCS/RUS/
How do Igenerate records?
• This is trickiest part...
• To some extent, this is scheduler specific• Platform LSF can generate UR format directly• For OpenPBS/PBSPro, you can use
SourceForge’s PBSAccounting– http://pbsaccounting.sourceforge.net
• Complex part is getting the X509 Distinguished Name into the record (for Grid jobs)
• Need to tweak Globus jobmanagers
Implementation Info
Web Service Container
XML Database
RUS Web Service
Application
Access control
list
XMLDB API
Service Interface
Service Interface (1)
• Write Operations– insertUsageRecords(UsageRecord[]),
replaceUsageRecords(RecordAndId[])– deleteRecords(XpathQuery),
deleteSpecificRecord(RecordId[])– modifyUsageRecordPart, updateUsageRecordPart
(not implemented)
• Read Operations– extractRecords(XpathQuery),
extractSpecificRecords(RecordId[])– extractUsageByGlobalUserId,
extractUsageByMachineName, extractUsgaeBySubmitHost,
Service Interface (2)
• Management– retrieveConfiuration – updateConfiguration
• Faults– RUSProcessingFault– RUSUserNotAuthorised– RUSInputFault
Security Model• Role based security
– Specified through access control file (XML) (Cached)– Administrator
• Unrestricted read/write authorization
– ResoureManager• Restricted read/write authorization• Requires a ResourceDescription to specify the resources for
which the RM has permission• ResourceTypes are urwg:MachineName, urwg:SubmitHost,
urwg:ProjectName and Domain• Authorization for a record determined by Logical AND
between different ResourceTypes, logical OR within values of same ResourceType
– All other users denied both read and write access
Configuration Mandatory Record Elements
A record must contain these elements for it to be valid for this RUS
Resolves “everything is optional” problem inherent in Usage Record specification
RUSUsageRecord
• Internal wrapper around UsageRecord• Adds elements
– RUSId– RecordHistory
• Audit trail of record insertion and modification• Records who and when in StoredBy and
ModifiedBy elements
InsertUsageRecords
• Check user authorization for record• Validate record against schema• Check mandatory elements are present• Check the record is not a duplicate• Insert into database
Implementation notes
• Started with WS-Security, but moved to TLS– More widely available
• Extended set of error codes– Added InvalidRecord and DuplicateRecord (used
in response for insert and replace)
• Database stores each record as a document– Xindice single document size limitation
• Developed web-based query client• Developed a Perl usage record spooler
Test MachineSpecification
Test Server 1 Test Server 2 Processor type and speed Intel 3.06GHz Intel 3.00 GHz No of processors 2 1 RAM 4GB 512MB Operating system Redhat Enterprise 3 Fedora Core 1 Disks 2x120GB 1x25GB Container Sun Java Syst e m
Applicat ion Ser ver P latfo rm Edition 8.0. 0_01
Sun Java Syst e m Applicat ion Ser ver Platfo rm Edition 8.0.0 _01
XML Data base Xindice 1.1b4 Xindice 1.1b4
Test Server 1
y = 2E-08x + 0.0487
0
0.5
1
1.5
2
2.5
0 500000 1000000 1500000 2000000 2500000 3000000
Number of Records
Avergae Insertion Time (s)
Data Set
Linear Fit
y = 7E-09x + 0.0565
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 500000 1000000 1500000 2000000 2500000 3000000
Number of Records in Database
Average Record Insertion Time (s)
Source Data
Linear Fit
Test Server 1(Restricted Data Set)
Test Server 2
y = 3E-07x + 0.0213
0
1
2
3
4
5
6
0 200000 400000 600000 800000 1000000
Number of Records
Average Insertion Time (s)
Data Set
Linear Fit
y = 2E-07x + 0.0504
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 200000 400000 600000 800000 1000000
Number of Records
Average Insertion Time (s) Series1
Linear Fit
Test Server 2(Restricted Data Set)