23
Accounting for the Grid Usage Records and a Resource Usage Service

Accounting for the Grid Usage Records and a Resource Usage Service

Embed Size (px)

Citation preview

Accounting for the Grid

Usage Recordsand a

Resource Usage Service

Acknowledgements...

• Work presented is the output from two Global Grid Forum Working Groups– Usage Record Working Group (UR-WG)– Resource Usage Service Working Group

(RUS-WG)• I was involved mainly in the RUS-WG• Work was funded through UK e-Science

Markets for Computational Services (MCS) Project

• The recent implementation of the RUS, and much of the material presented today is from John Ainsworth, University of Manchester.

Accounting on the Grid?

Q. Why is it different from HPC Center accounting?

A. Like accounting for a HPC Center, we need to track usage on more than one machine, but:

– users have single sign-on – need to work with X509 Distinguished Names...

– ...so usernames may differ– Also, some machines are at (and run by)

different organizations

How do we do this? (1)• We know that different batch systems produce

different accounting records– As many formats as batch systems (similar content)– But aggregating these directly is hard

• Also, need to cope with single sign-on (X509)• So first, we create a standard accounting record

representation (Usage Record)

• Defined by the GGF UR-WG• This is defined as an XML Schema. The spec. and

XML Schema are at:– http://www.psc.edu/~lfm/Grid/UR-WG/

• The work of this group is nearly completed• Specification is now stable

Example Usage Record<UsageRecord xmlns=http://www.gridforum.org/2003/ur-wg

xmlns:urwg="http://www.gridforum.org/2003/ur-wg" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<RecordIdentity urwg:recordId="JSS-UNIQUE-ID”urwg:createTime="2003-08-13T18:56:56Z" />

<JobIdentity>

<GlobalJobId>green147989</GlobalJobId>

<LocalJobId>147989</LocalJobId>

</JobIdentity>

<UserIdentity>

<LocalUserId>wwmarko</LocalUserId>

<ds:KeyInfo xmlns="http://www.w3.org/2000/09/xmldsig#" xmlns:ds="http://www.w3.org/2000/09/xmldsig#">

<X509Data>

<X509SubjectName>CN=john ainsworth, L=MC, OU=Manchester, O=eScience, C=UK</X509SubjectName>

</X509Data>

</ds:KeyInfo>

</UserIdentity>

...continued! <JobName>------</JobName>

<Status>completed</Status>

<TimeDuration urwg:type="cpuTimeRequested">PT1800S</TimeDuration>

<TimeDuration urwg:type="wallTimeRequested">PT1800S</TimeDuration>

<TimeInstant urwg:type="timeSubmitted">2004-11-29T06:47:30</TimeInstant>

<Processors>1</Processors>

<ProjectName>cs5015</ProjectName>

<Host>green</Host>

<CpuDuration>PT0.0S</CpuDuration>

<WallDuration>PT1S</WallDuration>

<StartTime>2004-11-29T06:48:33</StartTime>

<EndTime>2004-11-29T06:48:34</EndTime>

<MachineName>green</MachineName>

<SubmitHost>wren</SubmitHost>

<Queue>normal</Queue>

<Resource urwg:description="quoteReference">contract1234</Resource>

<Resource urwg:description="contractNumber">escience</Resource>

</UsageRecord>

How do we do this? (2)• Next, we need somewhere to store the records• Something that we can push records into, and pull them

back out of• So first, we now define a standard Web Service

interface (Resource Usage Service)

• Defined by the GGF RUS-WG• Service interface is based on “plain” Web Services, i.e. it

is compliant with the WS-I Basic Profile 1.0• This is defined as WSDL with XML Schema. The spec.

is being updated prior to going to the GGF Editor– http://www-unix.gridforum.org/mail_archive/rus-wg/maillist.html

• The work of this group is nearly completed• Specification is now stable

How do I work this?

• Specs are all very well, but what about running it?

• There is an implementation of the RUS• Also a record spooler for uploading records• Built at ESNW in Manchester• Will be maintained by LeSC in London• Will receive continued support through the UK’s

Open Middleware Infrastructure Institute (OMII)

• Current version is downloadable:– http://www.sve.man.ac.uk/Research/AtoZ/MCS/RUS/

How do Igenerate records?

• This is trickiest part...

• To some extent, this is scheduler specific• Platform LSF can generate UR format directly• For OpenPBS/PBSPro, you can use

SourceForge’s PBSAccounting– http://pbsaccounting.sourceforge.net

• Complex part is getting the X509 Distinguished Name into the record (for Grid jobs)

• Need to tweak Globus jobmanagers

Implementation Info

Web Service Container

XML Database

RUS Web Service

Application

Access control

list

XMLDB API

Service Interface

Service Interface (1)

• Write Operations– insertUsageRecords(UsageRecord[]),

replaceUsageRecords(RecordAndId[])– deleteRecords(XpathQuery),

deleteSpecificRecord(RecordId[])– modifyUsageRecordPart, updateUsageRecordPart

(not implemented)

• Read Operations– extractRecords(XpathQuery),

extractSpecificRecords(RecordId[])– extractUsageByGlobalUserId,

extractUsageByMachineName, extractUsgaeBySubmitHost,

Service Interface (2)

• Management– retrieveConfiuration – updateConfiguration

• Faults– RUSProcessingFault– RUSUserNotAuthorised– RUSInputFault

Security Model• Role based security

– Specified through access control file (XML) (Cached)– Administrator

• Unrestricted read/write authorization

– ResoureManager• Restricted read/write authorization• Requires a ResourceDescription to specify the resources for

which the RM has permission• ResourceTypes are urwg:MachineName, urwg:SubmitHost,

urwg:ProjectName and Domain• Authorization for a record determined by Logical AND

between different ResourceTypes, logical OR within values of same ResourceType

– All other users denied both read and write access

Configuration Mandatory Record Elements

A record must contain these elements for it to be valid for this RUS

Resolves “everything is optional” problem inherent in Usage Record specification

RUSUsageRecord

• Internal wrapper around UsageRecord• Adds elements

– RUSId– RecordHistory

• Audit trail of record insertion and modification• Records who and when in StoredBy and

ModifiedBy elements

InsertUsageRecords

• Check user authorization for record• Validate record against schema• Check mandatory elements are present• Check the record is not a duplicate• Insert into database

Implementation notes

• Started with WS-Security, but moved to TLS– More widely available

• Extended set of error codes– Added InvalidRecord and DuplicateRecord (used

in response for insert and replace)

• Database stores each record as a document– Xindice single document size limitation

• Developed web-based query client• Developed a Perl usage record spooler

Test MachineSpecification

Test Server 1 Test Server 2 Processor type and speed Intel 3.06GHz Intel 3.00 GHz No of processors 2 1 RAM 4GB 512MB Operating system Redhat Enterprise 3 Fedora Core 1 Disks 2x120GB 1x25GB Container Sun Java Syst e m

Applicat ion Ser ver P latfo rm Edition 8.0. 0_01

Sun Java Syst e m Applicat ion Ser ver Platfo rm Edition 8.0.0 _01

XML Data base Xindice 1.1b4 Xindice 1.1b4

Test Server 1

y = 2E-08x + 0.0487

0

0.5

1

1.5

2

2.5

0 500000 1000000 1500000 2000000 2500000 3000000

Number of Records

Avergae Insertion Time (s)

Data Set

Linear Fit

y = 7E-09x + 0.0565

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 500000 1000000 1500000 2000000 2500000 3000000

Number of Records in Database

Average Record Insertion Time (s)

Source Data

Linear Fit

Test Server 1(Restricted Data Set)

Test Server 2

y = 3E-07x + 0.0213

0

1

2

3

4

5

6

0 200000 400000 600000 800000 1000000

Number of Records

Average Insertion Time (s)

Data Set

Linear Fit

y = 2E-07x + 0.0504

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 200000 400000 600000 800000 1000000

Number of Records

Average Insertion Time (s) Series1

Linear Fit

Test Server 2(Restricted Data Set)

Final Comments

• This is mature work, which is being deployed in the UK, at Manchester, and other sites

• The work is based on emerging standards from the Grid community

• The implementation has a future, including development and support

• Any questions?