14
Adrian Jackson, Stephen Booth EPCC [email protected] +44 131 650 5746 Resource Usage Monitoring and Accounting

Adrian Jackson, Stephen Booth EPCC [email protected] +44 131 650 5746 Resource Usage Monitoring and Accounting

Embed Size (px)

Citation preview

Page 1: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

Adrian Jackson, Stephen BoothEPCC

[email protected]+44 131 650 5746

Resource Usage Monitoring and

Accounting

Page 2: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 2

Introduction

• Resource usage accounting has long been standard practice

on high-end compute resources.

• Historically less common on smaller systems where it was

easier to apportion costs locally.– This is becoming less viable.

– FEC costing– Grid computing (users no longer local) – Virtualisation

Page 3: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 3

GridSAFE

• JISC funded project to build general purpose accounting/monitoring solution.– http://gridsafe.forge.nesc.ac.uk/– Builds on accounting subsystem from SAFE user administration system used

by HPCx/HECToR

• Challenges:– Need to work with wide variety of different local policies.– Need to work with both grids and local HPC resources.

• One solution won’t fit all potential users– Build kit of parts – Pre-built solutions for common deployment scenarios.

• Key aims– Modular design, individual functions can be deployed independently – Behaviour can be customised using plug-ins to implement different service

policies.

Page 4: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 4

End Users

• End users are interested in accounting for their own use.– Compare the efficiency of different systems– Compare the cost effectiveness of different systems.– Check resources available

• Often interested in individual jobs as well as overall totals.

Page 5: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 5

Resource Providers

• Need to gather the raw

accounting data.– Format depends on the

underlying technology.

• Need to apply local policies– Charges

– Discounts

– Where to charge

• Usage data may be useful for

purposes other than accounting.– Analysing queue wait times.

– Job size profiles.

– May want to keep some of this data private.

Page 6: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 6

Research groups/Virtual organisations

• Research groups/VOs need to manage their resources

across all available platforms.– Ideally have all information available in a single place.

• Where all resources reside within a single grid this can be

provided by grid-level accounting.

• Resources may come from multiple grids or independent

resource/ providers.

Page 7: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 7

Overview

Page 8: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 8

Grid-SAFE core

• Java code with data stored in MySQL database.– Normally run within a tomcat container.

• UsageRecords are treated as a collection of properties

• Highly customisable– Code does not mandate a single format– Can choose which of the available properties to store in database.– Can add new properties for site local concepts– Easily extendable to new types of data

– Storage accounting– Allocation tracking

Page 9: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 9

Accounting code

• Plug-in parser modules handle different types of input data.– OGF-UR– SGE– PBS– EGEE JobManager– Etc.

• Plug-in policy modules augment these allowing site local customisation

Page 10: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 10

Reporting Portal

• Grid-safe uses XML templates to define reports – Can generate unified reports over multiple data tables containing

different types of data

– Tables/charts

– Parameterised reports (e.g. to select user or project).

• Support reports in multiple formats– PDF HTML CSV

• Performance of report generation a particular issue– Utilise database effectively.

– Use aggregate tables for high throughput systems.

Page 11: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 11

Sample report

Page 12: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 12

Web Services

• Web service interface for access by other services.

• Web service interfaces use OGF-UR XML as common

interchange format.

• RUPI – Resource Usage Publishing Interface– Interface for uploading usage records to a remote repository.– Currently a OGF-RUS-WG proposal

• RUQI – Resource Usage Query Interface– Interface for running queries on a remote repository.– Aim to submit to OGF-RUS-WG

Page 13: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 13

Grid level accounting

• Grid accounting is not a solved problem– We are aiming to contribute useful technology not to dictate a solution.

• Different grids are pursuing different architectures– EGEE/NGS hierarchical model

– Data published up tree of repositories

– DEISA distributed model.

– Resource providers run local repositories and control access to data.

– Accounting operations query multiple repositories.

• Some commonality– OGF-UR format generally accepted as common data interchange format.

• Combination of RUPI/RUQI can be used to implement either model.

Page 14: Adrian Jackson, Stephen Booth EPCC s.booth@epcc.ed.ac.uk +44 131 650 5746 Resource Usage Monitoring and Accounting

GridSafe AHM 2009 14

• Actively looking for sites to use the software

• Don’t need to use everything

• http://gridsafe.forge.nesc.ac.uk/

[email protected]