51
Grid Discovery and Monitoring Systems Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team

Grid Discovery and Monitoring Systems

  • Upload
    ulric

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Grid Discovery and Monitoring Systems. Laura Pearlman USC/Information Sciences Institute With materials from Ben Clifford and others from the Globus Project Team. Outline. Overview of information systems Some real implementations Globus MDS2 / BDII Globus MDS4 Inca GMA / R-GMA. - PowerPoint PPT Presentation

Citation preview

Page 1: Grid Discovery and Monitoring Systems

Grid Discovery and Monitoring Systems

Laura PearlmanUSC/Information Sciences Institute

With materials from Ben Clifford and others from the Globus Project Team

Page 2: Grid Discovery and Monitoring Systems

Outline

Overview of information systems Some real implementations

Globus MDS2 / BDII Globus MDS4 Inca GMA / R-GMA

Page 3: Grid Discovery and Monitoring Systems

Discovery and Monitoring

Discovery: finding resources that exist, at any moment, possibly meeting some criteria E.g., “find linux boxes with Java 1.5 installed”

Monitoring: determining the state of one or more resources E.g., “how much memory is free on machine X”?

“Monitoring” and “Discovery” information sometimes overlap “find me machines with 2G memory” vs. “how much

memory does Machine X have”

Page 4: Grid Discovery and Monitoring Systems

Examples of Useful Information

Characteristics of a compute resource Software available, networks connected to, load, type of CPU, disk space

Characteristics of a network Bandwidth and latency, protocols

Information about a service Contact info, version number, etc.

Page 5: Grid Discovery and Monitoring Systems

Who uses this information?

Individual users, trying to pick the ‘best’ resource Brokers or workflow systems trying to find suitable

resources VO administrators who want to know the state of

every resource. System administrators may use this information, but

probably also have local site monitoring systems in place

Page 6: Grid Discovery and Monitoring Systems

What Interfaces are Needed?

Graphic and command-line interfaces for individual users and administrators

Programmatic interfaces for brokers, workflow systems, etc.

Asynchronous notifications for administrators “send me mail when we’re almost out of

disk space”

Page 7: Grid Discovery and Monitoring Systems

Monitoring/Discovery Problems in Grids

Dynamic in nature VOs come and go Resources join and leave VOs Resources change status and fail

Geographically distributed users Geographically distributed resources Heterogeneous implementations

Page 8: Grid Discovery and Monitoring Systems

Grid Information: Facts of Life Information is always old Distributed state hard to obtain Components will fail

We must deal with this gracefully Scalability and overhead Many different usage scenarios

Page 9: Grid Discovery and Monitoring Systems

Resource Discovery/Monitoring

Distributed users and resources Variable resource status Variable grouping

RR

RR

R

R

?

?

R

RR

R

R R

R

R R?

?R

R

R

dispersed users

VO-A VO-B

network

RR

Page 10: Grid Discovery and Monitoring Systems

Resource Discovery/Monitoring

Some resources have failed A network partition has occurred Still, some work can get done…

RR

RR

R

R

?

?

R

RR

R

R R

R

R R?

?R

R

R

R R

dispersed users

VO-A VO-B

network

Page 11: Grid Discovery and Monitoring Systems

Scalability

Large numbers Many resources Many users

Independence Resources shouldn’t affect one another VOs shouldn’t affect one another

Graceful degradation of service “As much function as possible” Tolerate partitions, prune failures

Page 12: Grid Discovery and Monitoring Systems

Failure Scenarios

User is disconnected Resource fails or is disconnected Discovery service fails or is disconnected Network partition

Page 13: Grid Discovery and Monitoring Systems

When a user is disconnected

This should not adversely affect other users Some state (such as the user’s subscriptions) may

need to be cleaned up. Some systems use soft-state to deal with this

issue: Subscriptions are valid for a limited time and must

be periodically refreshed If the user does not come back in time to refresh the

subscription, it will be removed automatically.

Page 14: Grid Discovery and Monitoring Systems

When a resource disappears

Monitoring services should indicate that the resource is no longer there

Discovery services should stop advertising the resource

Neither of these can be gauranteed to happen instantaneously.

Page 15: Grid Discovery and Monitoring Systems

When a discovery service dies

Users cannot discover new resources. They may have old information cached –

this data is still useful, although it degrates in quality/usefulness.

Users can contact the resources directly and determine their status.

Some implementations allow for mirroring of discovery services.

Page 16: Grid Discovery and Monitoring Systems

When the network is partitioned

This could be seen as a generalization of some the previous scenarios – all of the previous scenarios can be modelled as appropriate network partitions.

If there is a discovery service in a user’s partition, the user should be able to discover resources in that partition.

Page 17: Grid Discovery and Monitoring Systems

Information Systems

We sometimes refer to Discovery and Monitoring as “Information Systems” This is misleading, as we’re not including

general-purpose database systems Discovery and Monitoring information is:

Often stale as soon as it’s reported Sometimes inconsistent Often updated by running probes, either on-

demand or periodically

Page 18: Grid Discovery and Monitoring Systems

Discovery Services Used to locate monitoring services with information

about resources. May cache some resource data

May even cache enough resource data to act as a monitoring system.

Generally involve a database-like query interface Languages like ldap, xpath, sql

Usually a relatively small number (maybe even just one, or one with a mirror) are deployed in a VO.

Page 19: Grid Discovery and Monitoring Systems

Two Models for Discovery Services

DiscoveryService

MonitoringService

Monitoring & Discovery

ServiceMonitoring

Service

MonitoringService

MonitoringService

MonitoringService

MonitoringService

Page 20: Grid Discovery and Monitoring Systems

Monitoring Services

Used to monitor the state of a resource Service interface usually involves db-like queries

With languages like ldap, xpath, sql Often also provides for asynchronous notification

Typically also includes a back-end provider interface Allows locally-written scripts, programs, etc. to

collect information for the monitoring service Typically deployed on each host that houses a

resource.

Page 21: Grid Discovery and Monitoring Systems

How Different Implementations Differ

Overall architecture Are monitoring and discovery separate?

Wire protocol LDAP, Web Services, custom

Query Language LDAP, Xpath, SQL

Caching Strategies Schemas

Really more a deployment issue

Page 22: Grid Discovery and Monitoring Systems

MDS2 / BDII history

MDS2 was developed as part of the Globus Toolkit It’s now superseded by MDS4, which has a

different architecture. BDII is a reimplementation of MDS2 by

EGEE, and is still in use.

Page 23: Grid Discovery and Monitoring Systems

MDS2 Architecture Overview

The Grid Resource Information Service (GRIS) collects information about a local resource and responds to requests for that information Uses pluggable information providers

The Grid Index Information Service (GIIS) aggregates information from various GRIS servers

Users may query the GIIS for aggregated information or query the GRIS servers directly.

GIIS servers may be arranged hierarchically.

Page 24: Grid Discovery and Monitoring Systems

MDS2 Architecture

GRIS

IP IP

GRIS

IP IP

GRIS

IP IP

GIIS GIIS

GIIS

Page 25: Grid Discovery and Monitoring Systems

MDS2 GIIS Grid Index Information Service (GIIS) servers

aggregate information from GRIS servers and other GIIS servers. These other servers register themselves to the GIIS

server. Registrations must be periodically refreshed

GIIS servers cache information (results from previous queries).

If a GIIS server receives a query for which there is no fresh cached information, it forwards the query to its registered servers.

Page 26: Grid Discovery and Monitoring Systems

MDS2 GRIS

A Grid Resource Information Server (GRIS): Runs on each host that has resources to be

monitored. Accepts requests for information about local

resources May come from users or GIIS servers

Runs a local “information provider” to collect and format the information

Unless the requested information is cached and relatively fresh

Caches the information and replies to the request

Page 27: Grid Discovery and Monitoring Systems

MDS2 Query Language

Both the GIIS and GRIS servers use LDAP as the service protocol and query language.

Page 28: Grid Discovery and Monitoring Systems

LDAP Basics Hierarchical data model Each entry has a distinguished name and a set of

attribute/value pairs Distinguished name

Is a collection of name-value pairs Must be unique Determines the entry’s place in the hierarchy

Each entry’s DN must include its parent’s DN

Queries Can search on attributes or DNs Results can include children (or not) or include only

certain attributes.

Page 29: Grid Discovery and Monitoring Systems

MDS4 Overview MDS4 is a redesign of MDS The MDS4 Index Service acts as both a monitoring

and discovery service. Uses WSRF standard resource property queries as its

query interface. A second monitoring service, the MDS4 Trigger

Service, examines aggregated information and takes action when certain conditions are met. E.g., “send email when a remote system appears to

be down”. MDS4 uses WSRF standards for its query and

registration interfaces.

Page 30: Grid Discovery and Monitoring Systems

WS-Resource Review

A WS-Resource is a Web Service that exposes internal state as Resource Properties An XML element of arbitrary complexity

Each WS-Resource has a Resource Property Document An XML document that includes all its Resource

Properties Example: The WS-GRAM service advertises

information about its associated queues and clusters as a resource property.

Page 31: Grid Discovery and Monitoring Systems

Retrieving Resource Properties

GetResourceProperty Gets a single named resource property

GetMultipleResourceProperties Gets a set of named resource properties

QueryResourceProperty Returns the results of a query against a resource’s

resource property set Subscription/notification

Clients subscribe and get periodic or occasional notifications

Page 32: Grid Discovery and Monitoring Systems

What this means… Standard requests can be used to get state

information from any WS-Resource. This means that every WS-Resource is also a

monitoring service! But not necessarily monitoring anything (i.e.,

providing any interesting state) We sometimes want information from sources

other than WS Resources Non-WSRF services General system information Catalogues of installed software

Page 33: Grid Discovery and Monitoring Systems

Service Groups Review A service group is a service that represents a group

of other services or resources Service groups contain Service Group Entries

(SGEs), which consist of: The address of the SGE itself, The address of the Service Group that the SGE

belongs to, and A Content element consisting of arbitrarily-formatted

data SGEs are created via the Service Group Add

request

Page 34: Grid Discovery and Monitoring Systems

The MDS4 Index Service

Acts as a Discovery Service Gathers information from other WS-

Resources Including other Index Servers

Acts as a Monitoring Service Caches all the information it gathers Also has a pluggable interface for

Information Providers Programs or Java classes that gather information

Page 35: Grid Discovery and Monitoring Systems

An MDS4 Index Deployment

Index

GRAM RFT

Index

GRAM RFT

Index

Index Index

IP IP

Page 36: Grid Discovery and Monitoring Systems

The MDS4 Index Data Model

The Index Service keeps its data as a Service Group Registering a new resource to be monitored

is accomplished by adding a service group entry to the service group.

The data in each SGE contains both: Configuration information

E.g., “query the X resource property from server Y”

and the actual collected data.

Page 37: Grid Discovery and Monitoring Systems

Index Data Model (simplified)

Index Service Group

SGE SGE

SG EPR SGE EPR Content

Config Data

GLUECE

Queue Cluster

Name State Name OS

RP EPR

GetRP

Page 38: Grid Discovery and Monitoring Systems

Data Model continued

In the Index Service data model, data is grouped with its configuration information

Can have the “same” data two different places in the tree, if it was acquired from two different information sources. E.g., information about a host’s load

average from two different GRAM servers running on that host.

Relatively easy to find where each piece of data came from.

Page 39: Grid Discovery and Monitoring Systems

How the Index Updates its Data

Periodically, the Index Service examines each SGE in its Service Group

If the SGE’s registration has expired and not been renewed, it is destroyed.

Otherwise, the Index looks at the Config part of the SGE content, gathers data as specified by that config information,

and updates the data in the Data part of the SGE content

Data is updated periodically, not on demand.

Page 40: Grid Discovery and Monitoring Systems

Querying the Index Service

The Index Service advertises its service group as a resource property You can fetch the whole thing with GetRP or

GetMultipleRPs Most people use QueryRP to query it.

QueryRP allows you to specify a dialect and a query Currently, only Xpath is supported as a

dialect

Page 41: Grid Discovery and Monitoring Systems

XPath Queries

Search an XML document and return some subset of the XML entities.

If an entity is included in the results, it’s included in its entirety Unlike LDAP, no way to leave out attributes

or children

Page 42: Grid Discovery and Monitoring Systems

MDS4 Trigger Service

A second monitoring service in MDS4 The Index is geared more towards queries

intended for resource location and selection.

The Trigger service is intended to alert people to problems. Can be configured to take action (e.g., send

mail to an administrator) when issues arise.

Page 43: Grid Discovery and Monitoring Systems

MDS4 Trigger Service Maintains information in a service group, like the

Index Service SGE config information also includes an xpath

query and an action The action is the name of a program to run.

Periodically, the trigger service looks at each SGE in its servicegroup: It evaluates the SGE’s xpath query against the SGE’s

data. If the query returns true, it runs the program

specified by the action.

Page 44: Grid Discovery and Monitoring Systems

MDS4 WebMDS

Provides a simple HTTP interface to query an MDS Index Service Really, to query resource properties of any

WS-Resource Optionally applies XSLT transforms to the

query results. Designed as a user interface, to be used

with a web browser But some people are using it to provide a

REST-like interface to MDS4.

Page 45: Grid Discovery and Monitoring Systems

INCA

Monitoring system developed at SDSC Users define tests for Inca to run. Inca runs them and stores the results in a

database. Users can view the results on a web page. Can be configured to send mail if tests fail,

etc. Can run tests using the user’s credentials

Page 46: Grid Discovery and Monitoring Systems

From the Inca 2.1 User’s Guide, http://inca.sdsc.edu/releases/2.1/guide/userguide.html

Page 47: Grid Discovery and Monitoring Systems

Inca Query Interface

Uses an SQL database internally End-users can query using a web page or

receive notifications via email. A web-services interface is also available

Uses a custom query language Overall a nice monitoring/testing

framework Not designed as a discovery service

Page 48: Grid Discovery and Monitoring Systems

GMA (Grid Monitoring Architecture)

Proposed architecture with three components: Producers produce information Consumers consume information Directories keep track of what information

is available what producers can be queried, not the actual data

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Diagram from “A Grid Monitoring Architecture”, B. Tierney et al., http://www-didc.lbl.gov/GGF-PERF/GMA-WG/papers/GWD-GP-16-2.pdf

Page 49: Grid Discovery and Monitoring Systems

R-GMA

Relational Grid Monitoring Architecture Implements the GMA model

Except that users never interact with the directory service (called a “registry” in R-GMA)

A consumer service does that instead, and users query the consumer service.

Uses SQL as its query language.

Page 50: Grid Discovery and Monitoring Systems

An R-GMA Query

Diagram from “R-GMA: Architectural Design” at http://www.r-gma.org/arch-consumers.html

•Client sends SQL query to Consumer Service•Consumer Service contacts registry for list of producers to contact•Consumer service queries producers and buffers results•Client retrieves results from consumer service

Page 51: Grid Discovery and Monitoring Systems

For More Information

Globus: http://www.globus.org Inca: http://inca.sdsc.edu R-GMA: http://www.r-gma.org XML / Xpath / XSLT: http://www.w3c.org