Upload
jovita
View
75
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Grid Services. Presented by Karan Bhatia. Hype Curve. Overview. Grid Computing Background Definition Opportunities Markets Technical Challenges Security Infrastructure Resource Management Service Interoperability Summary. Grid Computing is …. - PowerPoint PPT Presentation
Citation preview
Grid ServicesGrid Services
Presented by
Karan Bhatia
Presented by
Karan Bhatia
2
Hype Curve
3
Overview
• Grid Computing Background– Definition
– Opportunities
– Markets
• Technical Challenges– Security Infrastructure
– Resource Management
– Service Interoperability
• Summary
4
Grid Computing is …
• “Co-ordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]
– Co-ordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.
– Resources - compute cycles, databases, files, application services, instruments.
– Problem solving - focus on solving scientific problems
– Dynamic - environments that are changing in unpredictable ways
– Virtual Organization - resources spanning multiple organizations and administrative domains, security domains, and technical domains
5
Grid Computing is … (Industry)
• “about finding distributed, underutilized compute resources (systems, desktops, storage) and provisioning those resources to users or applications requiring them.” [The Grid Report, Clabby Analytics]
– Distributed - all the resources laying around in departments or server rooms.
– Underutilized - typical utilization of “big iron” is 5 to 10%. Organizations save money by increasing utilization versus purchasing new resources.
– Resources - servers and server cycles, applications, data resources
– Provisioning - predict and schedule resource use depending on load.
6
Types of Grids…
• Compute Grids– Seti@home, Entropia,
United Devices, Condor
• Data Grids– Storage Resource Broker
(SRB), Avaki, BIRN, GEON
• Collaboration Grids– Instrumentation
(telescience), applications
• Enterprise Grids– Majority of commercial
interest
• Partner Grids– B2B, Academic/Govt Grids
• Service Grids– “Utility” Computing, “On
Demand”, pervasive, autonomic, etc…
7
A Grid is …
• “the next generation Internet,”
• “all about free cycles ala SETI@HOME,”
• “a distributed object system,”
• “a new programming model,”
• “a replacement for high performance computing,”
8
IMAGING INSTRUMENTS
COMPUTATIONALRESOURCES
LARGE-SCALE DATABASES
DATAACQUISITION ,ANALYSIS
ADVANCEDVISUALIZATION
Example… TeleScience Grid
9
Grid Resources - Networks
10
Grid Resources - Compute
11
Top 500.org
12
13
Another Grid Example … Google
• Queries– 150 M queries/day (2000/s)
– 100 countries
– 3.3 B documents
• Hardware– 15,000 Linux systems in 6 data centers
– 15 Tflop/s and 1000 TB total capacity
– 40-80 1U/2U servers/cabinet
– 100 MB Ethernet switches/cabinate with gigabit uplinks
– Growth from 4000 systems (18 M queries/day)
14
Grid Resources - Data
• SDSC Resources – HPSS:
• SDSC's central long-term data storage system,• one of the world's largest IBM High Performance Storage System
(HPSS) units,• currently holds more than a petabyte (a million gigabytes) of data in
approximately 21 million files,• It has the capacity to store six petabytes of data; files are added at an
average rate of 10,000 gigabytes per month.
– Storage-Area Network (SAN): • A 72-processor Sun Microsystems SunFire 15K high-end server and 11
Brocade switches (1,400 ports) • 225,000 gigabytes of networked disk storage for data-oriented
applications.
• 1 TB of data = $2500
15
Protein Data Bank (PDB)
16
Putting it all together… TeraGrid
17
Grid Market
18
Grid Companies
• IBM– “on demand” solutions
• Sun Microsystems– N1 initiative
• Oracle– 10g
• Dell
• HP– “utility” computing
• Platform Computing– LSF, metaclulstering
• United Devices– Desktop grids
• DataSynapse• Akamai• Google?• Sony online
entertainment?
• Where’s Microsoft?
19
Grid Organizations
• Global Grid Forum (GGF)
• Organization for the Advancement of Structured Information Standards (OASIS)
• Distributed Management Task Force (DMTF)
• World Wide Web Consortium (W3C)
• Globus Alliance
• NSF Middleware Initiative (NMI)
• NASA IPG
• DOE Science Grid
• EU DataGrid
• NSF TeraGrid
20
Technical Challenges for Grid Computing
21
Challenges: Security
• Grids traverse organizational boundaries– Different administration domains have different authentication
mechanisms– Resources have different use agreements and sharing priorities
• Single sign-on– Multiple passwords difficult to manage
• Rights delegation• Trust
– Authentication of users– Authorization of users– Resource access
22
Security• Public Key Infrastructure
– Public key A.public– Private key A.private
• Supports Encrpyption– Message to B:
• m’ = F(m,A.private), send m’ to B• recv m’, m = F’(m’,A.public)
• Digital Signatures– Signed message to B:
• m’ = (m,F(m,A.public))
– Receiver verifies that m’ is from A and not tampered
23
Grid Security Infrastructure (GSI)
• A central concept in GSI authentication is the certificate.
• Every user and service on the Grid is identified via a certificate, a text file containing the following information:– a subject name identifying the person
or object that the certificate represents, – the public key belonging to the
subject, – the identity of a Certificate Authority
(CA) that has signed the certificate to certify that the public key and the identity both belong to the subject,
– the digital signature of the named CA.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
24
Proxy Certificate
• A proxy consists of a new certificate with a new public and private key.
• The new certificate contains the owner's identity modified slightly to indicate that it is a proxy.
• The new certificate is signed by the owner rather than a CA.
– This is called a self-signed certificate.
• The certificate also includes a time notation after which the proxy should no longer be accepted by others.
• Proxies have limited lifetimes in order to minimize the security vulnerability.
• Because the proxy isn't valid for very long, it doesn't have to kept quite as secure as the owner's private key.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
25
Mutual Authentication
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
26
Additional Challenges
• Certificate Management– MyProxy
• Role-based Access Control– CAS, VOM
• Authorization services• Integration with
applications & Portals
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
27
Challenges: Resource Management
• Resources loosely-coupled– Higher network latencies– Planned and unplanned disruptions
• How to provide QoS guarantees?
• Case Study: Entropia Desktop Grids– Additional trust/security issues
29
Entropia 1: Gimps• Over 1.5 Billion
CPU hours served
• 300,000+ machines, over 4 years operational
• Every PC and hardware config imaginable (proc, memory, disk, etc.)
• Every networking hookup imaginable
• Found 35th, 36th, 37th, 38th, and 39th Mersenne Primes
30
Entropia 2: FightAids@home
• Sept 2000 launch• Internet-Based• 54,657 total
machines• 10,770,506 total
hours of computation
• 27,881 peak billions of calculations/sec
31
Entropia 3: DCGrid
• Enterprise focus– Tremendous resources available in enterprise– Complements other HPC resources
• Computing Platform– Arbitrary application (open scheduling model)– Security, unobtrusiveness, manageability guaranteed
• Focus on – Pharmaceuticals, Chemicals, and Materials – Financial Services
32
DCGrid Architecture
35
Server vs. Desktop Grids
• Server environment– Fixed IP, always connected
– Always-on operation
– Moderate number of systems (10’s – 100’s)
– Dedicated use, trusted systems
• Desktop environment– Dynamic, temporary IP, intermittent connection
– Off evenings, off weekends, off lunch
– Large numbers of systems (100’s – 1000’s - ?)
– Shared resources, potentially untrusted users
• These differences give rise to desktop Grid challenges
36
Typical PC-Grid Environment
0
100
200
300
400
500
600
700
552 576 600 624 648 672 696 720
Time (hours)
37
PC-Grid Challenges
• Provide a stable compute environment for apps– Isolate app from variable desktop environment
• Operate in environment of dynamic use– Unobtrusiveness and Fault Tolerance are key!
• Provide simple application integration– Support ANY Application without modification
• Provide centralized management console– Zero additional management costs
38
JobManagement
ResourceSchedulinng
Physical NodeManagement
Job Manager
Subjob Scheduler
Node Manager
End-user
Entropia Clients
computation
resource
resource description
Workflow
2
3
45
6b
1
7
8
a
39
Stable Compute Environment
• Entropia Proprietary Sandbox– Binary-level protection
– System virtualization (registry, file system, network)
• Open Scheduling Infrastructure– Intelligent scheduling (match resources to subjobs
requirements)
– Manage subjob redundancy/fault tolerance
40
Manage Dynamic Use
• PC primary use must be respected!• Entropia Proprietary Sandbox
– Guaranteed to run at idle priority– Limit application capability– Monitor page faults, network access
• Management– Provide time-of-use windows– Different levels of unobtrusiveness
• Gathers 95+ % of cycles
41
Application Integration
• Support any Win32 binary– Language Neutral (C, C++, Fortran, Java,C#, etc.)
– Compiler/library Neutral
Client1 *
Client2 *
…
…
Open Grid Platform
App A
App B
App C
qsubqstat…
ApplicationPreparation Tools
Run Applications
42
Manageability
43
Application Performance
0
5
10
15
20
25
30
35
40
0 25 50 75 100 125 150
Number of Clients
Sequences per hourEntropia
1CPU SGI
1CPU SUN
Linear (Entropia)
0
50
100
150
200
250
300
350
400
0 100 200 300 400 500 600
Number of Clients
Throughput (Packets per Hour)
0
20
40
60
80
100
120
140
160
0 5 10 15 20 25 30 35 40 45 50
Number of Clients
Compounds per Hour
GOLD
AUTODOCK
HMMER
0
1000
2000
3000
4000
5000
6000
7000
0 100 200 300 400 500
Number of Clients
Compounds per Hour
DOCK
44
Scheduling PerformanceJob 14 Nodes (94 clients)
0
10
20
30
40
50
60
70
80
90
100
0 3600 7200 10800 14400 18000 21600
Time (secs)
Client ID
45
Challenges: Service Interoperability
• Trying to force homogeneity on users is futile. Everyone has their own preferences, sometimes even dogma.
• The Internet provides the model…
46
Typical Application
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
47
Typical Application
• Implementations are provided by a mix of– Application-specific code
– “Off the shelf” tools and services
– Tools and services from the Globus Toolkit
– Tools and services from the Grid community (compatible with GT)
• Glued together by…– Application development
– System integration
48
How it Really Happens(without the Grid)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
CameraTelepresence
Monitor
RegistrationService
A
B
C
D
E0Grid
Community
0Globus Toolkit
13Off the Shelf
9Application Developer
49
How it Really Happens(with the Grid)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
portlet
MyProxy
Portal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
CameraTelepresence
Monitor
Globus IndexService
GlobusGRAM
GlobusGRAM
GlobusDAI
GlobusDAI
GlobusDAI
4Grid Community
4Globus Toolkit
9Off the Shelf
2Application Developer
50
Theory -> Practice
51
What You Get in the Globus Toolkit
• OGSI(3.x)/WSRF(4.x) Core Implementation– Used to develop and run OGSA-compliant Grid Services (Java,
C/C++)
• Basic Grid Services– Popular among current Grid users, common interfaces to the most
typical services; includes both OGSA and non-OGSA implementations
• Developer APIs– C/C++ libraries and Java classes for building Grid-aware
applications and tools
• Tools and Examples– Useful tools and examples based on the developer APIs
52
Components in Globus Toolkit 3.0
GSI
WS-Security
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
WU GridFTPJAVA
WS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
53
Components in Globus Toolkit 3.2
GSI
WS-Security
CAS(OGSI)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
RFT(OGSI)
RLS
OGSI-DAI
WU GridFTP
XIO
JAVAWS Core(OGSI)
OGSI C Bindings
MDS2
WS-Index(OGSI)
Pre-WSGRAM
WS GRAM(OGSI)
OGSI Python Bindings
(contributed)
pyGlobus(contributed)
54
Planned Components in GT 4.0GSI
WS-Security
CAS(WSRF)
SimpleCA
Data Managemen
tSecurity
WSCore
Resource Managemen
t
Information Services
Authz Framework
RFT(WSRF)
RLS
OGSI-DAI
New GridFTP
XIO
JAVAWS Core(WSRF)
C WS Core(WSRF)
MDS2
WS-Index(WSRF)
Pre-WSGRAM
WS-GRAM(WSRF)
CSF(contribution)
pyGlobus(contributed)
55
Grid and Web Services Convergence
The definition of WSRF means that the Grid and Web services communities can move forward on a common base.
Grid
Services
Example
• (from sotomayor tutorial)
• MathService API:
– add(int x)
– subtract(int x)
– getvalue()
Note 1: How is this different than - Web Services? - Corba? - COM/DCOM?
Note 2: This is too simple! What about - co-ordination/workflows - personalization - presentation - security
OGSI
(or
what is a
grid service?)
• Using web service infrastructure
– MathService is defined by WSDL (like idl)
<?xml version="1.0" encoding="UTF-8"?>...<types><xsd:schema targetNamespace="http://www.gt3tutorial.org/namespaces/0.2/core/gwsdl/Math" attributeFormDefault="qualified" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema"> <xsd:element name="add"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:int"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="addResponse"> <xsd:complexType/> </xsd:element>...</types>
<message name="AddInputMessage"> <part name="parameters" element="tns:add"/></message><message name="AddOutputMessage"> <part name="parameters" element="tns:addResponse"/></message>...
<gwsdl:portType name="MathPortType" extends="ogsi:GridService"> <operation name="add"> <input message="tns:AddInputMessage"/> <output message="tns:AddOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="subtract"> <input message="tns:SubtractInputMessage"/> <output message="tns:SubtractOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="getValue"> <input message="tns:GetValueInputMessage"/> <output message="tns:GetValueOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation></gwsdl:portType>
</definitions>
Basic
Concepts
The
GridService
PortType
• a “grid service” is a web service that implements the GridService PortType
<portType name="GridService"><operation name="setServiceData"> [snip] </operation><operation name="destroy"> [snip] </operation><operation name="requestTerminationAfter"> [snip] </operation><operation name="requestTerminationBefore"> [snip] </operation><operation name="findServiceData"> [snip] </operation></portType>
<gwsdl:portType name="GridService"><sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="constant" name="interface" nillable="false" type="xsd:QName"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="mutable" name="serviceDataName" nillable="False" type="xsd:QName"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="factoryLocator" nillable="true" type="ogsi:LocatorType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="extendable" name="gridServiceHandle" nillable="false" type="ogsi:HandleType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="mutable" name="gridServiceReference" nillable="false" type="ogsi:ReferenceType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="findServiceDataExtensibility" nillable="false" type="ogsi OperationExtensibilityType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="setServiceDataExtensibility" nillable="false" type="ogsi:OperationExtensibilityType"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="terminationTime" nillable="false" type="ogsi:TerminationTimeType"/> <sd:staticServiceDataValues> <ogsi:findServiceDataExtensibility inputElement="ogsi:queryByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:setByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:deleteByServiceDataNames"/> </sd:staticServiceDataValues></gwsdl:portType>
GridService
PortType
• FindServiceData()• QueryByServiceDataNames()• GetServiceData()• SetByServiceDataNames()• DeleteByServiceDataNames()• RequestTerminationAfter()• RequestTerminationBefore()• Destroy()
Capabilities
of a
Grid
Service
• 2-level naming (GSH vs. GSR)
• Factories
• Lifetime management
• Service Data Elements
• Event Notification
• ServiceGroups
GSH
versus
GSR
• A GSH (Grid Service Handle) is a unique name for a Grid Service Instance
• A GSR (Grid Service Reference) is a perhaps temporary mechanism to access the Grid Service Instance
Factories
• Create new instances of services dynamically
• Individualized Instances
• lifetime management techniques
Service
Data
Elements
• Generalized State
– useful for describing capability
– Get/Set model similar to javaBeans Properties
• Can specify initial values in WSDL
• Integrated with Notification mechanism
Service
Data
Elements:
GridService
• Interface
• ServiceDataName
• FactoryLocator
• GridServiceHandle
• GridServiceReference
• TerminationTime
Notifications
• Source – implements NotificationSourcePortType– sends a notification message (XML Element) to Sinks• Sink– implements NotificationSinkPortType– sends a notification subscription request to source– causes a GridService Instance of porttype NotificationSubscription to be created
ServiceGroups
• A grid service that maintains information about other grid services• Can be used to implement a classic registry model• Can be used for dataset replication• A grid service can belong to more than one Service Group• Membership in a ServiceGroup can be homogeneous or heterogeneous• Service group portTypes are optional
Grid
Services:
Summary
• Extends Web Services to support Transient Services– WSDL 1.2 expected to include extensions• Requires support for factories, lifetime management, soft-state management, and
notifications• Java implementation pretty solid– Security implementation still shaky
69
Other Challenges
• Developing user interfaces
• Data Management
• Scheduling/co-scheduling of resources
• Failure management
• Application development
• Performance
• Many others…
70
What I hope you got from this talk
• Grid Computing is about – Co-ordinated use of different resources– Provisioning resources for increased utilization– Scaling to large numbers of resources, services
and users
• Many systems being built
• Many Applications being developed