Upload
nelson
View
80
Download
4
Embed Size (px)
DESCRIPTION
第五章 网格数据管理技术. 内容. 数据网格技术概述 Globus 的数据管理 OGSA - DAI. 第一节数据网格技术概述. 什么是数据网格 ?. 从科学研究的角度看:允许地理上分布的团体对 Petabytes ( Terabytes )的科学数据进行复杂、计算密集型的分析、处理 多个数据提供者 跨网络的最优数据移动 无缝的安全的数据访问 良好的访问控制机制和复杂的使用方式 数据访问的保证 类似电力网格 Multiple power generators Complex transmission networks with switching - PowerPoint PPT Presentation
Citation preview
网格计算- Grid Computing 肖侬
第五章 网格数据管理技术
网格计算- Grid Computing 肖侬
内容
• 数据网格技术概述
• Globus 的数据管理
• OGSA - DAI
网格计算- Grid Computing 肖侬
第一节数据网格技术概述
网格计算- Grid Computing 肖侬
什么是数据网格 ?• 从科学研究的角度看:允许地理上分布的团体对 Petabytes ( Terabytes )的科学数据进行复杂、计算密集型的分析、处理多个数据提供者跨网络的最优数据移动无缝的安全的数据访问良好的访问控制机制和复杂的使用方式数据访问的保证类似电力网格
• Multiple power generators
• Complex transmission networks with switching• Simple Usage Interface – plug and play• Guaranteed Supply - Meeting of demands
• Complex cost function
网格计算- Grid Computing 肖侬
从 1993 到 2004
•巨大的数据量 : PetabytesFor an increasing number of communities, gating step is not collection but analysis
•无处不在的因特网 :100+ million hostsCollaboration & resource sharing the norm
•超高速的网络 : 10+ Gb/sGlobal optical networks
•巨大的计算能力 : 100+ Top/sMoore’s law gives us all supercomputers
网格计算- Grid Computing 肖侬
结果 :全球知识社区的出现•围绕共同的目标组织的一支队伍
社区 : “Virtual organizations”•各种成员和能力
异构是一种力量而不是弱点•地理上和行政管理区域的分布
没有一个点和机构具有所有的能力和资源•适应如此环境是系统的功能
调整成员、分配职责和资源
7
网格计算- Grid Computing 肖侬
全球知识社区的出现
网格计算- Grid Computing 肖侬
全球知识社区的出现常常是由数据驱动 : 例如 Astronomy
No. & sizes of data sets as of mid-2002, grouped by wavelength
• 12 waveband coverage of large areas of the sky• Total about 200 TB data• Largest catalogs near 1B objects
Data and images courtesy Alex Szalay, John Hopkins
网格计算- Grid Computing 肖侬
数据的集成是一个基本的挑战
R
Discovery
Many sourcesof data, services,computation
R
Registries organizeservices of interestto a community
Access
Data integration activitiesmay require access to, &exploration of, data at many locations
Exploration & analysismay involve complex,multi-step workflows
RM
RM
RMRM
RM
Resource managementis needed to ensureprogress & arbitrate competing demands
Securityservice
Securityservice
PolicyservicePolicyservice
Security & policymust underlie access& managementdecisions
网格计算- Grid Computing 肖侬
为什么我们需要数据网格 ?
• 数据是分布的• 远程进行计算• 分布计算• 大规模的数据移动• 跨域的数据共享• 大规模的数据存储• 多个数据集合的访问• 动态、分布的社区信息资源的共享
建立、协商、管理和多组织的联邦的发展 协同、管理、提供、工作流的监控和所需资源
网格计算- Grid Computing 肖侬
数据网格需求 – 1• 无缝的访问• 规模可扩
in Size & Number
• 可保证的质量传递Fault tolerance, load sharing , Consistency Maintenance
• 处理异构和多样性Platforms & systems, vendors, types of storage, types of
services, types of processes & users
• 可控制的数据移动 Demand-driven Data placementCaching, archiving, version and locksThird-party data movementParallel data transfer
网格计算- Grid Computing 肖侬
数据网格需求 – 2
• 数据结构和表示 DGMS 必须支持访问所有在数据结构和表示中定义的数据类型• Numeric data at highest of precision• Text data in any format, structure, language,
and coding system• Multimedia data in any standard or user defi
ned binary format
网格计算- Grid Computing 肖侬
数据网格需求 – 3• 支持自动的认证
Multiple Authentication Realms – single sign-onUniform user name space
• 授权 – 访问控制Seamless One-stop authorizationRoles & Tickets – inheritance & longevityPossible: data owner can grant 、 revoke access 、
permission , and delegate authorityFlexible : combinations of restrictions and the leve
l of granularity levelEffective: grant and revoke all types of privileges dy
namicallyEasily : provide facilities or tools to owners
网格计算- Grid Computing 肖侬
数据网格需求 – 4• 虚拟数据组织
Data Location Independence , Uniform data name space,
persistent identifiers, Collections Hierarchy
• 与元数据的紧密集成 数据发布
• 动态地注册和注销数据资源 Register and deregister data resources dynamically
• Register and deregister can be propagate to site holding replicates
• 元数据定义、发布和说明过程的自动化
网格计算- Grid Computing 肖侬
数据网格需求 – 5
• 与元数据的紧密集成 数据发现
• complex querying & browsing System, user-defined, domain-specific,
applicationAccess Control for Metadata
网格计算- Grid Computing 肖侬
数据网格需求 – 6
• 数据服务Web- accessibility (HTTP GET, WSDL, SOAP)数据存取和 API
• 定位数据物理位置根据参数构造搜索规则和匹配标准构造查询、分布式查询或者异构联邦查询从多个不通的数据源从抽取数据汇集成一个逻辑单一的数据集
代理操作 (security/access considerations)Bulk Operations - batch交互式操作和异步操作
网格计算- Grid Computing 肖侬
数据网格需求 – 7
• 数据管理操作管理横跨多域、异构的环境中的资源,保证
24x7x52 小时的高可用能力 . DGMS must ensure that data resource/data
resource content catalogues/registries are always available and the definitions in them are current, accurate, and consistent
must maintains referential integrity of DGMS data resources
确保复制数据资源目录的动态一致性
网格计算- Grid Computing 肖侬
数据网格需求 – 8• 虚拟管理
Single-point administrationAutonomous local controlMultiple-levels of administrations –
• Roles and ResponsiibilitiesPolicy Management
• Distributed Caching, Archiving, Replication & Data Placement
• Locking, Pinning, BackUp• Data Movement • Preferences, Priorities Administration• Auditing, Quotas, Pricing
网格计算- Grid Computing 肖侬
数据网格需求 – 9• 存储
Hierarchical Storage Systems, Tapes, Disks, SAN, NAS, NFS, Databases, FTP servers, HTTP servers, WSDL services, …
Integration on Device CharacteristicsStorage Bricks
• Distributed Cluster Storage
• 网络CharacteristicsNWSGuaranteed Service
网格计算- Grid Computing 肖侬
数据网格管理的体系结构• Heterogeneity Transparency
• Location Transparency
• Name Transparency
• Distribution Transparency
• Replication Transparency
• Ownership & Costing Transparency
网格计算- Grid Computing 肖侬
Open Grid Architecture
Data HandlingSystems
StorageResource
s
RemoteProcedureExecution
Data ModelManagement
Application
StorageSystem
Description
InformationDiscovery
DynamicInfo
Discovery
网格计算- Grid Computing 肖侬
Open Grid Architecture
Data HandlingSystems
StorageResources
RemoteProcedureExecution
DPSS, DFS, NFS,HPSS, ADSM, DMF, Unitree, NASstore, D
B2, Oracle, Informix, Sybase, O2, ObjectStore, Objectivity
Data ModelManagement
Application
StorageSystem
Description
InformationDiscovery
ArmadaD’agents,
FEL, ADRGRAM,
SRB
DynamicInfo
Discovery
(e.g., filtering)LDAP, Database, Flat file, Object database
Condor, GASS, NILE, SRB, I-2 caching,
ADR
GloPerf, Netlogger, NWS
DTD, ADR, object class
网格计算- Grid Computing 肖侬
Open Grid Architecture
Data HandlingSystems
StorageResources
API that provides“glue” to underlyingstorage, QoS, etc.[GASS, IBP, SRB]
RemoteProcedureExecution
DPSS, DFS, NFSHPSS, ADSM, DMF, Unitree, NASstore, DB2,
Oracle, Informix, Sybase, O2, ObjectStore, Objectivity
API that provides “glue” to underlying data handling systems (security, scheduling, QoS, access protocol, data format/model, adaptivity, info d
iscovery, location control)Data ModelManagement
Application
StorageSystem
Description
InformationDiscovery
ArmadaD’agents,
FEL, ADRGRAM,
SRB
+ authentication+ authorization
DynamicInfo
DiscoveryGloPerf, Netlogger
, NWS
(e.g., filtering)Condor, GASS, NILE,
SRB, I-2 caching, ADR
DTD, ADR, object class
LDAP, Database, Flat file, Object database
网格计算- Grid Computing 肖侬数据网格的概念空间
Storage Systems
Local Storage Manager
Data HandlingSystem
Storage Accessprotocol Storage
Properties
Authenticationprotocol
RemoteProxies
Caches
ComputePlatforms
PhysicalObject
Containers
ReplicasData
Model
Collection
Sub-Collection
LogicalObject
ManagementMetadata
GlobalNamespace
DescriptiveMetadata
ReplicaMetadata
ACLs
NetworkMetadata
Networks
ResourceMetadata
Resources
Data
Service
Protocols
Metadata
网格计算- Grid Computing 肖侬
2.2 虚拟数据集
Collection A{subColletion a{ File 1 File 2}subCollection b{}File 3File 4File 5…}
NTFSC:\D:\…
Collection A
5 a
3
4
2
NFSM1://bin/M2://proc/…
SybaseTable X …
Ext3/bin/usr
…
1
bMDS
网格计算- Grid Computing 肖侬
数据网格结构视图元数据目录 复制目录
Tape Library
Disk Cache
属性定义
逻辑 Collection 和逻辑文件名
Disk Array Disk Cache
应用
复制选择多个位置
NWSSelectedReplica
gsiftp 性能、信息 和预测
复制位置 1 复制位置 2 复制位置 3
MDS
网格计算- Grid Computing 肖侬
基于文件的数据网格系统-网格文件系统
网格计算- Grid Computing 肖侬
网格文件系统 - 背景
• 网格中的海量数据可能以任何格式保存在任何存储系统中
• 其中很多大容量数据仍保存在文件中• 海量数据是分布的、并处于多个管理域中
• 为了方便的统一地访问分布海量数据文件,需要一种标准的机制去描述和组织文件
网格计算- Grid Computing 肖侬
网格文件系统服务• 网格文件系统在网格中的文件系统上联邦和共享虚拟数据 Virtual hierarchical namespace with access permission and metadata Reliable POSIX-like I/O interfaces for the Grid file system
Grid File System
/grid
ggf jp
aist gtrc
file1 file3file2 file4
file1 file2
Virtual DirectoryTree
MappingReplica services
Data services
Grid File System services
网格计算- Grid Computing 肖侬
网格文件系统 - 研究内容• 网格文件系统( GFS )工作组制订两种规范:
网格文件系统目录服务网格文件系统服务体系结构
• 网格文件系统目录服务规范-描述和管理文件系统数据命名空间访问控制机制元数据定义和管理元数据信息服务
• 网格文件系统服务体系结构规范 Extends VFDS and File Access Services Provides reliable POSIX-like I/O interfaces for the Grid File
System• Virtual pathname, virtual filename
网格计算- Grid Computing 肖侬
网格文件系统体系结构
Data Sources
Grid File System Service (POSIX-like Interface)
Data Services
Virtual Directory Service(Management of virtualization)
Coordinated with other groups
Hierarchical Logical
Name space + ACL +
metadata
Applications (Astronomy, Physics, Life Science, business apps, . . .)browser
NFS/CIFS …
网格计算- Grid Computing 肖侬
网格文件系统的需求•逻辑层次命名空间•单一的存储接口•复制管理•数据访问和传输•延迟管理•元数据管理•安全•优化和性能改进• APIs
网格计算- Grid Computing 肖侬
逻辑层次命名空间
•逻辑名空间•层次化•在逻辑名上的 POSIX 操作•Soft links•单个逻辑名的文件汇集
网格计算- Grid Computing 肖侬
单一的存储接口
•访问 File systemsDatabase objects
•Interface to storage middleware•针对文件和数据库的公共接口机制 ( 争论性问题 )
网格计算- Grid Computing 肖侬
复制管理
•分布 /层次复制目录•复制的创建和管理•一致性管理•副本之间的负载平衡•文件 / 对象部分内容的复制
网格计算- Grid Computing 肖侬
数据访问和传输
•GridFTP support•其他传输机制-用户可选?•可靠的传输•并行 I/O
网格计算- Grid Computing 肖侬
延迟管理
• Streaming
• Disk Caching
• Pre-fetching
• Remote I/O proxies
• Bulk opeartions
网格计算- Grid Computing 肖侬
元数据管理• Metadata Catalog
HierarchicalDistributedFederation
• Metadata to be maintainedFile level (size, creation/modification/access time,
creator …)Storage metadataAccess control metadataProvenance metadata
• Metadata Consistency
网格计算- Grid Computing 肖侬
安全
• GSI 认证• 其它认证机制• 访问控制列表 ?
• Ownership
网格计算- Grid Computing 肖侬
优化
• 优化的副本选择• 批量操作• 预产生的服务实例• 其它优化技术和方式
网格计算- Grid Computing 肖侬
APIs
• File API (POSIX semantics)
• Object level API
• Web service API
网格计算- Grid Computing 肖侬
AVAKI Data Grid
网格计算- Grid Computing 肖侬
AVAKI 产品
• Leigon 系统从大学到商业公司• Legion 系统是网格系统软件和 Globus齐名的两大系统之一Virginia 大学开发以对象模型为基础开放性和层次性教 globus差
网格计算- Grid Computing 肖侬
AVAKI Grid Software – Compute and Data Grid Capabilities
User DepartmentsIT Departments Enterprise
Desktops ServerServer Shared DataCluster
Shared Data Sources
Partner
Shared OutputServer
Queuing System
Queuing System
Enterprise Users Unifies compute, data and application resources
Single, global namespace
Secure access
Simplified administration
Failure detection and restart
Partner Users
网格计算- Grid Computing 肖侬
AVAKI Data Grid
Enterprise
Desktops ServerServer Shared DataCluster
Shared Data SourcesShared OutputServer
Queuing System
Queuing System
联邦多个数据源提供对局部虚拟文件系统的数据访问 (DAS, NAS, SAN)
通过标准接口访问数据
局部缓冲数据
Partner
Enterprise Users Partner Users
User DepartmentsIT Departments
网格计算- Grid Computing 肖侬Avaki Data Grid – Data Mapped to the Global Namespace
User DepartmentsIT Departments Enterprise Partner
将各数据源位置的文件和目录到数据网格的目录和用户定义的名字
定义独立于物理位置的网格名空间(三层命名空间)跨平台、位置、防火墙、管理域和数据拥有者的统一数据视图
LinuxSolarisWindows
2000
网格计算- Grid Computing 肖侬
Avaki Data Grid – Access Data
User DepartmentsIT Departments Enterprise
Desktops ServerServer Shared DataCluster
Shared Data SourcesShared OutputServer
Queuing System
Queuing System
使用标准的 NFS 协议和 Avaki命令访问数据
使用用户定义的名字访问 Access based on specified privileges
Single log-on for shared data access
Aggressively caches data locally
Cached Copy
Cached Copy
Partner
Enterprise Users Partner Users
AVAKI Data Access Server AVAKI Data Access Server
网格计算- Grid Computing 肖侬
设计思想• 客户使用标准的 NSF接口进行访问
由于采用了全局视图和命名同时客户端使用了修改后的 NSFClient软件,
• 数据服务系统截取使用 NSF 访问协议的请求,对数据操作进行分析,进行名字解析、数据定位和协议转化,例如访问文件系统或 CINF文件系统等,
• 不同的文件存储系统进行不同的文件协议操作,将结果返回给客户。
网格计算- Grid Computing 肖侬
数据网格和 NFS 共享比较
网格计算- Grid Computing 肖侬
AVAKI 2.5 Data Grid Benefits• 不需要改动应用,按照用户典型访问数据的方式进行
• 无需关系地理位置、管理域和平台,简便的广域访问数据
• 提供一致性的最新合适数据访问• 消除用户创建和管理数据的多个拷贝• 缓冲远程数据以便高性能访问• 细粒度安全保护数据• 简便地数据管理
网格计算- Grid Computing 肖侬
Industry Problem: Increasing Cost and Complexity of Life Science Data-sharing
•Over 400 public Life Sciences databases
•8x growth in genomics data, last 18 months. And this is just the beginning …
Proteomics data: 1,000x multiplier2
Glycomics, new small molecule efforts
•Increasing scope of data diversityAnnotations (interactions)Organism-specificMolecule-specific (protein, sugar)Data-type-specific (gene expression)
•Increasingly complex data interrelationships
Genbank growth has continued to trendsharply upwards, as have many other classes of biological data
Increasingly-complex
interrelationships between
biological research
databases today
(LION graphic)
网格计算- Grid Computing 肖侬
Life Sciences Data Management
Public DB Public DB Public DB
•Multiple research groups, domains
•Each dept or site acquires & manages its own data
•Coherence issues among researchers
•Bandwidth costs•Multiple FTEs allocated to data management efforts
SEQ_3
ResearchBioinformatics
External Partner
SEQ_2SEQ_1
External PartnerPharmaceutical Company
APP 2APP 1
Location 2Location 3
Tape
Tape
FTP
CD
Web Portal
Varying Media
网格计算- Grid Computing 肖侬
Data Management Solution -- Using Avaki Data Grid
Partner Data
Public Data
Internal Data
Enterprise/Partner Sites
Multiple data sources
One authoritative copy
Consistent data across sites
Automated process eliminates manual and duplicated effort
Write-through cache supports sharing user-created data
“Data Access Problem” solved
Central IT
AvakiData
Cache
AvakiData
Cache
AvakiData
Cache
AvakiData
Cache
AvakiData
Cache
网格计算- Grid Computing 肖侬
Resource Availability and Access Problem
EnterprisePartner
Server
Queuing System
Cluster
1 2 n
Queuing System
Linux ClusterSolaris
Workstations
Servers
1 2 n
Queuing System
Linux ClusterWIN2000
WorkstationsServer
Some resources are at maximum capacity while other resources are underutilized
Different user interfaces
Multiple queuing systems
Multiple log-ons and complex UID management
Policy and security needs make sharing difficult
网格计算- Grid Computing 肖侬Resource Availability and Access
Solution – Using Avaki Compute Grid
Server
Queuing System
Cluster
Queuing System
Linux ClusterSolaris
Workstations
Servers
Queuing System
Linux ClusterWIN2000
WorkstationsServer
Local Usage Policies Local Usage Policies Local Usage Policies
Avaki Log-on/Commands
Load balanced across resources for improved utilization
Single log-on to run jobs
Single user interface
Single set of commands to access all resources
Usage policies make sharing easy and secure
网格计算- Grid Computing 肖侬
LHC Computing Grid Project - LCG
网格计算- Grid Computing 肖侬
What is CERN?
• CERN is the world's largest particle physics centre funded by 20 European member states
The special tools for particle physics are:
• ACCELERATORSHuge machines able to speed up particles to very high energies before colliding them into other particles
• DETECTORSMassive instruments which register the particles produced when the accelerated particles collide
网格计算- Grid Computing 肖侬
LHC data (simplified)
• 40 million collisions per second
• After filtering, 100 collisions of interest per second
• A Megabyte of digitised information for each collision = recording rate of 0.1 Gigabytes/sec
• 1011 collisions recorded each year = 10 Petabytes/year of data
CMS LHCb ATLAS ALICE
1 Megabyte (1MB)A digital photo
1 Gigabyte (1GB) = 1000MBA DVD movie
1 Terabyte (1TB)= 1000GBWorld annual book production
1 Petabyte (1PB)= 1000TB10% of the annual production by LHC experiments
1 Exabyte (1EB)= 1000 PBWorld annual information production
网格计算- Grid Computing 肖侬
Expected LHC computing needs
Estimated DISK Capacity at CERN
0
1000
2000
3000
4000
5000
6000
7000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
year
Tera
Byt
es
Estimated Mass Storage at CERN
LHC
Other experiments
0
20
40
60
80
100
120
140
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
Year
Pet
aByt
es
Estimated CPU Capacity at CERN
0
1,000
2,000
3,000
4,000
5,000
6,000
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
year
K S
I95 Moore’s law (based
on 2000 data)
Networking:10 – 40 Gb/s to all big centres
today
Data: ~15 Petabytes a yearProcessing: ~ 100,000 of today’s PC’s
网格计算- Grid Computing 肖侬
Computing at CERN today
• High-throughput computing based on reliable “commodity” technology• More than 1500 dual processor PCs • More than 3 Petabyte of data on disk (10%) and tapes (90%)
Nowhere near enough!
网格计算- Grid Computing 肖侬
Computing for LHC
• Problem: even with computer centre upgrade, CERN can only provide a fraction of the necessary resources
• Solution: computing centres, which were isolated in the
past, will now be connected, uniting the computing resources of particle physicists in the world using GRID technologies!
Europe: ~270 institutes~4500 users
Elsewhere: ~200 institutes~1600 users
网格计算- Grid Computing 肖侬
LHC Computing Grid Project• The LCG Project is a collaboration of –
The LHC experiments The Regional Computing Centres Physics institutes
.. working together to prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data
• This includes support for applications provision of common tools, frameworks, environment, data
persistency
• .. and the development and operation of a computing service exploiting the resources available to LHC experiments in computing
centres, physics institutes and universities around the world presenting this as a reliable, coherent environment for the
experiments
网格计算- Grid Computing 肖侬
LCG-1 components (schematic)
Computing cluster Network resources Data storage
Operating system Local schedulerFile system
User access Security Data transfer Information schema
Global scheduler Data managementInformation system
User interfaces Applications
Hardware
System software
“Passive” services
“Active” services
High level services
Closed system (?)Closed system (?) HPSS, CASTOR…HPSS, CASTOR…
RedHat LinuxRedHat Linux NFS, …NFS, … PBS, Condor, LSF,…PBS, Condor, LSF,…
VDT (Globus, GLUE)VDT (Globus, GLUE)
EU DataGridEU DataGrid
LCG, experimentsLCG, experiments
网格计算- Grid Computing 肖侬
Work Load Management System
Job Status
submitted
waiting
ready
scheduled
running
done
cleared
UIReplicaCatalog
Inform.Service
NetworkServer
Job Contr.-
CondorG
WorkloadManager
RB node
CE characts& status
SE characts& status
RBstorage
Match-Maker/ Broker
JobAdapter
Log Monitor
Logging &Bookkeeping
sandbox
Matching
Job Adapter
On CE
Processed
Output back
User done
Arrived on RBInput Sandbox is what you take with you to the node Output Sandbox is what you get back
Failed jobs are resubmitted
网格计算- Grid Computing 肖侬
LCG-1 Information System
RegionA1GIIS
RegionA2GIIS
BDII ALDAP
RegionB1GIIS
RegionB2GIIS
CE1GRIS
CE2GRIS
SE1GRIS
SE2GRIS
SiteCGIIS
CE1GRIS
CE2GRIS
SE1GRIS
SE2GRIS
SiteDGIIS
CE1GRIS
CE2GRIS
SE1GRIS
SE2GRIS
SiteAGIIS
CE1GRIS
CE2GRIS
SE1GRIS
SE2GRIS
SiteBGIIS
Query
Register
•Every site GIIS registers with >1 regional GIIS•BDII switches between regional GIISs in case one fails•Stale information problem handled by repopulating one ldap tree while serving from another•Switch transparent by switching off the TCP port during swaps (takes about 0.5 sec. every 10 min)•System can scale by adding more regions•Reliability more secondary GIISes/ region •Every site with RBs has a BDII
secondary
primary
/dataCurrent/.. /dataNew/..
BDIILDAP
Swap&Restart
Populate
RB
网格计算- Grid Computing 肖侬
Moving Data, Finding Data• GridFTP (very basic tool)
Version of parallel ftp with GSI security (gsiftp)• Interface to storage system via SRM (Storage Resource Manager)
Handles things like migrating data, to from MSS, file pinning, etc. Abstract interface to storage subsystems
• EDG-RLS (Replica Location Service) Keeps track of where files are Composed of two catalogues
•Replica Metadata Catalog RMC and Local Replica Catalog (LRC)
•Provides mappings between logical file names and locations of the data (SURL)
•Design distributed, currently one instance/VO at CERN
• EDG-RM, (Replica Manager) moving files, creating replications• Transparent access to files by user via GFAL (GRID File Access
Lib)
网格计算- Grid Computing 肖侬
Data Management in LCG-2 • components:
Grid File Access Library (GFAL) Replica Location Service Replica Management tools Gridftp SRM as interface to storage
• And … dCache etc as disk-pool managers
• Strategy Integrate GFAL with SRM and RLS and
interface to Castor and Enstore/dCache Deploy SRM as standard MSS interface
(tape/disk)
• SRM is a (first?) real grid interface to a service – LCG will push this view
Goal:Goal: Provide transparent (location independent) access to storage via a POSIX I/O layer
网格计算- Grid Computing 肖侬GFAL – Functional View
MSSService
LocalDisk
Wide AreaAccess
Physics Application
Replica CatalogClient
SRM Client
LocalFile I/O
rfio I/Oopen()read()etc.
dCap I/Oopen()read()etc.
Grid File Access Library (GFAL)
SRMService
dCapService
rfioService
RCServices
POSIX I/O
VFS
root I/Oopen()read()etc.
Root I/O Service
POOL
Information
ServicesClient
MDS
gridFTPService
网格计算- Grid Computing 肖侬
What’s Next? - The EU EGEE (Enabling Gri
ds for E-science in Europe) Project
2004 - 2007 年
网格计算- Grid Computing 肖侬
EGEE Middleware Activity (I)
• Hardening and re-engineering of existing middleware functionality,
• Key services: Resource AccessData Management (CERN)Information Collection and
Accounting (UK)Resource Brokering (Italy)Quality Assurance (France)Grid Security (Northern Europe)Middleware Integration (CERN)Middleware Testing (CERN)
网格计算- Grid Computing 肖侬
总结• The LHC Computing challenge requires grid
computing (or similar) solutions
• LCG has made progress deploying a prototype grid computing environment
• Federating grid infrastructures is becoming a necessity to provide access to resources needed by applications
• But: the technology is still very immature – have to focus on making the underlying technology robust and production quality
• There is still a lot to be done!