Upload
yanka
View
37
Download
2
Embed Size (px)
DESCRIPTION
LCG2 development planning. Zdenek Sekera/IT-GD-CT. Outline (1). Release process certification testbed, activities monthly releases, procedures “Aug” release, content, destination, release note Different O/S support HW & SW setup RH7.3 SLC3 for IA32 & IA64 others “September” release - PowerPoint PPT Presentation
Citation preview
CERN
LCG2 development planningLCG2 development planning
Zdenek Sekera/IT-GD-CT
CERN
Outline (1)Outline (1)
• Release process– certification testbed, activities– monthly releases, procedures– “Aug” release, content, destination, release note
• Different O/S support– HW & SW setup– RH7.3– SLC3 for IA32 & IA64– others
• “September” release– RH7.3 fixes, features– Special software: dCache, Tank&Spark, accounting– New: SLC3 release, installation manual, full interoperability with
RH7.3
CERN
Outline (2)Outline (2)
• Data Management– File Catalog– DPM (Disk Pool Management)
• LCG and EGEE– migration from LCG2 to EGEE ?– preproduction testbed
CERN
Release process – C&T testbedRelease process – C&T testbed
• RH7.3 C&T is constantly evolving to reflect our activities– new BDIIs– dCache– added SLC3 cluster of WNs
• We have a mini-SLC3 testbed running– this will evolve into a fully featured SLC3 TB, waiting for
machines from FIO
• When appropriate we will connect both RH7.3 and SLC3 TBs to certify the full interoperability
CERN
1911RB_a
1912BDII_a
1766CE_a
1767SE_a
1913RB_b
1914BDII_b
1905CE_b
0738CE_2_
a0739SE_2_
a
1915RB_3
1916BDII_3
1758CE_3_
a1759SE_3_
a1540CE_4
1541SE_4
1754WN_a1
1906WN_b1
0741WN_2_a1
1761WN_3_a1
1762WN_3_a2
1764WN_3_a4
rlscert02RLS_oracle
Cluster_1 Cluster_2 Cluster_3 Cluster_4
1523UI_1
1524UI_4
1543CE_5Condor
1544WN_5_1
Cluster_5
lxs5243
CE_6LSF
Cluster_6
Certification & Testing Testbed
1757MyPro
xy
1755WN_a2
0742WN_2_a2
1752pool
dcache
1907WN_b2
1908WN_b3
1765WN_3_a5
0743WN_2_a3
0744WN_2_a4
1763WN_3_a3
1542WN_4_1
lxs5238
NO home sharing
NO home sharingNO home sharing
0740SE_2_
bdcache
lxs5239
lxs5240
lxs5241
lxs5242
1753SE_cdcache
1909SE_dCastor
1760SE_3_
bCastor
1751UI
NO home sharing
1539MON_
a
1538PlainG
ris
lxb0731
CE_7
733RB + BDII
730WN
lxs5243
CE_6LSF
lxs5238
lxs5239
lxs5240
lxs5241
lxs5242
1743WN_4_1
1742WN_4_1
NO home sharing
sl3 wn
0732UI_7
SL3 clust
734SE
733RB + BDII
Cluster_7
CERNCertification, Testing and Release CycleCertification, Testing and Release Cycle
Certification testbed Deployment
LCGC&T sectionadd featuresfix problems
transmit problems
EGEEfix problemsnew releases
VDTfix problemsnew releases
Integrate
BasicFunctionality
Tests
errors?
errors?
yes
no
yes
fix problems
RunC&T test suitessite test suites
no
RunCertification Matrix
errors?yes
EXPERIMENTSINTEGRATION
TESTBED
Release Candidatetagged
errors?
no
yesno
fix problems
candidate not acceptable
RE
LE
AS
EP
RE
-DE
PL
OY
ME
NT
cer
tifi
ed
rel
eas
e t
agg
ed
deployment feedback
GE
NE
RA
L R
EL
EA
SE
CERN
Release process – monthly releasesRelease process – monthly releases
• We have taken decision (about 5 months ago) to release monthly– smaller, more maintainable increments– hopefully more predictable– easier to manage– CT release note
• We (CT section) are releasing for the GD-GIS (Grid Infrastructure), not really for the public– GIS releases to the public– last verifications of the release, independent from CT– adding some wrappers if needed– GIS may skip a CT release if judged “too internal” without any
visible benefit to sites. This may happen when most of the changes in the release were internal, to prepare perhaps for a bigger change in the future.
• GIS releases for the public– decides the version number etc
CERN
Release process – “Aug” releaseRelease process – “Aug” release
• CT distributes an email describing a overview of the release and attaches to it Release Note that describes all changes in details (example of both from the August release are attached to the agenda of this meeting)– internally to GD group– to LCG management– to FIO
CERN
Different O/S supportDifferent O/S support
• We have asked at EGEE meeting in Cork what would be the “other” O/S’es sites would be interested in– no input from CICs nor ROCs
• We are closely collaborating with irish group porting LCG2 to IRIX, AIX, perhaps other Unix’es
• Internally we have taken a decision to go ahead with SLC3 port because it is used by big labs (CERN, FNAL, …)
• We have added SLC3 WNs to the RH7.3 TB to test the basic interoperability
• We have installed a full SLC3 mini-TB, it works, lots of rough edges in particular during the installation
• Waiting for number of machines from the FIO to complete the SLC3 testbed (~40 machines) for full certification
• When SLC3 is certified, we’ll connect it to the RH7.3 and certify the full interoperability
CERN
Different O/S support – “other”Different O/S support – “other”
• support SLC3 in two flavors– IA32, first– IA64, asap, IA64 port has is mostly beeing done by OpenLAB,
we’ll integrate their changes in the CVS tree, build machine is almost ready, when SLC3 IA64 port is certified, the IA64 TB will be connected to the RH7.3 + SLC3 IA32 for certification.
• port to other UNIX’es (IRIX, AIX, ??) is being done by Irish group, when they are ready we’ll investigate how to certify the port and interoperability with other ports in due time (remember, we do not have the necessary HW).
• we would be very interested what other ports are needed, waiting for some summary of requirements from CICs and ROCs – do we need Fedora Core?– others?– who can tell us what is needed?
CERN
““Sep” releaseSep” release
• After every release we’ll get together and review all new and outstanding bugs and requirements and broadly define the expected contents of the next release
• For the September release:– full support of RH7.3 will continue, bug fixes– we’ll try again to integrate dCache, number of reported
problems have apparently been fixed, need to verify this– certify accounting package we received from GOC– certify the Tank&Spark expts software installation tool– site GIIS replaced by BDII (already happening)– moving to Torque and Maui scheduler replacing PBS– full SLC3 support
CERN
After “Sep” releasesAfter “Sep” releases
• In the pipeline for later – new info provider (generic info provider with caching to avoid
site occasionally dropping out of BDII)– should find a solution to avoid BDII restarting every 2 mins– WP1 still chasing some bugs (e.g. error 155)– WP1 BDII “travelling” with the job (through env variable), this
will solve some RM problems, needs GFAL cooperation– DPM (disk pool manager), SRM 1.1 + 2.1 support– File Catalogs, many features requested by experiments, big
performance improvements (see CHEP talk by J.-P.Baud and J.Casey)
– lcgutils – improve error messages, introduce more retries and timeouts
CERN
Migration to EGEE softwareMigration to EGEE software
• Discussing possible scenarios– replacing modules with EGEE versions whenever feasible
• new RB supposed to support both push & pull models, could we run the old (push) in parallel with new? Need new CE, …
– running both softwares (modules) in parallel– or do we need to separate both on different testbeds and
merge them together later ?
• installing the EGEE pre-production testbed– to test EGEE software– to understand the needs for migration– later to run pre-release versions of EGEE software before
installing it in production environment
CERNEGEE Certification, Testing and Release CycleEGEE Certification, Testing and Release Cycle
CERTIFICATIONTESTING SERVICES
Integrate
BasicFunctionality
Tests
Run testsC&T suitesSite suites
RunCertification
Matrix
Releasecandidate
tag
RE
LE
AS
EP
RE
-PR
OD
UC
TIO
N
PR
OD
UC
TIO
N
EXPTSINTEGR
Certifiedrelease
tag
DE
VE
LO
PM
EN
T &
IN
TE
GR
AT
ION
UN
IT &
FU
NC
TIO
NA
L T
ES
TIN
G
DevTag
JRA1
LHCEXPTS
MEDICAL
OTHERTBD
APPSSW
Installation
DE
PL
OY
ME
NT
PR
EP
AR
AT
ION
Deploymentrelease
tag
DEPLOY
SA1
Productiontag