14
CERN LCG2 development planning LCG2 development planning Zdenek Sekera/IT-GD-CT

LCG2 development planning

  • Upload
    yanka

  • View
    37

  • Download
    2

Embed Size (px)

DESCRIPTION

LCG2 development planning. Zdenek Sekera/IT-GD-CT. Outline (1). Release process certification testbed, activities monthly releases, procedures “Aug” release, content, destination, release note Different O/S support HW & SW setup RH7.3 SLC3 for IA32 & IA64 others “September” release - PowerPoint PPT Presentation

Citation preview

Page 1: LCG2 development planning

CERN

LCG2 development planningLCG2 development planning

Zdenek Sekera/IT-GD-CT

Page 2: LCG2 development planning

CERN

Outline (1)Outline (1)

• Release process– certification testbed, activities– monthly releases, procedures– “Aug” release, content, destination, release note

• Different O/S support– HW & SW setup– RH7.3– SLC3 for IA32 & IA64– others

• “September” release– RH7.3 fixes, features– Special software: dCache, Tank&Spark, accounting– New: SLC3 release, installation manual, full interoperability with

RH7.3

Page 3: LCG2 development planning

CERN

Outline (2)Outline (2)

• Data Management– File Catalog– DPM (Disk Pool Management)

• LCG and EGEE– migration from LCG2 to EGEE ?– preproduction testbed

Page 4: LCG2 development planning

CERN

Release process – C&T testbedRelease process – C&T testbed

• RH7.3 C&T is constantly evolving to reflect our activities– new BDIIs– dCache– added SLC3 cluster of WNs

• We have a mini-SLC3 testbed running– this will evolve into a fully featured SLC3 TB, waiting for

machines from FIO

• When appropriate we will connect both RH7.3 and SLC3 TBs to certify the full interoperability

Page 5: LCG2 development planning

CERN

1911RB_a

1912BDII_a

1766CE_a

1767SE_a

1913RB_b

1914BDII_b

1905CE_b

0738CE_2_

a0739SE_2_

a

1915RB_3

1916BDII_3

1758CE_3_

a1759SE_3_

a1540CE_4

1541SE_4

1754WN_a1

1906WN_b1

0741WN_2_a1

1761WN_3_a1

1762WN_3_a2

1764WN_3_a4

rlscert02RLS_oracle

Cluster_1 Cluster_2 Cluster_3 Cluster_4

1523UI_1

1524UI_4

1543CE_5Condor

1544WN_5_1

Cluster_5

lxs5243

CE_6LSF

Cluster_6

Certification & Testing Testbed

1757MyPro

xy

1755WN_a2

0742WN_2_a2

1752pool

dcache

1907WN_b2

1908WN_b3

1765WN_3_a5

0743WN_2_a3

0744WN_2_a4

1763WN_3_a3

1542WN_4_1

lxs5238

NO home sharing

NO home sharingNO home sharing

0740SE_2_

bdcache

lxs5239

lxs5240

lxs5241

lxs5242

1753SE_cdcache

1909SE_dCastor

1760SE_3_

bCastor

1751UI

NO home sharing

1539MON_

a

1538PlainG

ris

lxb0731

CE_7

733RB + BDII

730WN

lxs5243

CE_6LSF

lxs5238

lxs5239

lxs5240

lxs5241

lxs5242

1743WN_4_1

1742WN_4_1

NO home sharing

sl3 wn

0732UI_7

SL3 clust

734SE

733RB + BDII

Cluster_7

Page 6: LCG2 development planning

CERNCertification, Testing and Release CycleCertification, Testing and Release Cycle

Certification testbed Deployment

LCGC&T sectionadd featuresfix problems

transmit problems

EGEEfix problemsnew releases

VDTfix problemsnew releases

Integrate

BasicFunctionality

Tests

errors?

errors?

yes

no

yes

fix problems

RunC&T test suitessite test suites

no

RunCertification Matrix

errors?yes

EXPERIMENTSINTEGRATION

TESTBED

Release Candidatetagged

errors?

no

yesno

fix problems

candidate not acceptable

RE

LE

AS

EP

RE

-DE

PL

OY

ME

NT

cer

tifi

ed

rel

eas

e t

agg

ed

deployment feedback

GE

NE

RA

L R

EL

EA

SE

Page 7: LCG2 development planning

CERN

Release process – monthly releasesRelease process – monthly releases

• We have taken decision (about 5 months ago) to release monthly– smaller, more maintainable increments– hopefully more predictable– easier to manage– CT release note

• We (CT section) are releasing for the GD-GIS (Grid Infrastructure), not really for the public– GIS releases to the public– last verifications of the release, independent from CT– adding some wrappers if needed– GIS may skip a CT release if judged “too internal” without any

visible benefit to sites. This may happen when most of the changes in the release were internal, to prepare perhaps for a bigger change in the future.

• GIS releases for the public– decides the version number etc

Page 8: LCG2 development planning

CERN

Release process – “Aug” releaseRelease process – “Aug” release

• CT distributes an email describing a overview of the release and attaches to it Release Note that describes all changes in details (example of both from the August release are attached to the agenda of this meeting)– internally to GD group– to LCG management– to FIO

Page 9: LCG2 development planning

CERN

Different O/S supportDifferent O/S support

• We have asked at EGEE meeting in Cork what would be the “other” O/S’es sites would be interested in– no input from CICs nor ROCs

• We are closely collaborating with irish group porting LCG2 to IRIX, AIX, perhaps other Unix’es

• Internally we have taken a decision to go ahead with SLC3 port because it is used by big labs (CERN, FNAL, …)

• We have added SLC3 WNs to the RH7.3 TB to test the basic interoperability

• We have installed a full SLC3 mini-TB, it works, lots of rough edges in particular during the installation

• Waiting for number of machines from the FIO to complete the SLC3 testbed (~40 machines) for full certification

• When SLC3 is certified, we’ll connect it to the RH7.3 and certify the full interoperability

Page 10: LCG2 development planning

CERN

Different O/S support – “other”Different O/S support – “other”

• support SLC3 in two flavors– IA32, first– IA64, asap, IA64 port has is mostly beeing done by OpenLAB,

we’ll integrate their changes in the CVS tree, build machine is almost ready, when SLC3 IA64 port is certified, the IA64 TB will be connected to the RH7.3 + SLC3 IA32 for certification.

• port to other UNIX’es (IRIX, AIX, ??) is being done by Irish group, when they are ready we’ll investigate how to certify the port and interoperability with other ports in due time (remember, we do not have the necessary HW).

• we would be very interested what other ports are needed, waiting for some summary of requirements from CICs and ROCs – do we need Fedora Core?– others?– who can tell us what is needed?

Page 11: LCG2 development planning

CERN

““Sep” releaseSep” release

• After every release we’ll get together and review all new and outstanding bugs and requirements and broadly define the expected contents of the next release

• For the September release:– full support of RH7.3 will continue, bug fixes– we’ll try again to integrate dCache, number of reported

problems have apparently been fixed, need to verify this– certify accounting package we received from GOC– certify the Tank&Spark expts software installation tool– site GIIS replaced by BDII (already happening)– moving to Torque and Maui scheduler replacing PBS– full SLC3 support

Page 12: LCG2 development planning

CERN

After “Sep” releasesAfter “Sep” releases

• In the pipeline for later – new info provider (generic info provider with caching to avoid

site occasionally dropping out of BDII)– should find a solution to avoid BDII restarting every 2 mins– WP1 still chasing some bugs (e.g. error 155)– WP1 BDII “travelling” with the job (through env variable), this

will solve some RM problems, needs GFAL cooperation– DPM (disk pool manager), SRM 1.1 + 2.1 support– File Catalogs, many features requested by experiments, big

performance improvements (see CHEP talk by J.-P.Baud and J.Casey)

– lcgutils – improve error messages, introduce more retries and timeouts

Page 13: LCG2 development planning

CERN

Migration to EGEE softwareMigration to EGEE software

• Discussing possible scenarios– replacing modules with EGEE versions whenever feasible

• new RB supposed to support both push & pull models, could we run the old (push) in parallel with new? Need new CE, …

– running both softwares (modules) in parallel– or do we need to separate both on different testbeds and

merge them together later ?

• installing the EGEE pre-production testbed– to test EGEE software– to understand the needs for migration– later to run pre-release versions of EGEE software before

installing it in production environment

Page 14: LCG2 development planning

CERNEGEE Certification, Testing and Release CycleEGEE Certification, Testing and Release Cycle

CERTIFICATIONTESTING SERVICES

Integrate

BasicFunctionality

Tests

Run testsC&T suitesSite suites

RunCertification

Matrix

Releasecandidate

tag

RE

LE

AS

EP

RE

-PR

OD

UC

TIO

N

PR

OD

UC

TIO

N

EXPTSINTEGR

Certifiedrelease

tag

DE

VE

LO

PM

EN

T &

IN

TE

GR

AT

ION

UN

IT &

FU

NC

TIO

NA

L T

ES

TIN

G

DevTag

JRA1

LHCEXPTS

MEDICAL

OTHERTBD

APPSSW

Installation

DE

PL

OY

ME

NT

PR

EP

AR

AT

ION

Deploymentrelease

tag

DEPLOY

SA1

Productiontag