22
October,10-14 2005 HEPiX Fall 2005 at SLAC Site Report Site Report Roberto Gomezel Roberto Gomezel INFN INFN

HEPiX Fall 2005 - INFN Site Report · –VIRGO –CDF –BABAR –AMS, ... •Electric power –220 V mono-phase (computers) ... HEPiX Fall 2005 at SLAC Storage status

  • Upload
    lamnhi

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

October,10-14 2005

HEPiX Fall 2005 at SLAC

Site ReportSite ReportRoberto GomezelRoberto Gomezel

INFN INFN

October,10-14 2005

HEPiX Fall 2005 at

SLAC 22

Outline of PresentationOutline of Presentation• New Computing New Computing

CommitteeCommittee

• Computing EnvironmentComputing Environment

• SecuritySecurity

• ServicesServices

• NetworkNetwork

• AFS AFS

• INFN FarmsINFN Farms

• Tier1@CNAFTier1@CNAF

October,10-14 2005

HEPiX Fall 2005 at

SLAC 33

New Computing and Networking New Computing and Networking CommitteeCommittee• Last June the previous Computing and Networking Last June the previous Computing and Networking

committee expired so a new one was formedcommittee expired so a new one was formed• Mauro Morandin (INFN-Padova) is the new chairman of Mauro Morandin (INFN-Padova) is the new chairman of

the committee and some members have been replaced the committee and some members have been replaced • This committee has been charged with the following This committee has been charged with the following

explicit mission:explicit mission:– To coordinate implementation of computing farm with particular To coordinate implementation of computing farm with particular

regard to LHC Tier-1, Tier-2 and Tier-3regard to LHC Tier-1, Tier-2 and Tier-3– To participate to national and international coordination To participate to national and international coordination

committees focused on topics related to CNC interestscommittees focused on topics related to CNC interests– To promote innovation and technological coordination of To promote innovation and technological coordination of

computing and networking of INFN sitescomputing and networking of INFN sites– To coordinate and to finance technological development and To coordinate and to finance technological development and

maintenance of computing resourcesmaintenance of computing resources

October,10-14 2005

HEPiX Fall 2005 at

SLAC 44

Computing Environment and securityComputing Environment and security

• Most of boxes are PCs running Linux or Most of boxes are PCs running Linux or Windows Windows

• Mac OS boxes keep on livingMac OS boxes keep on living• VPNs available in many sites VPNs available in many sites

– Cisco and Netscreen boxes using IPsecCisco and Netscreen boxes using IPsec– SSL VPNs are currently used by some sites SSL VPNs are currently used by some sites

• Interested results at LNF using Cisco VPN ConcentratorInterested results at LNF using Cisco VPN Concentrator

• Network SecurityNetwork Security– Dedicated Firewall machines just in a few sitesDedicated Firewall machines just in a few sites– Implemented with access lists on router Implemented with access lists on router

connected to WANconnected to WAN

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 55

DesktopDesktop

• PCs running Linux, Windows and Mac OSPCs running Linux, Windows and Mac OS

• SL and SLC are equally used SL and SLC are equally used

• A few sites use Caspur BigBox releaseA few sites use Caspur BigBox release

• Some units are taking advantage of Some units are taking advantage of outsource support for windows desktop outsource support for windows desktop environment because of lack of personnel environment because of lack of personnel

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 66

BackupBackup• Tape Libraries used:Tape Libraries used:

– IBM Magstar – just used at LNFIBM Magstar – just used at LNF– DLT, LTO2 – wide spreadDLT, LTO2 – wide spread– LTO3 will naturally replace LTO2 drives in the LTO3 will naturally replace LTO2 drives in the

next futurenext future

• Backup tools:Backup tools:– IBM Tivoli – quite usedIBM Tivoli – quite used– HP Omniback – quite usedHP Omniback – quite used– Atempo Time Navigator – just a few sitesAtempo Time Navigator – just a few sites– Domestic tools - widespreadDomestic tools - widespread

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 77

Wireless LANWireless LAN

• Access point running standard 802.11a,b,gAccess point running standard 802.11a,b,g

• All sites are using wireless connection during meeting or All sites are using wireless connection during meeting or conferencesconferences

• Most of them use it to give connection to laptop computers Most of them use it to give connection to laptop computers

• A specific working group keeps on investigating in order to A specific working group keeps on investigating in order to provide a common solution to solve security issuesprovide a common solution to solve security issues – To go beyond the permission based on Secure Port To go beyond the permission based on Secure Port

filtering (MAC Address) – (very poor)filtering (MAC Address) – (very poor)• 802.1X is a 802.1X is a good solution but it is not implemented and working d solution but it is not implemented and working

well on all platforms in usewell on all platforms in use – To investigate the standard 802.16 (WiMAX)To investigate the standard 802.16 (WiMAX)

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 88

E-mailE-mail• Mail Transfer AgentMail Transfer Agent

– Sendmail – widespread and more used (70%)Sendmail – widespread and more used (70%)– Postfix – a few sites (30%) (increased if compared with Postfix – a few sites (30%) (increased if compared with

last report to confirm the trend reported last year)last report to confirm the trend reported last year)

• Hardware and OSHardware and OS

INFN Site Report – R.Gomezel

14%

9%

60%

17%

Alpha

Solaris

Intel/Linux

Intel/BSD

October,10-14 2005

HEPiX Fall 2005 at

SLAC 99

E-mail user agentE-mail user agent• All INFN sites provide an HTTP mail user All INFN sites provide an HTTP mail user

agent agent – IMPIMP– SQUIRREL (increased use is due to its light SQUIRREL (increased use is due to its light

impact and good response time)impact and good response time)– Others:Others:

• IMHO, Open WebMail, Cyrus+Roxen…IMHO, Open WebMail, Cyrus+Roxen…

• Other mail user agents commonly used: Other mail user agents commonly used: – Pine, Internet Explorer, Mozilla, Thunderbird…Pine, Internet Explorer, Mozilla, Thunderbird…

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1010

E-mail antispam E-mail antispam • The last Computing and Networking Committee decided to The last Computing and Networking Committee decided to

subscribe a nation wide license for using Sophos as subscribe a nation wide license for using Sophos as common tool to reduce junk e-mail and to provide antivirus common tool to reduce junk e-mail and to provide antivirus control control

• Some sites used RAV or SPAM AssassinSome sites used RAV or SPAM Assassin• By the end of this year every site is supposed to move to By the end of this year every site is supposed to move to

Sophos not only for the pure message functionality but also Sophos not only for the pure message functionality but also as antivirus tool for PCsas antivirus tool for PCs

• Only authorized mail relays are allowed to send and receive Only authorized mail relays are allowed to send and receive mail for a specific sitemail for a specific site

• An increasing number of sites are An increasing number of sites are filteringfiltering outbound outbound connections on connections on portport 2525 to prevent users from sending to prevent users from sending viruses viruses unconsciously

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1111

INFN networkINFN network• LAN backbone network mainly based LAN backbone network mainly based

on Gigabit Ethernet on Gigabit Ethernet – 10 Gbit Ethernet switches used in 10 Gbit Ethernet switches used in

computing farmcomputing farm

• The The INFNINFN WAN network is completely WAN network is completely integrated into the GARR, providing a integrated into the GARR, providing a backbone connectivity at 54 Gbpsbackbone connectivity at 54 Gbps– POP typical access bandwidth for INFN sites: 34Mbps, 155 POP typical access bandwidth for INFN sites: 34Mbps, 155

Mbps, 622 Mbps and Gigabit EthernetMbps, 622 Mbps and Gigabit Ethernet– CNAF Tier-1 will be connected at 10Gbps soonCNAF Tier-1 will be connected at 10Gbps soon– There are still just a few small research groups connected There are still just a few small research groups connected

via multiple 2Mbps links because of lack of efficient via multiple 2Mbps links because of lack of efficient telecommunication infrastructuretelecommunication infrastructure

– Access to GEANT2: N * 10Gbps links soonAccess to GEANT2: N * 10Gbps links soon

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1212

AFSAFS• INFN sites keep on using AFS services to share data and software INFN sites keep on using AFS services to share data and software

throughout sitesthroughout sites

• Local cells have completely moved or are moving to Linux boxes Local cells have completely moved or are moving to Linux boxes running OpenAFS softwarerunning OpenAFS software

• The migration of INFN.IT authentication servers from Kerberos IV to The migration of INFN.IT authentication servers from Kerberos IV to Kerberos V was accomplished last JuneKerberos V was accomplished last June– A Kerberos V master server has been installed on a Linux A Kerberos V master server has been installed on a Linux

machine: k5.infn.itmachine: k5.infn.it– The former 3 AFS authentication servers (CNAF ,Naples and The former 3 AFS authentication servers (CNAF ,Naples and

Rome) have been reconfigured as Kerberos V slave serversRome) have been reconfigured as Kerberos V slave servers

• K5 WG is now working in order to test the usage of trust K5 WG is now working in order to test the usage of trust relationship authentication between different INFN cellsrelationship authentication between different INFN cells

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1313

INFN Site Farm: updateINFN Site Farm: update• A lot of sites are configuring and integrating computing facilities A lot of sites are configuring and integrating computing facilities

and local experiment-specific farm into a unique computing farmand local experiment-specific farm into a unique computing farm• Widespread deployment of SAN infrastructure to connect Widespread deployment of SAN infrastructure to connect

storage systems and computing unitsstorage systems and computing units– GPFS file system is becoming the most adopted as an efficient way GPFS file system is becoming the most adopted as an efficient way

of providing a cluster file system and volume managerof providing a cluster file system and volume manager– The increasing usage allows people to have support from other The increasing usage allows people to have support from other

sites when problems arisesites when problems arise– Even though Tier-1 is evaluating to move to Lustre because of lack Even though Tier-1 is evaluating to move to Lustre because of lack

of support from IBM on GPFS within a heterogeneous environmentof support from IBM on GPFS within a heterogeneous environment• There is an increasing use of LSF as tool for submitting jobs to There is an increasing use of LSF as tool for submitting jobs to

computing farm using different queues computing farm using different queues – Server license hosted at CNAF – Tier1 Server license hosted at CNAF – Tier1 – Incoming sites can take advantage of the increasing experience Incoming sites can take advantage of the increasing experience

coming from Tier1 and other units like Padua, Pisa and Cataniacoming from Tier1 and other units like Padua, Pisa and Catania

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1414

Storage WG Storage WG • The last CNC promoted the creation of a storage working The last CNC promoted the creation of a storage working

groupgroup• This group has been working since march 2005This group has been working since march 2005• Main tasksMain tasks

– To evaluate the opportunity of using Fibre Channel technology To evaluate the opportunity of using Fibre Channel technology as common infrastructure for Computing Facility at each siteas common infrastructure for Computing Facility at each site

– To investigate on the most common distributed file systems To investigate on the most common distributed file systems available evaluating performance and reliabilityavailable evaluating performance and reliability• With particular regard to the startup of next Tier-2 farmsWith particular regard to the startup of next Tier-2 farms

– To keep in touch with the HEPiX Storage task force activity To keep in touch with the HEPiX Storage task force activity – To take into account the impact of GRID requirements on To take into account the impact of GRID requirements on

storage file systemstorage file system• First status report at CNC meeting next weekFirst status report at CNC meeting next week

INFN Site Report – R.Gomezel

October,10-14 2005

HEPiX Fall 2005 at

SLAC

TIER-1@CNAF Status Report: IntroductionTIER-1@CNAF Status Report: Introduction

• Location: INFN-CNAF, Bologna (Italy)Location: INFN-CNAF, Bologna (Italy)–one of the main nodes of GARR networkone of the main nodes of GARR network

• Computing facility for INFN HNEP communityComputing facility for INFN HNEP community–Partecipating to LCG, EGEE, INFNGRID projectsPartecipating to LCG, EGEE, INFNGRID projects

• Multi-Experiment TIER1Multi-Experiment TIER1–LHC experimentsLHC experiments–VIRGOVIRGO–CDFCDF–BABARBABAR–AMS, MAGIC, ARGO, PAMELA,…AMS, MAGIC, ARGO, PAMELA,…

• Resources assigned to experiments on a yearly basis.Resources assigned to experiments on a yearly basis.

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

InfrastructureInfrastructure• Hall in the basement (-2Hall in the basement (-2ndnd floor): ~ 1000 m floor): ~ 1000 m22 of of

total spacetotal space–Easily accessible with lorries from the roadEasily accessible with lorries from the road–Not suitable for office use (remote control)Not suitable for office use (remote control)

• Electric powerElectric power–220 V mono-phase (computers)220 V mono-phase (computers)

• 4 x 16A PDU needed for 3.0 GHz Xeon racks4 x 16A PDU needed for 3.0 GHz Xeon racks–380 V three-phase for other devices (tape libraries, air 380 V three-phase for other devices (tape libraries, air

conditioning etc…)conditioning etc…)–UPS: 800 KVA (~ 640 KW)UPS: 800 KVA (~ 640 KW)

• needs a separate room (conditioned and ventilated).needs a separate room (conditioned and ventilated).–Electric Generator: 1250 KVA (~ 1000 KW)Electric Generator: 1250 KVA (~ 1000 KW)

up to 160 racks (~100 with 3.0 GHz Xeon)up to 160 racks (~100 with 3.0 GHz Xeon)Expansion under evaluationExpansion under evaluation

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

HW HW Resources (1/2)Resources (1/2)

• CPU:CPU:– 700 biprocessor boxes 2.4 – 3 GHz (+70 servers)700 biprocessor boxes 2.4 – 3 GHz (+70 servers)

– 150 new Opteron biprocessor boxes 2.6 GHz150 new Opteron biprocessor boxes 2.6 GHz • 1300 KSi2k Total1300 KSi2k Total• Decommissioning ~ 100 WNs (~ 150 KSi2K) moved to test Decommissioning ~ 100 WNs (~ 150 KSi2K) moved to test

farmfarm– Each CPU equipped with FE switch with 2xGb uplinks to core Each CPU equipped with FE switch with 2xGb uplinks to core

switchswitch

• Disk:Disk:– FC, IDE, SCSI, NAS technologiesFC, IDE, SCSI, NAS technologies– 470 TB raw (~ 430 FC-SATA) 470 TB raw (~ 430 FC-SATA) – Disk servers connected via GE to core switchDisk servers connected via GE to core switch

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC 1818

HW Resources (2/2)HW Resources (2/2)

• Tapes:Tapes:– Stk L180 18 TBStk L180 18 TB– Stk 5500 Stk 5500

• 6 LTO-2 with 2000 tapes 6 LTO-2 with 2000 tapes 400 TB 400 TB• 2 9940B with 800 tapes 2 9940B with 800 tapes 200 TB 200 TB

• NetworkingNetworking– 30 rack switches 30 rack switches 46 FE UTP + 2 GE FO 46 FE UTP + 2 GE FO– 2 core switches 2 core switches 96 GE FO + 120 GE FO + 4x10 GE 96 GE FO + 120 GE FO + 4x10 GE– Foreseen backbone upgrade to 10 GbpsForeseen backbone upgrade to 10 Gbps– 3x1Gbps links to WAN (on going upgrade to 10 Gbps)3x1Gbps links to WAN (on going upgrade to 10 Gbps)

• 1 Gbps production link 1 Gbps production link • 10 Gbps Service Challenge (LHCOPN) link10 Gbps Service Challenge (LHCOPN) link

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

Farm statusFarm status• SLC 3.0.5/LCG 2.6 installed on farmSLC 3.0.5/LCG 2.6 installed on farm

– Installation via quattor Installation via quattor (([email protected]@cern.ch))• Deployed upgrade to 500 nodes in one dayDeployed upgrade to 500 nodes in one day

– Standard configuration of WNs for all experimentsStandard configuration of WNs for all experiments• Migration from torque+maui to LSF (v6.1) last Spring Migration from torque+maui to LSF (v6.1) last Spring

– LSF farm running successfullyLSF farm running successfully– Fair sharing model for resource accessFair sharing model for resource access

• 1 queue/experiment (at least)1 queue/experiment (at least)– Special MPI queue on dedicated resources (InfiniBand) Special MPI queue on dedicated resources (InfiniBand) – Progressive inclusion of CDF farm into general oneProgressive inclusion of CDF farm into general one

• Access to resources centrally managed with Kerberos Access to resources centrally managed with Kerberos (authc) and LDAP (authz)(authc) and LDAP (authz)– Group based authorization Group based authorization

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

Access to Batch systemAccess to Batch system

“Legacy” non Grid Access

CE LSF Wn1 WNn

SE

Grid AccessUI

UIUI UI

Grid

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

c=it

o=infn

UAFS: infn.it

Go=cnaf

U GGeneric CNAF users

infn userpublic view

ou=afs

Authorization with LDAPprivate view

ou=cnaf

G

U

AR

RU

G

ou=people

ou=group A

ou=role

ou=automount

N ou=people-nologin

N

CNAF Tier-1 Report – L.Dell’Agnello

October,10-14 2005

HEPiX Fall 2005 at

SLAC

Storage statusStorage status• Physical access to main storage (Fast-T900) via SANPhysical access to main storage (Fast-T900) via SAN

– Level1 disk servers connected via FCLevel1 disk servers connected via FC• Usually also in GPFS clusterUsually also in GPFS cluster

– Easiness of administrationEasiness of administration– Load balancing and redundancyLoad balancing and redundancy– Lustre under evaluationLustre under evaluation

– Can be level2 disk servers connected to storage only via GPFSCan be level2 disk servers connected to storage only via GPFS• LCG and FC dependencies on OS decoupledLCG and FC dependencies on OS decoupled

• WNs are not members of GPFS cluster (no scalability on WNs are not members of GPFS cluster (no scalability on large number of WNs) large number of WNs) – Storage available to WNs via rfio, xrootd (BABAR only), Storage available to WNs via rfio, xrootd (BABAR only),

gridftp/SRM or NFS (sw distribution only)gridftp/SRM or NFS (sw distribution only)• CASTOR HSM system (SRM interface)CASTOR HSM system (SRM interface)

– STK library with 6 LTO2 and 2 9940B drives (+4 to install) STK library with 6 LTO2 and 2 9940B drives (+4 to install) • 1200 LTO2 (200 GB) tapes 1200 LTO2 (200 GB) tapes • 680 9940B (200 GB) tapes680 9940B (200 GB) tapes

CNAF Tier-1 Report – L.Dell’Agnello