12
Tier1A Status Martin Bly 28 April 2003

Tier1A Status

  • Upload
    egil

  • View
    22

  • Download
    1

Embed Size (px)

DESCRIPTION

Tier1A Status. Martin Bly 28 April 2003. CPU Farm. Older hardware: 108 dual processors (450, 600 and 1GHz) 156 dual processor 1400MHz PIII Recent delivery: 80 dual 2.66GHz P4 Xeon 533MHz FSB, 2GB memory Next delivery expected in the summer. Operating Systems. Operating Systems: - PowerPoint PPT Presentation

Citation preview

Page 1: Tier1A Status

Tier1A Status

Martin Bly28 April 2003

Page 2: Tier1A Status

CPU Farm

• Older hardware:– 108 dual processors (450, 600 and 1GHz)– 156 dual processor 1400MHz PIII

• Recent delivery:– 80 dual 2.66GHz P4 Xeon – 533MHz FSB, 2GB memory

• Next delivery expected in the summer

Page 3: Tier1A Status

Operating Systems

• Operating Systems:– Redhat 6.2 service will close in May– Redhat 7.2 service has been in production

for Babar for 6 months.– New Redhat 7.3 service now available for

LHC/other experiments

• Increasing demands for security updates becoming problematic.

Page 4: Tier1A Status

Disk Farm (last Year)

• Last year – 26 servers, each with 2 external RAID arrays - 1.7TB disk per server:– Excellent performance, well balanced system– Problems with a bad batch of Maxtor drives –

many failures and high error rate – all 620 drives now replaced by Maxtor.

– Still outstanding problems with Accusys controller failing to eject bad drives from RAID set.

Page 5: Tier1A Status

Disk Farm (this year)

• Recent upgrade to disk farm.– 11 dual P4 servers (with PCIx), each with 2 Infortrend

IFT-6300 arrays– 12 Maxtor 200GB Diamondmax Plus 9 drives per

array.

• Not yet in production – but a few snags:– Original tendered Maxtor: Maxline Plus II drive was

found not to exist.– Infortrend array has 2TB limit per RAID set – some

(10%) wasted space!

• Nick White ([email protected]) for more info

Page 6: Tier1A Status

New Projects

• Basic fabric performance monitoring (ganglia)

• Resource CPU accounting (based on PBS accounts/mysql)

• New CA in production• New batch scheduler (MAUI)• Deploy new helpdesk (May)

Page 7: Tier1A Status

Ganglia Monitoring

• Urgently needed live performance and utilisation monitoring– RAL Ganglia Monitoring (live)– RAL Ganglia Monitoring (Static)

• Scalable solution based on multicast• Very rapidly deployable - reasonable

support on all Tier1A Hardware• See: http://ganglia.sourceforge.net/

Page 8: Tier1A Status
Page 9: Tier1A Status

PBS Accounting Software

• Need to keep track of system CPU and disk usage.

• Home grown PBS accounting package (Derek Ross):– Upload PBS and disk stats into MYSQL– Process with perl DBI script– Serve via Apache

• http://www.gridpp.rl.ac.uk/stats • Contact Derek ([email protected]) for more

info.

Page 10: Tier1A Status

MAUI/PBS

• Maui scheduler has been in production for last 3 months.

• Allows extremely flexible scheduling with many features. But ….– Not all of it works – we have done much

work with developers for fixes.– Major problem – MAUI schedules on wall

clock time – not CPU time. Had to bodge it!!

Page 11: Tier1A Status

New Helpdesk Software

• Old helpdesk mail based/unfriendly.• With additional staff, urgently need to deploy

new solution.• Expect new system to be based on free

software – probably Request Tracker• Hope that deployed system will also meet

needs of Testbed and may also satisfy Tier 2 sites.

• Expect deployment by end of May.• http://requestracker.gridpp.rl.ac.uk/ (Static)

Page 12: Tier1A Status

Outstanding Issues/worries

• We have to run many distinct services. For example, FERMI Linux, RH 6.2/7.2/7.3, EDG testbeds, LCG …

• Farm management is getting very complex. We need better tools and automation.

• Security Is becoming a big concern again.