8
CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

Embed Size (px)

Citation preview

Page 1: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

CASPUR Site Report

Andrei MaslennikovLead - Systems

Karlsruhe, May 2005

Page 2: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 2

Contents

• Update on central computers• Storage news• Linux• Projects 2005

Page 3: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 3

Central computers IBM SMP: - 3 frames with 80 POWER-4 CPUs at 1.1 GHz and 144 GB of RAM - 1 legacy frame with 64 POWER-3 CPUs at 375 MHz and 64 GB of RAM - being decomissioned - AIX 5.2 ML4/5 - Very stable, all CPUs are heavily used - Under lease until 2006; will be probably upgrading to POWER-5 in 2005-6

HP SMP: - New 32-CPU EV7 system arrived to replace 8 4-CPU ES45 nodes - 32 CPUs at 1.15 GHz, RAM: 64 GB, Tru64 5.1B+ - Last of the Mohicans but extremely fast - We will soon become one of the last places on Earth running Tru64

Itanium-2 SMP: - 1 single CPU, 5 biprocessor and 1 quad nodes (900 MHz - 1 Ghz – 1.5 GHz) - RH ES 3 on one node, all others run CERN CEL3/AS3 Build for ia64

Opteron SMP: - New 4 quad 2.2GHz, 25 dual 2.0GHz - SuSE 9 Professional on all nodes - More units will be coming soon, with Infiniband (turning back to MPI)

NEC SX-6i: - New 8-way unit, 64 GB of RAM - heavily used

Page 4: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 4

Common glue

We are “heterogeneous” but learned how to live with it:

- Single sign-on: K5 (Heimdal) - Home directories: AFS on all platforms - Large shared files: NFS on all platforms - Batch system: SGEEE on all platforms

- Windows: AFS where needed, AD/K5 where needed

Page 5: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 5

Storage newsPurchased IBM SANFS (StorTank): - 2 MDS, 8TB (3 x FASTT100) - Metadata on IFT A16F-G1A2 (this saved a lot of money) - Local area for POWER-4 Cluster, but will test an Opteron port - In semiproduction since February, not yet backed up - Backup: most probably will be using our stager with archiving option Purchased 4 new IFT SATA/FC arrays (G2221) - Speed doubled in respect to G1A2 (more info tomorrow) - Will be putting these units in production shortly

Tapes: some upgrades - Replaced LTO1s with LTO2s, doubled the number of slots in 3584 robot - Will replace remaining LTO1 drives with LTO3s in 3Q - Data migration in progress

SAN: migration from Brocade to Qlogic - Several services moved, hope to finish before August

Page 6: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 6

CASPUR: principal resources in 2005

Itanium2 – 15 CPUs (0.9-1.5 GHz)

HP - 32 CPUs (1.15GHz)

FC TAPE SYSTEMS60 / 120 TB

FC RAID SYSTEMS50 TB

AFS - 6TB

NFS - 10 TB

Data Movers

Digital Library 16TB

TSM Backup

NEC 6Xi – 8 CPUs

Opteron – 70 CPUs (2-2.5 GHz)

StorTank - 8 TBIPSAN

AFS Backup

IBM - 150 CPUs (375,1100 MHz)

Page 7: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 7

Linux

CASPUR BigBox distro (since 1998): - Currently shipping ES3, fully compatible with RHEL 3

- Bought several official RedHat licenses for reference machines

- These machines are used for RPM builds, and for consistency checking

- Will release ES4 for i386 and x86_64 (June 2005)

Developed a taste for SuSE - May be a good candidate for servers (solid 2.6, XFS etc) - Good for comparison and debug, many features common with RH

- Preferred by our Application Sector, on Opterons. Key argument: ongoing fruitful collaboration between AMD and SuSE is a plus - Now talks in progress with Novell, will probably purchase a site license for SLES 9 - But: don’t have plans to build a SuSE based distro

Page 8: CASPUR Site Report Andrei Maslennikov Lead - Systems Karlsruhe, May 2005

A.Maslennikov - Karlsruhe 2005 8

Some projects, 2005

Technology tracking (in collab. with CERN and other centers) – 1 FTE- New storage devices- New software solutions in the field of storage- Tested 2005: 300+ KEuro worth of hardware

Staging IIa / Tape Dispatcher– 1.5 FTE - Virtual Library implemented - Tuning, cleaning, new features - Will appear on Source Forge before the end of 2005

AFS/OSD (in collaboration with CERN and RZ Garching) - 2.3 FTE- Implementation of an Object Shared Device (OSD) in accordance with T10 specs- OSD integration with AFS