Upload
cynthia-glenn
View
212
Download
0
Embed Size (px)
Citation preview
CASPUR Site Report
Andrei MaslennikovLead - Systems
Karlsruhe, May 2005
A.Maslennikov - Karlsruhe 2005 2
Contents
• Update on central computers• Storage news• Linux• Projects 2005
A.Maslennikov - Karlsruhe 2005 3
Central computers IBM SMP: - 3 frames with 80 POWER-4 CPUs at 1.1 GHz and 144 GB of RAM - 1 legacy frame with 64 POWER-3 CPUs at 375 MHz and 64 GB of RAM - being decomissioned - AIX 5.2 ML4/5 - Very stable, all CPUs are heavily used - Under lease until 2006; will be probably upgrading to POWER-5 in 2005-6
HP SMP: - New 32-CPU EV7 system arrived to replace 8 4-CPU ES45 nodes - 32 CPUs at 1.15 GHz, RAM: 64 GB, Tru64 5.1B+ - Last of the Mohicans but extremely fast - We will soon become one of the last places on Earth running Tru64
Itanium-2 SMP: - 1 single CPU, 5 biprocessor and 1 quad nodes (900 MHz - 1 Ghz – 1.5 GHz) - RH ES 3 on one node, all others run CERN CEL3/AS3 Build for ia64
Opteron SMP: - New 4 quad 2.2GHz, 25 dual 2.0GHz - SuSE 9 Professional on all nodes - More units will be coming soon, with Infiniband (turning back to MPI)
NEC SX-6i: - New 8-way unit, 64 GB of RAM - heavily used
A.Maslennikov - Karlsruhe 2005 4
Common glue
We are “heterogeneous” but learned how to live with it:
- Single sign-on: K5 (Heimdal) - Home directories: AFS on all platforms - Large shared files: NFS on all platforms - Batch system: SGEEE on all platforms
- Windows: AFS where needed, AD/K5 where needed
A.Maslennikov - Karlsruhe 2005 5
Storage newsPurchased IBM SANFS (StorTank): - 2 MDS, 8TB (3 x FASTT100) - Metadata on IFT A16F-G1A2 (this saved a lot of money) - Local area for POWER-4 Cluster, but will test an Opteron port - In semiproduction since February, not yet backed up - Backup: most probably will be using our stager with archiving option Purchased 4 new IFT SATA/FC arrays (G2221) - Speed doubled in respect to G1A2 (more info tomorrow) - Will be putting these units in production shortly
Tapes: some upgrades - Replaced LTO1s with LTO2s, doubled the number of slots in 3584 robot - Will replace remaining LTO1 drives with LTO3s in 3Q - Data migration in progress
SAN: migration from Brocade to Qlogic - Several services moved, hope to finish before August
A.Maslennikov - Karlsruhe 2005 6
CASPUR: principal resources in 2005
Itanium2 – 15 CPUs (0.9-1.5 GHz)
HP - 32 CPUs (1.15GHz)
FC TAPE SYSTEMS60 / 120 TB
FC RAID SYSTEMS50 TB
AFS - 6TB
NFS - 10 TB
Data Movers
Digital Library 16TB
TSM Backup
NEC 6Xi – 8 CPUs
Opteron – 70 CPUs (2-2.5 GHz)
StorTank - 8 TBIPSAN
AFS Backup
IBM - 150 CPUs (375,1100 MHz)
A.Maslennikov - Karlsruhe 2005 7
Linux
CASPUR BigBox distro (since 1998): - Currently shipping ES3, fully compatible with RHEL 3
- Bought several official RedHat licenses for reference machines
- These machines are used for RPM builds, and for consistency checking
- Will release ES4 for i386 and x86_64 (June 2005)
Developed a taste for SuSE - May be a good candidate for servers (solid 2.6, XFS etc) - Good for comparison and debug, many features common with RH
- Preferred by our Application Sector, on Opterons. Key argument: ongoing fruitful collaboration between AMD and SuSE is a plus - Now talks in progress with Novell, will probably purchase a site license for SLES 9 - But: don’t have plans to build a SuSE based distro
A.Maslennikov - Karlsruhe 2005 8
Some projects, 2005
Technology tracking (in collab. with CERN and other centers) – 1 FTE- New storage devices- New software solutions in the field of storage- Tested 2005: 300+ KEuro worth of hardware
Staging IIa / Tape Dispatcher– 1.5 FTE - Virtual Library implemented - Tuning, cleaning, new features - Will appear on Source Forge before the end of 2005
AFS/OSD (in collaboration with CERN and RZ Garching) - 2.3 FTE- Implementation of an Object Shared Device (OSD) in accordance with T10 specs- OSD integration with AFS