John Bland, Robert Fay

PowerPoint Presentation

John Bland, Robert Fay

Liverpool HEP - Site ReportMay 2012Current Hardware - UsersDesktops~100 Desktops: Scientific Linux 5.5, Windows 7+XP, Legacy systemsMinimum spec of 2.66GHz Q8400, 4GB RAM + TFT MonitorRecently upgraded, clean installs, single platformOpportunistic batch usage (~60cores)

Laptops~60 Laptops: Mixed architectureWindows+VM, MacOS+VM, Netbooks

PrintersSamsung and Brother desktop printersVarious HP model heavy duty group printers

Current Hardware Tier 3 BatchTier3 Batch FarmSoftware repository (0.5TB), storage (3TB scratch, 13TB bulk)medium64, short64 queues consist of 9 64bit SL5 nodes (2xL5420, 2GB/core)Plus desk64 for aforementioned desktop usage and all642 of the 9 SL5 nodes can also be used interactively5 older interactive nodes (dual 32bit Xeon 2.4GHz, 2GB/core)Using Torque/PBS/Maui+FairsharesUsed for general, short analysis jobsGrid jobs are occasionally run opportunistically on this cluster (not much recently due to steady local usage)

Current Hardware Servers~40 core servers (HEP+Tier2)Some rack Gigabit switches1 High density Force10 switch (400 ports)Moving to 10GbConsole access via KVMoIP (when it works) + IPMIKVM installed

LCG ServersMost services on Virtual machines (lcg-CE, CREAM*2, sBDII, APEL, Torque, ARGUS, test cluster)Still more to come (UI)KVMRedundant KVM servers for load sharing and testing

Current Hardware VirtualHeavy duty KVM servers for grid and HEP services

HVM124cpuHGVM324cpuHGVM18cpuHGVM28cpuNFSiSCSIProductionTestingShared Storage

Current Hardware NodesHAMMERMixed nodes:7 x dual L542064 x dual E562016 x dual X5650Lots of room to work inIPMI+KVM makes life so much easierAlso hot swap drives and mboardsIPMI monitoring not entirely trustworthySensor reading spikesOccasionally needs rebooting

Configuration and deploymentKickstart used for OS installation and basic post installUsed with PXE boot for some servers and all desktopsPuppet used for post-kickstart node installation (glite-WN, YAIM etc)Also used for keeping systems up to date and rolling out packagesAnd used on desktops for software and mount pointsCustom local testnode script to periodically check node health and software statusNodes put offline/online automaticallyKeep local YUM repo mirrors, updated when required, no surprise updates

HEP NetworkGrid cluster is on a sort-of separate subnet (138.253.178/24)Shares some of this with local HEP systemsMost of these addresses may be freed up with local LAN reassignmentsMonitored by Cacti/weathermap, Ganglia, Sflow/ntop (when it works), snort (sort of)Grid site behind local bridge/firewall, 2G to CSD, 1G to JanetShared with other University trafficUpgrades to 10G for WAN soon (now in progress!)Grid LAN under our control, everything outside our firewall controlled by our Computing Services Department (CSD)We continue to look forward to collaborating with CSD to make things work

Network MonitoringGanglia on all worker nodes and serversCacti used to monitor building switches and core Force10 switchThroughput and error readings + weathermapNtop monitors core Force10 switch, but still unreliablesFlowTrend tracks total throughput and biggest users, stableLanTopolog tracks MAC addresses and building network topologyarpwatch monitors ARP traffic (changing IP/MAC address pairings).

Monitoring - CactiCacti Weathermap

Current Hardware Network

Getting low on cablesNetwork capacity our biggest upcoming problemMoving to 10GbHurrah!Bonding was getting ridiculous as density increasedBut worked very well

HEP Network topology192.168 Research VLANs1500MTU9000MTU2G2G x 2410G1G1G1-3GServersStorageWAN2G1G1G6GNodesE600Chassis3GOld 1G NetworkCurrent Hardware Network

New 10Gb kitSwitches!3 x S4810 core switches10 x S55 rack switchesOptics!4x LR2x SRNICs!40 x SolarFlare dual port low-latency 10GBoxes!3x Dell R610s1x Dell R710

S4810S4810S4810S55S55ServersStorage80G80G30GWAN20G20G2G2G30GNodesFutureNew 10G NetworkCurrent Hardware Network

Initial testingJumbo frames still of benefit to bandwidth and CPU usagePacket offloading turned on makes a bigger difference, but should be on by default anyway10Gb switches now installed in racksRelocation of servers upcoming

StorageMajority of file stores using hardware RAID6.Mix of 3ware, Areca SATA controllers and Adaptec SAS/SATAArrays monitored with 3ware/Areca software and nagios pluginsSoftware RAID1 system disks on all servers.A few RAID10s for bigger arrays and RAID0 on WNsNow have ~550TB RAID storage in total. Getting to be a lot of spinning disks (~700 enterprise drives in WNs, RAID and servers).Keep many local sparesOlder servers were upgraded 1TB->2TBTrickle down of 1TB->0.75TB->0.5TB->0.25TB upgradesAlso beefed up some server local system/data disks

Central ClustersSpent last few years trying to hook up UKI-NORTHGRID-LIV-HEP to NWGRID over at the Computing Services Department (CSD)Never really workedToo many problems with OS versionsSGE awkwardShared environment and configuration by proxy is painfulNew 1 million procurement now in progressWere on the procurement committeePhysics requirements included from the outsetSteering and technical committees will be formedLooks promising

SecurityNetwork securityUniversity firewall filters off-campus trafficLocal HEP firewalls to filter on-campus trafficMonitoring of LAN devices (and blocking of MAC addresses on switch)Single SSH gateway, DenyhostsSnort and BASE (still need to refine rules to be useful, too many alerts)Physical securitySecure cluster room with swipe card accessLaptop cable locks (occasionally some laptops stolen from building)Promoting use of encryption for sensitive dataParts of HEP building publically accessibleWe had some (non-cluster) kit stolenLoggingServer system logs backed up daily, stored for 1 yearAuditing logged MAC addresses to find rogue devices

Plans and IssuesResearch ComputingResearch now more of an emphasisVarious projects in pipelineGPGPU workTesla box (dual Tesla M2070s)Looking forward to MICCluster room cooling still very old, regular failuresBit more slack after the MAP2 removal but failures increasingHowever central Facilities Management maintenance has improvedUniversity data centre strategy continues to evolveThere were plans to move CSD kit into the room, but this is not happening as things standWhile were on the subject of cooling, we had an incident

Its getting hot in hereOne Saturday morning, a large compressor in the building basement blew a fuseThis took out the main building powerOur air-conditioning is on the main building powerBut our cluster power is notAmbient temperature hit ~60C before building power was restoredA bit warmer than Death Valley and Africa at their hottestWe had some measures in placeSome systems shut downBut there were some problemsExternal network link went down with building power failureNeed to automatically switch off systems more aggressivelyAftermathSome hard drives failedSome insist on telling us about this one time it got really hot (smartctl health check return code has bit 5 set, indicating passed check but attribute has been

Documents

John Bland, Robert Fay