Upload
diana-barnett
View
215
Download
2
Tags:
Embed Size (px)
Citation preview
DRI Grant impact at the smaller sites
Pete Gronbech September 2012GridPP29 Oxford
2GridPP29, Oxford
Target Areas
• Internal Cluster networking• Cluster to JANET interconnect• Resilience and redundancy
26/9/12
3GridPP29, Oxford
Cluster Networking
• Most sites clusters have been interconnected at 1Gb/s• As storage servers size increased from ~20TB to 40TB
and even larger 36 bay units with usable capacities of ~70TB the network links had to be increased to cope with the number of simultaneous connections from worker nodes
• Many sites decided to use trunked or bonded 1Gb/s links.
• Work on the basis of roughly one 1Gb/s link per 10TB• This no longer scales for the very large servers having
6 bonded links.• Cost of gigabit networking starts to look high when
you have to divide the number of ports on a switch by 6.
• 10G bit switch prices coming down.26/9/12
4GridPP29, Oxford
DRI Grant
• Has allowed the sites to make the jump to 10Gbit switches in the cluster earlier than they would have planned to do so.
• Has allowed some degree of future proofing by providing enough ports to cover expected cluster expansion over the next few years.
• Replacing bonded gigabit with 10Gbit simplifies and tidies up the cabling and configuration. (Less to go wrong hopefully)
26/9/12
5GridPP29, Oxford
Campus connectivity
• Many Grid Clusters had 1 or 2G/bit connections to the campus WAN.
• Many sites have used grant funding to install routers to allow connectivity to the back bone at 10Gbit.
• If the campus backbone is made up of 10 Gbit links then the danger is that the grid cluster could saturate some of these links so blocking other traffic to the JANET connection.
• Links have to be doubled up on the route to the campus router.
• The JANET connection to the university has to be increased or the Grid link capped to allow both Grid traffic and Campus traffic to flow un hindered
• The alternative is to install a by-pass link directly to the JANET router.26/9/12
6GridPP29, Oxford
Resilience
• Where network upgrades were able to be purchased at a cost less than anticipated some funds where used to upgrade critical service nodes or infrastructure.
• Storage server head nodes, caching servers, UPS or improved firewalls were items chosen by different institutes.
• All sites were allocated some funds to purchase monitoring nodes. Originally intend to run Gridmon but the plan changed to use PerfSonar.
• The end result is that the Grid clusters at the sites are in a much stronger position than before and will provide robust
26/9/12
7GridPP29, Oxford
Careful planning required!
26/9/12
8GridPP29, Oxford26/9/12
Site Cluster Networking switches
Campus Switches comments
Birmingham Dell Force 10 S4810 and S60 switches plus NICS
Fibres provided to bypass the campus backbone to a separate 10Gbit JANET connection
Main Grid cluster will have 10Gbit connection to the PP part of the shared cluster.
Bristol Cisco switches Fibres and connections
Brunel Cisco Nexus 5596UP 10GE switches, and CISCO 3750E 1Gbit switches allowing 2 channel bonds to WNs
4Gb/s of 10Gbit JANET connection. CMS Site. A 72TB cache server purchased to act as a buffer between pool nodes and WNs
Cambridge Dell 8024F switches used to provide 10Gbps to SE head and pool nodes.
Full 10Gbps connectivity to JANET via new fibres
Durham HP 5412 gigabit switches with 8 10Gbps ports.
ECDF IBM BNT G8264R CISCO?
9GridPP29, Oxford26/9/12
Site Cluster Networking switches
Campus Switches comments
Glasgow Extreme Summit X670V & X460-48t
Mainly concentrated on cluster infrastructure
Imperial 10Gb/s infrastructure for storage, connects to 40Gb/s college connection
Lancaster Force 10 Z9000 & S4810
Liverpool Force 10 S4810 & S55, SolarFlare NICs
Manchester Dell 8024 10Gbit campus connection
Oxford Force 10 S4810 Cisco 4900M Campus LAN switches
10Gbit JANET link throttled to 5Gbps
10GridPP29, Oxford
Edinburgh
26/9/12
11GridPP29, Oxford
Glasgow
26/9/12
12GridPP29, Oxford
Lancaster
26/9/12
That Network Upgrade...
The mad scramble network uplift plan for
Lancaster took a 3-pronged approach.
1. Upgrade & Shang-hai the University's back up link. 10G (mostly) just for us.
2. Increase connectivity to campus backbone & thus between the two "halves" of the grid cluster and the local HEP cluster.
3. Add capacity for 10G networking to our cluster using a pair of Z9000 core switches &
half a dozen S4810 rack switches.
4. This free's up some of the current switches that can be retasked to improve the HEP cluster networking.
13GridPP29, Oxford
Liverpool
26/9/12
14GridPP29, Oxford
Oxford
26/9/12
15GridPP29, Oxford
QMUL
26/9/12
16GridPP29, Oxford
RHUL
26/9/12
• Now have 2x1Gb/s links to Janet, trunked. Second link added 7th March. Could not be utilised until old 1Gb/s firewall replaced.
• Network upgraded from 8x Dell PC6248 (1Gb/s) stack, to 2xF10 S4810 10Gb/s spine with PC6248s attached as leaves by 2x10Gb/s to each F10.
• Old 1Gb/s firewall out of warranty/support, to be replaced soon with Juniper SRX650 (7 Gb/s max).
17GridPP29, Oxford
Sheffield
26/9/12
18GridPP29, Oxford
Sussex
• 4 36-port Infiniband switches• Arranged IB switches in Fat Tree topology
26/9/12
19GridPP29, Oxford
Common Themes
• Well planned cluster networking, balanced and future proof
• Vast improvement from ad hoc cost limited designs they replaced.
• Have brought tangible benefits..
26/9/12
20GridPP29, Oxford
FTS Transfer Rates
• To Oxford
26/9/12
• From Oxford
21GridPP29, Oxford26/9/12
22GridPP29, Oxford
Benefits
• August 2012 Transfers of files to Oxford hitting 5Gbit rate cap for several hours.
26/9/12
23GridPP29, Oxford
Performance Tuning / Future
• Now need to concentrate on improving FTS transfers to the remaining slow sites
• Good Monitoring required both locally and nationally• PerfSonar being installed across the sites (See next
talk)• Work with JANET and site networking to increase
JANET connectivity where required.
26/9/12