Upload
peggy
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Preview of a Novel Architecture for Large Scale Storage . Andreas Petzold , Christoph -Erdmann Pfeiler , Jos van Wezel Steinbuch Centre for Computing. Storage Management S ystems at GridKa. SRM. Storage Management System. Linux. File System. Storage Controller. Disks. - PowerPoint PPT Presentation
Citation preview
KIT – University of the State of Baden-Württemberg andNational Laboratory of the Helmholtz Association
STEINBUCH CENTRE FOR COMPUTING - SCC
www.kit.edu
Preview of a Novel Architecture for Large Scale Storage
Andreas Petzold, Christoph-Erdmann Pfeiler, Jos van WezelSteinbuch Centre for Computing
2 CHEP 2013, Amsterdam
Storage Management Systems at GridKa
dCache for ATLAS, CMS, LHCb6 PB disk-only3 PB tape-buffers287 pools on 58 serversAgnostic to underlying storage technology
Scalla xrootd for ALICE2.7 PB disk-only/tape buffer15 serversAgnostic to underlying storage technology
SRM
Storage Management
System
File System
Linux
Storage Controller
Disks
3 CHEP 2013, Amsterdam
Current GridKa Disk Storage Technologies
9 x DDN S2AA9900150 enclosures9000 disks796 LUNsSAN Brocade DCX
1 x DDN SFA10K10 enclosures600 disks
1 x DDN SFA12K5 enclosures360 disks
4 CHEP 2013, Amsterdam
Current GridKa Tape Storage Technologies
2 Oracle/Sun/STK SL85002 x 10088 slots22 LTO5, 16 LTO4
1 IBM TS35005800 slots24 LTO4
1 GRAU XL5376 slots16 LTO3, 8 LTO4
6 CHEP 2013, Amsterdam
GridKa Storage Units
Serversconnect directly to storageor via SAN but not explicitly needed
Cluster Filesystem (GPFS)connects several serversfilesystem visible on all nodespredictable IO throughputnice storage virtualisation layer
Currently evaluating alternative filesystems
XFS, ZFS, BTRFS, EXT4
10G Ethernet
Fibre Channel
7 CHEP 2013, Amsterdam
Novel Storage Solutions for GridKa
Expect large resource increase in 2015Chance to look at new solutions during LHC LS1Simplification in operations and deployment required
Solution 1: DataDirectNetworks SFA12K-EServer VMs run embedded in storage controller
Solution 2: Rausch Netzwerktechnik BigFootMore conventional setup; server directly connected to local disks
8 CHEP 2013, Amsterdam
Shortening Long IO Paths
Worker node NIC NICSwitch Router Server HBA HBASAN
SwitchStorage
Controller diskSAS/FC
fabric
applicationbegin {do_io} end
Worker node NIC NICSwitch Router Server disk
SAS/FC fabric
applicationbegin {do_io} end
Traditional storage at GridKa
Previewed storage at GridKa
disk
disk
Worker node NIC NICSwitch Router
Storage Controller disk
SAS/FC fabric
applicationbegin {do_io} end disk
BigFoot
DDN SFA12K-E
Worker node NIC NICSwitch Router Server IB
HCAStorage
Controller diskSAS/FC
fabric
applicationbegin {do_io} end disk
IBHCA
potentially redundant components
network
DDN SFA12K
9 CHEP 2013, Amsterdam
DDN SFA E Architecture
10 GE NIC dedicated to VM
8 VMs per SFA pairDDN VM driverdma access to storage
Worker nodeWorker nodeWorker node
SFA OS
Cache
sfablkdriver
SAS Switch
VM1xrootd
VM2 VM3 VM4
IObridge
IObridge
IObridge
IObridge
NIC
SSD SSD
SFA OS
Cache
sfablkdriver
SAS Switch
VM1 VM2 VM3 VM4
IObridge
IObridge
IObridge
IObridge
SSD SSD
10 CHEP 2013, Amsterdam
BigFoot Architecture
10 GE NIC
Worker nodeWorker nodeWorker node
SAS HBA/RAID Controller
OS
Cache
SSD SSD
NIC
11 CHEP 2013, Amsterdam
Thoughts on Effect of SAN and Long Distance Latency
IO is different for read and writewrites can be done asynchronously
but if you do them sync (like NFS) you have to wait for the ACK
workloads are read mostlyread are synchronous reading huge files linearly defeats caches in servers
SSD speedsexample: Enterprise MLC SSD, Violin memory, Fusion IO drivenumber of IOPS: 250000/s (lower estimate) at 4 k/IO = 4 μs → 4 μs per IO
SAN LatencyDCX director 2.1 μs → 4.2 μs round tripnot accounting HBA, fibre length (300 m = 1 μs)
12 CHEP 2013, Amsterdam
Thoughts on Effect of SAN and Long Distance Latency
Effect on high speed SSDs 121951 IOPS not 250000 IOPSsimilar for disk controller caches when used for VMs or data servers
Negligible impact of SAN for magnetic disks = 199 IOPS
Putting the SSD next to the server can possibly increase the number of IOPs
assumption will be tested in the coming weeks
13 CHEP 2013, Amsterdam
Expected Benefits
Reduced LatencyHBA and SAN: 2.1 μs (e.g. Brocade DCX, blade to blade)storage controllerImproved IO rates
Reduced powerServer and controller HBA, SAN Switch~300-400W = ~600Euro/server/year
Reduced investment500-600 €/HBA, 400-600 €/switch port =1800-2400 €
Improved MTBF Less components
14 CHEP 2013, Amsterdam
Possible Drawbacks
Loss in flexibilityw/o SAN storage building blocks are largerLimited server access to storage blocksStorage systems are only connected via LAN
VMs inside storage controller (DDN SFA12K-E)Competition for resourcesLimited number of VMs limits “server/TB” ratioLoss of redundancy
Simple server attached storage (BigFoot)Limited by simple hardware controller HW admin doesn’t scale to 100s of boxesNo redundancy
15 CHEP 2013, Amsterdam
Glimpse at Performance
Preliminary performance evaluationIOZONE testing 30-100 parallel threads on XFS filesystemXrootd data server in VM, performance similar to IOZONEOut-of-the-box settings, no tuningPerformance below expectations, reasons still to be understoodZFS tested on BigFoot
IOZONE
16 CHEP 2013, Amsterdam
Conclusions
Storage extension at GridKa requires expensive upgrade of GridKa disk SAN – or novel storage solutionTight integration of server and storage looks promisingMany possible benefits – further evaluation required
Less componentsLess power consumptionLess complexity
Performance needs to be understood together with vendorsMore tests with other vendors in near future
17 CHEP 2013, Amsterdam