Upload
cori-jackson
View
213
Download
1
Embed Size (px)
Citation preview
CERN/IT/DB
A Strawman Model for using Oracle for LHC Physics Data
Jamie Shiers, IT-DB, CERN
CERN/IT/DB
Overview
Focus on scalability & deployment aspects
Implicit assumption that OCCI / OTT can provide needed functionality
Learn from experience with Objectivity/DB deployment in LAN & WAN
CERN/IT/DB
Basic ConceptsOracle Database refers to datafiles & server
processes on a single system or clusterUser applications can access as many
Oracle Databases as requiredDifferent roles / schema / transaction
boundaries etc all supported out of the boxOracle deployed today at 1-100TB level
CERN/IT/DB
LHC Datatypes / Volumes
RAW: 1PB / yearESD: ~100TB / yearAOD: ~10TB / yearTAG: ~100GB-1TB / year
CERN/IT/DB
LHC Datatypes & Oracle
RAW: 1PB/yrESD: ~100TB/yrAOD: ~10TB/yrTAG: ~100GB-1TB/yr
~1 ‘DB’ / month~1 ‘DB’ / year~1 ‘DB’~1 ‘DB’ combined with
AOD• Maybe possible to soften these to ~1 ‘DB’ for all ESD• Would there be a strong advantage?• Different ‘DB’s have different access patterns, access
control, schema, … etc.• Navigation between DBs fully supported (links)
CERN/IT/DB
A 100TB Oracle DB
Single machine or cluster?Oracle stress “Real Application Clusters”
with Oracle 9i – set of commodity systems vs ‘datacenter’ style server
Today’s Objy servers have ~1TB / disk accessible through 1 network connection
Scale to cluster of O(10) systems with O(100TB) disk?Seems plausible…
Oracle Confidential 7
CERN/IT/DB
Cluster Architecture
ClusteredDatabase Servers
Mirrored Disk Subsystem
High Speed Switch or Interconnect
Hub or Switch Fabric
NetworkCentralized Management Console
Storage Area Network
Low Latency InterconnectVIA or Proprietary
Drive and Exploit Industry Advances in Clustering
Users
No SinglePoint Of Failure
Oracle Confidential 8
CERN/IT/DB
Cache Fusion
Full Cache Fusion Cache-to-cache
data shipping Shared cache
eliminates slow I/O Enhanced IPC
Allows Flexibleand Transparent Deployment
Users
Shared CacheShared Cache
Cache FusionCache Fusion
CERN/IT/DB
O.R.A.C.
Certified Intel configurations from a number of vendors… COMPAQ: PIII Xeon 700MHz, 4P, 4GB FastTango: Oracle 9i cluster on Linux
Obtaining information from these and other vendors on suitable evaluation configurations…
CERN/IT/DB
Oracle DeploymentDAQ cluster: current data – no history
export tablespacesto RAW cluster
to/from MSS
ESD cluster: 1/year? 1?
AOD/TAG 1 total?
to RCs to/from RCs
reconstruct ‘shift’ analysis
CERN/IT/DB
100TB cluster testbed
BT have ~80TB Oracle DB today Visit arranged for July 31
Other VLDB sites will also be visited e.g. Deutsche Telekom (DB2), DOCOMO, …
CERN/IT/DB
Why Cluster?
Separate DBs Simple, no cluster h/w or
s/w Individual nodes (DBs)
can be maintained independently
Need additional layer to find DB
Machines serving inactive data idle
Each node is a single point of failure
Cluster Additional complexity,
cost Entire cluster must be
upgraded together No additional s/w layer All nodes used all of
the time(?) Shared cache Reliability increases
with additional nodes
CERN/IT/DB
Size of the Largest RDBMS in Commercial Use for DSSSource: Database Scalability Program 2000
Terabytes
3
50
100
1996 2000 2005
Projected By Respondents
CERN/IT/DB Decision Support (2000)
Company DB Size*(TB)
DBMS Partner
Server Partner
Storage Partner
SBC 10.50 NCR NCR LSI
First Union Nat. Bank
4.50 Informix IBM EMC
Dialog 4.25 Proprietary Amdahl EMC
Telecom Italia (DWPT)
3.71 IBM IBM Hitachi
FedEx Services 3.70 NCR NCR EMC
Office Depot 3.08 NCR NCR EMC
AT & T 2.83 NCR NCR LSI
SK C&C 2.54 Oracle HP EMC
NetZero 2.47 Oracle Sun EMC
Telecom Italia (DA) 2.32 Informix Siemens TerraSystems
*Database size = sum of user data + summaries and aggregates + indexes
CERN/IT/DBTransaction Processing (2000)
Company DB Size*(TB)
DBMS Partner
Server Partner
Storage Partner
Telstra 10.36 IBM IBM, Hitachi IBM
British Telecom 8.45 CA IBM EMC
United Parcel Service
7.88 IBM IBM EMC
Experian 3.14 IBM Hitachi EMC
US Customs Service 2.70 CA IBM Hitachi
Korea Telecom (KT ICIS)
2.26 Oracle Compaq StorageTek
Dacom System Tech.
1.80 Oracle Pyramid Seagate
CheckFree 1.35 IBM IBM IBM
Centrelink 1.27 CCA IBM IBM
LG TelCom 1.13 Oracle HP EMC
*Database size = sum of user data + summaries and aggregates + indexes
CERN/IT/DB
Summary
~100TB DBs (in Oracle sense) will be fully supported by mainstream vendors on LHC timescales
The gap between our requirements & those of commercial firms narrowing fast