Upload
christmas
View
34
Download
0
Embed Size (px)
DESCRIPTION
Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems. Tofigh Azemoon Jacek Becla Andrew Hanushevsky Massimiliano Turri SLAC National Accelerator Laboratory. Soon a typical running HEP experiment will look like this. 100s of users 10s of thousands of batch nodes - PowerPoint PPT Presentation
Citation preview
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 11
Real-time Data Access Monitoring Real-time Data Access Monitoring in Distributed, Multi Petabyte in Distributed, Multi Petabyte
SystemsSystems
Tofigh AzemoonTofigh AzemoonJacek BeclaJacek Becla
Andrew Hanushevsky Andrew Hanushevsky Massimiliano TurriMassimiliano Turri
SLAC National Accelerator LaboratorySLAC National Accelerator Laboratory
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 22
Soon a typical running HEP experiment will look like Soon a typical running HEP experiment will look like thisthis
100s of users100s of users
10s of thousands of batch nodes10s of thousands of batch nodes
1000s of data servers1000s of data servers
10s of millions of files10s of millions of filesContainingContaining
10s of PB of data10s of PB of data++
geographically distributed geographically distributed crossing many time zones.crossing many time zones.
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 33
Mission StatementMission Statement
Provide real time overall view of Provide real time overall view of system performancesystem performance
Respond to detailed queriesRespond to detailed queries to identify bottle necks to identify bottle necks to optimize the systemto optimize the system to aid in planning system to aid in planning system expansionexpansion
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 44
The SLAC ¼The SLAC ¼PBPB “kan” “kan” ClusterCluster
ClientsClients
kan001 kan002 kan003 kan004 kan059
bbr-rdr03 bbr-rdr04
Data ServersData Servers( 320 TB)( 320 TB)
ManagersManagers
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 55
Monitored ObjectsMonitored Objects
client
file
job
session
usertype_2
type_1
server
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 66
File classes to monitor aggregate valuesFile classes to monitor aggregate values for groups of files for groups of files
BaBar Examples:BaBar Examples:
type_1 ( dataType )type_1 ( dataType ) /store/store//PRPR//R22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.rootR22/AllEvents/0006/70/22.0.3/AllEvents_00067045_22.0.3V03.02E.root /store//store/SPSP/R22/000998/200406/22.0.3/SP_000998_068468.01.root /R22/000998/200406/22.0.3/SP_000998_068468.01.root /store/store//PRskimsPRskims//R22/22.1.1cR22/22.1.1c//IsrIncExcIsrIncExc//79/79/
IsrIncExc_57978.01.rootIsrIncExc_57978.01.root/store/store//SPskimsSPskims//R22/22.1.1cR22/22.1.1c//Tau1NTau1N//001235/200212/001235/200212/
Tau1N_001235_49553.01.rootTau1N_001235_49553.01.root type_2 (skims)type_2 (skims)
File path File path getFileType getFileType ( ( type_1 valuetype_1 value, , type_2 valuetype_2 value))
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 77
Xrootd ServerXrootd Server Highly scalable serverHighly scalable server Posix like access to filesPosix like access to files Load balancingLoad balancing Transparent recovery from Transparent recovery from
server crashesserver crashes Fault tolerantFault tolerant Very low latencyVery low latency
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 88
Monitoring Implementation in Monitoring Implementation in xrootdxrootd
Minimal impact Minimal impact on client requestson client requests
Robustness in Robustness in multimode failuremultimode failure
Precision & Precision & specificity of specificity of collected datacollected data
Real time Real time scalabilityscalability
Use UDP datagramsUse UDP datagrams Data servers Data servers
insulated from insulated from monitoring. Butmonitoring. But
Packets can get lostPackets can get lostOutsource client Outsource client
serializationserializationLow bounded Low bounded
resource usageresource usageUse of time bucketsUse of time buckets
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 99
•Start SessionsessionId, user, PId, client, server, start T
•StagingstageId, user, PId, client, file path, stage T, duration, server
•Open FilefileId, user, PId, client, server, file path, open T
•Close File fileId, bytes read, bytes written, close T•End Session
sessionId, duration, end T + Xrootd restart time for each server
RT
data
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1010
Single Site MonitoringSingle Site Monitoring
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1111
Multi-site MonitoringMulti-site Monitoring
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1212
COMPONENTSCOMPONENTS Collector/Decoder (C+Collector/Decoder (C+
+)+) MySQL database (5.0)MySQL database (5.0) Database Applications Database Applications
(Perl, Perl DBI)(Perl, Perl DBI) CreateCreate Load Load PreparePrepare UpgradeUpgrade ReloadReload BackupBackup
Web application ( JSP3)Web application ( JSP3) DB access via JDBCDB access via JDBC
Data servers Data servers xrootd enabledxrootd enabled
Database ServerDatabase Server Hosting DB & running Hosting DB & running
DB applicationDB application Web ServerWeb Server
TomcatTomcat
For security reasons DB & For security reasons DB & Web servers on Web servers on different hostsdifferent hosts
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1313
Configuration FileConfiguration File dbName: xrdmon_kan_v005dbName: xrdmon_kan_v005 MySQLUser: xrdmonMySQLUser: xrdmon webUser: readerwebUser: reader MySQLSocket: /tmp/mysql.sockMySQLSocket: /tmp/mysql.sock baseDir: /u1/xrdmon/allSitesbaseDir: /u1/xrdmon/allSites ctrPort: 9931ctrPort: 9931 thisSite: SLACthisSite: SLAC fileType: dataType 100fileType: dataType 100 fileType: skim 500fileType: skim 500
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1414
site: 1 SLAC PST8PDT 2005-06-13 00:00:00site: 1 SLAC PST8PDT 2005-06-13 00:00:00 site: 2 RAL WET 2005-08-08 10:14:00site: 2 RAL WET 2005-08-08 10:14:00 site: 2 CCIN2P3 CET 2006-10-16 00:00:00site: 2 CCIN2P3 CET 2006-10-16 00:00:00 site: 3 CNAF CET 2006-12-18 00:00:00site: 3 CNAF CET 2006-12-18 00:00:00 site: 3 GRIDKA CET 2006-10-16 00:00:00site: 3 GRIDKA CET 2006-10-16 00:00:00 site: 3 UVIC PST8PDT 2007-05-04 21:00:00site: 3 UVIC PST8PDT 2007-05-04 21:00:00 backupInt: SLAC 1 DAYbackupInt: SLAC 1 DAY backupIntDef: 1 DAYbackupIntDef: 1 DAY
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1515
fileCloseWaitTime: 10 MINUTEfileCloseWaitTime: 10 MINUTE maxJobIdleTime: 15 MINUTEmaxJobIdleTime: 15 MINUTE maxSessionIdleTime: 12 HOURmaxSessionIdleTime: 12 HOUR maxConnectTime: 70 DAYmaxConnectTime: 70 DAY closeFileInt: 15 MINUTEcloseFileInt: 15 MINUTE closeIdleSessionInt: 1 HOURcloseIdleSessionInt: 1 HOUR closeLongSessionInt: 1 DAYcloseLongSessionInt: 1 DAY nTopPerfRows: 20nTopPerfRows: 20 yearlyStats: ONyearlyStats: ON allYearsStats: OFFallYearsStats: OFF
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1616
Basic ViewBasic View
users unique files
jobs all files
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1717
Top Top Performers Performers
TableTable
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1818
User InformationUser Information
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 1919
Skim InformationSkim Information
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2020
Statistics (BaBar, September 2007)Statistics (BaBar, September 2007) DB size: > 40 GBDB size: > 40 GB # tables: > 200# tables: > 200
many with 10’s of millions rowsmany with 10’s of millions rows Largest table > 132,000,000 rowsLargest table > 132,000,000 rows
# jobs recorded > 30,000,000# jobs recorded > 30,000,000 At peak timesAt peak times
over 100 concurrent usersover 100 concurrent users running 10’s of thousands of jobsrunning 10’s of thousands of jobs
March 6, 2009 March 6, 2009 Tofigh AzemoonTofigh Azemoon 2121
Future Developments & ExpansionsFuture Developments & Expansions DB and RT Data BackupDB and RT Data Backup File and User FilteringFile and User Filtering Staging MonitoringStaging Monitoring
Never enough disk to hold entire data sampleNever enough disk to hold entire data sample Disk uses power even when files are not Disk uses power even when files are not
accessedaccessed Fraction of Data AccessedFraction of Data Accessed
For each file typeFor each file type In specific time intervalsIn specific time intervals … …
Multi Experiment MonitoringMulti Experiment Monitoring Many experiments sharing computing resourcesMany experiments sharing computing resources