Upload
bryce-harvey
View
215
Download
2
Embed Size (px)
Citation preview
So, Jung-ki
Distributed Computing System LAB
School of Computer Science and Engineering
Seoul National University
Implementation of Package Management in a Cluster
Environment
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
2 / 20
Introduction (1/2)
Supercomputer High performance processor / high network bandwidth
Expensive system but Beowulf system is cost-effective
Motivation Focus on Cluster system
Cluster Management system Manual method / add-on method / integrated method
Registry Central repository of information about all aspects of the computer
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
3 / 20
Introduction (2/2)
Challenge Integrated method has low availability and reliability
Can’t manage computation nodes separately
When failure occurs, system can’t be rejuvenated
Goal ( using Registry ) Improve availability and reliability of integrated method
Administrator can manage a cluster system easily
Restore cluster system with a backup snapshot
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
4 / 20
Supercomputer
0
50
100
150
200
250
300
350
400
450
500
Constellation 0 0 12 25 118 140 79
Cluster 0 0 1 6 32 149 304
MPP 119 219 270 247 319 211 117
SMP 249 241 215 222 31 0 0
SIMD 35 11 2 0 0 0 0
Single P rocessor 97 29 0 0 0 0 0
1993 1995 1997 1999 2001 2003 2005
Domestic Supercomputer
Quantity : 14
Cluster : 4
MPP : 4
Constellation : 6
※ SNU : 2 (51/413)
60.8%
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
5 / 20
Cluster Management System
Manual approach System administrator brings up entire system manually
Add-on method Bring up a frontend node, then add cluster packages
OSCAR / Warewulf / OpenMosix
Integrated method Cluster packages are installed and configured during the in
itial installation Rocks / Scyld
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
6 / 20
Cluster Management System
Software Stack
Linux Kernel
Linux EnvironmentHPC
Device Drivers
Job Schedulingand Launching
Cluster software management
Cluster State management /
Monitoring
Message passing / communication Layer
Parallel code / Grid / computer lab …
OS (Linux)SGEApplication HPC
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
7 / 20
Rocks Overview
Identity System to build and manage a Linux Cluster
Free : Open source project
Goal Make clusters easy
Philosophy Computation nodes are 100% automatically installed
Roll : set of packages
Graph / Kickstart
Run on heterogeneous system architecture
Doesn’t attempt to incrementally update software
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
8 / 20
Rocks system
Architecture
Front-end node
node nodenode node
Local Network
eth1
eth0
eth0 eth0 eth0 eth0
internet
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
9 / 20
What is Registry ?
Central repository of info about all aspects of the computer
Hardware, OS, applications, users information
Function
Retrieve system information
Update / add / delete software
Backup & restore system
Advantage
Easier for applications to access system
Storing large amounts of structured data (system info)
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
10 / 20
Registry Design
ID (primary key)
Name
Membership
CPUs
Rack
Rank
Comment
NodesID (primary key)
Node
MAC
IP
Gateway
Name
Device
Module
Network
ID (primary key)
Node
Name
Version
Release
Install
Package
ID (primary key)
Node
Name
Aliases
ID (primary key)
Name
Appliance
Distribution
Memberships
ID (primary key)
Name
Graph
Node
Appliances
ID (primary key)
Name
Release
Lang
Distribution
Original Relational Schema
Appended Relation
H/Winformation
S/Winformation
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
11 / 20
Strategy of management
Rocks Setup Minimum modification
Take advantage of original Rocks system Deploy cluster system easily
Modify related source codes insert-ethers, kickstart.cgi, Kpp, Kgen, Rgen
Running System Apply package modification
Package management program : add / update / delete packages
DB consistency management program
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
12 / 20
Collection Method Rgen
Registry variables
Package variables
Appendedcomponent
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
13 / 20
Modification Method
frontend node
compute node
1: check command & registry
2: transmit rpm file & command
3: perform command
4: return result(command execution)
5: update registry or handle Error
Insert commandpackage modification
Insert Command
Update package
check cmd
retreive registry
Delete package
Add package
Modify registry
[ registry-on and update cmd ]
[ registry-on and delete cmd ]
[ registry-off and add cmd ]
[ else ]
Packages tablePackage name / version / release
Instruction : Add / update / delete add –c=compute-0-0 –i=amanda-2.4.5-2.i386 add –c=all –i=all del -c=compute-0-0 –i=amanda-2.4.5-2.i386 del -c=all -i=all
Packages tableAdd / delete / update
Compute Nodes
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
14 / 20
Registry consistency
Setup time
When frontend node removes / updates computation node
Dependency : change node table → change package table
Modify Kickstart.cgi / kgen
Apply cascading tables change
※mysql not support transaction property
Running system
Package install / delete / update
Compute node rpm information = frontend node’s registry DB
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
15 / 20
Experiment Setup
Public Ethernet
Frontend node
Compute nodes (14)
Rocks.snu.ac.krCPU 800MhzRAM 768MBHDD 40G
Compute-0-(1~14)CPU 850MhzRAM 1GHDD 10G
468KB
117MB
capacity
3
53
volume
amanda
HPC
name
Experiment Data
1.5GB 479Rocks roll
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
16 / 20
Original Rocks Evaluation
676 703 703 664 711 668 684 690 708 708 669 671 689 689
1104 1138 11401088
1148 1102 1120 1127 1144 1135 1096976 993 1004
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14
compute node
sectransmit service
average service time : 18min 14sec average transmit time : 11min 28sec
Network cardDHCP request
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
17 / 20
Amanda Packages Evaluation
6413 6450
7931
6393 6636 6598
7735
66086197 5905 6194 6205
7283
6228
50235369
6735
5659 5831 5727 56335127
5589 5342 56005244
58645282
0
1000
2000
3000
4000
5000
6000
7000
8000
1 2 3 4 5 6 7 8 9 10 11 12 13 14
compute node
millisec install amanda packages delete amanda pakages
average install time : 6.62 sec Average delete time : 5.57sec
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
18 / 20
HPC Roll Evaluation
212 205
233
188205 201 206 203 211
196 195 197 206195
7584
74 78 81 83 82 78 80 75 77 76 80 75
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14
compute node
sec install hpc packages delete hpc packages
average install time : 3min 38sec average delete time : 1min 18sec
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
19 / 20
Conclusion
Registry takes advantage of cluster system
Improve availability and reliability using Registry
Administrator can manage cluster systems easily
Restore cluster systems with backup snapshots
So, Jung-ki (SNU DCS Lab)
Introduction Related Work Design Evaluation Conclusion
20 / 20
Q & A
Questions or Comments ?
Thank you !