Upload
ce107
View
218
Download
0
Embed Size (px)
Citation preview
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 1/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
CriB 2010 Seminar Series
Scientific Computing on the Cloud:
Many Task Computing and other opportunities
Constantinos Evangelinos Pierre F. J. LermusiauxChris Hill Jinshan Xu
MIT Patrick J. Haley Jr.
Earth, Atmosp
heric and Planetary Sciences MIT/Mechanical Engineering
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 2/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Outline
● Many Task Computing● ESSE as an MTC application
● ESSE on clusters, grids and Amazon EC2
● Amazon EC2 for HPC?
● Amazon EC2 for education
● Conclusions
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 3/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Motivation● Could cloud computing be in our future for
climate (ocean and coupled climate) models? – Can it be useful for more than EP or Map-Reduce
type of applications?
– Are the days of having to purchase, install and
maintain personal clusters coming to an end? – Could grant money buy cloud cycles some day?
– Can it be used for HPC instruction?
–Can it be used for Geosciences education?
● What about HPC performance in a virtualmachine environment?
– Issues and middleware
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 4/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Many Task Computing● Loose definition by Foster et al.: high-performance computations
comprising multiple distinct activities, coupled via (for example) file
system operations or message passing. Tasks may be small or large,uniprocessor or multiprocessor, compute-intensive or data-intensive.The set of tasks may be static or dynamic, homogeneous orheterogeneous, loosely or tightly coupled. The aggregate number oftasks, quantity of computing, and volumes of data may be extremelylarge.
● What it is not:
– Plain MPMD (unless one speaks of dynamic/heterogeneous)
– Workflow (only part of the story)
– Capacity computing
– High Throughput computing
– Embarrassingly parallel computing
● Instead of metric jobs/day, metric is units per sec or per hour.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 5/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
DA Motivation● Improve the forecasting capabilities of ocean
data assimilation and related fields viaincreased access to parallelism
● Move existing computational framework to amore modern, non-site specific setup
● Test the opportunities for executing massivetask count workflows on distributed clusters,Grid and Cloud platforms.
● Provide an external outlet to handle peak-demand for compute resources during liveexperiments in the field
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 6/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean Data Assimilationdx =M (x, t) + dη; M the model operator
yko = H (xk, tk) + εk; H the measurement operator
minxJ (x
k,y
k
o; dη, εk, Q(t), R
k); J objective function
Model errors are assumed Brownian:dη = N(0,Q(t)) with E{dη(t) dη(t) T} = Q(t) dt
In fact the models are forced by processes withnoise correlated in space and time (meteo)
Measurement errors follow white Gaussian:
εk
= N(0, Rk)
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 7/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean AcousticsEstimate of the ocean temperature and salinity
fields (and uncertainties) necessary for calculatingacoustic fields and their uncertainties.
Sound-propagation studies often focus on verticalsections. Time is fixed and an acoustic broadband
transmission loss (TL) field is computed for eachocean realization.
A sound source of specific frequency, location
and depth is chosen. The coupled physical-acoustical covariance P for the section iscomputed and non-dimensionalized and used forassimilation of hydrographic and TL data.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 8/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Acoustic climatology maps
● Underwater acoustics transmission loss variability predictions in a 56 x 33km area northeast of Taiwan.
● 2D propagation over 15km distance at 31x31 = 961 grid points X 8directions
● Each job a short 3 minute acoustics 2D ray propagation problem● Distributed on 100 dual-core computer nodes, speed up more than 100
times in real time experiment (SGE overhead of scheduling short jobs)
( )Mean Transmission Loss TL TL STD over depth TL STD over bearing
77km
65km
55dB
65dB
.1 3 dB
.0 1 dB
3 dB
.0 1 dB effect of internal tides
Effect of steepbathymetry
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 9/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Canyon Nx2D acoustics modeling
–OMAS moving sound source
Bathymetry of Mien Hua Canyon
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 10/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
AOSN-II Monterey Bay
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 11/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Error Subspace Statistical Estimation
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 12/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
ESSE Surf. Temp. Error Standard Deviation Forecasts for AOSN-II
Aug 12 Aug 13
Aug 27Aug 24
Aug 14
Aug 28
End of RelaxationSecond Upwelling period
First Upwelling periodStart of Upwelling
Leonard and Ramp, Lead PIs
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 13/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Serial and Parallel ESSE workflows
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 14/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
The ESSE workflow engine● Is actually (for historical and practical reasons)
a heavily modified C-shell script (master )!
– Catches signals to kill all remaining jobs
● Grid Engine, Condor and PBS variants
– Submits and tracks singleton jobs
● Or uses job arrays for scalability
– Further variants depending on I/O strategy:
● Separate pert singletons?
● Input/output to shared or local disk (or mixed)?● Shared directories store files with the execution
status of each of the singleton scripts
●
Singletons need the perturbation number:tricks!
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 15/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Multi-level parallelism in ESSE● Nested ocean model
runs (HOPS) are runin parallel
– Limited parallelism
– 2 or 3 levels
– bi-directional
● SVD calculation isbased on
parallelizableLAPACK routines
● Convergence checkcalculation also.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 16/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
ESSE and ocean acoustics● As things stand ESSE is used to provide the
necessary temperature and salinity informationfor sound propagation studies.
● The ESSE framework can also be extended toacoustic data assimilation. With significantly
more compute power one can compute the whole “acoustic climate” in a 3D region
– providing TL for any source and receiverlocations in the region as a function of time
and frequency,
– by running multiple independent tasks fordifferent sources/frequencies/slices atdifferent times.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 17/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Canyon Nx2D acoustics modeling
● Acoustics transmission loss difference in 6 hours (internal tides or otheruncertainties)
● In future, incorporate with ESSE for uncertainties estimation, computationcost will be 1800 (directions) X 15 locations X HUNDREDS of cases.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 18/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean DA/ESSE/acoustics: MTC● A minimum of hundreds to thousands (and with
increased fidelity tens of thousands) of oceanmodel runs (tens of minutes or more) precededby an equal number of IC perturbations (secs)
● File I/O intensive, both for reading and writing
● Concurrent reads to forcing files etc.
● Thousands of short acoustics runs (mins)
● Future directions for ESSE will generate even
more tasks:
– dynamic path sampling for observing assets
– combined physical-acoustical ESSE
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 19/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
“Real-time” experiments
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 20/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Notable differencesFrom many parameter sweeps and other MTC apps:
● there is a hard deadline associated with the execution of theensemble worflow, as a forecast needs to be timely;
● the size of the ensemble is dynamically adjusted according tothe convergence of the ESSE workflow which is not a DAG;
● individual ensemble members are not significant (and theirresults can be ignored if unavailable) - what is important is thestatistical coverage of the ensemble;
● the full resulting dataset of the ensemble member forecastisrequired, not just a small set of numbers; IC are different for
each ensemble members● individual forecasts within an ensemble, especially in the case
of interdisciplinary interactions and nested meshes, can be parallel programs themselves.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 21/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
And their implications● Deadline: use any Advanced Reservation capabilities available
● Dynamic: means that the actual total compute and data
requirements for the forecast are not known beforehand andchange dynamically
● Dropped members: suggests that failures (due to software orhardware problems) are not catastrophic and can be tolerated.
Moreover runs that have not finished (or even started) by theforecast deadline can be safely ignored provided they do notcollectively represent a systematic hole in the statisticalcoverage.
● I/O needs: mean that relatively high data storage and network
bandwidth constraints will be placed on the underlyinginfrastructure
● Parallel ensemble members: mean that the computerequirements will not be insignificant either.
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 22/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean DA on local clusters● Local Opteron cluster
– Opteron 250 2.4GHz (4GB RAM) computenodes (single gigabit network connection)
– Opteron 2380 2.5GHz (24GB RAM) head node
– 18TB of shared disk (NFS) over 10Gbit Ethernet
– 200Gbit switch backplane
– Grid Engine and Condor co-existing
● Tried both GridEngine and Condor versions of
ESSE workflows. Test 600 member ensemble: – I/O optimizations (all local dirs) 86 to 77 mins
– SGE 10-20% faster than Condor
● without heroic tuning of the latter
h id
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 23/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean DA on the Teragrid● Extensive use of sshfs to share directories for
checking state of runs etc.
● Remote job submissions (over (gsi)ssh)
– part of driver and modified singletons
● Or Condor-C and Glide-in with care if root
● Condor-G will not scale
● Or Personal Condor & Mycluster
System cores pertORNL 2 67.83 1823.99
Purdue 4 6.25 1107.4
local 2 6.21 1531.33
pemodel
Ad f h T id
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 24/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Advantages of the Teragrid● Enormous numbers of theoretically available
cores and very large sizes for storage
– Condor pool supposedly 14-27kcores (~1800)
● Shared high-speed parallel filesystems
● High speed connections to the home cluster
● Suites of Grid software for remote file accessand job submission, control etc.
– Mixed blessing...
● Free after writing the proposal to convinceTeragrid to get the SUs...
Di d f h T id
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 25/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Disadvantages of the Teragrid● Very large heterogeneity in both hardware, O/S and
paths (to scratch disks etc.) requiring mods to the
singleton code – user confusion.● Without advance reservations one cannot be
guaranteed not to have to use multiple Teragrid sitesto reach the desired number of processors within the
deadline. – Backfilling can help but per user job limits also limit
the usability of a single Teragrid site
– Schedulers favor large processor count runs
– Complicated tricks to submit many jobs as one
● Teragrid MPPs not always suitable for scripts
● Careful fetching of results back to home (congestion)
O DA th Cl d
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 26/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Ocean DA on the Cloud● We have been experimenting with the use of
Cloud computing for more traditional HPCusage – including parallel runs of I/O intensivedata parallel ocean models such as MITgcm.
● Given the limitations seen in network
performance it was natural to try andinvestigate the usability of Amazon EC2 forMTC applications such as ESSE.
Cl d M d f
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 27/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Cloud Modes of usage● Stand-alone (batch) on-demand EC2 cluster
– Torque or SGE (all-in-the cloud or remote submits)● Augmented local cluster with EC2 nodes
– We have a Torque setup
–
Used recipes for SGE setup. – Condor use of EC2 too restrictive
– MyCluster dynamic SGE or Condor merged clusters
– Commercial (Univa Unicloud, Sun Cloud
Adapter in Hedeby/SDM) for fully dynamicprovisioning
● Experientation with parallel filesystems:PVFS2/GlusterFS/FhGFS
S i l t/ d l f
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 28/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Serial pert/pemodel performanceSystem cores pert
m1.small 0.5 13.53 2850.14
m1.large 2 9.33 1817.13m1.xlarge 4 9.14 1860.81
c1.medium 2 9.8 1008.11
c1.xlarge 8 6.67 1030.42
m2.2xlarge 4 3.39 779.77
m2.4xlarge 8 3.35 790.86
pemodel
●m1.xxxx AMIs are using Opteron processors●A binary optimized with the Pathscale compilers was used● All cores were loaded.●
I/O is to local disk (EBS is slower, so is NFS that is used forthe centrally coordinating directory of the run)● Total runtime is reported.● Better than 2.5 speedup for m1.small to c1.medium● Nehalems (m2.xxxxx) not the best option for price/perf.
Advantages of the Cloud
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 29/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Advantages of the Cloud● For all intents and purposes the response is immediate.
Currently a request for a virtual EC2 cluster gets satisfied on-
demand, without having to worry about queue times and backfillslots.
● The use of virtual machines allows for deploying the sameenvironment as the home cluster. This provides for a very cleanintegration of the two clusters.
● Having the same software environment also results in no needto rebuild (and in most cases having to revalidate) executables.This means that last minute changes (because of model build-time parameter tuning) can be used ASAP instead of having togo through a buildtest-deploy cycle on each remote platform.
● EC2 allows our virtual clusters to scale at will: (default limit 20)
● Since the remote machines are under our complete control,scheduling software and policies etc. are tuned to our needs.
Cost analysis
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 30/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Cost analysis● Cost-wise for example an ESSE calculation with 1.5GB input data, 960 ensemble memberseach sending back 11MB (for a total of 6.6GB)
would cost:
– 1.5(GB)×0.1+10.56(GB)x0.17 for the data
– 2(hr)x20x0.68 for the computations – For a total of $29.15
● Use of reserved instances would drop pricing
for the cpu usage by more than a factor of 3.● Compare that to the cost of overprovisioning
your local cluster resources to handle the peakload required a few times a year.
Disadvantages of the Cloud
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 31/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Disadvantages of the Cloud● Inhomogeneity needs to be kept in mind or it will bite you
● Any extra security issues need to be worked out.
● EC2 usage needs to be directly paid to Amazon. Amazoncharges by the hour - like a cell-phone, 1 hour 1 sec. counts as2 hours. Charges for data movement in and out of EC2.
● The performance of virtual machines is less than that of “bare
metal”, the difference more pronounced when it comes to I/O.● No persistent large parallel filesystem. One can be constructed
on demand (just like the virtual clusters) but the GigabitEthernet connectivity used throughout Amazon EC2 alongsidethe randomization of instance placement mean that parallelperformance of the filesystem is not up to par. Horror stories...
● Unlike national and state supercomputing facilities, Amazon’sconnections to the home cluster are bound to be slower andresult in file transfer delays.
Future work directions
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 32/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Future work directions● Reimplement the workflow engine.
–Considering Swift – other options? Nimrod?
● Generalize ESSE work-engine away:
– Use with other ocean models (MITgcm,ROMS)
●
Expand production use of ESSE: – Heterogeneous sites on the Teragrid
– Open Science Grid
– MPPs with sufficient support: Blue Gene/P?
● Expand uses for ESSE (and number of tasks):
– ESSE for Acoustics
– ESSE for adaptive sampling
Which sampling on Aug 26 optimally reduces uncertainties on Aug 27?
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 33/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Which sampling on Aug 26 optimally reduces uncertainties on Aug 27?4 candidate tracks, overlaid on surface T fct for Aug 26
ESSE fcts after DA of each track
Aug 24 Aug 26 Aug 27
2-day ESSE fct
ESSE for Track 4
ESSE for Track 3
ESSE for Track 2
ESSE for Track 1DA 1
DA 2
DA 3
DA 4
IC(nowcast) DA
Best predicted relative error reduction: track 1
•
Based on nonlinear error covariance evolution•For every choice of adaptive strategy, an ensembleis computed
Memory Bandwidth
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 34/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Memory Bandwidth
m1.small c1.medium Opteron 1.4GHz
0
1
2
3
4
5
6
5.4
2.6 2.8
5.4
2.6 2.8
5.3 5.61 threadN threads
per thread
● The small instance memory bandwidth appears to be equal to the fullmemory expected from such a platform despite the 50% cpu time throttler –not entirely unexpected for memory bandwidth.
● The faster CPU in the c1.medium instance does considerably worse.
● In fact an original 1st gen 1.4 GHz Opteron system also does worse (DDR2memory in the m1.small instance should help).
● This suggests that for memory bandwidth limited applications the smallinstance may be the most efficient
● The increase of memory bandwidth with the c1.medium instance suggeststhat the 2 cores are not on the same die. This would be an Amazon policy.
Serial Performance
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 35/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Serial PerformanceSystem Class A Class W EP (A) EP (W)
m1.small 132 149 6.66 6.73
c1.medium 312 357 15.59 15.04
ratio 2.36 2.4 2.34 2.23
●NAS NPB serial (geometric mean of all tests except EP) inMop/s
● Compiled with system gcc (generic flags)● A single instance running on the c1.medium case (no
memory bandwidth contention)● 1:2.5 theoretical ratio becomes 1:2.3● Still the price ratio is 1:2● When loading both cores in the the c1.medium the resulting
ratio depends on the memory vs cpu utilizationcharacteristics of the individual benchmark
I/O performance
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 36/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
I/O performance
m1.small c1.medium c1.medium 2 cores
1
10
100
1000
10000
168 284 264
1626 1882 2399
40 42 4173 63 63
/tmp write
/tmp read
NFS write
NFS read
m1.small c1.medium c1.medium 2 cores
1
10
100
1729 28
1629 28
1526 21
14
30 26
Class A /tmp
Class W /tmp
Class A NFSClass W NFS
Serial NAS NPB 3.3 BT-IO; Fortran I/O, MB/s
Serial IOR read and write bandwidth (128KB requests, 1MBblocksize, 100 segments), fsync, MB/s
MPI performance
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 37/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
MPI performance
LAM GridMPI MPICH2 nemesis MPICH2 sock OpenMPI LAM/ACES
0
50
100
150
200
250
57.85 54.6
15.72
58.49
16.44
117.64
81.98 77.07
26.08
83.42
17.99
198.59
unidirectional bandwidth (MB/s)
bidirectional bandwidth (MB/s)
LAM GridMPI MPICH2 nemesis MPICH2 sock OpenMPI LAM/ACES
0
50
100
150
200
250
300
350
81.2 83.46
300
85.87
300
35.83
latency (us)
MPI performance cont
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 38/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
MPI performance cont.
Coupled climate simulation
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 39/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Coupled climate simulation● The MITgcm (MIT General Circulation Model) is
a numerical model designed for study of the
atmosphere, ocean, and climate.
– MPI (+OpenMP) code base, Fortran 77(/90) withsome C bits – very portable
– Memory bandwidth intensive but not entirelymemory bound – also I/O intensive for climateapplications.
● Coupled ocean-coupler-(atmosphere-land-ice)
model on a ~2.8° cubed sphere (6 32x32 faces) – MPMD mode, 3 binaries, up to 6+6+1 processes in
a standard configuration.
ECCO-GODAE
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 40/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
ECCO GODAE● 1 degree MITgcm ocean simulation (including sea-ice)
that computes costs with respect to misfits to
observational data. Automatic differentiation. – Followed by an optimization step that generally will
not fit on EC2 nodes (large memory)
– Loops over – so a lot of data transfer involved.
● 32, 60 or 64 processor runs usually.
● Very I/O intensive (60-120 or more GB input data, 25-200 or more GB output data that need to be kept,more in terms of intermediate files).
● Per process I/O useful but bothersome.
● Ensembles of forward runs less I/O demanding (MTCat large scale?)
Modes of usage
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 41/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Modes of usage● Stand-alone (interactive) on-demand EC2
cluster
● Stand-alone (batch) on-demand EC2 cluster
– Torque or SGE
● Augmented local cluster with EC2 nodes
– We have a Torque setup
– Used recipes for SGE setup.
● Project Hedeby
● Parallel filesystems: PVFS2/GlusterFS/FhGFS● Inhomogeneity needs to be kept in mind
● Security issues need to be worked out.
Optimizing compiler issues
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 42/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
Optimizing compiler issues
● Two high performance compilers that can be deployed without licensingissues for academics and may perform better: Open64 and Sun Studio 12.
– The latter provides an 11.5% performance boost for the geometric meanof the tests (up to 25% for MG).
– The MPI runtime may need to be rebuilt for the new compiler every time
● To use the Intel, Absoft, PGI compilers one can employ a local virtualmachine with a valid software license using the same OS and middleware asthe virtual cluster and then run the executables on the EC2 cluster.
gcc 4.1 open64 4.1 Pathscale 2.5 PGI 6.1 Absoft 10 Intel 9.1 Studio12
0
20
40
60
80
100
120
140
160
180
148.83 150.22 151.11 159.46 149.91 150.97165.91
131.57 139.01 141.25 143.82 139.02 146
Class W
Class A
The economics of Clouds
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 43/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
The economics of Clouds• So can we move to an all-cloud option for our HPC
needs? ̶
The enticement: No more worrying about hardwaremaintenance, upgrades, network administration,possibly system administration (using pre-configured clusters) leading to lower costs.
̶ At the same time “virtual” clusters retain part of the
“cluster”-hugging mentality of some users. ̶ And at the institute level:
̶ No need to worry aboutbuilding/renovating/retrofitting datacenters
̶
And most importantly in days of increasing energycosts, you don't see electricity bills anymore ̶ The carbon impact becomes someone else's
problem.
An exercise
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 44/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
e e c se• Part of an effort at MIT for investigating future needs: ̶ 0.68c/hour for 2 cpu, 8 core Xeon instance on Amazon
EC2 (cheapest option offering fullest flexibility currentlyavailable)
̶ Cost of 158 racks, 2U, low density, 21 node per rackequivalent is 158x21x0.68 is $2256.24 per hour.
̶ Using reserved instances it is 158x21x0.24=$796.3
̶ Assuming an 85% utilization, that amounts to2256.24*24*365*0.85 = $16.8 million per year, ~7 timesour expected electricity bill for a highly efficientdatacenter.2800/3*158*21+2654.4*24*365*0.24=$8.7 per annum
̶ With the cost of building a datacenter the cloud costsmore after 4 (9) (or less for more racks) years.
̶ But sporadic use is very well suited economically to theuse of clouds.
– Gigabit Ethernet limitations for large instance counts.
Education
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 45/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
● Using cloud computing for geoscienceseducation.
– Multi-tiered approach(client-server emphasis)
– Up to one Amazon EC2 instance per student(full cpu power for each student if needed)
– VNC or other remote visualization approach – Menu/forms driven models
– Web interface integrating course material withdemonstrations
– Simulations mimicking experiments run in class
● MPI/OpenMP class taught at MIT (IAP 2008-10)
– EC2 and/or VMware image
Educational uses
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 46/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
● The opportunity tohost all of ESSE's
computationalneeds on EC2allows for a visionof ocean DA foreducation.
● CITE (Cloud-computing
Infrastructure andTechnology forEducation) – NSFSTCI project.
Virtual teaching environment
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 47/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
g
LCML/LEGEND
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 48/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])
● LCML (Legacy Computing Markup Language)is an XML Schema based framework for
encapsulating the process of configuring thebuild-time and run-time configuration of legacybinaries alongside constraints.
●
It was implemented for ocean/climate modelsbut designed for general applications that useMakefiles, imake, cmake, autoconf etc. to setupbuild-time configuration (not ant).
● LEGEND is a Java-based validating GUI generator that parses LCML files describing anapplication and produces a GUI for the user tobuild and run the model.
LEGEND in action
8/8/2019 Crib 2009
http://slidepdf.com/reader/full/crib-2009 49/49
MIT/EAPS & Mech.Eng.C. Evangelinos ([email protected])