Upload
oswin-lawrence
View
231
Download
5
Embed Size (px)
Citation preview
• Problem is to compute:f(latitude, longitude, elevation, time)
temperature, pressure, humidity, wind velocity
• Approach:– Discretize the domain, e.g., a measurement point every 10 km– Devise an algorithm to predict weather at time t+1 given t
Source: http://www.epm.ornl.gov/chammp/chammp.html
• Uses:- Predict major events,
e.g., El Nino
- Use in setting air emissions standards
Weather Forecasting
An accurate long-range forecast requires huge amounts of cells and hence computations.
Case Study: Global Climate Modelling earth’s surface is approximately 5 x 108 km2
Considering one cell per square km with 15 levels ( ground level up to 14 km high )
6 data values ( update once every minute ) : humidity, temperature, wind, latitude, longitude and height
Throughput: 3 Gigabytes of data per second
• One piece is modeling the fluid flow in the atmosphere– Solve Navier-Stokes problem
• Roughly 100 Flops per grid point with 1 minute timestep
• Computational requirements:– To match real-time, need 5x 1011 flops in 60 seconds = 8 Gflop/s
– Weather prediction (7 days in 24 hours) 56 Gflop/s
– Climate prediction (50 years in 30 days) 4.8 Tflop/s
– To use in policy negotiations (50 years in 12 hours) 288 Tflop/s
• To double the grid resolution, computation is at least 8x • State of the art models require integration of atmosphere,
ocean, sea-ice, land models, plus possibly carbon cycle, geochemistry and more
• Current models are coarser than this
Global Climate Modeling Computation
Weather Forecasting
Computer Computer VisualisatioVisualisation of a n of a HurricaneHurricane
High Resolution Climate Modeling on NERSC-3 – P. Duffy,
et al., LLNL
Protein Folding One of the major challenges in molecular biology. Proteins perform over a thousand different jobs.
( As enzymes they accelerate reactions. They also carry oxygen and antibodies to fight disease. )
Before proteins can go to work they must fold into the correct shape.( The string of amino acids in the protein twist and fold to form the final protein )
Scientists are using supercomputers to discover the rules that describe why a string of amino acids folds into a particular protein.
Protein FoldingResearchers at the Researchers at the Pittsburgh Computing Pittsburgh Computing Center tracked the Center tracked the folding of a small folding of a small protein (300 amino protein (300 amino acids) in water ( ~ acids) in water ( ~ 32000 atoms ).32000 atoms ).
Folding time: Folding time: 1 1 millisecondmillisecond
#FLOPS required: #FLOPS required: 3 x 3 x 10102222
With a With a PetaFLOPPetaFLOP computer the computer the simulation would take simulation would take a year.a year.
IBM are funding a $100M project called IBM are funding a $100M project called The Blue The Blue Gene ProjectGene Project to build a 1 PetaFLOP/s computer to build a 1 PetaFLOP/s computer ( PetaScale Computing – 10( PetaScale Computing – 101515 Flops/s) Flops/s)
The Production of Toy Story
• 140,000 frames rendered for full-length feature film.
• 10,000 seconds required to render each frame.
• ~ 1017 operations
• Operations were distributed over dozens of Sun workstations, ~ 10 MIPS ( millions of instructions per second ) per Sun.
What is Parallel Architecture?
A parallel computer is a collection of processing elements that cooperate to solve large problems fast
• Some broad issues:– Resource Allocation:
• how large a collection? • how powerful are the elements?• how much memory?
– Data access, Communication and Synchronization• how do the elements cooperate and communicate?• how are data transmitted between processors?• what are the abstractions and primitives for cooperation?
– Performance and Scalability• how does it all translate into performance?• how does it scale?
Parallelism:• Provides alternative to faster clock for
performance• Applies at all levels of system design• Is a fascinating perspective from which to view
architecture• Is increasingly central in information
processing
Why Study Parallel Architecture?
Architectural TrendsGreatest trend in VLSI generation is increase in
parallelism– Up to 1985: bit level parallelism: 4-bit -> 8 bit -> 16-bit
• slows after 32 bit
• adoption of 64-bit now under way, 128-bit far (not performance issue)
• great inflection point when 32-bit micro and cache fit on a chip
– Mid 80s to mid 90s: instruction level parallelism
• pipelining and simple instruction sets, + compiler advances (RISC)
• on-chip caches and functional units => superscalar execution
• greater sophistication: out of order execution, speculation, prediction
– to deal with control transfer and latency problems
– Next step: thread level parallelism
0 1 2 3 4 5 6+0
5
10
15
20
25
30
0 5 10 150
0.5
1
1.5
2
2.5
3
Fra
ctio
n o
f to
tal c
ycle
s (%
)
Number of instructions issued
Sp
ee
du
p
Instructions issued per cycle
How far will ILP go?
• Infinite resources and fetch bandwidth, perfect branch prediction and renaming
– real caches and non-zero miss latencies