Upload
jolie
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Large scale simulations of astrophysical turbulence. Axel Brandenburg (Nordita, Copenhagen) Wolfgang Dobler (Univ. Calgary) Anders Johansen (MPIA, Heidelberg) Antony Mee (Univ. Newcastle) Nils Haugen (NTNU, Trondheim) etc. (...just google for Pencil Code ). Overview. - PowerPoint PPT Presentation
Citation preview
Large scale simulations of Large scale simulations of astrophysical turbulenceastrophysical turbulence
Axel Brandenburg (Nordita, Copenhagen)
Wolfgang Dobler (Univ. Calgary)
Anders Johansen (MPIA, Heidelberg)
Antony Mee (Univ. Newcastle)
Nils Haugen (NTNU, Trondheim)
etc.
(...just google for Pencil Code)
2
OverviewOverview• History: as many versions as there are people??• Example of a cost effective MPI code
– Ideal for linux clusters– Pencil formulation (advantages, headaches)– (Radiation: as a 3-step process)
• How to manage the contributions of 20+ people– Development issues, cvs maintainence
• Numerical issues– High-order schemes, tests
• Peculiarities on big linux clusters– Online data processing/visualization
3
Pencil CodePencil Code
• Started in Sept. 2001 with Wolfgang Dobler
• High order (6th order in space, 3rd order in time)
• Cache & memory efficient
• MPI, can run PacxMPI (across countries!)
• Maintained/developed by many people (CVS!)
• Automatic validation (over night or any time)
• Max resolution so far 10243 , 256 procs
4
Range of applicationsRange of applications
• Isotropic turbulence– MHD (Haugen), passive scalar (Käpylä), cosmic rays (Snod, Mee)
• Stratified layers– Convection, radiative transport (T. Heinemann)
• Shearing box– MRI (Haugen), planetesimals, dust (A. Johansen), interstellar (A. Mee)
• Sphere embedded in box– Fully convective stars (W. Dobler), geodynamo (D. McMillan)
• Other applications and future plans– Homochirality (models of origins of life, with T. Multamäki)– Spherical coordinates
5
Pencil formulationPencil formulation
• In CRAY days: worked with full chunks f(nx,ny,nz,nvar)
– Now, on SGI, nearly 100% cache misses
• Instead work with f(nx,nvar), i.e. one nx-pencil• No cache misses, negligible work space, just 2N
– Can keep all components of derivative tensors
• Communication before sub-timestep• Then evaluate all derivatives, e.g. call curl(f,iA,B)
– Vector potential A=f(:,:,:,iAx:iAz), B=B(nx,3)
6
A few headachesA few headaches
• All operations must be combined– Curl(curl), max5(smooth(divu)) must be in one go– out-of-pencil exceptions possible
• rms and max values for monitoring– call max_name(b2,i_bmax,lsqrt=.true.)– call sum_name(b2,i_brms,lsqrt=.true.)
• Similar routines for toroidal average, etc• Online analysis (spectra, slices, vectors)
7
CVS maintainedCVS maintained
• pserver (password protected, port 2301)– non-public (ci/co, 21 people)– public (check-out only, 127 registered users)
• Set of 15 test problems in the auto-test– Nightly auto-test (different machines, web)
• Before check-in: run auto-test yourself• Mpi and nompi dummy module for single
processor machine (or use lammpi on laptops)
8
Switch modulesSwitch modules
• magnetic or nomagnetic (e.g. just hydro)• hydro or nohydro (e.g. kinematic dynamo)• density or nodensity (burgulence)• entropy or noentropy (e.g. isothermal)• radiation or noradiation (solar convection, discs)• dustvelocity or nodustvelocity (planetesimals)• Coagulation, reaction equations• Homochirality (reaction-diffusion-advection equations)
9
Features, problemsFeatures, problems
• Namelist (can freely introduce new params)
• Upgrades forgotten on no-modules (auto-test)
• SGI namelist problem (see pencil FAQs)
10
Pencil Code check-insPencil Code check-ins
11
High-order schemesHigh-order schemes
• Alternative to spectral or compact schemes– Efficiently parallelized, no transpose necessary
– No restriction on boundary conditions
– Curvilinear coordinates possible (except for singularities)
• 6th order central differences in space• Non-conservative scheme
– Allows use of logarithmic density and entropy
– Copes well with strong stratification and temperature contrasts
12
(i) High-order spatial schemes(i) High-order spatial schemes
x
fffffff iiiiii
i 60
945459 321123'
2321123''
180
227270490270272
x
ffffffff iiiiiii
i
Main advantage: low phase errors
13
Wavenumber characteristicsWavenumber characteristics
14
Higher order – less viscosityHigher order – less viscosity
15
Less viscosity – also in Less viscosity – also in shocksshocks
16
(ii) High-order temporal schemes(ii) High-order temporal schemes
),( 111 iiiii utFtww
Main advantage: low amplitude errors
iiii wuu 1
3)()(
0 , uuuu nn
2/1 ,1 ,3/1
1 ,3/2 ,0
321
321
1 ,2/1
2/1 ,0
21
21
1
0
1
1
3rd order
2nd order
1st order
2N-RK3 scheme (Williamson 1980)
17
Shock tube testShock tube test
18
Hydromagnetic turbulence Hydromagnetic turbulence and subgrid scale models?and subgrid scale models?
• Want to shorten diffusive subrange– Waste of resources
• Want to prolong inertial range• Smagorinsky (LES), hyperviscosity, …
– Focus of essential physics (ie inertial range)
• Reasons to be worried about hyperviscosity– Shallower spectra– Wrong amplitudes of resulting large scale fields
42
2
19
Simulations at 512Simulations at 51233
Withhyperdiffusivity
Normaldiffusivity
Biskamp & Müller (2000)
20
The bottleneckThe bottleneck: is a physical effect: is a physical effect
Porter, Pouquet, & Woodward (1998) using PPM, 10243 meshpoints
Kaneda et al. (2003) on the Earth simulator, 40963 meshpoints
(dashed: Pencil-Code with 10243 )
compensated spectrum
21
Bottleneck effect: Bottleneck effect: 1D vs 1D vs 3D3D spectra spectra
Compensated spectra
(1D vs 3D)
22
Relation to ‘laboratory’ 1D spectraRelation to ‘laboratory’ 1D spectra2222
3 )(4)( kuku kdkE kD yxkyxkE zzD d d ),,(2)(
2
1 u
kkkkkkkzk
z d )(4d ),(42
0
2
uu
kk
E
zk
D d 3
0zk
222zkkk
23
Hyperviscous, Smagorinsky, normalHyperviscous, Smagorinsky, normal
Inertial range unaffected by artificial diffusionHau
gen
& B
rand
enbu
rg (
PR
E, a
stro
-ph/
0402
301)
height of bottleneck increased
onset of bottleneck at same position
24
256 processor run at 1024256 processor run at 102433
25
Structure function exponentsStructure function exponents
agrees with She-Leveque third moment
26
Helical dynamo saturation with Helical dynamo saturation with hyperdiffusivityhyperdiffusivity
23231 f
bB kk
for ordinaryhyperdiffusion
42k
221 f
bB kk ratio 125 instead of 5
BJBA 2d
d
t
27
Slow-down explained by magnetic helicity conservation
2f
2m
21m 22 bBB kk
dt
dk
molecular value!!
BJBA 2dt
d
)(2
m
f22 s2m1 ttke
k
k bB
28
MHD equationsMHD equations
JBuA
t
fuuBJu
)(1
lnD
D3122
sc
t
utD
lnD
AB
BJ
Induction
Equation:
Magn.Vectorpotential
Momentum andContinuity eqns
29
Vector potentialVector potential
• B=curlA, advantage: divB=0• J=curlB=curl(curlA) =curl2A• Not a disadvantage: consider Alfven waves
z
uB
t
b
z
bB
t
u
00 and ,
uBt
a
z
aB
t
u02
2
0 and ,
B-formulation
A-formulation 2nd der onceis better than1st der twice!
30
Comparison of A and B methodsComparison of A and B methods
2
2
02
2
2
2
0 and ,z
auB
t
a
z
u
z
aB
t
u
2
2
02
2
0 and ,z
b
z
uB
t
b
z
u
z
bB
t
u
31
Wallclock time versus processor #
nearly linearScaling
100 Mb/s showslimitations
1 - 10 Gb/sno limitation
32
Sensitivity to layout onSensitivity to layout onLinux clustersLinux clusters
yprox x zproc
4 x 32 1 (speed)
8 x 16 3 times slower
16 x 8 17 times slower
Gigabituplink 100 Mbit
link only
24 procsper hub
33
Why this sensitivity to layout?
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9 0 1 2 3 4
All processors need to communicatewith processors outside to group of 24
34
Use exactly 4 columns
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Only 2 x 4 = 8 processors need to communicate outside the group of 24 optimal use of speed ratio between 100 Mb ethernet switch and 1 Gb uplink
35
Fragmentation over many switches
36
Pre-processed data for animationsPre-processed data for animations
37
Ma=3 supersonic turbulenceMa=3 supersonic turbulence
38
Animation of B vectorsAnimation of B vectors
39
Animation of energy spectraAnimation of energy spectra
Very long run at 5123 resolution
40
MRI turbulenceMRI turbulenceMRI = magnetorotational instabilityMRI = magnetorotational instability
2563
w/o hypervisc.t = 600 = 20 orbits
5123
w/o hypervisc.t = 60 = 2 orbits
41
Fully convective starFully convective star
42
Geodynamo simulationGeodynamo simulation
43
Homochirality: competition of left/rightHomochirality: competition of left/rightReaction-diffusion equationReaction-diffusion equation
YrYrYY
XrXrXX222
222
~/
~/
44
ConclusionsConclusions• Subgrid scale modeling can be unsafe (some problems)
– shallower spectra, longer time scales, different saturation amplitudes (in helical dynamos)
• High order schemes– Low phase and amplitude errors
– Need less viscosity
• 100 MB link close to bandwidth limit• Comparable to and now faster than Origin• 2x faster with GB switch• 100 MB switches with GB uplink +/- optimal
45
Transfer equation & parTransfer equation & paraallelizationllelization
Analytic Solution:
Ray direction
Intrinsic Calculation
Processors
SII
d
d
46
The Transfer Equation & The Transfer Equation & ParParaallelizationllelization
Analytic Solution:
Ray direction
Communication
Processors
47
The Transfer Equation & The Transfer Equation & ParallelizationParallelization
Analytic Solution:
Ray direction
Processors
Intrinsic Calculation
48
Current implementationCurrent implementation
• Plasma composed of H and He
• Only hydrogen ionization
• Only H- opacity, calculated analytically
No need for look-up tables
• Ray directions determined by grid geometry
No interpolation is needed
49
Convection with radiationConvection with radiation