Large scale simulations of astrophysical turbulence

Preview:

DESCRIPTION

Large scale simulations of astrophysical turbulence. Axel Brandenburg (Nordita, Copenhagen) Wolfgang Dobler (Univ. Calgary) Anders Johansen (MPIA, Heidelberg) Antony Mee (Univ. Newcastle) Nils Haugen (NTNU, Trondheim) etc. (...just google for Pencil Code ). Overview. - PowerPoint PPT Presentation

Citation preview

Large scale simulations of Large scale simulations of astrophysical turbulenceastrophysical turbulence

Axel Brandenburg (Nordita, Copenhagen)

Wolfgang Dobler (Univ. Calgary)

Anders Johansen (MPIA, Heidelberg)

Antony Mee (Univ. Newcastle)

Nils Haugen (NTNU, Trondheim)

etc.

(...just google for Pencil Code)

2

OverviewOverview• History: as many versions as there are people??• Example of a cost effective MPI code

– Ideal for linux clusters– Pencil formulation (advantages, headaches)– (Radiation: as a 3-step process)

• How to manage the contributions of 20+ people– Development issues, cvs maintainence

• Numerical issues– High-order schemes, tests

• Peculiarities on big linux clusters– Online data processing/visualization

3

Pencil CodePencil Code

• Started in Sept. 2001 with Wolfgang Dobler

• High order (6th order in space, 3rd order in time)

• Cache & memory efficient

• MPI, can run PacxMPI (across countries!)

• Maintained/developed by many people (CVS!)

• Automatic validation (over night or any time)

• Max resolution so far 10243 , 256 procs

4

Range of applicationsRange of applications

• Isotropic turbulence– MHD (Haugen), passive scalar (Käpylä), cosmic rays (Snod, Mee)

• Stratified layers– Convection, radiative transport (T. Heinemann)

• Shearing box– MRI (Haugen), planetesimals, dust (A. Johansen), interstellar (A. Mee)

• Sphere embedded in box– Fully convective stars (W. Dobler), geodynamo (D. McMillan)

• Other applications and future plans– Homochirality (models of origins of life, with T. Multamäki)– Spherical coordinates

5

Pencil formulationPencil formulation

• In CRAY days: worked with full chunks f(nx,ny,nz,nvar)

– Now, on SGI, nearly 100% cache misses

• Instead work with f(nx,nvar), i.e. one nx-pencil• No cache misses, negligible work space, just 2N

– Can keep all components of derivative tensors

• Communication before sub-timestep• Then evaluate all derivatives, e.g. call curl(f,iA,B)

– Vector potential A=f(:,:,:,iAx:iAz), B=B(nx,3)

6

A few headachesA few headaches

• All operations must be combined– Curl(curl), max5(smooth(divu)) must be in one go– out-of-pencil exceptions possible

• rms and max values for monitoring– call max_name(b2,i_bmax,lsqrt=.true.)– call sum_name(b2,i_brms,lsqrt=.true.)

• Similar routines for toroidal average, etc• Online analysis (spectra, slices, vectors)

7

CVS maintainedCVS maintained

• pserver (password protected, port 2301)– non-public (ci/co, 21 people)– public (check-out only, 127 registered users)

• Set of 15 test problems in the auto-test– Nightly auto-test (different machines, web)

• Before check-in: run auto-test yourself• Mpi and nompi dummy module for single

processor machine (or use lammpi on laptops)

8

Switch modulesSwitch modules

• magnetic or nomagnetic (e.g. just hydro)• hydro or nohydro (e.g. kinematic dynamo)• density or nodensity (burgulence)• entropy or noentropy (e.g. isothermal)• radiation or noradiation (solar convection, discs)• dustvelocity or nodustvelocity (planetesimals)• Coagulation, reaction equations• Homochirality (reaction-diffusion-advection equations)

9

Features, problemsFeatures, problems

• Namelist (can freely introduce new params)

• Upgrades forgotten on no-modules (auto-test)

• SGI namelist problem (see pencil FAQs)

10

Pencil Code check-insPencil Code check-ins

11

High-order schemesHigh-order schemes

• Alternative to spectral or compact schemes– Efficiently parallelized, no transpose necessary

– No restriction on boundary conditions

– Curvilinear coordinates possible (except for singularities)

• 6th order central differences in space• Non-conservative scheme

– Allows use of logarithmic density and entropy

– Copes well with strong stratification and temperature contrasts

12

(i) High-order spatial schemes(i) High-order spatial schemes

x

fffffff iiiiii

i 60

945459 321123'

2321123''

180

227270490270272

x

ffffffff iiiiiii

i

Main advantage: low phase errors

13

Wavenumber characteristicsWavenumber characteristics

14

Higher order – less viscosityHigher order – less viscosity

15

Less viscosity – also in Less viscosity – also in shocksshocks

16

(ii) High-order temporal schemes(ii) High-order temporal schemes

),( 111 iiiii utFtww

Main advantage: low amplitude errors

iiii wuu 1

3)()(

0 , uuuu nn

2/1 ,1 ,3/1

1 ,3/2 ,0

321

321

1 ,2/1

2/1 ,0

21

21

1

0

1

1

3rd order

2nd order

1st order

2N-RK3 scheme (Williamson 1980)

17

Shock tube testShock tube test

18

Hydromagnetic turbulence Hydromagnetic turbulence and subgrid scale models?and subgrid scale models?

• Want to shorten diffusive subrange– Waste of resources

• Want to prolong inertial range• Smagorinsky (LES), hyperviscosity, …

– Focus of essential physics (ie inertial range)

• Reasons to be worried about hyperviscosity– Shallower spectra– Wrong amplitudes of resulting large scale fields

42

2

19

Simulations at 512Simulations at 51233

Withhyperdiffusivity

Normaldiffusivity

Biskamp & Müller (2000)

20

The bottleneckThe bottleneck: is a physical effect: is a physical effect

Porter, Pouquet, & Woodward (1998) using PPM, 10243 meshpoints

Kaneda et al. (2003) on the Earth simulator, 40963 meshpoints

(dashed: Pencil-Code with 10243 )

compensated spectrum

21

Bottleneck effect: Bottleneck effect: 1D vs 1D vs 3D3D spectra spectra

Compensated spectra

(1D vs 3D)

22

Relation to ‘laboratory’ 1D spectraRelation to ‘laboratory’ 1D spectra2222

3 )(4)( kuku kdkE kD yxkyxkE zzD d d ),,(2)(

2

1 u

kkkkkkkzk

z d )(4d ),(42

0

2

uu

kk

E

zk

D d 3

0zk

222zkkk

23

Hyperviscous, Smagorinsky, normalHyperviscous, Smagorinsky, normal

Inertial range unaffected by artificial diffusionHau

gen

& B

rand

enbu

rg (

PR

E, a

stro

-ph/

0402

301)

height of bottleneck increased

onset of bottleneck at same position

24

256 processor run at 1024256 processor run at 102433

25

Structure function exponentsStructure function exponents

agrees with She-Leveque third moment

26

Helical dynamo saturation with Helical dynamo saturation with hyperdiffusivityhyperdiffusivity

23231 f

bB kk

for ordinaryhyperdiffusion

42k

221 f

bB kk ratio 125 instead of 5

BJBA 2d

d

t

27

Slow-down explained by magnetic helicity conservation

2f

2m

21m 22 bBB kk

dt

dk

molecular value!!

BJBA 2dt

d

)(2

m

f22 s2m1 ttke

k

k bB

28

MHD equationsMHD equations

JBuA

t

fuuBJu

)(1

lnD

D3122

sc

t

utD

lnD

AB

BJ

Induction

Equation:

Magn.Vectorpotential

Momentum andContinuity eqns

29

Vector potentialVector potential

• B=curlA, advantage: divB=0• J=curlB=curl(curlA) =curl2A• Not a disadvantage: consider Alfven waves

z

uB

t

b

z

bB

t

u

00 and ,

uBt

a

z

aB

t

u02

2

0 and ,

B-formulation

A-formulation 2nd der onceis better than1st der twice!

30

Comparison of A and B methodsComparison of A and B methods

2

2

02

2

2

2

0 and ,z

auB

t

a

z

u

z

aB

t

u

2

2

02

2

0 and ,z

b

z

uB

t

b

z

u

z

bB

t

u

31

Wallclock time versus processor #

nearly linearScaling

100 Mb/s showslimitations

1 - 10 Gb/sno limitation

32

Sensitivity to layout onSensitivity to layout onLinux clustersLinux clusters

yprox x zproc

4 x 32 1 (speed)

8 x 16 3 times slower

16 x 8 17 times slower

Gigabituplink 100 Mbit

link only

24 procsper hub

33

Why this sensitivity to layout?

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

6 7 8 9 0 1 2 3 4

All processors need to communicatewith processors outside to group of 24

34

Use exactly 4 columns

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

16 17 18 19

20 21 22 23

0 1 2 3

4 5 6 7

8 9 10 11

12 13 14 15

Only 2 x 4 = 8 processors need to communicate outside the group of 24 optimal use of speed ratio between 100 Mb ethernet switch and 1 Gb uplink

35

Fragmentation over many switches

36

Pre-processed data for animationsPre-processed data for animations

37

Ma=3 supersonic turbulenceMa=3 supersonic turbulence

38

Animation of B vectorsAnimation of B vectors

39

Animation of energy spectraAnimation of energy spectra

Very long run at 5123 resolution

40

MRI turbulenceMRI turbulenceMRI = magnetorotational instabilityMRI = magnetorotational instability

2563

w/o hypervisc.t = 600 = 20 orbits

5123

w/o hypervisc.t = 60 = 2 orbits

41

Fully convective starFully convective star

42

Geodynamo simulationGeodynamo simulation

43

Homochirality: competition of left/rightHomochirality: competition of left/rightReaction-diffusion equationReaction-diffusion equation

YrYrYY

XrXrXX222

222

~/

~/

44

ConclusionsConclusions• Subgrid scale modeling can be unsafe (some problems)

– shallower spectra, longer time scales, different saturation amplitudes (in helical dynamos)

• High order schemes– Low phase and amplitude errors

– Need less viscosity

• 100 MB link close to bandwidth limit• Comparable to and now faster than Origin• 2x faster with GB switch• 100 MB switches with GB uplink +/- optimal

45

Transfer equation & parTransfer equation & paraallelizationllelization

Analytic Solution:

Ray direction

Intrinsic Calculation

Processors

SII

d

d

46

The Transfer Equation & The Transfer Equation & ParParaallelizationllelization

Analytic Solution:

Ray direction

Communication

Processors

47

The Transfer Equation & The Transfer Equation & ParallelizationParallelization

Analytic Solution:

Ray direction

Processors

Intrinsic Calculation

48

Current implementationCurrent implementation

• Plasma composed of H and He

• Only hydrogen ionization

• Only H- opacity, calculated analytically

No need for look-up tables

• Ray directions determined by grid geometry

No interpolation is needed

49

Convection with radiationConvection with radiation

Recommended