Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Magneto-hydrodynamics simulation in astrophysics
by
Bijia Pang
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of PhysicsUniversity of Toronto
Copyright c© 2011 by Bijia Pang
Abstract
Magneto-hydrodynamics simulation in astrophysics
Bijia Pang
Doctor of Philosophy
Graduate Department of Physics
University of Toronto
2011
Magnetohydrodynamics (MHD) studies the dynamics of an electrically conducting fluid
under the influence of a magnetic field. Many astrophysical phenomena are related to
MHD, and computer simulations are used to model these dynamics. In this thesis, we
conduct MHD simulations of non-radiative black hole accretion as well as fast magnetic
reconnection. By performing large scale three dimensional parallel MHD simulations on
supercomputers and using a deformed-mesh algorithm, we were able to conduct very high
dynamical range simulations of black hole accretion of Sgr A* at the Galactic Center.
We find a generic set of solutions, and make specific predictions for currently feasible
observations of rotation measure (RM). The magnetized accretion flow is subsonic and
lacks outward convection flux, making the accretion rate very small and having a density
slope of around −1. There is no tendency for the flows to become rotationally supported,
and the slow time variability of the RM is a key quantitative signature of this accretion
flow.
We also provide a constructive numerical example of fast magnetic reconnection in a
three-dimensional periodic box. Reconnection is initiated by a strong, localized perturba-
tion to the field lines and the solution is intrinsically three-dimensional. Approximately
30% of the magnetic energy is released in an event which lasts about one Alfven time,
but only after a delay during which the field lines evolve into a critical configuration. In
the co-moving frame of the reconnection regions, reconnection occurs through an X-like
ii
point, analogous to the Petschek reconnection. The dynamics appear to be driven by
global flows rather than local processes.
In addition to issues pertaining to physics, we present results on the acceleration of
MHD simulations using heterogeneous computing systems [83]. We have implemented
the MHD code on a variety of heterogeneous and multi-core architectures (multi-core x86,
Cell, Nvidia and ATI GPU) using different languages (FORTRAN, C, Cell, CUDA and
OpenCL). Initial performance results for these systems are presented, and we conclude
that substantial gains in performance over traditional systems are possible. In particular,
it is possible to extract a greater percentage of peak theoretical performance from some
heterogeneous systems when compared to x86 architectures.
iii
Acknowledgements
It is a pleasure to thank the many people who made this thesis possible.
First I would like to thank my supervisor, Prof. Ue-Li Pen, for his enthusiastic
teaching and inspiring guiding throughout my Ph.D. study.
I want to thank my committee members, Prof. Christopher D. Matzner, Prof.
Stephen W. Morris, Prof. Sabine Stanley, and Prof. Ralph E. Pudritz for their in-
teresting questions and helpful suggestions for the thesis. Specially, I am grateful to
Prof. Matzner, who has devoted a lot of time on my project. Discussion with him always
enlightens me on the research.
I want to thank Kiyoshi Masui and Joachim Harnois-Deraps for editing my draft, and
Gregory Paciga for correcting my presentation slides.
I also want to thank my friends, Xingxing Xing, Bin Guo, Sing-Leung Cheung, Chao
Zhuang, Nan Chen, Jing Wang, Lu Wang, MinXue Liu, Xiaomin Du, Xingyu Liu, and
Jun Hong Liang, who made my life not that boring during my Ph.D. study.
Finally, and most importantly, I want to thank my parents, Li Qin and Li Pang. I
thank them for bringing me to this colourful world, raising me, supporting me, and loving
me. I dedicate this thesis to them.
iv
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Black hole accretion . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Fast magnetic reconnection . . . . . . . . . . . . . . . . . . . . . 9
1.2.3 Accelerate MHD simulation . . . . . . . . . . . . . . . . . . . . . 12
1.3 MHD equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 The properties of MHD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Frozen-in effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.2 Magnetic energy and stress . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Tools for the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.6 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Black hole accretion 20
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.1 Constraining the accretion flow . . . . . . . . . . . . . . . . . . . 21
2.2 Simulation detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 Physical setup and dimensionless physical parameters . . . . . . . 24
2.2.2 Grid setup and numerical parameters . . . . . . . . . . . . . . . . 26
2.3 Simulations and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
v
2.3.1 Character of saturated accretion flows . . . . . . . . . . . . . . . 29
2.4 Rotation measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Observational Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3 Fast magnetic reconnection 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.1 Physical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.2 Numerical setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3.1 Global fast magnetic reconnection . . . . . . . . . . . . . . . . . . 52
3.3.2 What happens on the current sheet? . . . . . . . . . . . . . . . . 58
3.3.3 What happens globally? . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.5 Ideal vs resistive MHD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Accelerate MHD 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 The algorithms of MHD . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Implementation on heterogeneous systems . . . . . . . . . . . . . . . . . 74
4.3.1 x86 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.2 Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.3.3 Nvidia GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3.4 ATI GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Comparative Results and Discussion . . . . . . . . . . . . . . . . . . . . 83
vi
4.4.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5 Conclusion 91
A Rotation measure constraint on accretion flow 101
B Inner boundary conditions 105
B.0.1 Magnetic field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.0.2 Density and pressure . . . . . . . . . . . . . . . . . . . . . . . . . 108
C Supporting Movie for black hole accretion 110
Bibliography 110
vii
List of Tables
2.1 Simulations described in this paper. Columns: Run number; Maximum
resolution relative to the Bondi radius; Radial dynamic range within RB;
grid expansion factor within RB; effective resolution at RB; magnetization
parameter; rotation parameter; range of simulation times over which flow
properties were measured; mean mass accretion rate over this period; and
typical density power law slope (ρ ∝ r−k) over this period. . . . . . . . . 29
4.1 Performance on the multi-core x86 for different box sizes; timings in mil-
liseconds. x86(1) refers to single-core performance; x86(8) to 8. . . . . . 76
4.2 Cell performance while using PPE or varying numbers of SPEs for different
box sizes; timings in milliseconds. . . . . . . . . . . . . . . . . . . . . . 78
4.3 x86 vs NVidia GPU performance for different box sizes; timings in mil-
liseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.4 x86 vs ATI GPU performance for different box sizes; timings in millisec-
onds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.5 Performance comparison for different architectures; timings in millisec-
onds. N-GPU represents Fermi; A-GPU represents ATI HD5870; peak
Gflops represents theoretical peak floating-point performance; peak GB/s
represents the theoretical on-chip bandwidth; . . . . . . . . . . . . . . . 85
viii
List of Figures
1.1 X ray image for Sgr A*. The luminosity of the supermassive black hole is
108 order dimmer than simple theoretical predictions. NASA/CXC/MIT/F.K.
Baganoff et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Image of Submillimeter Array (SMA). Successful measurements of RM
have been done by [56] using Submillimeter Array in 2006. Image courtesy
SMA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Image of solar flare. The time scale of solar flare is 105 faster than the
theoretical model (Sweet-Parker). Courtesy of NASA/SDO and the AIA,
EVE, and HMI science teams. . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Geometry for Sweet-Parker reconnection. The flows come into the thin
reconnection region from up and down half, and go out to two other direc-
tions horizontally. The speed of magnetic reconnection is limited by the
ratio of L/δ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Comparison between conventional supercomputer and heterogeneous plat-
form. To the left is a picture of Scinet supercomputer, which has the
price-to-performance ratio of $100,000 for 1 Tera flops. To the right is my
desktop computer (ATI GPUs inside), which has the price-to-performance
ratio as $400 for 1 Tera flops. Programming on a heterogeneous platform
takes more time and effort. . . . . . . . . . . . . . . . . . . . . . . . . . 13
ix
2.1 2D slice of the simulation for 6003 box at 15 Bondi times. Colour represents
the entropy, and arrows represent the magnetic field vector. The right
panel is the equatorial plane (yz), while the left panel a perpendicular
slice (xy). White circles represent the Bondi radius (rB = 1000). The
fluid is slowly moving, in a state of magnetically frustrated convection. A
movie of this flow is available in the supporting information section of the
electronic edition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Density versus radius. The dotted line represents the density profile for
the Bondi solution, which is the steepest plausible slope at k = 1.5. The
dashed line represents the density scaling for CDAF solution, which is the
shallowest proposed slope with k = 0.5. The solid line is the density profile
from one of our simulations, which is intermediate to the two. . . . . . . 32
2.3 log(β), entropy and radial velocity versus radius. The dashed line vr/cs
represents the radial velocity in units of mach number. The dots vr/cms
represent the radial velocity in units of magnetosonic mach number. The
solid line is the entropy, and we see the entropy inversion which leads to
the slow, magnetically frustrated convection. Inside the inner boundary,
the sound speed is lowered, leading to the lower entropy. The + symbols
are the magnetic field strength, β. . . . . . . . . . . . . . . . . . . . . . 33
2.4 Rotation measure vs time (in units of tB). We chose Rrel = 17, corre-
sponding to Rrel/RB=0.068. Six lines represent three axes: upper set is X
(centered at +3), center is Y (centered at 0) and lower is Z (centered at
-3), with positive and negative directions drawn as solid and dashed lines,
respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 PDF of RM in Figure 2.4. The dashed line represents a Gaussian distribu-
tion. The horizontal axis has been normalized by the standard deviation
in figure 2.4, σRM = 0.63. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
x
2.6 The rotation measure integrant ρBr vs radius and time. The central dark
bar represents the inner boundary, the vertical axis is the Z axis. The
horizontal axis is time, in units of tB; Greyscale represents sign(Br)4
√
ρ|Br|,
which was scaled to be more visually accessible. The coherence time is
longer at large radii and at late times. Several Bondi times are needed to
achieve the steady state regime. . . . . . . . . . . . . . . . . . . . . . . 39
2.7 Autocorrelation for Figure 2.4. X axis represents time lags; Y axis rep-
resents autocorrelation for different Rin. The dotted, dashed, dashed-dot
and solid lines correspond to Rin = 43, 34, 26, 17 respectively. . . . . . . 40
2.8 RM coherence time τ as a function of the inner truncation radius Rrel;
points refer to Rrel = 17, 26, 34 and 43. The bootstrap error of 0.17 dex
is based on the six data, two for each coordinate direction, at each Rrel.
The normalization for Rrel = RB is log10(tlags/tB) = 2.15. . . . . . . . . . 41
3.1 Numerical setup: the sphere in the center of the box represent the area of
the rotational perturbation. up-left is the rotational perturbation looked
from YZ plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 Reconnection for different initial conditions. The total magnetic energy
is an indication of reconnection. The dash-dot line has non-zero mean
magnetic field perturbation, and the reconnected field asymptotes to a
slightly different value. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Reconnection for different resolutions. . . . . . . . . . . . . . . . . . . . 55
3.4 Reconnection for different resolutions near reconnection point. This plots
recenters figure 3.3 to the time of maximum magnetic energy release, and
scales the horizontal and vertical axis to the fractional energy release and
mean alfven time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 2D snapshot during reconnection. current as background color . . . . . . 57
3.6 Geometry of Petscheck solution . . . . . . . . . . . . . . . . . . . . . . . 60
xi
3.7 snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 37 CT . . . . . . . . . 60
3.8 snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 39 CT . . . . . . . . . 61
3.9 snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 41 CT . . . . . . . . . 61
3.10 Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 0 CT for 400 cells . . 62
3.11 Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 10 CT for 400 cells . . 62
3.12 Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 38 CT for 400 cells . . 63
3.13 Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 40 CT for 400 cells . . 63
3.14 Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 42 CT for 400 cells . . 69
3.15 Geometry of global configuration . . . . . . . . . . . . . . . . . . . . . . 70
4.1 Time vs box size for GPU comparison. X axis represents the length of the
box; Y axis represents the time for one time step, Timings in milli second;
Dot diamond is OpenCL on ATI; Dash circle is OpenCL on Nvidia; Dash
dot asterisk is CUDA on Nvidia; Solid is linear fit with slope=3. . . . . 88
5.1 Atacama Large Millimeter/Submillimeter Array (ALMA). ALMA has much
higher sensitivity and higher resolution compared with current sub-millimeter
telescopes. Image courtesy ALMA (ESO/NAOJ/NRAO). . . . . . . . . 92
xii
5.2 The three-dimensional simulation box is for fast magnetic reconnection.
The fast magnetic reconnection is a three dimensional effect, and the global
geometry determines the reconnection. . . . . . . . . . . . . . . . . . . . 93
5.3 Roadmap for Nvidia GPU. DP represents double precision. FLOPS rep-
resents FLoating point Operations per Second, which is a measure for
computing performance. X axis represents the time; Y axis represents the
computing performance. Tesla, Fermi, Kepler, and Maxwell are the family
name of each generation of GPU from Nvidia. . . . . . . . . . . . . . . . 97
A.1 The logarithm of the relativistic RM factor, log10 F (k, kT ). The true RM
integral is modified by a factor F (k, kT ) relative to an estimate in which
the nonrelativistic formula is used, but the inner bound of integration is
set to the radius Rrel at which electrons become relativistic; see equation
A.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
B.1 Vacuum solution of the magnetic field is calculated in the central region.
The field lines outside of the central region show the boundary condition. 108
xiii
Chapter 1
Introduction
1.1 Motivation
Computer technology has been developing rapidly during the last decades and it has
revolutionized practically all aspects of human life. Scientists control satellites using
computers to observe the universe; financial departments use computers to estimate
economic growth; transportation companies use computers to guide trains and monitor
air traffic; computers allow people to pay their bills and order tickets online; children
play computer games and watch videos; people use electronic mail to contact each other
and video-conference makes long distances meetings possible.
While we now use computers for a variety of tasks, they were originally designed for
computing, that is, to perform simple mathematical operations. Today we continue to
use them for such purposes. Thanks to the technology revolution, a typical personal
computer can currently calculate a simple operation (e.g. 1+1=2) 10 billions times in
just one second. Furthermore, the fastest computer cluster to date has the theoretical
performance of 1peta flops (FLoating point Operations Per Second), which means that
it performs this simple equation a quadrillion times in one single second!
Obviously, we do not just use computers to calculate ‘1+1=2’. To utilize their full
1
Chapter 1. Introduction 2
power, people now take advantage of computers to do far more complicated numeri-
cal simulations in different areas, including physics, chemistry, biology, economics, and
engineering.
A computer simulation tries to solve a mathematical model, which is usually expressed
using equations. Different output behaviours depend on the model and on the input
data, and the results of the simulations allow users to setup the experiment and predict
the results at very low cost, since the only apparatus needed is the computer. The
mathematical equations are first translated into programming languages (i.e. code). The
results are then calculated by computers and presented using graphs, videos or other
readable methods for the analysis.
Computer simulations are commonly applied in fluid dynamics, a field known as com-
putational fluid dynamics (CFD). CFD simulations have become a very important tool
for engineers because of their low cost. In aerodynamics, for example, people can study
how air flow affects the objects by constructing a wind tunnel for testing. However,
building such an apparatus will cost much money and time. A good example is HiMAT
(Highly Maneuverable Aircraft Technology), the experimental NASA (National Aero-
nautics and Space Administration) aircraft [25], which was designed for the testing of
high maneuverability for the next generation of fighter planes. NASA found that their
wind tunnel tests of HiMAT suffered a drag problem related to the wings. Redesigning
the aircraft model would cost about $150, 000 and an unacceptable time delay. However,
they were able to redesign it using a computer simulation, costing only $6, 000.
The situation is worse in astronomy, since in many fields an experiment is almost
impossible. For example, there is strong evidence that a supermassive black hole sits at
our Galactic centre. The distance between the black hole and the Earth is about 1020
meters, such that it takes more than 2.6×104 years for light emitted in its neighbourhood
to reach us. We know that black holes have very strong gravitational potentials, making
it difficult for nearby objects to escape falling in. However, due to the gas dynamics in
Chapter 1. Introduction 3
the surrounding region, not all of the matter falls into the black hole. How then can one
investigate how much gas finally falls? The region of the gravitational influence of the
black hole at the Galactic center has a length scale of 1014 meters, and it is impossible
to reproduce experimentally this huge system on Earth. The currently available way to
study the system is to use computers to mimic the black hole in a simulation. Equations
would represent the evolution of the gas dynamics, and numerical parameter constraints
are used to represent the black hole and all the underlying physics. Both of these would
all be translated to computer languages, and after long simulation runs, we could show
how the matter around such a black hole evolves and use data analysis to find out the
percentage of the gas that finally falls into the black hole.
In this thesis, we apply magnetohydrodynamics (MHD), which is an extension of fluid
dynamics (i.e., additional terms are added into the equations of fluid dynamics), to study
phenomena in the universe.
Magnetohydrodynamics (MHD) studies the dynamics of an electrically conducting
fluid under the influence of a magnetic field. If there is no magnetic field present, the
problem reduces to traditional fluid dynamics. However, in most astrophysical settings,
the fluids are highly conductive and observed to be magnetized. Electro-motive forces
generated by magnetic fields will modify the flow, which will in turn affect the field. As
a result, one has to solve both the hydrodynamics and electromagnetics simultaneously.
The interactions between the field and the motion are a source of great interest and many
challenges in the MHD domain.
To simulate MHD in astrophysics is not a simple task, due to its complexity and the
large scales of astrophysical objects. As a result, a highly efficient simulation code and
powerful supercomputers are needed. Fortunately both of these resources are available
here, allowing us to perform this research.
Chapter 1. Introduction 4
1.2 Thesis projects
The majority of matter in the universe exists in the form of plasma. The plasma is
electrically conducting, and can be treated as a fluid if length scales are much larger than
the mean free path of particles. Additionally, it is widely believed that magnetic fields
are present throughout the universe. For instance, magnetic fields are observed from
polarization measures. Therefore, magnetohydrodynamics can be applied in the field
of astrophysics. Compared to laboratory MHD, astrophysical objects have such large
sizes that the electric currents are generated from self-induction rather than electrical
resistance [28]. A consequence of the high conducting fluids in astrophysical environments
is that it allows ideal MHD to be safely adopted.
1.2.1 Black hole accretion
A supermassive black hole, with mass MBH ≃ 4.3×106M⊙ [37] is situated at the Galactic
center, near the compact radio source Sgr A* [60] (Figure 1.1). This black hole has the
largest angular resolution compared to other black holes, and the current best estimate
for the distance to the Galactic center is R0 = 8.33 ± 0.35kpc [37]. This means that
one arcsecond corresponds to only 0.04 parsec (i.e. ∼ 1.2 × 1017cm). Therefore, in
comparison with other astrophysical objects which are much further away, finer details
can be resolved through observational means so as to constrain the fluid dynamics, which
also proves helpful to theoretical modeling.
What is curious about this black hole is that it has a very low luminosity, despite
being embedded in a huge amount of matter (e.g. ne ≃ 130cm−3 at accretion radius,
which is 0.06 pc). This is paradoxical because as matter accretes to it due to the strong
gravitational field, a large amount of energy would be released, which should then be
a source of radiation [14]. The bolometric luminosity is determined by the product of
accretion rate and radiative efficiency. Taking the accretion rate of the Bondi solution
Chapter 1. Introduction 5
Figure 1.1: X ray image for Sgr A*. The luminosity of the supermassive black hole is
108 order dimmer than simple theoretical predictions. NASA/CXC/MIT/F.K. Baganoff
et al.
[19], which describes spherical accretion under the influence of gravitational potential, and
a typical radiative effciency (∼ 10%) [33], the theoretical luminosity is about 1041ergs s−1.
On the other hand, the reported luminosity is about 1033ergs s−1 [13; 35]. Therefore,
there exists a discrepancy between theory and observation of 8 orders of magnitude.
While it is still unclear how to calculate the radiative efficiency, the accretion rate
can be constrained by the properties of linear polarization [76]. Due to the fact that the
magnetized plasma has an anisotropic index of refraction, the position angle will rotate
differently for different frequencies of linearly polarized light,
θ = RMλ2 (1.1)
Chapter 1. Introduction 6
where RM is the rotation measure. In nonrelativistic plasma, the RM is defined as
[80; 34],
RM = (8.1 × 105)∫
neB · dl rad m−1 (1.2)
where ne is the electron density in units of cm−3, dl is the path length element in units
of parsec, and B is the magnetic field in units of Gauss. In the case of ultrarelativistic
thermal plasma, the RM is suppressed by a factor of log γ/2γ2 [76], with γ = kTe/mec2.
It can be shown that the rotation measure is only dependent on the density of the
electrons and the magnetic field on the light path from the source to the observer. As
a result, when the RM value is measured, the electron density (i.e. the accretion rate)
can be determined with certain assumptions for the electron density and magnetic field.
This can be done by assuming a power law density n(r) to attain the expression for
electron density, and taking the condition of equipartition between magnetic, kinetic
and gravitational energy [59] to acquire the expression of magnetic field. The value of
rotation measure was observed to be around 105 rad m−1 [56; 55; 58], by Submillimeter
Array (Figure 1.2) [56], and by Berkeley-Illinois-Maryland Association (BIMA) Array
[55; 21]. This number would lead to an accretion rate of 10−9 − 10−7 M⊙ yr−1, which is
significantly smaller than the Bondi solution (∼ 10−5 M⊙ yr−1).
With the aforementioned constraint on the accretion rate, we can now revisit previous
accretion models. To begin, the Bondi solution is a spherically symmetrical accretion
model under a gravitational field [19]. Matter inside the capture radius, rB = GM/cs02,
falls into the center due to the gravity. The expression for the mass accretion rate is
4πλcrB2ρcs0, with λc = 0.25. If taking the observational data from [14], rB = 0.06pc,
ne=130nf−1/2cm−3, cs0 ≈ 550km s−1 at the Bondi radius, we can find an accretion rate
of: MBondi ∼ 1.4 × 10−5M⊙yr−1. Clearly, this value is too large and does not agree with
the constraint from rotation measures.
Obviously, the Bondi solution is insufficient, as it does not include magnetic fields, ke-
plerian rotation, and is also a one-dimensional model. There have been many other mod-
Chapter 1. Introduction 7
Figure 1.2: Image of Submillimeter Array (SMA). Successful measurements of RM have
been done by [56] using Submillimeter Array in 2006. Image courtesy SMA.
els designed for this low luminosity accretion flow at a later date, including Advection-
Dominated Accretion Flow (ADAF) [65; 63; 27; 61; 72; 64], adiabatic inflow-outflow solu-
tion (ADIOS) [18], Convection-Dominated Accretion Flows (CDAF) [62; 77], Convection-
Dominated Bondi Flow (CDBF) [42], Thermal conductive flow [45; 85], and Stellar wind
[54]. In Advection-Dominated Accretion Flow (ADAF) [65], the entropy generated by
the accretion cannot radiate out of the disk surface and must advect into the black hole
with hot ions. The ions and electrons interact only through inefficient Coulomb collisions,
resulting in a very low radiative efficiency [66; 64]. Unfortunately, the accretion rate in
this model is once again inconsistent with the RM values. An Adiabatic inflow-outflow
solution (ADIOS) [18] was proposed in which a large fraction of the released energy will
be driven away by wind and only a small amount of mass will fall into the centre of the
black hole. This process leads to a much smaller accretion rate compared to the Bondi
solution, however, there is no simulation that can produce this effect. ADAF is convec-
tively unstable [41; 40; 89], and later Convection-Dominated Accretion Flows (CDAF)
[62; 77] and Convection-Dominated Bondi Flow (CDBF) [42] were proposed in which
convection plays an important role in the flow. In both models, the matter in a spherical
Chapter 1. Introduction 8
shell circulates indefinitely in convective eddies, causing the accretion rates to be very
small. Thermal conduction can be important if the conduction time of the plasma is
shorter than the electron cooling time [45; 85]. These simulations showed that the ther-
mal conduction transports energy outward, resulting in a reduction of the accretion rate.
Stellar wind from the stars near the black hole may provide direct matter input for the
low luminosity of Sgr A* [54]. In this calculation, these simulations derived an accre-
tion rate as ∼ 10−8M⊙yr−1, which is comparable to the observations. However, a large
amount of mass can still accrete into the black hole even if the stars do not contribute.
An additional flow was proposed by Pen et al [70], who presented a very subsonic flow
referred to as Magnetically Frustrated Convection, in which the flow is quasi hydrostat-
ically supported by thermal pressure. Pen et al [70] conducted a 14003 grid zones MHD
simulations, in which they found the density slope to be n ∼ 0.72. There is very small
amount of inward energy flux and the buoyant motions are resisted by the magnetic shear
stresses.
Many simulations were performed for the black hole accretion flow, but all suffered
various difficulties. Examples include: no conservation during magnetic reconnection,
some simulations had dimensional problems (one or two dimensional simulation instead
of three), boundary problems, poor dynamical range, or a lack of run time to achieve a
stable result.
Here, we address these problems and continue the effort on magnetically frustrated
convection through the use of our three dimensional large-scale MHD simulations. Our
simulations have an expanding Cartesian grid (i.e. the grid distance increases as one
moves away from the centre), which can achieve a larger box with smaller number of grid
points. An inner boundary at the centre represents the black hole, which removes the
mass and energy at each time step. The box edge is far away enough from the Bondi
radius (rB = GM/c2s0) so as to minimize the outer boundary effects from previous sim-
ulations. The simulations begin with static flow, except keplerian velocity, and uniform
Chapter 1. Introduction 9
magnetic field is subsequently added. Our simulations are able to run long enough to
attain a stable result.
1.2.2 Fast magnetic reconnection
Magnetic reconnection is a process in which magnetic field lines reconnect, causing topo-
logical rearrangements of field lines, and the conversion of magnetic energy into other
forms of energy (e.g. heat, kinetic). It is believed that some energy release in astrophysics
is related to magnetic reconnection, for example, in the case of solar flare. However, the
theoretical predictions for reconnection speed are too slow when compared to the observa-
tions. For example, the Sweet-Parker model [90; 68], which describes opposing magnetic
fields interacting in a thin current sheet, indicates a time scale for solar flare (Figure 1.3)
reconnection that is 105 times slower than the observation [29].
In the Sweet-Parker model (Figure 1.4), two oppositely directed inflows push two
oppositely directed magnetic fields toward a neutral line, causing magnetic reconnection
in a thin region with thickness of 2δ and length of 2L (L ≫ δ). The conversion of
magnetic energy increases the pressures, and the magnetic tension force expels the inflow
plasma. The conservation law and the steady reconnection require that the inward mass
equal the outward mass:
vinL = voutδ (1.3)
As a result, even the outflow plasma can be accelerated to Alfven speed (vA = B/√
4πρ),
the inflow speed is still very small due to the large ratio of L/δ in astrophysics, limiting
the reconnection rate.
To solve this problem, Petschek later proposed the X-point reconnection configuration
[71], in which standing shock waves direct the reconnection outflow. Instead of a thin
current film, a short length (L′) was applied, and the reconnection rate was increased
by√
L/L′, approaching Alfven speed. However, two dimensional simulations cannot
reproduce Petschek’s fast reconnection [16]. Later Priest et al. [74] pointed out that
Chapter 1. Introduction 10
Figure 1.3: Image of solar flare. The time scale of solar flare is 105 faster than the
theoretical model (Sweet-Parker). Courtesy of NASA/SDO and the AIA, EVE, and
HMI science teams.
Chapter 1. Introduction 11
Figure 1.4: Geometry for Sweet-Parker reconnection. The flows come into the thin re-
connection region from up and down half, and go out to two other directions horizontally.
The speed of magnetic reconnection is limited by the ratio of L/δ.
the boundary conditions are crucial for the occurrence of fast magnetic reconnection.
However, there is still no ideal MHD simulation that can realize this, except artificially
enhanced local resistivity [82], or collisionless MHD [15]. Unfortunately, these cases are
still unable to explain the fast reconnection in astrophysics environments, such as solar
flare, which requires ideal collisional MHD.
Both Sweet-Parker and Petschek’s models are two-dimensional, while magnetic re-
connection may occur in three-dimensional space, which means that the previous two-
dimensional simulations may be limited. Furthermore, Petschek’s configuration suffered
geometrical problems: it emphasizes the significance of the microscopic X-point in the
fast reconnection, but ignores the fact that most of the energy and mass are in the global
flow.
Here, we use our three-dimensional ideal MHD simulations to address these issues.
The simulation starts with two oppositely aligned magnetic fields in a periodic boundary
box. An initial perturbation is added in the center of the box, between two opposing mag-
netic field lines. Consequently, we were able to show a three-dimensional fast magnetic
Chapter 1. Introduction 12
reconnection.
1.2.3 Accelerate MHD simulation
Computational simulations use discretization to simulate the real world physics prob-
lems. Usually, a higher grid resolution yields more accurate the results. Additionally,
astrophysical objects have large length scales, which means that a large simulation box
is needed. Both high resolution and large simulation boxes require greater computing
power. Alternatively, clusters of linked computers and massive parallelization have been
used to compute a great amount of data simultaneously. However, the simple accumu-
lation of computers could still be a problem due to the high monetary cost, high power
consumption, and large space occupation.
Thanks to the improvements in technology, computers are becoming cheaper and
faster. Furthermore, with heterogeneous systems being developed [83], it is now becoming
possible to achieve higher performance through the exploration of these new architectures.
Heterogeneous systems typically have a single controlling processor and many com-
puting cores. Examples include Cell processors from IBM, and graphics processing units
(GPU) from both Nvidia and ATI. The controlling processor is responsible for mission
assignment, while the computing cores do the calculation work. As there are many cores
on one heterogeneous platform, calculation can be greatly sped up. Furthermore, hetero-
geneous platforms are also inexpensive, and consume less power and occupy less space,
achieving a perfect price-to-performance ratio.
To demonstrate these economical benefits, we do a rough comparison between Scinet
computer and the GPUs in my desktop computer (Figure 1.5). The Scinet supercomputer
has 30,240 cores, totalling 306 TFlops performance. It cost about 30 million Canadian
dollars, which means $100,000 per Tera flops. My desktop contains two ATI HD5870
GPUs, which have about 5 TFlops performance. The total cost is about $2,000, giving
$400 per Tera flops. This ratio is very impressive. To port the simulations into a hetero-
Chapter 1. Introduction 13
Figure 1.5: Comparison between conventional supercomputer and heterogeneous plat-
form. To the left is a picture of Scinet supercomputer, which has the price-to-performance
ratio of $100,000 for 1 Tera flops. To the right is my desktop computer (ATI GPUs in-
side), which has the price-to-performance ratio as $400 for 1 Tera flops. Programming
on a heterogeneous platform takes more time and effort.
geneous system, the programmers have to rewrite the code; however, given the attractive
price-to-performance ratio, it is worth doing so.
Here, we show our progress on the programming MHD simulations on three different
heterogeneous systems, (i.e. Cell/B.E. [1], Nvidia GPU, and ATI GPU). As different
heterogeneous platforms have various programming languages, we port our FORTRAN
MHD code to C, Cell SDK, CUDA and OpenCL. We present the varied speed-ups for
different heterogeneous platforms, as a guide for the future acceleration of MHD simula-
tion.
1.3 MHD equations
Hannes Alfven brought out the concept of MHD in 1942 [11]. The governing MHD
equations contain both Euler equations and Maxwell’s equations, but with modifications
to represent the interactions between the magnetic field and the motion.
Chapter 1. Introduction 14
Because continuum assumption is used for electromagnetism, the mean free path of
the electrons is assumed to be small in comparison with the Larmor radius, which is the
radius of curvature of the electrons’ orbits in the magnetic field [50]. The Larmor radius
is proportional to the magnetic field, and the mean free path is inversely proportional
to the density. As a result, MHD cannot apply in a rarefied medium or with a strong
magnetic field. We are also more concerned about the conductor rather than the oscil-
lations; assuming the variation of the field is slow, the Maxwell’s displacement currents
are ignored. Furthermore, only the ideal MHD is considered here, and thus diffusion,
viscosity, heat conduction, and resistivity are ignored.
The equations for ideal magnetohydrodynamics are listed below [88], The equation of
continuity is the same as for fluid dynamics [49],
∂ρ
∂t+ ∇ · (ρu) = 0 (1.4)
with u denotes the velocity and ρ represents the density.
The equation of motion is modified by the inclusion of Lorentz force,
∂u
∂t+ (u · ∇)u = −1
ρ∇p +
1
4πρ(∇× B) × B (1.5)
with p represents the gas pressure and B stands for the magnetic field. The last term is
lorentz force, je × B/c = (∇× B) ×B/4π.
We write the equation of heat transfer in the form of conservation of energy, with
additional terms from electromagnetic part in both density and flux terms,
∂
∂t(1
2ρu2 + ρǫ +
B2
8π) = −∇(ρu(
1
2u2 + ǫ + p/ρ +
1
4πB2/ρ) +
1
4πB × (u ×B)) (1.6)
with ǫ is the internal energy per unit mass. The density of energy includes the magnetic
energy B2/8π, and the energy flux includes the Poynting vector cE × H/4π, whose
dissipation part is already neglected in the equation.
To relate the pressure, density and temperature, the equation of state is used,
p = p(ρ, T ) (1.7)
Chapter 1. Introduction 15
The electromagnetic aspect also needs to be included in the MHD equations. The
equations that describe the electromagnetic field in a moving conductor are:
∇ · B = 0 (1.8)
∂B/∂t = ∇× (u× B) (1.9)
As a result, Equation 1.4 to 1.9 are the equations for ideal magnetohydrodynamics.
1.4 The properties of MHD
1.4.1 Frozen-in effect
By combining the equations of continuity and the evolution function for a magnetic field
(i.e. Equation 1.9), we get the following formula,
d
dt(B
ρ) = (
∂
∂t+ u · ∇)
B
ρ= (
B
ρ· ∇)u (1.10)
This formula effectively represents the frozen-in effect of ideal MHD, explained in detail
below [50]. Imagine an element of length (i.e. δl) on a fluid line, which represents a line
that moves with the fluid particles. If the velocity at one end of the element is u, then
the velocity on the other end can be expressed as u+(δl ·∇)u. The length of the element
will change to dt(δl · ∇)u after a time interval dt, which consequently means that:
d
dt(δl) = (δl · ∇)u (1.11)
This expression is exactly the same as Equation 1.10, if δl is substituted by B/ρ. There-
fore, we can determine that the vectors will remain parallel if their initial directions are
the same, and the ratio of their length will not change. Due to the frozen-in effect, par-
ticles of an infinitesimal distance apart will always move on the same lines of magnetic
forces. This is a characteristic of ideal MHD.
Chapter 1. Introduction 16
1.4.2 Magnetic energy and stress
The energy of a magnetic field is defined by B2/2µ per unit volume. The total energy
WM can be defined as [28]:
WM =∫
B2
2µdτ (1.12)
together with Equation 1.9, the rate of change of the magnetic energy for ideal MHD is:
dWM
dt= µ−1
∫
B · [∇× (u ×B)]dτ (1.13)
which can be further simplified as:
dWM
dt= µ−1
∫
{∇[(u×B) × B] + (u× B) · ∇ × B}dτ = −∫
u · (j× B)dτ (1.14)
Therefore, the change of magnetic energy is due to the work done by the magnetic
force j× B. More specifically, this magnetic force can be expressed as:
j× B = −∇B2
2µ+ (B · ∇)
B
µ= −∇B2
2µ+ ∇BB
µ(1.15)
The term B2/2µ represents a hydrostatic pressure if there is magnetic field gradient, and
the term B2/µ represents the tension along magnetic flux tubes.
1.5 Tools for the research
Software – Simulation code
The MHD code we are using was written by Ue-li Pen in 2003, and later expanded
by Phil Arras, ShingKwong Wong, Hugh Merz, Matthias Liebendorfer, Stephen Green,
and Bijia Pang.
The code [69] is a three-dimensional second-order accurate (in space and time) high-
resolution total variation diminishing (TVD) MHD parallel code. Kinetic, thermal, and
magnetic energy are conserved and the divergence of the magnetic field is kept to zero
by flux constrained transport. There is no explicit magnetic and viscous dissipation in
Chapter 1. Introduction 17
the code. The TVD constraints result in non-linear viscosity and resistivity on the grid
scale.
This code is MPI parallelled [46] and OpenMP, and is therefore suited for large scale
simulations. Furthermore, this code is exceptionally fast. By combining the code with
powerful computer clusters, we can simulate the largest and longest MHD simulation yet.
Hardware – Computer clusters
If you attempt to run simulations on a large box with your desktop machine you
will never graduate. Fortunately, we have access to two powerful computer clusters,
which are hundreds of thousands of linked computers. One is called SunnyVale, in the
Canadian Institute for Theoretical Astrophysics (CITA). The other is Scinet, and is the
most powerful supercomputer in Canada. Access to these two clusters provides us the
opportunity to operate high performance parallel computing for MHD simulations.
SunnyVale is the Beowulf cluster in Canadian Institute for Theoretical Astrophysics
(CITA) [2]. Here there are 200 Dell PE1950 compute nodes, and each node contains 2
quad core Intel(R) Xeon(R) E5310 @ 1.60GHz processors, 4GB of RAM, 2 gigE network
interfaces, and a 40GB disk. As a result, there are 1600 CPUs in total.
Scinet [3] has the most powerful supercomputer in Canada, to date. It has 306
TFlops theoretical peak performance, which earned it the rank of No. 16 on June 2009
TOP500 list. Scinet has two large clusters, GPC and TCS; the former one is a very large
x86-based commodity cluster, which is designated for large variety of serial and parallel
applications. The latter one is for jobs that have large numbers of processes/threads or
large memory and low latency interconnect. In the case of GPC, there are 30,240 Intel
Xeon E5540 aka ‘Nehalem’ (2.53GHz) cores, and each node contains 8 cores, 16GB of
RAM. The interconnections are hybrid GigE and InfiniBand.
In addition to computer clusters, a wide variety of heterogeneous platforms [83] are
also available for our research of the acceleration of MHD simulations 1.
1More details about heterogeneous platforms are provided in Chapter 4
Chapter 1. Introduction 18
CITA provides many Nvidia GPUs, ranging from the lower level of early ‘Quadro’
series to the latest ‘Fermi’ family. Moreover, two dual ATI HD 5870 GPUs are also
available for the test. Scinet also provides the newest GPUs from Nvidia, and a cluster
of Cell blade is available.
1.6 Contribution
Chapters 2 to 4 are three technical papers.
The authors for the black hole accretion section include Bijia Pang, Ue-Li Pen,
Christopher D. Matzner, Stephen Green and Matthias Liebendorfer. This section has
been submitted to Monthly Notices of the Royal Astronomical Society. Ue-Li Pen and
Christopher D. Matzner were involved in this project for several years, and both pro-
vided extremely valuable suggestions and devoted a lot of time to the subject. At the
same time, Christopher D. Matzner has spent a considerable amount of time editing the
manuscript. Stephen Green contributed to the non-equal grid and inner boundary setting
of the code. I received the MPI version of the code from Matthias Liebendorfer several
years ago, and he provided tremendous assistance in my early attempts to familiarize
myself with the code.
Regarding the section on fast magnetic reconnection, the authors include Bijia Pang,
Ue-Li Pen and Ethan T. Vishniac. This was published in Physics of Plasma. Ue-Li Pen
and Ethan T. Vishniac contributed a lot of time to the discussion and writing of the
draft. Ethan T. Vishniac is an expert of MHD reconnection and travelled to Toronto
frequently to meet with us during the progress of this project, despite his busy schedule.
Prof. Vishniac also has spent a considerable amount of time editing the manuscript.
The accelerate MHD section was authored by Bijia Pang, Ue-Li Pen and Michael
Perrone. This section has already been posted on arXiv.org e-Print archive, and future
plans include its submission to a computer science conference. Ue-Li Pen contributed
Chapter 1. Introduction 19
to the discussion and draft writing; Michael Perrone was the manager of multi-core
department of IBM TJ Watson research center when I was there and helped with the
discussion.
Chapter 2
Black hole accretion
2.1 Introduction
The radio source Sgr A* at the Galactic centre (GC) is now accepted to be a supermassive
black hole [MBH ≃ 4.3 × 106M⊙: 37], accreting hot gas from its environment [ne ≃
130 cm−3, kBT ≃ 2 keV at 1 arc second: 14]. Interest in the Sgr A* accretion flow is
stimulated by its remarkably low luminosity; by its similarity to other low-luminosity
AGN; by circumstantial evidence for past episodes of bright X-ray emission [79, but see
98] and nearby star formation [53]; and foremost, by its status as an outstanding physical
puzzle.
Supermassive black holes are enigmatic in many respects; for the GC black hole
(GCBH) the enigma is sharpened by a wealth of observational constraints, which permit
detailed, sensitive and spatially resolved studies of its accretion dynamics. Within a
naıve model such as Bondi flow, matter would flow inward at the dynamical rate from
its gravitational sphere of influence, which at ∼ 1′′ is resolved by Chandra. Converted to
radiation with an efficiency ηc2, the resulting luminosity would exceed what is actually
observed by a factor ∼ 105(η/0.1). This wide discrepancy between expectation and
observation has stimulated numerous theoretical explanations, including convection [62;
20
Chapter 2. Black hole accretion 21
77], outflow [18], domination by individual stars’ winds [54], and conduction [91; 45; 85;
87].
2.1.1 Constraining the accretion flow
Because many of its parameters are uncertain, the central density and accretion rate of
the GCBH flow are not strongly constrained by the its emission spectrum [78]; the most
stringent constraints come from observations of the rotation measure [76], now known to
be roughly −5.4 × 105 radm−1 [58]. Interpreting this as arising within a quasi-spherical
flow with magnetic fields in rough equipartition with gas pressure, and adopting the
typical assumption that magnetic fields do not reverse rapidly, we derive a gas density
nH ∼ 105.5 cm−3(RS/Rrel)1/2 at the radius Rrel which dominates the RM integral, namely
where electrons become relativistic; see §A for more detail. If this radius is about 102
Schwarzschild radii (102RS), as in the spectral models of [78], then a comparison between
this density and conditions at the Bondi radius RB ≃ 0.053 pc indicates a density power
law ρ ∝ r−k with k = 1.1 − 1.3; the derived value is rather insensitive to the black
hole mass, the degree of equipartition, and the precise radius at which electrons become
relativistic. (If rapid conduction causes electrons to be nonrelativistic at all radii, the
implied slope falls to 0.8.)
An independent but weak constraint on k comes from recent multi-wavelength ob-
servations of flares in the emission from Sgr A*. Yusef-Zadeh et al [97] favor an in-
terpretation in which these flares originate within regions in which electrons have been
transiently heated and accelerated; using equipartition arguments they estimate a mag-
netic field strength B ∼ 13−15G at 4−10 Schwarzschild radii, implying a total pressure
P > 20 dyn cm−2 at those radii. Because P ∝ r−(k+1), a comparison to the conditions at
RB requires k > 0.6− 0.8. This constraint could be violated if the emitting regions were
sufficiently over-pressured relative to the surrounding gas; however the subsonic rate of
expansion inferred by [96] suggests this is not the case.
Chapter 2. Black hole accretion 22
The density power law k is an important diagnostic, both because it allows one to
estimate the mass accretion rate onto the black hole, and because k takes definite values
within proposed classes of accretion flows. Bondi [19] accretion and ADAFs [advection-
dominated accretion flows, 65], in which gas undergoes a modified free fall, imply k =
3/2 and have long been ruled out [10] by limits on the rotation measure [20]. CDAFs
[convection-dominated accretion flows, 62; 77] and related flows like CDBFs [convection-
dominated Bondi flows, 42], in which convection carries a finite outward luminosity, all
have k = 1/2 outside some small radius: otherwise, convection becomes supersonic [38].
Three classes of flows are known to have intermediate values, 1/2 < k < 3/2, as
suggested by the observations. One of these is the ADIOS [advection-dominated inflow-
outflow solutions, 18], in which mass is lost via a wind from all radii within a rotating
ADAF; however these flows appear to require that low angular momentum material has
been removed from the axis. Another is a class of conductive flows, in which heat is
carried outward by electrons and stifles accretion at large radii [45]. A third consists of
flows which lack any significant outward convective or conductive luminosity [38], but are
nevertheless hydrostatic rather than infalling; this behavior is seen within some numerical
simulations in which magnetized gas is accreted, such as those of Igumenshchev et al. [43]
and Pen et al. [70], who termed the flow “magnetically-frustrated convection”.
We are concerned with the last flow class, as it is physically simple, realizable within
simulations, and consistent with observational constraints. Whether it is physically rel-
evant depends on the strength of conduction in the accretion flow, a question we return
to in § 2.5. Although it is of interest, previous simulations do not suffice to make any
quantitative comparisons between it and the Sgr A* accretion flow. Igumenshchev et
al. [43] have already discussed several shortcomings which afflicted prior numerical work,
such as (1) a lack of energy conservation during magnetic reconnection and (2) simulation
durations too short to capture steady states or secular trends. There are a number of
other roadblocks: (3) Dynamical range: RB is 105 Schwarzschild radii, but the largest
Chapter 2. Black hole accretion 23
simulations yet done have only a factor of ∼ 102 separating their inner and outer bound-
aries; (4) Resolution: numerical solutions are rarely close enough to the continuum limit
to allow turbulent phenomena to be predicted with confidence; (5) Outer boundary con-
ditions: although matter is presumably fed into the accretion flow by stellar winds from
the nuclear star cluster [36], the flow structure and magnetization of this gas is not well
constrained; (6) Inner boundary conditions: the hole interacts with the flow in a manner
which is not fully characterized, and which is likely to dominate the energetics; (7) Mass
injection: stars within RB produce fresh wind material, which have the potential to affect
the final solution [54]; and (8) Plasma physics: close to be black hole, the flow is only
weakly collisional, leading to effects such as anisotropic pressure and conduction, which
may alter the nature of fluid instabilities and the character of heat transport [85]. Po-
tential deviations from ideal MHD become stronger as one approaches the event horizon,
and are discussed further in section 2.5.
In this paper we describe a numerical parameter survey designed to partially overcome
difficulties (1)-(5) in the above list, while making an educated guess regarding (7) and
leaving (6) and (8) to future work. Specifically, we conduct three dimensional, explicitly
energy conserving simulations to the point of saturation – often tens of dynamical times
at RB. We vary the dynamical range and resolution in order to gather information about
the astrophysical limits of these parameters, although they lie beyond our numerical
reach. We push numerical outer boundaries far enough from RB to minimize their effect
on the flow, and we vary the conditions exterior to RB in order to gauge the importance
of magnetization and rotation in the exterior fluid. Our simulations obey ideal MHD, but
are viscous and resistive on the grid scale for numerical reasons; we make no attempt to
capture non-ideal plasma effects. We do not account for stellar mass injection within the
simulation volume. Our gravity is purely Newtonian, and at its base we have a region of
accretion and reconnection rather than a black hole (although we are currently pursuing
relativistic simulations to overcome this limitation). Our numerical approach is described
Chapter 2. Black hole accretion 24
more thoroughly below.
By varying the conditions of gas outside RB and by varying the allocation of grid
zones within RB we are able to disentangle, to some degree, physical and numerical
factors within our results. We also compute integrated quantities related to the value
and time evolution of RM, and draw conclusions regarding the importance of RM(t) as
a powerful discriminant between physical models.
We reiterate that our simulations have two simplifications which could substantially
change the behaviour. 1. Our black hole boundary condition is Newtonian. Since the
deepest potential dominates the dynamics and energy of the flow, a change in this as-
sumption might alter the solution. 2. We assume ideal MHD to hold. As one approaches
the black hole, the Coloumb collision rate is insufficient to guarantee LTE. Plasmas can
thermalize through other plasma processes, but if these fail, strong non-ideal effects could
dominate and lead to rapid conduction. These effects are both strongest at small radii,
potentially modifying the extrapolation to the actual physical parameters. We address
these issues in more detail in section 2.5.
2.2 Simulation detail
2.2.1 Physical setup and dimensionless physical parameters
We wish our simulations to be reasonably realistic with regard to the material which
accretes onto the black hole, but also easily described by a few physical and numerical
parameters. We therefore do not treat the propagation and shocking of individual stellar
winds or turbulent motions, but take the external medium to be initially of constant
density ρ0 and adiabatic sound speed cs0, and imbued with a characteristic magnetic field
B0 and characteristic rotational angular momentum j0 (but no other initial velocity). A
Keplerian gravity field −GM/r2 accelerates material toward a central “black hole” of
Chapter 2. Black hole accretion 25
mass M surrounded by a central accretion zone. The Bondi accretion radius is therefore
RB =GM
c2s0
. (2.1)
We adopt the Bondi time tB = RB/cs0 as our basic time unit; this is 100 years for
the adopted conditions at Sgr A*. All of the initial flow quantities will evolve as a
result of this during the course of the simulation, and we run for many Bondi times
in order to allow the accretion flow to settle into a final state quite different from our
initial conditions. From the above dimensional quantities we define several dimensionless
physical parameters.
The adiabatic index is γ = 5/3; the initial plasma-β parameter, or ratio of gas to
magnetic pressure, is
β0 =8πγρ0c
2s0
B20
; (2.2)
we consider models with β0 = (1, 10, 100, 1000,∞) to capture a wide range of plausible
magnetizations. In our main sequence of simulations we adopt a uniform magnetic field
B0.1
The initial velocity field is v0 = (j0 × r)/r, where r is the separation from the black
hole. The specific vector angular momentum is thus j0 at the rotational equator, with
solid-body rotation on spherical shells away from the equator. A dimensionless rotation
parameter is therefore
RK
RB=
(
j0cs
GM
)2
; (2.3)
here RK = j20/(GM) is the Keplerian circularization radius of the equatorial inflow. (Our
flows never do circularize at RK , both because angular momentum transport alters the
distribution of j, and because gas pressure can never be neglected.)
We impose mass accretion and magnetic field reconnection within a zone of char-
acteristic radius Rin, described below, which introduces the dynamic range parameter
1We also investigated scenarios with Gaussian random field components, in which the dominantwavelengths were some multiple of RB; however we abandoned these, as such fields decay on a Alfvencrossing time, confounding our attempts to quantify the accretion flow, and we did not wish to add aturbulent driver to maintain steady state.
Chapter 2. Black hole accretion 26
RB/Rin. Because it sets the separation between small and large scales and the maximum
depth of the potential well, this ratio has a strong influence on flow properties. One of
our goals is to test how well the flow quantities at high dynamic range can be predicted
from simulations done at lower dynamic range, as the dynamic range appropriate to Sgr
A* is beyond what we can simulate.
2.2.2 Grid setup and numerical parameters
We employ a fixed, variable-spacing Cartesian mesh in which the grid spacing increases
with distance away from the black hole. To simplify our boundary conditions, we hold
the spacing fixed within the inner accretion zone and near the outer boundary. The total
box size is 40003 in units of the minimum grid spacing; however this is achieved within a
numerical grid of only 3003 to 6003 zones. Our grid geometry allows for a large number
of long-duration runs to be performed at respectable values of the dynamic range, while
avoiding coordinate singularities and resolution boundaries. These advantages come at
the cost of introducing an anisotropy into the grid resolution; however we have tested the
code for conservation of angular momentum and preservation of magnetosonic waves, and
found it to be comparable in accuracy to fixed-grid codes with the same resolution. Our
grid expansion factor s = δdxi/dxi takes one value for xi < RB and another, larger value
for xi > RB; this allows us to devote most of our computational effort to the accretion
region of interest, while also pushing the (periodic) outer boundary conditions far away
from this region. The inner expansion factor sin is therefore an important numerical
parameter, related to both the grid’s resolution and its anisotropy where we care most
about the flow.
Within our inner accretion region, magnetic fields are reconnected (relaxed to the
vacuum solution consistent with the external field, see appendix B) and mass and heat
are adjusted (invariably, removed) so that the sound speed and Alfven velocity both
match the Keplerian velocity at RB. The accretion zone is a cube, whose width we hold
Chapter 2. Black hole accretion 27
fixed at 15 in units of the local (uniform) grid separation, so we define Rin = 7.5 dxmin
(but note, the volume of this region is equivalent to a sphere of radius 9.3dxmin.) We
consider it too costly to vary the numerical parameter Rin/dxmin.
Our grid geometry imposes a local dimensionless resolution parameter
ℜ ≡ r
maxi(dxi)(2.4)
(the maximum being over coordinate directions), which depends both on radius and on
angle within the simulation volume. At the inner boundary ℜ ≃ 7.5 − 9.3; ℜ increases
to nearly s−1in ≃ 102 at RB, then decreases toward s−1
out in the exterior region. In §2.3 we
report the effective resolution at the Bondi radius, ℜB = ℜ(RB), along with our results.
2.3 Simulations and results
Our suite of simulations is described in Table 2.1, along with some selected results. We
independently varied the magnetization, rotation, and dynamic range of the flow, as well
as the effective resolution at RB. In order to suppress the lingering effects of our initial
conditions, we ran each simulation for long enough that a total mass equivalent to all the
matter initially within RB was eventually accreted, before assessing the flow structure.
Because most of our runs exhibited a significant suppression of the mass accretion rate M
relative to the Bondi value, this constraint required us to simulate for many tB (typically
20 tB). This requirement put strenuous constraints on our simulations (each of which
required ∼ 3 weeks to complete), and will be a serious limitation on any future simulations
performed at higher dynamic range.
Run RB
dxmin
RB
Rin
1+sin ℜB β0RK
RB
tsimtB
MMBondi
keff2
1 500 67 1.023 40.15 ∞ 0 8 1.02 1.5047
2 250 33 1.013 59.29 ∞ 0 3 1.10 1.5273
2Values are taken from Equation 2.5
Chapter 2. Black hole accretion 28
3 125 17 1.013 48.11 100 0 6-20 0.49 1.2482
4 250 33 1.013 59.29 100 0 6-20 0.31 1.1650
5 500 67 1.023 40.15 100 0 6-20 0.22 1.1399
6 1000 133 1.0315 30.82 100 0 6-10 0.16 1.1253
7 250 33 1.013 59.29 1 0 6-20 0.15 0.9574
8 250 33 1.013 59.29 10 0 6-20 0.26 1.1147
9 250 33 1.013 59.29 1000 0 6-20 0.40 1.2379
10 250 33 1.013 59.29 100 0.1 6-20 0.289 1.1450
11 250 33 1.013 59.29 100 0.5 6-20 0.286 1.1420
12 250 33 1.013 59.29 100 1.0 6-20 0.31 1.1650
133 62.5 33 1.06 14.24 100 0 6-20 0.30 1.1557
144 125 33 1.037 28.94 100 0 6-20 0.33 1.1829
15 250 33 1.013 59.29 ∞ 0.1 6-20 0.615 1.3610
16 250 33 1.013 59.29 ∞ 0.5 6-20 0.621 1.3637
17 250 33 1.013 59.29 ∞ 1.0 6-20 0.759 1.4211
18 250 33 1.013 59.29 1000 0.1 6-20 0.400 1.2379
195 250 33 1.013 59.29 1000 0.1 6-20 0.469 1.2835
20 250 33 1.013 59.29 100 0.1 6-20 0.300 1.1557
21 250 33 1.013 59.29 10 0.1 6-20 0.233 1.0834
22 250 33 1.013 59.29 1 0.1 6-20 0.188 1.0220
23 250 33 1.013 59.29 100 0 6-20 0.340 1.1915
246 500 67 1.0315 31.65 100 0.1 6-20 0.18 1.2434
257 1000 58.9 1.015 64 100 0.1 6-20 0.19 1.0925
3case of 753 grid resolution4case of 1503 grid resolution519-23, B field is along [0 0 1] axis624-25, B field is along [1 2 0] axis7case of 6003 grid resolution
Chapter 2. Black hole accretion 29
Table 2.1: Simulations described in this paper. Columns:
Run number; Maximum resolution relative to the Bondi
radius; Radial dynamic range within RB; grid expansion
factor within RB; effective resolution at RB; magnetiza-
tion parameter; rotation parameter; range of simulation
times over which flow properties were measured; mean
mass accretion rate over this period; and typical density
power law slope (ρ ∝ r−k) over this period.
2.3.1 Character of saturated accretion flows
Figure 2.1 shows the 2D slices for the simulation of our highest resolution 6003 box at
15 Bondi times (case 25) 8. The remaining Figures are all based on case 10, which is
most representative of the whole set of simulations. Figures 2.2 and 2.3 display the
spherically-averaged properties, figure 2.2 shows the spherically-averaged density of the
run; figure 2.3 shows the spherically-averaged radial velocity, β and entropy (normalized
to the Bondi entropy). The entropy inversion is clearly visible, which leads to the slow,
magnetically frustrated convection.
We draw several general conclusions from the runs listed in Table 2.1:
- In the presence of magnetic fields, the flow develops a super-adiabatic temperature
gradient and flattens to k ∼ 1. Gas pressure remains the dominant source of
support at all radii, although magnetic forces are always significant at the inner
radius.
- Mass accretion diminishes with increasing dynamic range, taking values M ≃ (2−
8Movies are also available in various formats at http://www.cita.utoronto.ca/ pen/MFAF/blackhole movie/index.html
Chapter 2. Black hole accretion 30
4)MB(Rin/RB)3/2−k.
- Even significant rotation at the Bondi radius has only a minor impact on the mass
accretion rate, as the flows do not develop rotationally supported inner regions.
- Our results depend only weakly on the effective resolution ℜB.
- In the absence of magnetic fields and rotation, a Bondi flow develops. ([70] further
demonstrated a reversion to Bondi inflow if magnetic fields are suddenly eliminated;
we have not repeated this experiment.)
2.3.1.1 Lack of rotational support
The non-rotating character of the flow casts some doubt on models which depend on
equatorial inflow and axial outflow. Our nonrelativistic simulations cannot rule out an
axial outflow from a spinning black hole, but they certainly show no tendency to develop
rotational support in their inner regions, even after many tens of dynamical times. In
a rotating run, angular momentum is important at first, in preventing the accretion
of matter from the equator. Axial, low-j material does accrete, but some of it shocks
and drives an outflow along the equator (as reported by [70] and [75]). After a few tB
this quadrupolar flow disappears, leaving behind the nearly hydrostatic, slowly rotating
envelope which will persists for our entire simulation time, i.e. tens of tB. We attribute
the persistence of this rotational profile to magnetic braking, as the Alfven crossing time
of the envelope is always shorter than its accretion time. Magnetic fields thus play a role
here which is rather different than in simulations which start from a rotating torus, where
the magneto-rotational instability is the controlling phenomenon; the critical distinction
is the presence of low-angular-momentum gas.
Unlike compact object disks, which accrete high-angular-momentum material and are
guaranteed to cool in a fraction of their viscous time, the GCBH feeds upon low-angular-
momentum matter, and its accretion envelope cannot cool. For both of these reasons
Chapter 2. Black hole accretion 31
Figure 2.1: 2D slice of the simulation for 6003 box at 15 Bondi times. Colour represents
the entropy, and arrows represent the magnetic field vector. The right panel is the equa-
torial plane (yz), while the left panel a perpendicular slice (xy). White circles represent
the Bondi radius (rB = 1000). The fluid is slowly moving, in a state of magnetically
frustrated convection. A movie of this flow is available in the supporting information
section of the electronic edition.
Chapter 2. Black hole accretion 32
−1.5 −1 −0.5 0 0.5 10
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
← Inner boundary ← Bondi radius
log10
(R/RB)
log 10
(ρ)
Figure 2.2: Density versus radius. The dotted line represents the density profile for
the Bondi solution, which is the steepest plausible slope at k = 1.5. The dashed line
represents the density scaling for CDAF solution, which is the shallowest proposed slope
with k = 0.5. The solid line is the density profile from one of our simulations, which is
intermediate to the two.
Chapter 2. Black hole accretion 33
−1.5 −1 −0.5 0 0.5 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
← Inner boundary ← Bondi radius
log10
(R/RB)
log 10
(β),
Vr, &
ent
ropy
← Inner boundary ← Bondi radius← Inner boundary ← Bondi radius← Inner boundary ← Bondi radius
Figure 2.3: log(β), entropy and radial velocity versus radius. The dashed line vr/cs
represents the radial velocity in units of mach number. The dots vr/cms represent the
radial velocity in units of magnetosonic mach number. The solid line is the entropy, and
we see the entropy inversion which leads to the slow, magnetically frustrated convection.
Inside the inner boundary, the sound speed is lowered, leading to the lower entropy. The
+ symbols are the magnetic field strength, β.
Chapter 2. Black hole accretion 34
it is not surprising to discover a thick, slowly rotating accretion envelope rather than a
thin accretion disk. We stress that global simulations, which resolve the Bondi radius
and beyond and continue for many dynamical times, are required to capture the physical
processes which determine the nature of the flow.
2.3.1.2 Dependence on parameters; Richardson extrapolation
We now investigate whether our results for the accretion rate can be distilled into a single,
approximate expression. It is clear from the results in Table 2.1 that rotation affects the
accretion rate in a non-monotonic fashion. However as we have just noted that rotation
plays a minor role in our final results, we are justified in fitting only the non-rotating
runs. Rather than M/MBondi we fit an effective density slope keff defined by
M
MBondi
=(
Rin
RB
)3/2−keff
. (2.5)
There are three major variables: the magnitute of the ambient magnetic field (β0), the
radial dynamical range (RB/Rin), the resolution of the Bondi scale (ℜB). Our fit is
keff = 1.50 − 0.56β−0.0980 + 6.51
(
RB
Rin
)−1.4
− 0.11ℜ−0.48B ; (2.6)
all seven numerical coefficients and exponents were optimized against the 25 runs in Table
2.1 . The form of equation (2.6) is significantly better than others we tested, including
those involving log(RB/Rin) and log(ℜB). It predicts the entries in Table 2.1 to within
a root-mean-square error of only 0.017.
Somewhat unexpectedly, this nonlinear fit to our simulation output recovers the Bondi
solution in the continuum, unmagnetized limit (keff → 3/2 as β0 → ∞, RB/Rin → ∞,
and ℜB → ∞). Moreover the form of the expression allows us to extrapolate, in the
manner of Richardson extrapolation, to conditions we expect are relevant to Sgr A*:
ℜB ∼ ∞, RB/Rin ∼ 105, and β0 ∼ 1 − 5: then, keff ∼ 0.94 − 1.0.
It is encouraging that this result lies in the vicinity of observational constraints,
lending additional credence to the notion that Sgr A* is surrounded by a “magnetically-
Chapter 2. Black hole accretion 35
frustrated” accretion flow. We must recall, however, that this is only an extrapolation
based on simulations which lack potentially important physics such as a relativistic in-
ner boundary and a non-ideal plasma. The absence of an imposed outward convective
luminosity is likely to be the essential element which allows for a lower value of k.
2.4 Rotation measure
The magnitude of RM constrains the density of the inner accretion flow, thereby also
constraining the mass accretion rate and power law index k. Future observations should
provide time series of RM(t), a rich data set which encodes important additional infor-
mation about the nature of the flow. Our goal in this section will be to characterize RM
variability within our own simulations sufficiently well to distinguish them from other
proposed flow classes.
We pause first to consider why RM should vary at all. The rotation of polarization is
determined by an integral (eq. A.1, [86]) which is proportional to∫
neB·dl integrated over
the zone of nonrelativistic electrons. The integral is typically dominated by conditions
at Rrel, the radius where kTe = mec2. Even if ne is reasonably constant, B likely will
change in magnitude and direction as the flow evolves. Given that the dynamical time
at Rrel is under a day, any strongly convective flow should exhibit significant day-to-day
fluctuations in RM; measurements by [58] appear to rule this out. Rotational support
also implies rapid RM fluctuations unless B is axisymmetric. In the highly subsonic
flow of magnetically-frustrated convection, however, RM may vary on much longer time
scales.
Two proposals have been advanced in which RM(t) would be roughly constant.
Within their simulations of thick accretion disks, [84] show that trapping of poloidal
flux lines leads to a rather steady value of RM for observers whose lines of sight are
out of the disk plane. [85] point to the constancy of RM in the steady, radial magnetic
Chapter 2. Black hole accretion 36
configuration which develops due to the saturation of the magneto-thermal instability
(in the presence of anisotropic electron conduction). We suspect that noise at the dy-
namical frequency is to be expected in both these scenarios, which need not exist in a
magnetically frustrated flow. We also note that both scenarios lead to systematically low
values of RM for a given accretion rate, and therefore imply somewhat higher densities
than we inferred from a spherical model; this may be observationally testable.
Our calculation of RM(t) is based on case 10 in Table 2.1. In Figure 2.4 we plot
RM(t) against an analytical estimate of its magnitude. For this purpose we estimate RM
as,
RM ≡ e3
2πme2c4
∫ RB
Rrel
neBdr, (2.7)
integrated along radial rays (two per coordinate axis) through the simulation volume. We
neglect the difference between this expression and one which accounts for the relativistic
nature of electrons within Rrel. We therefore normalize RM to the estimate RMest as,
RMest =e3
2c4m2e
[
GMRrelµene(Rrel)3
11π
]1/2
(2.8)
given by equation (A.5) with F (k, kT ) → 1, 〈cos(θ)〉 → 1/2, β → 10, and k → 1. Because
we do not calculate electron temperature within our simulations, we have the freedom to
vary Rrel and to probe the dependence of coherence time on this parameter. In practice
we chose Rrel = (17, 26, 34, 43)δxmin in order to separate this radius from the accretion
zone (7.5δxmin) and Bondi radius (250δxmin, in this case). Figure 2.4 illustrates RM(t)
along each coordinate axis with the case for Rrel = 17δxmin). As this figure shows, RM
changes slowly and its amplitude agrees with our estimate RMest. In our simulations, we
can measure the full PDF, shown in figure 2.5.
We can ask how well a single measurement of RM constrains the characteristic RM,
say the ensemble-averaged root-mean-square value RMrms. This is a question of how
well a standard deviation is measured from a single observation. From figure 2.5 we
see that the distribution in our simulations is roughly Gaussian with standard deviation
Chapter 2. Black hole accretion 37
0 2 4 6 8 10 12 14 16 18 20
−4
−3
−2
−1
0
1
2
3
4
t/tB
RM
/RM
est
Figure 2.4: Rotation measure vs time (in units of tB). We chose Rrel = 17, corresponding
to Rrel/RB=0.068. Six lines represent three axes: upper set is X (centered at +3), center
is Y (centered at 0) and lower is Z (centered at -3), with positive and negative directions
drawn as solid and dashed lines, respectively.
Chapter 2. Black hole accretion 38
−3 −2 −1 0 1 2 30
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
RM/RMest
/σRM
RM
dis
trib
utio
n
Figure 2.5: PDF of RM in Figure 2.4. The dashed line represents a Gaussian distribution.
The horizontal axis has been normalized by the standard deviation in figure 2.4, σRM =
0.63.
Chapter 2. Black hole accretion 39
Figure 2.6: The rotation measure integrant ρBr vs radius and time. The central dark
bar represents the inner boundary, the vertical axis is the Z axis. The horizontal axis is
time, in units of tB; Greyscale represents sign(Br)4
√
ρ|Br|, which was scaled to be more
visually accessible. The coherence time is longer at large radii and at late times. Several
Bondi times are needed to achieve the steady state regime.
Chapter 2. Black hole accretion 40
−1 −0.5 0 0.5 1 1.5−0.2
0
0.2
0.4
0.6
0.8
1
log10
(tlags
/tB)
auto
corr
elat
ion
dot is Rin=43
dash is Rin=34
dash dot is Rin=26
solid is Rin=17solid is Rin=17solid is Rin=17solid is Rin=17
Figure 2.7: Autocorrelation for Figure 2.4. X axis represents time lags; Y axis repre-
sents autocorrelation for different Rin. The dotted, dashed, dashed-dot and solid lines
correspond to Rin = 43, 34, 26, 17 respectively.
Chapter 2. Black hole accretion 41
−1.25 −1.2 −1.15 −1.1 −1.05 −1 −0.95 −0.9 −0.85 −0.8 −0.75
−0.4
−0.2
0
0.2
0.4
0.6
log10
(Rrel
/RB)
log 10
(tla
gs/t B
)
straight line is best fit with fixed slope 2
Figure 2.8: RM coherence time τ as a function of the inner truncation radius Rrel; points
refer to Rrel = 17, 26, 34 and 43. The bootstrap error of 0.17 dex is based on the six
data, two for each coordinate direction, at each Rrel. The normalization for Rrel = RB is
log10(tlags/tB) = 2.15.
Chapter 2. Black hole accretion 42
σRM = 0.63RMest. One needs to apply Bayes’ Theorem to infer the variance of a Gaussian
from N independent measurements:
∆RMrms =(
2
N
)1/2
σRM (2.9)
To date, no sign change in RM has been observed, suggesting that we only have one in-
dependent measurement. Estimating RMrms from a single data point requires a Bayesian
inversion. Estimating from our simulation with a flat prior, the 95% confidence inter-
val for the ensemble characteristic RM given the one data point spans two orders of
magnitude!
In other words, if in fact RMrms = 5.4 × 106, it is not very surprising that we have
observed RM≃ −5.4×105. The maximum likelihood estimate is RMrms = RM. The 95%
upper bound is RMrms = 33RM, and the lower bound is RMrms = 0.33 RM. More data
is essential to constrain this very large uncertainty.
A visual description of the RM integrand through the flow is shown in figure 2.6. The
time variability time scale is shorter at small radii, and shorter at the beginning of the
simulation. Simulations of many Bondi times with boundaries many Bondi radii away
are necessary to see the characteristic flow patterns.
To be more quantitative, we plot in Figure 2.7 the autocorrelation of RM(t) for
different Rrel. We define the coherence time τ to be the lag at which the autocorrelation
of RM falls to 0.5.
The actual RM radius Rrel is not resolved in our simulations. In order to extrapolate
to physically interesting regimes, we fit a trend to our limited dynamic range. The
characteristic variability time scale is given by the flow speed, so τ ∝ R3relρ(Rrel)/M . For
our characteristic values k ∼ 1, we have τ ∝ R2rel, which we fit to our coherence time,
shown in figure 2.8.
For density profiles shallower than Bondi, the characteristic RM time scale τ is sig-
nificantly longer than the dynamical time (τ ∼ (Rrel/RB)3/2tB). In our fit, it is given by
Chapter 2. Black hole accretion 43
the accretion time
τ ∼ 20(
Rrel
RB
)2 (
Rin
RB
)−1/2
tB, (2.10)
with a relatively large dimensionless prefactor. This indicates a coherence time of order
one year for the conditions at Sgr A*. The actual value of Rrel is uncertain by a factor
of 6, so the expected range could be two months to a year.
This is sufficiently distinct from one day that the distinction between frustrated and
dynamical flows should be readily apparent, once observations span year long baselines.
We will discuss this point more in Section 2.6 below.
2.5 Discussion
In this section we wish to revisit several of the physical processes which are missing from
the current numerical simulations: stellar winds from within RB, the transport of en-
ergy and momentum by nearly collisionless electrons, and the inner boundary conditions
imposed by a central black hole.
Stellar wind input. Our simulations account for the accretion of matter from outside
the Bondi radius inferred from X-ray observations, but not for the direct input of matter
from individual stars in the vicinity of the black hole. [54] raises the possibility that
individual stars may in fact dominate the accretion flow. The wind from a single star at
radius r dominates the flow when its momentum output Mwvw satisfies
Mwvw > 4πr2p(r) → 3.3(10−5M⊙ yr−1)(1000 km s−1) (2.11)
where the evaluation is for a model consistent with RM constraints, in which density
follows nH ≃ 107.3(r/RS)−1cm−3 and pressure follows p ≃ 104(r/RS)−2dyn cm−2 – note
that the criterion is independent of radius for k = 1. The required momentum output,
equivalent to 106.2L⊙/c, is well above the wind force of any of the OB stars observed
within RB. While stars within RB add fresh matter faster than it is accreted by the
Chapter 2. Black hole accretion 44
hole, we can be confident that no single star dominates the flow. If the density slope
is substantially more shallow, for example in a CDAF with k = 1/2, the steller winds
would be a more important factor.
Collisionless transport. In the context of a dilute plasma where Coulomb collisions
are rare, electron thermal conduction has the potential to profoundly alter the flow pro-
file. The importance of this effect depends on the electrons’ ability to freely stream down
their temperature gradient [85], despite the wandering and mirroring induced by an in-
homogeneous magnetic field. The field must be weak for the magneto-thermal instability
to develop, yet weak fields are less resistive to tangling. The thermal conduction is ex-
pected to be strongest in the deep interior of the flow. If electrons actually free stream
inside of 1000 Schwarzschild radii, the electrons could be non-relativistic all the way to
the emission region, changing the interpretation of the RM. This would favour even shal-
lower density profiles, for example the CDAF models. In such a model, the RM might
be expected to vary on time scales of minutes, which appears inconsistent with current
data. If, on the other hand, the free streaming length is short on the inside, it more likely
places the fluid in an ideal regime for the range of radii in our simulations. We therefore
remain agnostic as to the role of thermal conduction in hot accretion flows, although it
remains a primary caveat of the current study. Observations of time variability of RM
will substantially improve our unstanding.
Black hole inner boundary. Our current inner boundary conditions do not resemble
a black hole very closely, apart from the fact that they also allow gas to accrete. As the
inner region dominates the energetics of the flow, we consider it critical to learn how the
black hole modifies our results. We are currently engaged in a follow-up study with a
relativistic inner boundary, to be described in a future paper.
Chapter 2. Black hole accretion 45
2.6 Observational Probes
RM can be measured by several techniques. Currently, efforts have concentrated at
high frequencies, ν ∼ 200 − 300 GHz [57], where the polarization angle varies slowly
with frequency. Accurate measurements over long time baselines allows discrimination
between models. At high frequencies, the SMA and ALMA would allow a steady synaptic
monitoring program. The full time 2 point correlation function extends the measurement
space by one dimension.
At lower frequencies, high spectral resolution is needed to resolve the winding rate,
which is now tractable with broad band high resolution instruments such as the EVLA
and ATCA/CABB. The higher winding rate would allow a much more sensitive measure-
ment of small changes in the RM, which would also be a descriminant between models.
The challenge here is that the polarization fraction drops signficantly with frequency,
requiring a more accurate instrumental polarization model. On the other hand, the very
characteristic λ2 dependence of RM should allow a robust rejection against instrumental
effects.
At lower frequency, the spatial extent of the emission region is also expected to in-
crease. When the emission region approaches the rotation measure screen, one expects
depolarization. Direct polarized VLBI imaging could shed light on this matter. This is
complicated by interstellar scattering, which also increases the apparent angular size. The
changing emission location as a function of frequency may complicate the RM inferences
[32; 24] This can skew the actual value of the infered RM, resulting in an underestimate.
The sign of RM would generically be a more robust quantity, and looking for changes in
the sign of RM could be a proxy for the correlation function.
A separate approach is to use other polarized sources as probes of the flow. One
candidate population is pulsars. At the galactic center, interstellar scattering (Lazio
2000) smears out the pulses, making them difficult to detect directly. But the pulse
averaged flux should still be present. Over the orbit around the black hole, one can
Chapter 2. Black hole accretion 46
measure the time variation of the RM, leading to a probe of the spatial RM variations
in the accretion flow. Some pulsars, such as the crab, exhibit giant pulses, which could
still be visible despite a scattering delay smearing. These could be used to measure the
dispersion measure (DM) along the orbit. The GMRT at 610 MHz would have optimal
sensitivity to detecting the non-pulsating emission from pulsars, and be able to deconfuse
them from the dominant synchrotron emission using rotation measure synthesis [22].
2.7 Summary
A series of new, large dynamical range secular MHD simulation are presented for the
understanding of the low luminosity of the supermassive black hole in the Galactic Center.
These are the first global 3-D MHD simulations which do not face boundary conditions at
outer radii, and impose ingoing boundaries at the interior, running for many Bondi times.
We confirm a class of magnetically frustrated accretion flows, whose bulk properties are
independent of physical and numerical parameters, including resolution, rotation, and
magnetic fields. No significant rotational support nor outward flow is observed in our
simulations. An extrapolation formula is proposed and the accretion rate is consistent
with observational data.
A promising probe for the nature of the accretion flow is the rotation measure, and
its time variability. In this comparison, the dominant free parameter is the electron
temperature. We argued that over the plausible range, from thermal to adiabatic, this
radius varies from 40 to 250 Schwarzschild radii. The RM variations in the simulations
are intermittent, requiring many measurements to determine this last free parameter.
We propose that temporal rotation measure variations are a generic prediction to
distinguish between the wide variety of theoretical models currently under consideration,
ranging from CDAF through ADIOS to ADAF. RM is dominated by the radius at which
electrons turn relativistic, when the flow is still very subrelativistic, and is thus much
Chapter 2. Black hole accretion 47
further out than the Schwarzschild radius. Most models, other than the ones found in
our simulations, involve rapidly flowing plasmas, with Mach numbers near unity. These
generically result in rapid RM variations on time scales of hours to weeks (or in special
cases, it can be infinite). In contrast, our simulations predict variability on time scales
of weeks to years. A major uncertainty in this prediction is the poor statistical measure
of the standard deviation of RM measurement, which requires long term RM monitoring
to quantify.
Future observations of RM time variability, or spatially resolved measurements using
pulsars, will provide valuable information.
Chapter 3
Fast magnetic reconnection
3.1 Introduction
In ideal hydrodynamics, irreversible processes, such as shock waves and vorticity recon-
nection, occur at dynamical speeds, independent of microscopic viscosity parameters.
Weak solutions describe these irreversible discontinuous solutions of the Euler equations.
While smooth flows conserve entropy and vorticity, the infinitesimal discontinuity sur-
faces generate entropy and reconnect vorticity. This can also be understood as a limiting
case starting with finite viscosity, where these surfaces have a finite width.
The ideal limit of MHD poses a new class of problems in dissipative processes.
If two opposing field lines sit nearby, a state of higher entropy can be reached by
reconnecting the field lines, and converting their magnetic energy into fluid entropy. In
the presence of resistivity, this process occurs on a resistive time scale for some relevant
scale. This exaggerates the problem somewhat. Extensive theoretical research on mag-
netic reconnection([17], [73]) has shown that scales intermediate between the size of a
system and resistive scales can be important. Nevertheless, in many astrophysical set-
tings, simple models for reconnection give time scales that are very long, and reconnection
is observed or inferred to occur on much shorter time scales, e.g. for solar flares, more
48
Chapter 3. Fast magnetic reconnection 49
than 1010 times faster than the theory[29]. This has led to the suggestion that magnetic
reconnection in the limit of vanishing resistivity might also go to a weak (discontinuous)
solution, occuring at a finite speed which is insensitive to the value of the resistivity.
The problem is best illustrated by the Sweet-Parker configuration ([90], [68]), where
opposing magnetic fields interact in a thin current sheet, the reconnection layer. This
unmagnetized layer becomes a barrier to further reconnection. In a finite reconnection
region, fluid can escape the reconnection region at alfvenic speeds. Because the reconnec-
tion region is thin, the reconnection speed is reduced from the alfven speed by a factor
of the ratio of the current sheet width to the transverse system size. In the Sweet-Parker
model this factor is the inverse of the square root of the Lundquist number (VAL/η),
with η the plasma resistivity. The predicted sheet widths are typically extremely thin.
Petschek proposed a fast magnetic reconnection solution ([71]) based on the idea that
magnetic reconnection happens in a much smaller diffusive region, called the X-point,
instead of a thin sheet. The global structure is determined by the log of the Lundquist
number, and stationary shocks allow the fluid to convert magnetic energy to entropy.
However, Biskamp’s simulations ([16]) showed that Petschek’s solution is unstable
when Ohmic resistivity becomes very small. In their two dimensional incompressible
resistive MHD simulations, they injected and ejected plasma and magnetic flux across
the boundary. They also changed the boundary condition during the simulation to elim-
inate the boundary current layer. However, considering the current sheet formed in their
simulation, the computation domain may not be big enough. After reproducing differ-
ent scaling simulations results([16], [52]), Priest and Forbes [74] pointed out that it is
the boundary conditions that determine what happens (including Biskamp’s unstable
Petscheck’s simulation) and that sufficiently free boundary conditions can make fast re-
connection happen. However, there is no self-consistent simulation of fast reconnection
reported, except with artificially enhanced local resistivity[82].
To reconcile the observed fast reconnection with its absence in simulations leads to
Chapter 3. Fast magnetic reconnection 50
two possible resolutions: 1) ideal MHD are not the correct equations, and long range
collisionless effects are required, or 2) assumptions about the reconnection regions are
too restrictive. This includes the 2-dimensionality and the boundary conditions.
In exploring of the first possibility, it was found that when integrating with the Hall
term in the MHD equations, or using a kinetic description([15]), it was possible to find
fast reconnection. However, this still didn’t offer any help to the collisional system, which
still has fast magnetic reconnection no matter whether Hall term is present or not; and
also the increase of local resistivity is not generic in astrophysical environments, which
mostly has highly conducting fluids.
For the second possibility, we note that Lazarian & Vishniac (LV99) [51] proposed a
model of fast magnetic reconnection with low amplitude turbulence. Subsequent simu-
lation results [48] support this model. They found that the reconnection rate depends
on the amplitude of the fluctuations and the injection scale, and that Ohmic resistivity
and anomalous resistivity do not affect the reconnection rate. The result that only the
characteristics of turbulence determine the reconnection speed provides a good fit for
reconnection in astrophysical systems.
LV99 offered a solution to fast magnetic reconnection in collisional systems with
turbulence. In this paper, we consider a different problem, whether we could still have fast
reconnection without turbulence. We present an example of fast magnetic reconnection
in ideal three dimensional MHD simulation in the absence of turbulence. Here we explore
a different aspect: 3-D effects and boundary conditions. Traditionally, simulations have
searched for stationary 2-D solutions, or scaling solutions. In the case of fast reconnection,
the geometry changes on an alfvenic time, so these assumptions might not be applicable.
Specifically, we bypass the choice of boundary condition by using a periodic box.
The primary constructive fast reconnection solution, the Petscheck solution, has some
peculiar aspects. The global geometry of the flow, and the reconnection speed, depend on
the details of a microscopic X-point. This X-point actually interacts infinitesimal matter
Chapter 3. Fast magnetic reconnection 51
and energy, so it seems rather surprising that this tiny volume could affect the global
flow. Instead, one might worry about the global flow of the system, which dominates the
energy. We will see that this is particularly important in our simulations.
3.2 Simulation setup
3.2.1 Physical setup
The purpose of the simulation is to study magnetic reconnection and its dynamics. We
start by dividing the volume in two, with each subvolume containing a uniform magnetic
field. In a periodic volume, this results in two current sheets where reconnection can
occur. An initial perturbation is added to trigger the reconnection.
3.2.2 Numerical setup
We have a reference setup, and vary numerical parameters relative to that. Initially the
upper and lower halves of the simulation volume are filled with uniform magnetic fields
whose directions differ by 135 degrees (Figure 3.1). The magnitude of the magnetic field
is the same for every cell, and β, the ratio of gas pressure to magnetic pressure, is set to
one.
There is a rotational perturbation on the interface of the magnetic field, at the center
of the box, inside a sphere of radius 0.05, relative to the box size. The rotational axis
is nearly along the X axis, with a small deviation, which is used to break any residual
symmetry. We use constant specific angular momentum at the equator, with solid body
rotation on shells, which comes from the same initial condition generator as [70]. The
rotational speed is set to equal to the sound speed at a radius of 0.02, and 0.4 sound
speed at the sphere’s equatorial surface
We also tried adding a localized magnetic field perturbation: a random Gaussian
magnetic field, with (β = 1) and correlation length is half of the box, was added in the
Chapter 3. Fast magnetic reconnection 52
same region as the rotational perturbation. Since the only dissipation is numerical, on
the grid scale, a translational velocity [93] was added to the simulation to increase the
numerical diffusion for all the cells in the box. The reference value of the translational
velocity is equal to the sound speed and we measure the time (unit in CT) by box size
divided by the initial sound speed. Varying this by a factor of 2 up or down does not
change the results. At the beginning the Alfven speed is the same as the sound speed.
Different resolutions were tested, from 503 cells to 8003 cells.
3.3 Simulation results
3.3.1 Global fast magnetic reconnection
We use the total magnetic energy as a global diagnostic of the system. Figure 3.2 shows
the evolution of the magnetic energy. The generic feature is the sudden drop of magnetic
energy, which occurs on an alfvenic box crossing time, during which much of the magnetic
energy is dissipated. The onset of this event depends on numerical parameters. Due to
symmetries in the code, an absence of any initial perturbations would maintain the initial
conditions indefinitely.
We can see that when there is no forced diffusion and no initial perturbation, the
magnetic energy is almost stationary. When diffusion is added, the magnetic energy
decays gradually throughout the simulation.
When explicit velocity perturbations are present, all the simulations show a sudden
decrease of magnetic energy, which indicates fast magnetic reconnection. The common
property is that they all have some initial perturbation, either rotational or a strong
localized field perturbation; and the background diffusion only affects how early recon-
nection happens. In order to make sure this fast reconnection is not related to resolution,
we simulate different resolutions, from a 503 box, to a 8003 box, in Figure 3.3. All show
fast reconnection and the resolution only affects the time elapsed before fast reconnec-
Chapter 3. Fast magnetic reconnection 53
Figure 3.1: Numerical setup: the sphere in the center of the box represent the area of
the rotational perturbation. up-left is the rotational perturbation looked from YZ plane.
Chapter 3. Fast magnetic reconnection 54
0 10 20 30 40 50 60 70 80 90 100−0.9
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0
t/CT
log 10
(B2 /2
p 0)
magnetic energy vs time
dash−dot line has diffusion & field perturbation
dash line has no diffusion or perturbation
dot line has only rotational perturbation
solid line has only diffusion
asterisk mark has diffusion & rotational perturbation
Figure 3.2: Reconnection for different initial conditions. The total magnetic energy
is an indication of reconnection. The dash-dot line has non-zero mean magnetic field
perturbation, and the reconnected field asymptotes to a slightly different value.
Chapter 3. Fast magnetic reconnection 55
0 10 20 30 40 50 60 70 80 90 100−0.9
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0
t/CT
log 10
(B2 /2
p 0)
magnetic energy vs time
dash−dot is 800 cells
dash line is 400 cells
dot line is 200 cells
solid line is 100 cells
asterisk mark is 50 cells
Figure 3.3: Reconnection for different resolutions.
Chapter 3. Fast magnetic reconnection 56
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
(t−trec
)/ctrec
(B2 /B
rec2 )−
1
slope for magnetic energy drop vs time
dash−dot is 800 cellsdash line is 400 cellsdot line is 200 cellssolid line is 100 cellsasterisk mark is 50 cells
Figure 3.4: Reconnection for different resolutions near reconnection point. This plots
recenters figure 3.3 to the time of maximum magnetic energy release, and scales the
horizontal and vertical axis to the fractional energy release and mean alfven time.
Chapter 3. Fast magnetic reconnection 57
Figure 3.5: 2D snapshot during reconnection. current as background color
Chapter 3. Fast magnetic reconnection 58
tion happens, though the details of how the delay depends on resolution are still unclear.
In order to give readers a clear detail of the energy drop, we plot the evolution of the
magnetic energy near the reconnection point (± 2 CT) in Figure 3.4. We see a rapid
reconnection event where roughly 30% of the magnetic energy is released in one alvenic
crossing time, whose rate does not depend substantially on resolution, and this is clearly
fast reconnection by any reasonable criteria.
Figure 3.5 shows a rough two dimensional snapshot of current(∝ ▽ × B) during
fast reconnection, with color representing the current magnitude. It is clear that there
are some regions that have reconnection(i.e. high current value) and we will use higher
resolution to analyze them later.
3.3.2 What happens on the current sheet?
We can see there are some regions that have large currents, and the reconnection should
happen there. Now we use high resolution (e.g. 800 cells) to investigate what exactly
happens there. We want to show a snapshot close to the current sheet to see how flow
evolves and what the magnetic field geometry looks like near the current sheet. We
subtract the average value for both magnetic field and velocity in the region close to the
current sheet. This places us in the frame comoving with the fluid. The mean magnetic
field does not participate in the dynamics of reconnection, so its removal allows us to see
the dynamics more clearly.
We present snapshots of three different times during the reconnection: one at the
beginning, one at the middle and one at the end. Each time step snapshot contains three
graphs, with the upper left one has current magnitude as background color and white
line represents magnetic field line, and the lower left one is a snapshot of both magnetic
field (blue dash) and velocity field (red solid), and the right one is the corresponding
magnetic energy plot. Figure 3.7 is the beginning; Figure 3.8 is the middle; Figure 3.9 is
the end;
Chapter 3. Fast magnetic reconnection 59
It is easy to find that the snapshot of both magnetic field line and velocity field line
in figure 3.7 looks like Figure 3.6 [71], which is the geometry of Petschek’s solution for
fast magnetic reconnection. The X-point, which is the reconnection region, is small and
at the center. The tangent of the angle α represent the ratio of inflow to outflow.
3.3.3 What happens globally?
We show the long term and global 2D evolution of both velocity field lines and magnetic
fields for the 4003 simulations, starting from the beginning until reconnection completes
(i.e. from Figure 3.10 to 3.14). These plots are analogous to the plots in the previous
section: the left one is the snapshot of both magnetic and velocity field lines; the center
one is the snapshot of magnetic field lines with current as the background color; and
the corresponding magnetic energy is also included on the right. At the beginning, the
magnetic field lines are opposite and there is no velocity field. Then the initial rotational
perturbation induces two reconnected regions with closed magnetic field loops, one at
each interface. The closed loops are fed by a slow X-point at each interface. Noting
that there is a mean field perpendicular to the plotted surface, these loops are actually
twists in the perpendicular magnetic field. In the bulk region between the interfaces, the
parallel magnetic fields are not yet disturbed much by the perturbation.
In Figure 3.12 we can see the loops to move into the X-point of the opposing loop,
and strong interactions occur. The fluid forms two large circular cells, offset from the
magnetic loops. The energy to drive the fluid flow comes from the reconnection energy
of the magnetic field. This flow pattern enhances the reconnection by driving the fluid
through the X point.
We illustrate the fast reconnection flows in Figure 3.15. Blue dash circles with arrows
represent the magnetic loops. The red field lines with arrows represent the velocity field.
There are two big black X’s in the global frame, which represent the X point for recon-
nection. Because we are using periodic boundary condition, we extend the simulation
Chapter 3. Fast magnetic reconnection 60
Figure 3.6: Geometry of Petscheck solution
Figure 3.7: snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 37 CT
Chapter 3. Fast magnetic reconnection 61
Figure 3.8: snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 39 CT
Figure 3.9: snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 41 CT
Chapter 3. Fast magnetic reconnection 62
Figure 3.10: Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 0 CT for 400 cells
Figure 3.11: Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 10 CT for 400 cells
Chapter 3. Fast magnetic reconnection 63
Figure 3.12: Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 38 CT for 400 cells
Figure 3.13: Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 40 CT for 400 cells
Chapter 3. Fast magnetic reconnection 64
box picture to two other directions, to make the global flow easier to understand. Red
solid lines represent the velocity field in the real box, and red dash-dot line represents
the field line in the extended box.
Reconnection is a local process in the global flow field. To see that, we need to boost
into the comoving frame. Let’s take the right magnetic twist for example: In global frame,
the flow on the right all moves downwards, with the magnetic twist moving at the highest
speed. The X-point is like a saddle point for the flow: the fluid converges vertically, and
diverges horizontally. In the X-point frame, setting the velocity at B to zero, A will move
down and C will move up, which supports the conditions for reconnection.
3.4 Discussion
To summarize, we have found a global flow pattern which reinforces X-point reconnection,
and the resulting fast reconnection in turn drives the global flow pattern. The basic
picture is two dimensional. We did find that a pure 2-D simulation does not show this
fast reconnection. This is easy to understand, since the reconnected field loops are loaded
with matter, and would require resistivity to dissipate. In 3-D, these loops are twists
which are unstable to a range of instabilities, allowing the field loops to collapse. So
three basic ingredients are needed: 1. A global flow which keeps the field lines outside
the X-point at a large opening angle to allow the reconnected fluid to escape, and avoid
the Sweet-Parker time scale. 2. The reconnection energy drives this global flow 3. A
three dimensional instability allows closed (reconnected) field lines to collapse, releasing
all the energy stored in the field.
The problem described here has two geometric dimensionless parameters: the 2 axis
ratios of the periodic box. In addition, there are a number of numerical parameters. We
have varied them to study their effects.
Extending the box in the Y direction (separation between reconnection regions) shuts
Chapter 3. Fast magnetic reconnection 65
off this instability, which might be expected: there are no global flows possible if the two
interaction regions are too far separated. We found the threshold to be Y < 1.2Z. In
the other direction, there appears to be no limit to make Y << Z. Increasing the size
of the Z dimension does not diminish this instability. There is also a dependence on X
(extend along field symmetry axis). Shortening it to one grid cell protects the topology
of field loops, and reconnection is not observed in 2-D simulations.
We changed different initial condition to see whether the fast reconnection is sensitive
to how the initial setup is. After changing the angle of the opposite magnetic field(from
beyond 90 degree to 180 degree), the strength of the rotational perturbation, and axis
of the rotational perturbation, we found that the fast reconnection still appeared. The
boundary condition is kept periodic and we found that the evolution of fluid dynamics
of different initial conditions are similar.
It can be seen that the fast reconnection happens at the two interfaces of the straight
magnetic field at the same time, with a magnetic twist moving towards it on each side.
They are not head-on collision on the magnetic field, but a little separated in transverse
direction. This special geometry helps the magnetic reconnection happen fast, since each
magnetic twist pushes the field line, it also affect the velocity field at the other side and
it helps to increase the outflow speed. If we look back to Sweet-Parker’s solution([90],
[68]), the main problem is that the current sheet is so thin, that even if one accelerates
the outflow to Alfven speed, the mass of outflow is still small, which slows down the
speed of the reconnection. Petschek’s configuration[71] can resolve this problem with a
small reconnection region and finite opening angle for the outflow. In our simulation the
speed of the outflow is further increased by the feedback between the two reconnection
regions.
The solar flare reconnection time scale is about Alfven time scale[29], which is the
order of seconds to minutes.
If there is only magnetic diffusivity(η) present, the diffusive time is τD = L2/η, with L
Chapter 3. Fast magnetic reconnection 66
is the characteristic length. Taking the values from [29], L = 1000km and η is 10−3m2s−1,
τD is 1015s.
Sweet-Parker’s thin current sheet proposed a reconnection time as τSP = L/(VAi/R1/2mi ),
with Rmi = LυAi/η. This makes the reconnection time about 105 Alfven times.
Petschek’s configuration has a reconnection time as τP = L/(αυA), with α is between
0.01 and 0.1 and Alfven speed ∼ 100km/s, and this makes the time scale as 100−1000s.
Our fast reconnection time has the order of Alfven time scale, and Alfven time τA =
L/υA, which is the same order as observed time scales of 20 − 60s [29]. Furthermore,
comparing to LV99, no turbulence is needed or added in our simulations. Our fast
magnetic reconnection time scale is qualitatively similar to the energy release time scale
for solar flares.
3.5 Ideal vs resistive MHD
Our code solves the equations of ideal MHD, without explicit resistivity. Ideal MHD and
also ideal Euler equations lead to discontinuities in the form of weak solutions, where
the solutions become discontinuous. The conserved quantities are still conserved across
these discontinuities, but most derived conserved quantities are not. For example, entropy
increases across shock fronts, vorticity can be generated, and vortex as well as magnetic
field lines can reconnect. These features are all understood from analytical theory.
The TVD algorithm is meant to capture these discontinuous effects by effectively
introducing non-linear viscous and resistive terms on the grid scale while enforcing the
conservation equations and preventing numerical oscillations that arise from differentiat-
ing discontinuous equations.
Our TVD code solves the ideal MHD equations by computing the conservation laws
across cell boundaries, and using the TVD scheme to control oscillations. While there is
no explicit resistivity or viscosity, the solution does capture shocks and discontinuities,
Chapter 3. Fast magnetic reconnection 67
which in the end generates entropy, vorticity and reconnection. This class of algorithms
has a single free numerical parameter, which is the resolution. When we study conver-
gence, we are testing if the results depend on the numerical parameter of cell size. At
each resolution we are solving the ideal MHD equations.
Other approaches could also be taken. One could add sufficient resistivity, diffusivity
and viscosity that shocks and discontinuities do not form on the grid scale. One could test
the dependence of the results on changes in resistivity and viscosity. There are at least
4 numerical and physical parameters that can be varied independently, corresponding to
Reynolds number, Prandtl number, Schmidt number, resolution, etc. One would like to
see if the solution converges to the ideal MHD limit for any ratio of these parameters,
as their dimensionful counterparts go to zero. This is a high dimensional space, and
numerically challenging to explore. In this work, we explore only a subset of this space
in the context of ideal MHD.
Physically this question corresponds to the problem whether all systems always ex-
hibit fast reconnection, independent of micro-physics. As a first step, we offer a construc-
tive existence exploration, without addressing this broader question.
3.6 Summary
We present evidence for fast magnetic reconnection in a global three dimensional ideal
magnetohydrodynamics simulation without any sustained external driving. These global
simulations are self-contained, and do not rely on specified boundary conditions. We have
quantified ranges in parameter space where fast reconnection is generic. The reconnection
is Petscheck-like, and fast, meaning that ∼ 30% of the magnetic energy is released in one
Alfven time.
This example of fast reconnection example relies on two interacting reconnection
regions in a periodic box. It is an intrinsically three dimensional effect. Our interpretation
Chapter 3. Fast magnetic reconnection 68
is that the Petschek-like X-point angles are not determined by microscopic properties at
an infinitesimal boundary where no energy is present, but rather by the global flow far
away from the X-point. Whether or not such configurations are natural in an open system
remains to be seen.
Chapter 3. Fast magnetic reconnection 69
Figure 3.14: Snapshot of magnetic field line on the background of current, and snapshot
of both magnetic and velocity field line, and B2 at 42 CT for 400 cells
Chapter 3. Fast magnetic reconnection 70
Figure 3.15: Geometry of global configuration
Chapter 4
Accelerate MHD
4.1 Introduction
The magnetohydrodynamics equations are nonlinear, and cannot in general be solved
analytically. Thanks to the increasing power of computers, three dimensional simulations
can be used to model these equations numerically. Numerical simulations are crucial
both for understanding the theory of such fluids, and for use in directing real world
experiments. However, in order to achieve realistic representatives of real world problems,
numerical experiments must push computational hardware resources to their limits.
Due to the considerations of compute power per watt or per dollar, new architectures
are being considered to perform calculations. In particular, we examine here heteroge-
neous systems, which consist of two different kinds of processors; one or more general-
purpose conventional processors which control the overall computation, and specialized,
usually multi-core, processor units to which the numerically intensive computing is of-
floaded [83]. There are currently several heterogeneous platforms in use and we will focus
on Cell/B.E. [1] and graphics processing units (GPU) in this paper, and use a multi-core
x86 system for comparison.
The comparison between these systems is not straight-forward because they are typ-
71
Chapter 4. Accelerate MHD 72
ically programmed using different platform-specific languages that may also affect the
performance. The Open Computing Language (OpenCL) [4], is a cross-platform applica-
tion programming interface (API), which is designed for heterogeneous systems, including
GPU and Cell. It has been released by Khronos Group and ameliorates this problem to
some degree; and we will see here that OpenCL can perform equally well as CUDA on
Nvidia GPU, which makes OpenCL a really good choice for heterogeneous programming,
though it is more complicated than CUDA.
We discuss our reference implementation of a solver for the MHD equations in Section
4.2; in Section 4.3 we discuss our implementation on several architectures. We summarize
our results in Section 4.4 and discuss future work; in Section 4.5 we conclude.
4.2 The algorithms of MHD
There are many algorithms for solving these equations, which we will not attempt to
review here. We follow the approach of [69] in this paper, as the conciseness of its
implementation lends itself to re-implementation for the different architectures, and its
memory-access patterns are an excellent match to the heterogeneous architectures dis-
cussed here.
The method is a second-order accurate (in space and time) high-resolution total
variation diminishing (TVD) [39] scheme. The kinetic, thermal, and magnetic energy
are conserved identically and there is no explicit magnetic or viscous dissipation. The
TVD constraints result in non-linear viscosity and resistivity on the grid scale. The TVD
constraint allows the capture of shocks for compressible flows, where the flow becomes
discontinuous.
The code solves magnetohydrodynamics equations, in finite difference and finite vol-
ume scheme. Different from the MHD equations we list in Chapter 1, the integration form
(i.e. conservation of flux) was used in the calculation. We rewrite the MHD equations in
Chapter 4. Accelerate MHD 73
our code as below,
∂tρ + ∇(ρ~v) = 0 (4.1)
∂t(ρ~v) + ∇(ρ~v~v + P∗δ −~b~b) = 0 (4.2)
∂te + ∇[(e + P∗)~v −~b~b · ~v)] = 0 (4.3)
∂t~b = ∇× (~v ×~b) (4.4)
∇ ·~b = 0 (4.5)
Here for numerical convenience the magnetic field b is normalized by a factor of√
4π.
P∗ is total pressure, which equals to the sum of the gas pressure p and the magnetic
pressure b2/2; ρ and e are the mass and total energy densities, where the latter is the
sum of kinetic energy (ρv2/2), internal energy (p/(γ − 1)), and magnetic energy (b2/2).
The code solves the magnetic component and fluid dynamics separately. The former
is solved by a two-dimensional advection-constraint step [69], while for the latter, a
monotone upwind scheme for conservation laws (MUSCL) is used for a one dimensional
fluid advection update [92]. The time step update is based on Courant-Friedrichs-Lewy
(CFL) constraint, which ensures that the fastest wave can’t travel for more than one
grid space in a single time step. The approach is ‘dimensionally split’ in the sense that
updates are first made along the x direction, then y, and then z; memory transposes are
used to reorient the grid between each sweep. This both greatly simplifies the numerical
kernel (which only has to be implemented once) and ensures regular memory access for
each sweep.
The dimensional splitting reduces the fluid update to one dimensional dynamics:
∂t~u + ∇x~F = 0. (4.6)
This is discretized into finite volumes, ensuring conservation. The fluxes are calculated
using MUSCL, a first-order upwind scheme with a second-order TVD (Van Leer limiter)
Chapter 4. Accelerate MHD 74
correction. Time integration is performed using a second-order Runge-Kutta scheme. To
solve the complex upwind problem that is involved with momentum and energy fluxes,
relaxing TVD [44] is used for the Euler equations.
The magnetic update is reduced to a two dimensional advection-constraint step consis-
tent with Equation 4.4 and to ensure the constraint given by Equation 4.5. In constrained
transport [30], one stores the magnetic flux at the cell face, which can then be used to
accurately maintain a zero divergence of magnetic field.
In addition, [69] proposed not storing all the computed electromotive forces (EMFs)
and just applying the individual pieces of the EMF for advection-constraint steps. This
can save a significant amount of memory, and in addition, reduce unnecessary memory
access. As a result, the code is very memory efficient, and transposing the grid in memory
between sweeps ensures short strides along sweep directions and thus low memory-access
latency. One must remain aware of grid-imposed data dependencies of the method,
however. The one dimensional fluid update stencil is a standard 7-point stencil requiring
data from all 4-neighbouring ‘pencils’ in the direction of a sweep; the magnetic update,
to ensure the consistency of the magnetic field constraint, in addition needs the adjacent
‘pencils’ to be updated by the flux.
4.3 Implementation on heterogeneous systems
Heterogeneous systems have processors for different roles. As a result, a new memory
system design is needed, which is the challenge for the programmers.
In this section, we will discuss our implementation and performance results of the
MHD scheme described above on different platforms: multi-core x86, Cell/B.E., a Nvidia
GPU and an ATI GPU. Each platform has corresponding languages or libraries: OpenMP
for multi-core x86, Cell programming for Cell blade, CUDA for the NVidia GPU, and
OpenCL for the ATI GPU.
Chapter 4. Accelerate MHD 75
In all cases, we implement the full 3D version of the method described above. Our
performance tests consist of measuring the time taken to evolve a 3D domain of varying
size (163, 323, 643, and 1283 zones) by one timestep (only evolution step and no extra
memory transfer is included); by varying the size of the domain we can see the effects of
overhead such as memory transfer. Note that all calculations in this paper are performed
at single precision to make comparisons more readily meaningful. All the timing data
has units of milliseconds.
4.3.1 x86
As a basis of comparison, we first examine the performance of the original FORTRAN
code on a multi-core x86 architecture. We use two Intel Xeon(R) E5506 CPU @ 2.13GHz,
each with 4 processor cores, for this experiment.
Parallelization is done with OpenMP. Programming OpenMP is straightforward: the
programmers only need to add some lines to the loops and the API will partition the
loop automatically. The original version of the code under consideration here already
had OpenMP parallelization, incurring only a minimal overhead in coding length or
complexity. The parallelization is done over 2D slabs, with parallelization occurring over
the outermost loop in the solvers.
Result
Data for different box sizes are provided in Table 4.1, with the numbers inside the
brackets indicating the number of cores. For problem sizes larger than 163, a steady 6.7
times speedup is achieved.
4.3.2 Cell
The Cell Broadband Engine (Cell/B.E.) [1] is a collaboration of Sony, Toshiba and IBM.
The original design purpose was for a gaming machine, the Sony’s Playstation 3; however,
it is also a good candidate for high performance computing due to its specialized multi-
Chapter 4. Accelerate MHD 76
Table 4.1: Performance on the multi-core x86 for different box sizes; timings in millisec-
onds. x86(1) refers to single-core performance; x86(8) to 8.
Domain size
Architecture 163 323 643 1283
x86(1) 17.8 140 1096 8770
x86(8) 4.0 20.7 163.6 1315
speedup (8:1) 4.4 6.7 6.7 6.7
core architecture. Cell/B.E.’s design, a combination of one Power Processor Element
(PPE) and eight Synergistic Processing Elements (SPE), is to overcome three walls – the
power wall, memory wall and frequency wall [12].
The PPE is a 3.2GHz PowerPC-like processor, and is used to control the eight 3.2GHz
SPEs, which are used for data intensive computing. An SPE can perform four single-
precision floating-point operations in a single clock cycle. With dual pipelines, this gives
3.2 × 4 × 2 = 25.6 Gflops peak performance for single precision on one SPE [12]. There
are three levels of memory: the PPE’s main storage, the SPE’s 256kB SRAM local
storage, and the SPE’s 128-bit 128-entry unified register file. It is the programmers’ job
to handle the Direct Memory Access (DMA) to transfer the data between PPE and SPE.
The transfer is performed on Element Interconnect Bus (EIB), a high speed internal bus
which has 204.8 GB/s peak data bandwidth[26].
Cell processors have a high-level C-like programming language.
Parallelization/Partition
In the first stage in the parallelization, the PPE assigns the threads/memory to
the SPEs and performs synchronization. Once the calculation begins, the PPE will no
longer be involved in the calculation, and all work is done by the SPEs. DMA is used to
transfer data between main memory and local storage. Since the PPE is not used during
the calculation, the signal-notification channel is used for synchronization. One SPE is
Chapter 4. Accelerate MHD 77
assigned as the master. Once a synchronization point is reached, the slave SPEs send a
message to the master SPE. Upon receiving all the messages, the master initializes slaves
using a binary synchronization tree.
The fluid updates are performed along one dimensional pencil of grid points (e.g. X
direction), transferred separately to each SPE to calculate; this makes best use of the
fairly modest 256kB limit of local storage on each SPE. By ensuring the domain sides
are always multiples of 4, the starting address of each transfer is correctly aligned. We
follow the update order from the FORTRAN version for the fluid part. For the magnetic
update, any pencil that sits in one SPE has to update the pencil next to it (both Y
and Z directions). As a result, we separate the update of the magnetic part into four
sub-functions. The intermediate value to be updated by the next pencil is sent back to
the PPE. After the synchronization of the former function, the value is sent to the SPEs
to finish the update.
Implementing the grid transpose efficiently requires some care. For every DMA trans-
fer, the start of the address has to be aligned to 16 bytes. To achieve higher performance,
the data for each transfer should be approaching 16kB. For the regular memory accesses
involved in the fluid and magnetic update this is straightforward; but balancing these
constraints for the non-continuous memory access of the transpose is more difficult. As a
result, DMA lists, commands that can cause execution of a list of transfer requests, are
used for this task. For every SPE, there are 163 cube data elements for one component
transfer by DMA list. The incoming lists hold the starting address of a two dimensional
plane data arrays and the data size. After the transfer inside the SPEs, the out-going
lists hold the starting address of the after-transpose plane data arrays and the same data
size. Because the size is a multiple of 4, with at least single precision (4 bytes), the
starting address is always a multiple of 16 bytes.
Optimization
Further performance gains can be achieved by taking advantage of SIMD capabilities
Chapter 4. Accelerate MHD 78
Table 4.2: Cell performance while using PPE or varying numbers of SPEs for different
box sizes; timings in milliseconds.
Domain size
Architecture 163 323 643 1283
PPE 52 448 3745 32300
1 SPE 22.3 163.8 1257 9901
4 SPE 6.5 43.8 327 2607
16 SPE 3.5 14 112 864
speedup (16 SPE:PPE) 14.9 32.0 33.4 37.4
speedup (16 SPE:1 SPE) 6.4 11.7 11.2 11.5
of the SPEs, and overlapping communication and computation.
To exploit the SIMD capabilities of the SPEs, our code’s data structures are arranged
as a structure-of-arrays (SOA), which means that the different components of the fluid
and magnetic parts are stored in different arrays. For every SIMD operation in single
precision, one component of the adjacent four cells will be calculated.
Since there is no cache on the SPE, to overlap communications and computations,
double buffering is used to hide the memory latency between the PPE and the SPEs.
Result
Data for different box sizes are provided in Table 4.2, with the number 16 inside the
brackets indicating the speed-up ratio for 16 SPEs.
4.3.3 Nvidia GPU
Graphics Processing Units (GPU) were originally developed for 3D graphics rendering,
but their naturally parallel architecture is also suitable for high performance computing.
Current GPUs already use unified shaders for rendering, and these shaders are what we
call ‘cores’ for GPU computing.
Chapter 4. Accelerate MHD 79
The Tesla C2050 is used in our tests. There are 448 1.15GHz cores (one FMA op-
erations per clock cycle), which are partitioned into 14 streaming multiprocessors (SM).
This gives it a peak performance for single precision of: 448 × 1.15 × 2 = 1030.4 Gflops
[5]. There are three levels of memory: 3GB of GPU global memory, 64kB of on-chip
memory on each block (i.e. SM), and 32768 32-bit registers on each SM; In addition,
both shared memory and L1 cache shared this 64 kB per SM, and it can be chosen to
be either 48kB or 16kB for shared memory or L1 cache respectively. The bandwidth
between global memory and in-block memory is 144 GB/s, while the CPU and GPU are
connected by PCI-e, which has 8GB/s bandwidth.
For this architecture, we re-implement the MHD solver using the Compute Unified
Device Architecture (CUDA), a high level C-like language, which can be used to program
on any Nvidia GPU after G80.
Parallelization/Partition
For this architecture, it is the CPU which initializes the work, assigns the threads/memory,
and performs necessary synchronization. To minimize the impact of the relatively low
bandwidth over PCI-e, no more data transfer is performed after transferring the ini-
tialized data to GPU global memory. However, there is still the long latency (several
hundred cycles) of fetching data from the card’s global memory to the arithmetic units,
which must be hidden by oversubscribing the cores.
CUDA uses SIMT (Single Instruction, Multiple Thread), which means every thread
in the same block executes the same instruction at the same time. SIMT is different
from SIMD in that the width (the number of threads) is not fixed, which will affect the
available number of registers (which is fixed per SM) available per thread.
In our implementation for this architecture, each CUDA block of threads is assigned
one one-dimensional pencil, and the corresponding data is copied into the block’s shared
memory. Each thread within the block corresponds to one zone. Synchronization is
provided inside a block, and if synchronization among blocks is needed, we return all
Chapter 4. Accelerate MHD 80
blocks back to CPU control by ending the CUDA kernel.
To further reduce latency resulting from access to global memory, we modify the
magnetic update by staggering the updates; first update the odd indices of the blocks,
and subsequently the even indices. The reading/writing of intermediate flux can be
avoided, and about a 10% speed up is achieved.
Finally, the CUDA SDK provides examples for performing transposes, which are used
and modified for our purposes here. Because the memory transpose is three dimensional,
the data size is limited by the shared memory per block. In our simulation, only 83 grid
points of only one component of either fluid and magnetic field are transposed at a time.
We found this to be the best balance between shared memory and data transpose size.
Optimization
We can further improve the performance on this architecture by being aware of the
underlying memory architecture, and choosing block sizes to maximize occupancy.
Because of the size of the stencil, and the structure of the magnetic field update,
adjacent cells are needed for evolving any zone. Repeated access to global memory is
avoided by using shared memory in CUDA to cache the needed values. We did not use
constant, texture or pinned memory, as there is no large amount of ‘read-only’ data which
could benefit from being stored here.
The global memory access by the updates is automatically coalesced by the memory
transposes, so needs no special work in this implementation.
A further concern is occupancy – keeping each SM as fully occupied with thread
blocks as possible. Occupancy is the ratio of active warps to maximum warps in a block.
Increasing occupancy may not lead to good performance directly, but a low occupancy
will certainly not hide memory latency well. Three factors — threads per block, shared
memory and register usage — affect the occupancy. Empirically, we found that organizing
the thread blocks by pencils, and assigning between 128 and 192 threads (and thus zones)
per block to maximize performance. Further improvements in occupancy is limited by
Chapter 4. Accelerate MHD 81
Table 4.3: x86 vs NVidia GPU performance for different box sizes; timings in milliseconds.
Domain size
Architecture 163 323 643 1283
x86(1) 17.8 140 1096 8770
Nvidia (CUDA) 1.3 2.3 8.8 64
Nvidia (OpenCL) 1.5 2.5 9.3 65
Nvidia (CUDA) (double) 1.9 3.7 17.9 136
Speedup (CUDA:x86) 13.1 61 125 137
register number for the fluid evolution and shared memory for the magnetic evolution.
Result and comparison with previous works
Timing data for different domain sizes are provided in Table 4.3. For sufficiently large
domains, we achieve a factor of 100 speedup compared to a single-core x86. For reader’s
interest, we also run the test for double precision on Tesla C2050. It can be seen that
the result for double precision is nearly half of single precision, which is the same ratio
as the peak performance claimed from Nvidia.
For this architecture, there are other works that can be used to gauge the efficiency of
our implementation. Two other groups ([94], [81]) have used CUDA to implement Pen’s
[69] TVD code for MHD or pure hydrodynamics. In [94], they used CUDA for MHD
and they achieved a speed-up of 84 times in 3D, on a GTX 295 (480 cores) over an Intel
Core i7 965 3.20GHz. In comparison, our 137 speedup with Tesla C2050 and on Xeon(R)
E5506 2.13GHz seems comparable.
In [81], a relaxing TVD scheme was used for three dimensional hydrodynamics. Fur-
thermore, adaptive mesh refinement (AMR) and a multi-level relaxation scheme were
used, and this was applied to a multi-GPU cluster system. Since this setup is signifi-
cantly different from our own, no direct comparison is presented here. They state that
their speed-up is 12.19 for 1 GPU.
Chapter 4. Accelerate MHD 82
4.3.4 ATI GPU
The ATI GPU uses superscalar cores (shader), a modification from SIMD. One super-
scalar structure contains one 4D vector and one 1D scalar, which means in one cycle,
it can do one 4D operations and one 1D operation. To compensate for the insufficient
power of scalar computing, more cores are added onto the chips, e.g. the Radeon HD
5800 has 1600 cores.
We used an ATI HD 5870 for our simulation. The 1600 0.85 GHz shader cores are
located in 20 SIMD units, and each SIMD unit has 80 cores. The 4D+1D core can
perform two single-precision operations per clock cycle, which gives the ATI HD 5870
peak performance for single precision as: 1600 × 0.85 × 2 = 2720 Gflops [6]. There are
three levels of memory: 1GB of GPU global memory, 32kB of shared memory per block
(i.e. SIMD unit), and 16384 128-bit registers per block. The bandwidth between global
memory and in block memory is 153.6 GB/s, while the CPU-GPU’s bandwidth is the
same as Nvidia.
For this architecture, we re-implement the MHD solver using OpenCL.
Parallelization/Partition/Optimization
The parallelization is similar to Nvidia GPU, except that we vectorize the code to
get the maximum performance. There are both SIMD and SIMT units in ATI GPUs,
and each thread manipulates the data itself, making cross-grid calculation impossible.
As a result, we use structure-of-arrays (SOA), instead of AOS in Cell. We store the first
four of five fluid components as a ‘float4’, and leave the last one as a ‘float’. For the
magnetic part, we package the components as a ‘float4’, leaving the fourth element of
magnetic array unused. For the memory transpose function, we use two subroutines: the
first for first four components of fluid, and the second for the fifth component of fluid
and magnetic components. Otherwise, there are few differences for parallelization and
optimization between CUDA and OpenCL.
Result
Chapter 4. Accelerate MHD 83
Table 4.4: x86 vs ATI GPU performance for different box sizes; timings in milliseconds.
Domain size
Architecture 163 323 643 1283
x86(1) 17.8 140 1096 8770
ATI GPU 10 26 37 128
Speedup (ATI:x86) 1.78 5.4 29.6 68.5
Data for different box size are provided in Table 4.4.
4.4 Comparative Results and Discussion
4.4.1 Results
We compare different architecture results by four criteria:
1. Code speed-up: speed up ratio on the heterogeneous architecture compared to a
single core x86;
2. Fractional speed-up: ratio of the speed up ratio (heterogeneous to single-core x86)
to theoretical peak performance ratio (heterogeneous to single-core x86);
3. Floating-point operations per second (FLOPS) fraction: ratio of actual FLOPS to
theoretical peak performance for each architecture.
4. Bandwidth fraction: ratio of actual data transfer (including read and write) to
theoretical bandwidth (on-chip bandwidth).
All these values are relative to respective languages, i.e. OpenMP for multi-core
x86, Cell for QS22, CUDA for Nvidia GPU, and OpenCL for ATI. However, OpenCL is
provided as a reference across different architectures as well.
We calculate the total number of operations in one time step for our FORTRAN
version, including CFL, fluid and magnetic update. For a single cell in one simulation
time step for the box (ignoring O(n2)), there are 466 addition operations, 598 subtraction
Chapter 4. Accelerate MHD 84
operations, 1174 multiplication operations, 125 division operations, and 3 square root op-
erations. Since the proportion of division and square root operations are small, following
[67], we regard their cost as 1 flop each, for simplicity. As a result, the FORTRAN code
has 4.62 Giga floating-point operations for a 1283 box in each time step, which contains
1 CFL function, 6 fluid update and 6 magnetic update functions. By combining the code
run times with this value, one can calculate the actual FLOPS for different architectures.
We calculate the total data load/write for one time step for our FORTRAN version,
including CFL, fluid, magnetic update and memory transpose. For single-precision in a
single cell in one time step, there are 11 float reads in the CFL function, 10 float reads
and 5 float writes in the fluid update, 14 float reads and 6 float writes in the magnetic
update, and 8 float reads and 8 float writes in the memory transpose. As a result, the
FORTRAN code has 2.23 GBytes of data transfer (i.e. 1.46 Gbytes read and 0.77 Gbytes
write) per time step for a 1283 box, which contains 1 CFL function, 6 fluid and 6 magnetic
updates functions, and 4 memory transposes. Combining the code run times with this
value one can calculate the actual bandwidth for different architectures.
Table 4.5 presents the comparison for different architectures for a box size of 1283,
including both the respective program language and OpenCL. Code and fractional speed-
up, and FLOPS fraction are included. We also add the theoretical peak performance for
single precision, memory bandwidth, and our practical power consumption in units of
watts. No data for OpenCL on a single core or cell is provided. The former issue is due
to the fact OpenCL treats multi-core x86 as a heterogeneous system and all the available
compute units are used. The latter is because our OpenCL code still can’t run on a Cell
cluster, which may be due to the limitations of beta release of OpenCL on Cell. The
power usage for single-core is not available because the Xeon is a multi-core processor.
It can be seen that the CUDA on the Nvidia GPU gets the best speed-up in both
code and fractional speed-up.
Chapter 4. Accelerate MHD 85
Table 4.5: Performance comparison for different architectures; timings in milliseconds.
N-GPU represents Fermi; A-GPU represents ATI HD5870; peak Gflops represents the-
oretical peak floating-point performance; peak GB/s represents the theoretical on-chip
bandwidth;
Architecture x86(1) x86(8) Cell N-GPU A-GPU
Respective time 8770 1315 864 64 128
OpenCL time N/A 6435 N/A 65 128
Peak Gflops 17 136 409.6 1030 2720
Peak GB/s 19.2 19.2 204.8 144 153.6
Power(Watts) N/A 170 440 550 360
Code speed-up 1.0 6.7 10.2 137 68.5
Fractional speed-up 1.0 0.83 0.42 2.0 0.43
FLOPS fraction 3.1% 2.6% 1.3% 7.0% 1.3%
Bandwidth fraction 1.3% 8.8% 1.3% 24.2% 11.3%
Chapter 4. Accelerate MHD 86
4.4.2 Discussion
Speed-up and fractional parameters on different architectures:
The code speed-up quantifies the total gain in performance for different architectures.
The fractional speed-up takes into account the theoretical peak performance comparison
and also the programmer’s optimization work relative to the original code. The FLOPS
fraction tells us how many operations are done compared to the peak FLOPS. The
bandwidth fraction tells us what percentage of bandwidth the code occupies. Comparing
the two fractions can give us an idea which one is the bottleneck for the performance. Our
results indicate that CUDA on the Nvidia GPU is a good choice for starting heterogeneous
computing. CUDA on the Nvidia GPU has up to 137 times code speed-up and 2.0
fractional speed-up 1. CUDA also has the highest FLOPS and bandwidth fraction, which
tells us that CUDA uses its flops computing ability and bandwidth efficiently.
More detail for CUDA and OpenCL on GPU:
The Nvidia GPU has scalar shader cores and high efficiency of computing. On the
other hand, the ATI GPU has much more cores, which leads to much higher power for
floating-point operations, but with low efficiency of computing. This may be due to the
difficulty of mapping the algorithm efficiently onto the 4D+1D vector core design. Since
we have both CUDA and OpenCL here, while the latter one can also run on Nvidia GPU,
we did some more comparisons here.
The comparison for CUDA on Nvidia, OpenCL on Nvidia and OpenCL on ATI is
shown in Figure 4.1. The X axis represents the length for the box, which is in log scale,
and the Y axis represents the time for one time step, which is in log scale and millisecond
time units. It can be seen that CUDA on Nvidia on smaller box sizes are good. The
OpenCL on ATI catches up to CUDA with increasing box sizes. The OpenCL on Nvidia
performed almost the same as CUDA on Nvidia. We didn’t simulate the box sizes larger
1These numbers are related to programmer’s optimization work. On the other hand, we have tomention that the CPU code is not fully vectorized, neither is CUDA.
Chapter 4. Accelerate MHD 87
than 1443 due to the memory limit on ATI.
Summary of the code
The code is finite-difference finite-volume three dimensional code. Dimension split is
used to reduce three dimensional problem to one dimensional update. With the help of
matrix transpose, each dimension can update separately, and memory access is linear.
This born feature of memory coalescing is very helpful for heterogeneous computing. The
one dimensional update fits into one block, and with SIMT in GPU, the code speeds up to
a great number. Any cell in the one dimension update requires the neighbouring variable
and this simple dependency simplifies the algorithm. Our code is memory bound, and
shared memory is used to avoid redundant data fetching.
Insight for the programmer
We provide several insights here for potential heterogeneous programmers:
• Memory management: The control-compute structure of heterogeneous system re-
quires a correspondingly complicated three levels — global-shared-local — memory
system. Good management of this memory system is crucial for better performance,
e.g. hide the latency of on-chip bandwidth(e.g. double/multiple buffering for CELL
and more active warps for GPU), use of shared memory to avoid redundant data
fetching, memory coalescing to improve efficient reading.
• Maximize computing on GPU: 100x speed up on GPU was achieved in our sim-
ulation, but one has to note that almost all computing stays on the GPU, which
means the low bandwidth of PCI-e does not affect our result. Actually, our prelimi-
nary MPI + CUDA results show that the CPU and PCI-e communications brought
down the speed a lot.
• Programming effort: In the author’s view, CUDA is the simplest of these three
programming languages, while CELL SDK is the most difficult one. OpenCL is
between them, but quite similar to CUDA. Considering the cross-platform OpenCL
Chapter 4. Accelerate MHD 88
102
101
102
solid: linear fit (slope=3)
dash dot asterisk: CUDA on Nvidia
dash circle: OpenCL on Nvidiadot diamond: OpenCL on ATI
Time vs Box size (Log−Log)
x
T
Figure 4.1: Time vs box size for GPU comparison. X axis represents the length of the
box; Y axis represents the time for one time step, Timings in milli second; Dot diamond
is OpenCL on ATI; Dash circle is OpenCL on Nvidia; Dash dot asterisk is CUDA on
Nvidia; Solid is linear fit with slope=3.
Chapter 4. Accelerate MHD 89
performs almost the same as CUDA on Nvidia GPU, it may be a good idea to start
with CUDA then transfer to OpenCL.
Future work
• Cell: Only SIMD and double buffering are included in our simulation, more can be
done to explore the power of Cell/B.E..
• Nvidia GPU: The register restriction on fluid update and shared memory restriction
on magnetic update limit the occupancy. Reorganizing the algorithms for them
might be helpful to speed up the code.
• ATI GPU: The ATI GPU SIMD unit has the problem of low efficiency for vectorized
core computing, We will do more research on this to explore the power of 2.7Tflops
ATI GPU.
• MPI: We will apply our code to MPI version for use on GPU clusters in the future.
4.5 Summary
We presented magneto-hydrodynamics simulations on heterogeneous systems, e.g. Cell/B.E.,
Nvidia and ATI GPU. These heterogeneous systems share a similar structure that they
all have a control processor for mission management and many computing intensive pro-
cessors for calculations. Correspondingly, the memory system is also complicated, which
is a challenge for programmers. We present the results on different architectures for
comparison; 10 times, 137 times and 68 times speed-up for Cell, Nvidia, and ATI GPU
were achieved. The CUDA on Nvidia GPU has the best performance on both code and
fractional speed-up, and the ATI GPU improves with larger size simulation. Specially,
CUDA and openCL perfrom similar on Nvidia GPU. The 2.0 fractional speed-up for
CUDA on Nvidia GPU shows that a greater percentage of peak theoretical performance
compared to x86 architecture was achieved.
Chapter 4. Accelerate MHD 90
These performance numbers were obtained with an algorithm which was directly
translated from a CPU code. Designing algorithms with heterogeneous architectures in
mind may also improve performance.
Chapter 5
Conclusion
Black hole accretion Here, we present several new and large dynamical range MHD
simulations for the black hole accretion in the Galactic center. This is the first three-
dimensional large scale MHD simulation that does not encounter problems with the outer
boundary, and runs long enough to achieve a stable state. The simulation is designed to
allow for understanding the low luminosity of the supermassive black hole. In addition,
the class of magnetically frustrated accretion flow is confirmed. Multiple physical and
numerical parameters including the strength of magnetic field, rotation, ratio of Bondi
radius to inner boundary and resolution are tested. An extrapolation formula based on
these free parameters is proposed for the accretion rate, which appears to be consistent
with the observation data. The accretion rate is very small, and the density slope is
around −1. The accretion flow is subsonic and no outward flux or rotational support is
observed.
The rotation measure (RM) is an efficient tool to explore the characteristics of the
accretion flow; the value of RM is closely related to the radius when the electrons become
relativistic. We argue that this radius varies from 40 to 250 Schwarzschild radius from
thermal to adiabatic limit, and therefore more observations are needed to determine
this radius. We also propose that the variation of RM can be an effective constraint
91
Chapter 5. Conclusion 92
Figure 5.1: Atacama Large Millimeter/Submillimeter Array (ALMA). ALMA has much
higher sensitivity and higher resolution compared with current sub-millimeter telescopes.
Image courtesy ALMA (ESO/NAOJ/NRAO).
for the accretion models. Our subsonic, non-rotational-support, magnetically frustrated
accretion flow suggests a low variable RM, which is measured to be a time scale of months
to a year in the simulations. On the other hand, from ADAF to CDAF through ADIOS,
these models all involved fast flowing plasma. Some also have rotational support, which
suggests a rapid variation for the RM as a time scale of hours to weeks. In order to
accurately measure the variability of RM, a time series of data points is needed.
Various groups have already successfully detected several RM values, including Sub-
millimeter Array [56], and Berkeley-Illinois-Maryland Association (BIMA) Array [55; 21].
Furthermore, the Atacama Large Millimeter/submillimeter Array (ALMA), which has
much higher sensitivity and higher resolution, will come into operation in 2012 (Figure
5.1). This development will be extremely helpful to the RM observation, as if the slow
variable RM is confirmed by further RM observation, our model can be distinguished
from others.
Fast magnetic reconnection Here, we present the first global three-dimensional ideal
MHD simulation on fast magnetic reconnection. The fast magnetic reconnection is a three
dimensional effect (Figure 5.2) 1, instead of a two dimensional effect seen in Sweet-Parker
1Three dimensional movie for the evolution of magnetic field line can be seen at http ://www.cita.utoronto.ca/ ∼ bpang/mhd simulation in astrophysics/long term 01boxV 1rK2.wmv
Chapter 5. Conclusion 93
Figure 5.2: The three-dimensional simulation box is for fast magnetic reconnection. The
fast magnetic reconnection is a three dimensional effect, and the global geometry deter-
mines the reconnection.
and Petschek’s models. The fast magnetic reconnection is determined by global geometry,
rather than the micro-physics of the X-point in Petschek’s solution. The reconnection
does not rely on specific boundary conditions, external driving, or anomalous resistivity.
This Petschek-like reconnection is self-contained and generic. About 30% of the magnetic
energy is released in one Alfven time, which qualifies for fast reconnection.
The reconnection is initiated by a strong localized perturbation to the field lines in a
periodic box and the two reconnection regions interact with each other, helping the re-
connection to occur rapidly. We conclude that the Petschek-like X-point reconnection is
thus not determined by the microphysics at the infinitesimal boundary where there is no
energy present, but rather by the global flow far away from the X-point. In addition, we
simulated two-dimensional reconnection and found that there is no fast magnetic recon-
nection; supporting the conclusion that fast reconnection is indeed a three-dimensional
Chapter 5. Conclusion 94
effect.
Accelerate MHD We now present the first and widest speed comparison of MHD
simulation on various heterogeneous systems. By porting the FORTRAN MHD code to
different heterogeneous platforms, including multi-core x86, Cell, Nvidia and ATI GPU,
we can show that the Nvidia GPU performs the best on both code speed-up and fractional
speed-up. In fact, more than 100 times speed-up is achieved through this method.
The results of code speed-up on different architectures are: 6.7 times for normal
multi-core x86, 10 times for Cell, 137 times for Nvidia GPU, and 68 times for ATI GPU.
Taking into account the theoretical peak performance, the results of fractional speed-up
on different architectures are: 0.83 times for normal multi-core x86, 0.42 times for Cell,
2.0 times for Nvidia GPU, and 0.43 times for ATI GPU.
The Nvidia GPU achieves optimum performance with a factor of 100 speed-up, and
the 2.0 fractional speed-up shows that an even greater percentage of peak theoretical per-
formance is possible. ATI GPU also has nearly 70 times speed-up; however, considering
its claimed big theoretical peak performance, the fractional speed-up of 0.43 is poor. Cell
has 10 times speed-up, which is more than the 6.7 times speed-up from OpenMP for x86.
However, it may not be a good idea to transfer the program into Cell considering the cost
of Cell blade and the difficulty of programming on Cell SDK. The CUDA and OpenCL
perform equally well on Nvidia GPU, suggesting it may be a good idea to program on
OpenCL to use its cross-platform feature.
As can be seen from our work, a heterogeneous system usually consists of one con-
trolling processor used to manage mission assignment, and many computing processors
used for heavy parallel calculation. This is why the heterogeneous platform is able to
accelerate. This complicated structure also requires an equally complicated memory sys-
tem; the main data must stay in the memory in controlling processor, and the data will
be transferred to local memory for the calculation. Because the local memory is limited,
Chapter 5. Conclusion 95
it is important to organize the calculations attentively so as to avoid a bottleneck effect
of communication. The key to programming on heterogeneous systems is to keep paral-
lelism in mind at all times. This differs from a parallel job on normal computer clusters,
as one now needs to partition the data and algorithms for the computing cores on the
platform.
Future of MHD simulation In this thesis we have presented examples of simulations
in scientific computing. Through the use of super computers we attempt to reproduce
physical phenomena and propose physical explanations. A great number of problems
were solved in this process, but new complications did arise.
A common concern among many people is the accuracy of simulations produced, how
closely the simulation can model reality, and the adequacy in the results. However in
spite of these concerns, simulation results are required to be as time efficient as possible.
Our MHD simulations can be seen as an effective example. We have a fast MHD
code and extremely powerful supercomputers. To model the black hole accretion with
a 3003 deformed-mesh grid box that achieves a box size of 40003, at least three weeks
are needed for a parallel computing job of 216 CPUs for a long-term and stable result.
Yet even this simulation can only achieve a ratio of RB/Rin as 100, while the real world
ratio is about 10000. Obviously, the result of this simulation is unable to accurately
represent real dynamics. Although we did obtain an expression for the density slope
from our simulation result and the formula fits the observation data very well, this is
based on extrapolation 2. An alternative option is to enlarge the ratio to approach the
real value in simulations, however the corresponding update time for the simulation is
inversely proportional to grid size. As a result, the update time would be so small that it
would be almost impossible to run such a large simulation to a stable state. We are left
with no other choice but to do the simulation and obtain results that are not completely
2Extrapolation is very common in astrophysics for the large scale in astronomy, even though it is notentirely accurate
Chapter 5. Conclusion 96
satisfactory.
Similar situations occur in other fields, prompting people to ask questions such as:
is the assumed mathematical model correct? Is the algorithm suitable for the problem?
Is the discretization fine enough to represent the problem? It should be noted that the
increasing of accuracy could put a great burden on the cost of calculation. The more
complicated models, higher order algorithms, and finer grids all lead to more calculation.
Here we reach an impasse. It appears to be very difficult to achieve adequate results
in a satisfying time scale. Is a solution possible?
The solution is actually quite simple; just increase the speed of the simulation. By
doing so, a more complicated calculation can be realized in an affordable time line. For
evidence of this we can look back five to ten years ago when two-dimensional simula-
tions were mainstream. Now, more complicated three-dimensional simulations can be
performed easily - this is due to increasing power, which in turn increases the simulation
speed.
Here we provide quality examples of fast computing by the parallelization on a super-
computer. In addition we also demonstrate the impressive speed-up using heterogeneous
system. We can foresee the great speed-up by the combination of both, a parallel com-
puting on heterogeneous systems, which includes host nodes, GPUs, and interconnects.
This idea is not a new one, and was achieved five years ago by the scientists in State
University of New York [31]. These scientists built a 30 GPU nodes cluster and the
simulation was sped up by 4.6 times compared with traditional CPU cluster. Their
GPUs were not that powerful, and they were only able to use Cg [7] to program. GPUs
have become much more powerful in a relatively short duration; Nvidia has developed the
GPUs focusing on high performance computing, e.g. Tesla and Fermi. The roadmap of
Nvidia GPU is shown in Figure 5.3, and it can be seen that the proposed double precision
performance will be double in the next two years, and four times by 2013. Nvidia also
developed CUDA, which is designed for parallel data programming, and can increase the
Chapter 5. Conclusion 97
Figure 5.3: Roadmap for Nvidia GPU. DP represents double precision. FLOPS represents
FLoating point Operations per Second, which is a measure for computing performance.
X axis represents the time; Y axis represents the computing performance. Tesla, Fermi,
Kepler, and Maxwell are the family name of each generation of GPU from Nvidia.
ease of programming.
Presently in Ontario, Sharcnet [8], one of seven High performance computing consortia
in Canada, has a GPU cluster which contains 11 Nvidia Tesla S1070 GPU servers with
peak performance for single precision of over 40 Tflops. Each GPU server has 4 GPUs
and 16GB of global memory and are connected to 2 HP DL160G5 CPU servers. The
CPU-CPU connections are via 4X DDR Infiniband, and CITA is also trying to acquire
a new GPU cluster.
Meanwhile, we also have some preliminary results on GPU clusters. Using our raw
FORTRAN MPI code, combined with updated subroutines for both magnetic and fluid
part in CUDA, we performed the simulation on a mini GPU cluster, a 2 Tesla C1060 GPU
nodes, and we get a speed-up of 3.4 compared with OpenMP on conventional computer
cluster. In addition, there are other benefits compared with traditional CPU clusters:
GPUs are cheaper, occupy smaller space and consume less power - which leads to lower
cooling demands.
Chapter 5. Conclusion 98
However, these techniques are not yet mature and many problems still exist. The
most pertinent issue is communication, which took more than half of the running time in
our GPU cluster test. Increasing the efficiency of the communication will not only bring
up the speed, but also bring down the power consumption. [47] proved that the global
memory access (communication) consumed more power than on-chip register or shared
memory accesses.
Much like increasing the speed in computer simulation, the simple solution for parallel
heterogeneous computing is to increase the efficiency of the communication.
There is a multitude of methods for solving this problem, which are proving to be
successful. In the case of a GPU cluster, the communication contains both GPU-CPU
and CPU-CPU connection, to increase the efficiency of communication, programmers
need to pay attention to minimizing the I/O between the host and the GPU, in order
to keep the GPUs computing. An efficient hardware design is also needed. For example,
the application of infiniband [9] between CPUs, and better connections between GPU
and CPU. The new PCI-e 3.0 is coming at the end of 2010 and its new feature includes
the doubling of the bandwidth of current PCI-e 2.0, which will definitely improve the
communication speed.
Here, we do a simple and rough estimate for the future CITA GPU cluster. The ex-
pected GPU cluster will have 360 Fermi family GPUs from Nvidia. Each Fermi has about
1Tera Flops computing power, which translated into a theoretical peak performance of
0.4Peta flops. However when compared with a traditional CPU cluster, it costs only
1/10th of the money and consumes 1/20th of the power, which is very attractive. Based
on the results of our preliminary test and the improving communication from PCI-e 3.0,
this GPU cluster can have about 10-20 times speed-up compared with a CPU cluster per
node 3. This means that when running the same program with the same nodes, we can
get at least a factor of 10 speed-up with lower cost and power consumption. For example,
3Assuming one-to-one (CPU-GPU) structure of GPU cluster
Chapter 5. Conclusion 99
it took about three weeks to finish the black hole accretion using 27 nodes on the CPU
cluster, but it only requires two days on a GPU cluster with the same nodes. We will be
able to try some larger simulations, for example, RB/Rin = 1000, which would achieve a
satisfyingly stable result in three to four months with only 27 nodes on the GPU cluster,
while five or more years is needed for 27 nodes on a CPU cluster. If all the 360 nodes
GPU could be used we would be able to simulate the real black hole (RB/Rin = 10000) in
about three years. This may appear long, but is at least within the realm of possibility.
When we approached the closing of this thesis, we became aware that the GPU
supercomputer, Tianhe-1,was officially the fastest supercomputer in the world on October
28, 2010. This supercomputers’ 2.5 peta flops performance surpassed the former number
one CPU supercomputer (Jaguar) by 40 percent, securing its first place rank in the
coming TOP500 list. This particular supercomputer contains over 7,000 Fermi GPUs,
which is a very good example to support our view that great speed-up will be achieved
by parallel computing on heterogeneous systems.
We witnessed the development of computer simulations, from the realization of sim-
ple to complicated mathematical models. We witnessed the development of conventional
computer clusters, from several linked office computers to the former number one con-
ventional supercomputer, Jaguar, which has a theoretical peak performance as 1.75 peta
flops. We also witnessed the development of heterogeneous systems, including the hard-
ware (for example, Nvidia Fermi and ATI HD 5870), and the software (for example, the
programming language CUDA and OpenCL, subroutine and template libraries CUBLAS,
CUFFT, and compiler-based approaches, PyCUDA, RapidMind). And most recently, we
witnessed the success of Tianhe-1, a GPU cluster that surpassed all the conventional CPU
supercomputers.
Due to all of the above, we have significant reasons to believe that the parallel het-
erogeneous computing will play a much more important role in the scientific research in
the future.
Chapter 5. Conclusion 100
Summary We present three-dimensional large scale magnetohydrodynamics simula-
tions for black hole accretion in the Galactic centre. Our presentation of the simulation
is significant because there are no outer boundary problems and a stable state is achieved
from the long runs. The subsonic magnetically frustrated accretion flow predicts a low
variable rotation measure. Furthermore, we present the first fast magnetic reconnection
in three-dimensional ideal magnetohydrodynamics simulations. The reconnection is a
three-dimensional effect and about 30% of the magnetic energy is released in one Alfven
time. Finally, we have access to the first and widest speed comparison of magnetohydro-
dynamics simulation on various heterogeneous systems. CUDA on Nvidia GPU performs
the best and achieves more than one hundred times speed-up.
Appendix A
Rotation measure constraint on
accretion flow
In traversing the accretion flow, linearly polarized radio waves of wavelength λ are rotated
by RMλ2 radians, where
RM =e3
2πm2ec
4
∫
nef(Te)B cos(θ)dl. (A.1)
Here f(Te) is a ratio of modified Bessel functions: f(Te) = K0(mec2/kBTe)/K2(mec
2/kBTe)
[86], which suppresses RM by a factor ∝ T−2e wherever electrons are relativistic. The
integral here covers the entire path from source to observer; θ is the angle between
B and the line of sight. This expression is appropriate for the frequencies at which
RM has been observed; at lower frequencies, where propagation is “superadiabatic” [23]
cos(θ) → ±| cos(θ)|.
We adopt a power-law solution with negligible rotational support in which ρ ∝ r−k,
and the total pressure P ∝ r−kP with kP = k + 1; moreover we take Te ∝ r−kT for the
relativistic electrons. The hydrostatic equation dP/dr = −GMρ/r2 becomes
P = Pg + PB =GM
(k + 1)
ρ
r, (A.2)
101
Appendix A. Rotation measure constraint on accretion flow 102
and with Pg = βPB = βB2/(8π), ρ = neµe (where µe = 1.2mp is the mass per electron),
B =
[
8π
(β + 1)(k + 1)
GMµene
r
]1/2
. (A.3)
So long as k > 1/3 (so that RM converges at large radii) and k < (1 + 4kT )/3 (so
it converges inward as well), the RM integral is set around Rrel. Taking a radial line of
sight (dl → dr), we write
∫
∞
0nef(Te)B cos(θ)dr = F (k, kT )
∫
∞
Rrel
neB cos(θ)dr (A.4)
=2
3k − 1〈cos(θ)〉F (k, kT ) [neBr]Rrel
where 〈cos(θ)〉 encapsulates the difference between the true integral what it would have
been if θ = 0 all along the path, and F (k, kT ) encapsulates the difference between a
smooth cutoff and a sharp one. We plot F (k, kT ) in Figure A.1; it is of order unity
except as kT approaches (3k − 1)/4. All together,
RM =4e2GM
m2ec
5
〈cos(θ)〉F (k, kT )
3k − 1
[
µene(Rrel)3
π(k + 1)(β + 1)
Rrel
RS
]1/2
. (A.5)
To estimate ne(Rrel) from RM, one must make assumptions about the uncertain pa-
rameters β, 〈cos(θ)〉, kT , and Rrel/RS; then k can be derived self-consistently from ob-
servations ne(RB) and RM. Our fiducial values of these parameters are 10, 0.5, 0.5 and
100, respectively, of which we consider the last to be the most uncertain. We now discuss
each in turn.
Although the magnetization parameter β could conceivably take a very wide range
of values, we consistently find β ≃ 10 in our simulations, with some tendency for β to
decrease inward. We consider it unlikely for the flow to be much less magnetized, given
the magnetization of the galactic center and the fact that weak fields are enhanced in
most of the flow models under consideration.
If B wanders little in the region where the integrand is large (a zone of width ∼ Rrel
around Rrel), and is randomly oriented relative to the line of sight 〈cos(θ)〉 ≃ cos(θ(Rrel)),
Appendix A. Rotation measure constraint on accretion flow 103
0
0
0
0.2
0.2
0.2
0.2
0.4
0.4
0.4
0.6
0.6
0.6
0.8
0.8
0.8
1
1
1
1.2
1.2
1.4
1.4
1.6
1.6
1.8
k
kT
0.5 1 1.50.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5
Figure A.1: The logarithm of the relativistic RM factor, log10 F (k, kT ). The true RM
integral is modified by a factor F (k, kT ) relative to an estimate in which the nonrelativistic
formula is used, but the inner bound of integration is set to the radius Rrel at which
electrons become relativistic; see equation A.1.
typically 1/2 in absolute value. If the field were purely radial, 〈cos(θ)〉 would be unity.
Conversely if B reverses frequently in this region (the number of reversals Nr is large)
then 〈cos(θ)〉rms ≃ 1/(2√
Nr + 1) will be small. However, Nr cannot be too large, or
magnetic forces are unbalanced. We gauge its maximum value by equating the square
of the buoyant growth rate, N 2 = [(3 − 2k)/5]GM/r3, against the square of the Alfven
frequency N2r v2
A/r2. Noting that v2A = GM/[2(β + 1)(k + 1)r], we find N2
r ≃ (2/5)(β +
1)(k + 1)(3 − 2k). For β = 10 and k = 1 this implies 〈cos(θ)〉rms ≃ 0.25: a very minor
suppression. We can therefore be confident that 〈cos(θ)〉 = 0.5 to within a factor of 2,
unless β ≫ 10 for some reason.
The precise value of kT is not important unless it approaches or falls below the
minimum value (3k − 1)/4. If electron conduction is very strong this is unavoidable, as
rapid transport implies kT ≃ 0; however in this case the relativistic region disappears, as
discussed below. Alternately, if relativistic electrons are trapped and adiabatic, Te ∝ ρ1/3
and kT = k/3; however kT < (3k−1)/4 then requires k < 3/5, which can only be realized
Appendix A. Rotation measure constraint on accretion flow 104
within the CDAF model. Finally, if electrons remain strongly coupled to ions, kT = 1
and we only require k < 5/3.
The location at which electrons become relativistic, Rrel, is quite uncertain. Models
such as those of [95], in which electrons are heated while advecting inward, predict
Rrel ≃ 102RS. The maximum conceivable Rrel corresponds to adiabatic compression
of the electrons, inward from the radius at which they decouple from ions; this yields
about 500/(1+k)RS. If conduction is very strong, however, electrons should remain cold
throughout the flow; in this case we should replace Rrel/RS → 1 and F (k, kT ) → 1 in
equation (A.5).
Adopting our fiducial values for the other variables, and taking F (k, kT ) → 1 for lack
of knowledge regarding kT , we may solve for the self-consistent value of k which connects
the density at RB with ne(Rrel) derived from equation A.5. We find k → (0.90, 1.23, 1.32)
for Rrel/RS → (200, 100, 1), respectively. As noted in the text, the current small set of RM
measurements allows a two order of magnitude range in RMest, and k ∼ 1 is consistent
with data. Longer observations of time and amplitude will improve the constraints.
Appendix B
Inner boundary conditions
The inner boundary conditions were determined by first solving for the vacuum solution
of the magnetic field inside the entire inner boundary cube. Then inside the largest
possible sphere within this cube, matter and energy were removed.
To simplify the programming, we put the entire inner boundary region on one node.
This meant that the grid had to be divided over an odd number of nodes in each Cartesian
direction.
B.0.1 Magnetic field
In order to determine the vacuum magnetic field solution, we use the following two
Maxwell equations for zero current:
∇ · B = 0 , (B.1)
∇× B = 0 . (B.2)
Equation (B.2) enables us to write B = ∇φ, for some scalar function φ. Combining this
with (B.1) we obtain Laplace’s equation
∇2φ = 0 , (B.3)
105
Appendix B. Inner boundary conditions 106
which we solve with Neumann boundary conditions (the normal derivative n · ∇φ speci-
fied) given by B · n on the boundary of the cube.
Since the MHD code stores the values of B on the left-hand cell faces, we must solve
for φ in cell centers and then take derivatives to get the value of B on the cell boundary.
Let the inner boundary cube be of side length N , consisting of cells numbered 1, . . . , N
in all three directions. In order to simplify the problem we set B · n = 0 on five of the
six faces of the cube, and find the contribution to φ from one face at a time.
Suppose B · n = 0 on all of the faces except the i = N + 1 face (i.e., BN+1,j,kx can be
non-zero). The Laplace equation (B.3) with Neumann boundary conditions only has a
solution if the net flux of field into the cube is zero. Since all of the boundary conditions
are zero except for the i = N + 1 face, that face must have a net flux through it of zero.
Defining
BN+1x =
1
N2
N∑
j=1
N∑
k=1
BN+1,j,kx (B.4)
to be the average of Bx on the i = N + 1 face, and letting bN+1,j,kx = BN+1,j,k
x − BN+1x ,
bN+1,j,kx can be used as the boundary condition and BN+1
x will be added in later.
We use separation of variables to solve for φ. Set φijk = X iY jZk, substitute into
(B.3), and rearrange to get
X i+1 − 2X i + X i−1
X i+
Y j+1 − 2Y j + Y j−1
Y j+
+Zk+1 − 2Zk + Zk−1
Zk= 0 . (B.5)
Now let
Y j+1 − 2Y j + Y j−1
Y j= −η2 , and (B.6)
Zk+1 − 2Zk + Zk−1
Zk= −ω2 . (B.7)
Solving equations (B.6) and (B.7) with the boundary conditions,
Y jm = cos
mπ(j − 12)
N, Zk
n = cosnπ(k − 1
2)
N, (B.8)
Appendix B. Inner boundary conditions 107
η2m = 4 sin2 mπ
2N, ω2
n = 4 sin2 nπ
2N. (B.9)
Substituting (B.6), (B.7), and (B.9) into (B.5), and solving, yields
X imn = cosh
αmnπ(i − 12)
2N, (B.10)
where
αmn =2N
πarcsinh
√
sin2 nπ
2N+ sin2 mπ
2N. (B.11)
Finally, putting this all together,
φijk =N−1∑
m=0
N−1∑
n=0
Amn cosmπ(j − 1
2)
Ncos
nπ(k − 12)
N×
× coshαmnπ(i − 1
2)
N, (B.12)
and define A00 = 0.
To determine the coefficients Amn we add in the final boundary condition (i = N +1),
and get
Amn =4
N2
1
2 sinh(αmnπ) sinh(αmnπ/2N)×
×N
∑
j=1
N∑
k=1
bN+1,j,kx cos
mπ(j − 12)
N×
× cosnπ(k − 1
2)
N. (B.13)
A similar calculation may be performed for the case when the i = 1 boundary has
non-zero field. After finding the contribution from each face, store their sum in φ.
To deal with the subtracted cube face field averages, let
φijk0 = B1jk
x i + Bi1ky j + Bij1
z k +
+BN+1,j,k
x − B1jkx
2N(i2 + j2) +
+BN+1,j,k
x + Bi,N+1,ky − B1jk
x − Bi1ky
2N(j2 + k2), (B.14)
and add this to φ. φ0 is the potential of a cube where each face has the uniform magnetic
field given by the average of the magnetic field on the corresponding face of the inner
boundary cube.
Appendix B. Inner boundary conditions 108
−10 −5 0 5 10
−10
−5
0
5
10
x
y
Figure B.1: Vacuum solution of the magnetic field is calculated in the central region.
The field lines outside of the central region show the boundary condition.
To find B, set
Bijkx = φijk − φi−1,j,k , (B.15)
Bijky = φijk − φi,j−1,k , (B.16)
Bijkz = φijk − φi,j,k−1 . (B.17)
In Figure B.1 we used the magnetic field solver with a boundary condition consisting
of field going in one side and out an adjacent side of the box. This boundary condition
tests both the φ0 component of the solution (since faces have non-zero net flux) as well
as the Fourier series component (since faces have non-constant magnetic field).
B.0.2 Density and pressure
Inside the largest possible sphere that can be inscribed within the inner boundary cube,
we adjust the density and pressure so that the Alfven speed and the sound speed are
Appendix B. Inner boundary conditions 109
both equal to the circular speed. We accomplish this by setting
ρ =B2r
GMBH, (B.18)
p =GMBHρ
rγ. (B.19)
We then set p to 0.1p. ρ and p were assigned minimum values of 0.1 times the average
value of ρ outside of the sphere, and 0.001, respectively, to ensure stability.
Appendix C
Supporting Movie for black hole
accretion
Animation of magnetically frustrated convection simulation.
The qualitative behaviors of the accretion flow is best illustrated in the form of a
movie. This movie shows case 25. The raw simulation used 6003 grid cells. The Bondi
radius is at 1000 grid units, where one grid unit is the smallest central grid spacing. The
full box size is 80003 grid units. Colour represents the entropy, and arrows represent the
magnetic field vector. The right side shows the equatorial plane (yz). the left side shows a
perpendicular plane (xy). The moving white circles represent the flow of an unmagnetized
Bondi solution, starting at the Bondi radius. On average, the fluid is slowly moving
inward, in a state of magnetically frustrated convection. Various other formats can also
be seen at http://www.cita.utoronto.ca/∼pen/MFAF/blackhole movie/index.html.
110
Bibliography
[1] http://www.research.ibm.com/cell/.
[2] http://wiki.cita.utoronto.ca/mediawiki/index.php/Sunnyvale.
[3] http//www.scinet.utoronto.ca/.
[4] http://www.khronos.org/opencl/.
[5] http://developer.download.nvidia.com/compute/cuda/3 1/toolkit/docs/NVIDIA CUDA C ProgrammingGuide
[6] http://www.amd.com/us/products/desktop/graphics/ati-radeon-hd-5000/hd-
5870/Pages/ati-radeon-hd-5870-specifications.aspx.
[7] http://developer.nvidia.com/page/cg main.html.
[8] https://www.sharcnet.ca/.
[9] http://en.wikipedia.org/wiki/InfiniBand.
[10] E. Agol. Sagittarius A* Polarization: No Advection-dominated Accretion Flow,
Low Accretion Rate, and Nonthermal Synchrotron Emission. ApJL, 538:L121–L124,
August 2000.
[11] H. Alfven. Existence of Electromagnetic-Hydrodynamic Waves. Nature, 150:405–
406, October 1942.
111
BIBLIOGRAPHY 112
[12] A. Arevalo, R.M. Matinata, M. Pandian, E. Peri, K. Ruby, F. Thomas, and C. Al-
mond. Programming the Cell Broadband Engine Architecture: Examples and Best
Practices. IBM Red-Books, 2008.
[13] F. K. Baganoff, M. W. Bautz, W. N. Brandt, G. Chartas, E. D. Feigelson, G. P.
Garmire, Y. Maeda, M. Morris, G. R. Ricker, L. K. Townsley, and F. Walter. Rapid
X-ray flaring from the direction of the supermassive black hole at the Galactic Cen-
tre. Nature, 413:45–48, September 2001.
[14] F. K. Baganoff, Y. Maeda, M. Morris, M. W. Bautz, W. N. Brandt, W. Cui, J. P.
Doty, E. D. Feigelson, G. P. Garmire, S. H. Pravdo, G. R. Ricker, and L. K. Towns-
ley. Chandra X-Ray Spectroscopic Imaging of Sagittarius A* and the Central Parsec
of the Galaxy. ApJ, 591:891–915, July 2003.
[15] J. Birn, J. F. Drake, M. A. Shay, B. N. Rogers, R. E. Denton, M. Hesse,
M. Kuznetsova, Z. W. Ma, A. Bhattacharjee, A. Otto, and P. L. Pritchett. Geospace
Environmental Modeling (GEM) magnetic reconnection challenge. JGR, 106:3715–
3720, March 2001.
[16] D. Biskamp. Magnetic Reconnection Via Current Sheets (Invited paper). In M. A.
Dubois, D. Gresellon, and M. N. Bussac, editors, Magnetic Reconnection and Tur-
bulence, pages 19–+, 1986.
[17] D. Biskamp. Magnetic Reconnection in Plasmas. November 2000.
[18] R. D. Blandford and M. C. Begelman. On the fate of gas accreting at a low rate on
to a black hole. MNRAS, 303:L1–L5, February 1999.
[19] H. Bondi. On spherically symmetrical accretion. MNRAS, 112:195, 1952.
[20] G. C. Bower, D. C. Backer, J.-H. Zhao, M. Goss, and H. Falcke. The Linear Po-
BIBLIOGRAPHY 113
larization of Sagittarius A*. I. VLA Spectropolarimetry at 4.8 and 8.4 GHZ. ApJ,
521:582–586, August 1999.
[21] G. C. Bower, H. Falcke, M. C. Wright, and D. C. Backer. Variable Linear Polarization
from Sagittarius A*: Evidence of a Hot Turbulent Accretion Flow. ApJL, 618:L29–
L32, January 2005.
[22] M. A. Brentjens and A. G. de Bruyn. Faraday rotation measure synthesis. A&A,
441:1217–1228, October 2005.
[23] A. E. Broderick and R. D. Blandford. Understanding the Geometry of Astrophysical
Magnetic Fields. ApJ, 718:1085–1099, August 2010.
[24] A. E. Broderick and J. C. McKinney. Parsec-scale Faraday Rotation Measures
from General Relativistic MHD Simulations of Active Galactic Nuclei Jets. ArXiv
e-prints, June 2010.
[25] P.E. Ceruzzi. Beyond the limits: flight enters the computer age. The MIT Press,
1989.
[26] T. Chen, R. Raghavan, JN Dale, and E. Iwata. Cell broadband engine architecture
and its first implementation: a performance view. IBM Journal of Research and
Development, 51(5):559–572, 2007.
[27] X. Chen, M. A. Abramowicz, and J.-P. Lasota. Advection-dominated Accretion:
Global Transonic Solutions. ApJ, 476:61–+, February 1997.
[28] T. G. Cowling. Magnetohydrodynamics. 1976.
[29] K. P. Dere, J.-D. F. Bartoe, G. E. Brueckner, J. Ewing, and P. Lund. Explosive
events and magnetic reconnection in the solar atmosphere. JGR, 96:9399–9407, June
1991.
BIBLIOGRAPHY 114
[30] C. R. Evans and J. F. Hawley. Simulation of magnetohydrodynamic flows - A
constrained transport method. apj, 332:659–677, September 1988.
[31] Z. Fan, F. Qiu, A. Kaufman, and S. Yoakum-Stover. GPU cluster for high perfor-
mance computing. In Proceedings of the 2004 ACM/IEEE conference on Supercom-
puting, page 47. IEEE Computer Society, 2004.
[32] V. L. Fish, S. S. Doeleman, A. E. Broderick, A. Loeb, and A. E. E. Rogers. Detecting
Changing Polarization Structures in Sagittarius A* with High Frequency VLBI. ApJ,
706:1353–1363, December 2009.
[33] J. Frank, A. King, and D. Raine. Accretion power in astrophysics. 1992.
[34] F. F. Gardner and J. B. Whiteoak. The Polarization of Cosmic Radio Waves.
ARA&A, 4:245–+, 1966.
[35] R. Genzel, R. Schodel, T. Ott, A. Eckart, T. Alexander, F. Lacombe, D. Rouan,
and B. Aschenbach. Near-infrared flares from accreting gas around the supermassive
black hole at the Galactic Centre. Nature, 425:934–937, October 2003.
[36] R. Genzel, R. Schodel, T. Ott, F. Eisenhauer, R. Hofmann, M. Lehnert, A. Eckart,
T. Alexander, A. Sternberg, R. Lenzen, Y. Clenet, F. Lacombe, D. Rouan, A. Ren-
zini, and L. E. Tacconi-Garman. The Stellar Cusp around the Supermassive Black
Hole in the Galactic Center. ApJ, 594:812–832, September 2003.
[37] S. Gillessen, F. Eisenhauer, S. Trippe, T. Alexander, R. Genzel, F. Martins, and
T. Ott. Monitoring Stellar Orbits Around the Massive Black Hole in the Galactic
Center. ApJ, 692:1075–1109, February 2009.
[38] A. Gruzinov. 1/2 Law for Non-Radiative Accretion Flow. ArXiv Astrophysics e-
prints, April 2001.
BIBLIOGRAPHY 115
[39] A. Harten. High resolution schemes for hyperbolic conservation laws. Journal of
computational physics, 135(2):260–278, 1997.
[40] I. V. Igumenshchev and M. A. Abramowicz. Rotating accretion flows around black
holes: convection and variability. MNRAS, 303:309–320, February 1999.
[41] I. V. Igumenshchev, X. Chen, and M. A. Abramowicz. Accretion discs around
black holes: two-dimensional, advection-cooled flows. MNRAS, 278:236–250, Jan-
uary 1996.
[42] I. V. Igumenshchev and R. Narayan. Three-dimensional Magnetohydrodynamic
Simulations of Spherical Accretion. ApJ, 566:137–147, February 2002.
[43] I. V. Igumenshchev, R. Narayan, and M. A. Abramowicz. Three-dimensional Mag-
netohydrodynamic Simulations of Radiatively Inefficient Accretion Flows. ApJ,
592:1042–1059, August 2003.
[44] S. Jin, Z. Xin, Shi Jin, and Zhouping Xin. The relaxation schemes for systems of
conservation laws in arbitrary space dimensions. Comm. Pure Appl. Math, 48:235–
277, 1995.
[45] B. M. Johnson and E. Quataert. The Effects of Thermal Conduction on Radiatively
Inefficient Accretion Flows. ApJ, 660:1273–1281, May 2007.
[46] R. Kaeppeli, S. C. Whitehouse, S. Scheidegger, U. -. Pen, and M. Liebendoerfer.
FISH: A 3D parallel MHD code for astrophysical applications. ArXiv e-prints, Oc-
tober 2009.
[47] V.V. Kindratenko, J.J. Enos, G. Shi, M.T. Showerman, G.W. Arnold, J.E. Stone,
J.C. Phillips, and W. Hwu. GPU clusters for high-performance computing. In Pro-
ceedings on the IEEE cluster2009 workshop on parallel programming on accelerator
clusters (PPAC09), pages 1–8, 2009.
BIBLIOGRAPHY 116
[48] G. Kowal, A. Lazarian, E. T. Vishniac, and K. Otmianowska-Mazur. Numerical
Tests of Fast Reconnection in Weakly Stochastic Magnetic Fields. ArXiv e-prints,
March 2009.
[49] L. D. Landau and E. M. Lifshitz. Fluid mechanics. 1959.
[50] L. D. Landau and E. M. Lifshitz. Electrodynamics of continuous media. 1960.
[51] A. Lazarian and E. T. Vishniac. Reconnection in a Weakly Stochastic Field. ApJ,
517:700–718, June 1999.
[52] L. C. Lee and Z. F. Fu. Multiple X line reconnection. I - A criterion for the transition
from a single X line to a multiple X line reconnection. JGR, 91:6807–6815, June
1986.
[53] Y. Levin and A. M. Beloborodov. Stellar Disk in the Galactic Center: A Remnant
of a Dense Accretion Disk? ApJL, 590:L33–L36, June 2003.
[54] A. Loeb. Direct feeding of the black hole at the Galactic Centre with radial gas
streams from close-in stellar winds. MNRAS, 350:725–728, May 2004.
[55] J.-P. Macquart, G. C. Bower, M. C. H. Wright, D. C. Backer, and H. Falcke. The Ro-
tation Measure and 3.5 Millimeter Polarization of Sagittarius A*. ApJL, 646:L111–
L114, August 2006.
[56] D. P. Marrone, J. M. Moran, J.-H. Zhao, and R. Rao. Interferometric Measurements
of Variable 340 GHz Linear Polarization in Sagittarius A*. ApJ, 640:308–318, March
2006.
[57] D. P. Marrone, J. M. Moran, J.-H. Zhao, and R. Rao. The Submillimeter Polarization
of Sgr A*. Journal of Physics Conference Series, 54:354–362, December 2006.
[58] D. P. Marrone, J. M. Moran, J.-H. Zhao, and R. Rao. An Unambiguous Detection
of Faraday Rotation in Sagittarius A*. ApJL, 654:L57–L60, January 2007.
BIBLIOGRAPHY 117
[59] F. Melia. An accreting black hole model for Sagittarius A. ApJL, 387:L25–L28,
March 1992.
[60] F. Melia and H. Falcke. The Supermassive Black Hole at the Galactic Center.
ARA&A, 39:309–352, 2001.
[61] K. E. Nakamura, M. Kusunose, R. Matsumoto, and S. Kato. Optically Thin,
Advection-Dominated Two-Temperature Disks. PASJ, 49:503–512, August 1997.
[62] R. Narayan, I. V. Igumenshchev, and M. A. Abramowicz. Self-similar Accretion
Flows with Convection. ApJ, 539:798–808, August 2000.
[63] R. Narayan, S. Kato, and F. Honma. Global Structure and Dynamics of Advection-
dominated Accretion Flows around Black Holes. ApJ, 476:49–+, February 1997.
[64] R. Narayan, R. Mahadevan, J. E. Grindlay, R. G. Popham, and C. Gammie.
Advection-dominated accretion model of Sagittarius A*: evidence for a black hole
at the Galactic center. ApJ, 492:554–568, January 1998.
[65] R. Narayan and I. Yi. Advection-dominated accretion: A self-similar solution. ApJL,
428:L13–L16, June 1994.
[66] R. Narayan, I. Yi, and R. Mahadevan. Explaining the spectrum of Sagittarius A*
with a model of an accreting black hole. Nat, 374:623–625, April 1995.
[67] L. Nyland, M. Harris, and J. Prins. Fast n-body simulation with CUDA. GPU gems,
3:677–695, 2007.
[68] E. N. Parker. Sweet’s Mechanism for Merging Magnetic Fields in Conducting Fluids.
JGR, 62:509–520, December 1957.
[69] U.-L. Pen, P. Arras, and S. Wong. A Free, Fast, Simple, and Efficient Total Variation
Diminishing Magnetohydrodynamic Code. ApJS, 149:447–455, December 2003.
BIBLIOGRAPHY 118
[70] U.-L. Pen, C. D. Matzner, and S. Wong. The Fate of Nonradiative Magnetized Ac-
cretion Flows: Magnetically Frustrated Convection. ApJL, 596:L207–L210, October
2003.
[71] H. E. Petschek. Magnetic Field Annihilation. NASA Special Publication, 50:425–+,
1964.
[72] R. Popham and C. F. Gammie. Advection-dominated Accretion Flows in the Kerr
Metric. II. Steady State Global Solutions. ApJ, 504:419–+, September 1998.
[73] E. Priest and T. Forbes. Magnetic Reconnection. June 2000.
[74] E. R. Priest and T. G. Forbes. Does fast magnetic reconnection exist? JGR,
97:16757–+, November 1992.
[75] D. Proga and M. C. Begelman. Accretion of Low Angular Momentum Material
onto Black Holes: Two-dimensional Magnetohydrodynamic Case. ApJ, 592:767–
781, August 2003.
[76] E. Quataert and A. Gruzinov. Constraining the Accretion Rate onto Sagittarius A*
Using Linear Polarization. ApJ, 545:842–846, December 2000.
[77] E. Quataert and A. Gruzinov. Convection-dominated Accretion Flows. ApJ,
539:809–814, August 2000.
[78] E. Quataert and R. Narayan. Spectral Models of Advection-dominated Accretion
Flows with Winds. ApJ, 520:298–315, July 1999.
[79] M. G. Revnivtsev, E. M. Churazov, S. Y. Sazonov, R. A. Sunyaev, A. A. Lutovinov,
M. R. Gilfanov, A. A. Vikhlinin, P. E. Shtykovsky, and M. N. Pavlinsky. Hard X-ray
view of the past activity of Sgr A* in a natural Compton mirror. A&A, 425:L49–L52,
October 2004.
BIBLIOGRAPHY 119
[80] G. B. Rybicki and A. P. Lightman. Radiative processes in astrophysics. 1979.
[81] Hsi-Yu Schive, Yu-Chih Tsai, and Tzihong Chiueh. GAMER: a GPU-Accelerated
Adaptive Mesh Refinement Code for Astrophysics. Astrophys. J. Suppl., 186:457–
484, 2010.
[82] M. Scholer. Undriven magnetic reconnection in an isolated current sheet. JGR,
94:8805–8812, July 1989.
[83] A. Shan. Heterogeneous processing: a strategy for augmenting moore’s law. Linux
Journal, 2006(142):7, 2006.
[84] P. Sharma, E. Quataert, and J. M. Stone. Faraday Rotation in Global Accretion
Disk Simulations: Implications for Sgr A*. ApJ, 671:1696–1707, December 2007.
[85] P. Sharma, E. Quataert, and J. M. Stone. Spherical accretion with anisotropic
thermal conduction. MNRAS, 389:1815–1827, October 2008.
[86] R. V. Shcherbakov. Propagation Effects in Magnetized Transrelativistic Plasmas.
ApJ, 688:695–700, November 2008.
[87] R. V. Shcherbakov and F. K. Baganoff. Inflow-Outflow Model with Conduction and
Self-Consistent Feeding for Sgr A*. ArXiv e-prints, April 2010.
[88] F. Shu. Physics of Astrophysics, Vol. II: Gas Dynamics. University Science Books,
1991.
[89] J. M. Stone, J. E. Pringle, and M. C. Begelman. Hydrodynamical non-radiative
accretion flows in two dimensions. MNRAS, 310:1002–1016, December 1999.
[90] P. A. Sweet. The Neutral Point Theory of Solar Flares. In B. Lehnert, editor, Elec-
tromagnetic Phenomena in Cosmical Physics, volume 6 of IAU Symposium, pages
123–+, 1958.
BIBLIOGRAPHY 120
[91] T. Tanaka and K. Menou. Hot Accretion with Conduction: Spontaneous Thermal
Outflows. ApJ, 649:345–360, September 2006.
[92] H. Trac and U.-L. Pen. A Primer on Eulerian Computational Fluid Dynamics for
Astrophysics. pasp, 115:303–321, March 2003.
[93] H. Trac and U.-L. Pen. A moving frame algorithm for high Mach number hydrody-
namics. New Astronomy, 9:443–465, July 2004.
[94] H.C. Wong, U.H. Wong, X. Feng, and Z. Tang. Magnetohydrodynamics simulations
on graphics processing units. Imprint, 2009.
[95] F. Yuan, E. Quataert, and R. Narayan. Nonthermal Electrons in Radiatively In-
efficient Accretion Flow Models of Sagittarius A*. ApJ, 598:301–312, November
2003.
[96] F. Yusef-Zadeh, H. Bushouse, M. Wardle, and 11 coauthors. Simultaneous Multi-
Wavelength Observations of Sgr A* during 2007 April 1-11. ArXiv e-prints, July
2009.
[97] F. Yusef-Zadeh, H. Bushouse, M. Wardle, C. Heinke, D. A. Roberts, C. D. Dowell,
A. Brunthaler, M. J. Reid, C. L. Martin, D. P. Marrone, D. Porquet, N. Grosso,
K. Dodds-Eden, G. C. Bower, H. Wiesemeyer, A. Miyazaki, S. Pal, S. Gillessen,
A. Goldwurm, G. Trap, and H. Maness. Simultaneous Multi-Wavelength Observa-
tions of Sgr A* During 2007 April 1-11. ApJ, 706:348–375, November 2009.
[98] F. Yusef-Zadeh, M. Muno, M. Wardle, and D. C. Lis. The Origin of Diffuse X-Ray
and γ-Ray Emission from the Galactic Center Region: Cosmic-Ray Particles. ApJ,
656:847–869, February 2007.