M : A ﬂexible free-software package for electromagnetic

M: A flexible free-software packagefor electromagnetic simulations by the FDTD method

Ardavan F. Oskooi∗,a,c, David Roundyb, Mihai Ibanescua,c,d, Peter Bermelc,J. D. Joannopoulosa,c,d, Steven G. Johnson∗∗,a,c,e

aCenter for Materials Science& Engineering, Massachusetts Institute of Technology, Cambridge MA 02139bDepartment of Physics, Oregon State University, CorvallisOR 97331

cResearch Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge MA 02139dDepartment of Physics, Massachusetts Institute of Technology, Cambridge MA 02139

eDepartment of Mathematics, Massachusetts Institute of Technology, Cambridge MA 02139

Abstract

This paper describes Meep, a popular free implementation ofthe finite-difference time-domain(FDTD) method for simulating electromagnetism. In particular, we focus on aspects of imple-menting a full-featured FDTD package that go beyond standard textbook descriptions of thealgorithm, or ways in which Meep differs from typical FDTD implementations. These includepervasive interpolation and accurate modeling of subpixelfeatures, advanced signal processing,support for nonlinear materials via Pade approximants, and flexible scripting capabilities.

PACS: 02.70.Bf; 82.20.Wt; 03.50.De; 87.64.Aa.

Key words: computational electromagnetism; FDTD; Maxwell solver.

Program SummaryProgram title:MeepProgram summary URL:http://ab-initio.mit.edu/meepLicensing provisions:GNU GPLNo. of lines in distributed program, including test data, etc: 58000No. of bytes in distributed program, including test data, etc: 734KDistribution format: tar.gzProgramming language:C++Computer:any computer with a Unix-like system and a C++ compiler; optionally exploits additional freesoftware packages: GNU Guile [1], libctl interface library[2], HDF5 [3], MPI message-passing interface[4], and Harminv filter-diagonalization [5]. Developed on 2.8 GHz Intel Core 2 Duo.Operating system:any Unix-like system; developed under Debian GNU/Linux 5.0.2RAM:problem dependent (roughly 100 bytes per pixel/voxel)Classification:10 Electrostatics and Electromagnetics

∗Corresponding author∗∗Principal corresponding author

Email addresses:[email protected] (Ardavan F. Oskooi),[email protected] (DavidRoundy),[email protected] (Mihai Ibanescu),[email protected] (Peter Bermel),[email protected](J. D. Joannopoulos),[email protected] (Steven G. Johnson)

Preprint submitted to Elsevier January 8, 2010

published in Computer Physics Communications, vol. 181, pp. 687-702 (2010)

External routines/libraries: optionally exploits additional free software packages: GNU Guile [1], libctlinterface library [2], HDF5 [3], MPI message-passing interface [4], and Harminv filter-diagonalization [5](which requires LAPACK and BLAS linear-algebra software [6]).Nature of problem:classical electrodynamicsSolution method:finite-difference time-domain (FDTD) methodRunning time:problem dependent (typically about 10 ns per pixel per timestep)

References:

1. GNU Guile,http://www.gnu.org/software/guile2. Libctl, http://ab-initio.mit.edu/libctl3. M. Folk, R.E. McGrath, N. Yeager, HDF: An update and futuredirections, in: Proc. 1999 Geoscience

and Remote Sensing Symposium (IGARSS), Hamburg, Germany, vol. 1, IEEE Press, 273–275,1999.

4. T.M. Forum, MPI: A Message Passing Interface, in: Supercomputing ’93, Portland, OR, 878–883,1993.

5. Harminv,http://ab-initio.mit.edu/harminv6. LAPACK, http://www.netlib.org/lapack/lug


1. Introduction

One of the most common computational tools in classical electromagnetism is the finite-difference time-domain (FDTD) algorithm, which divides space and time into a regular grid andsimulates the time evolution of Maxwell’s equations [1, 2, 3, 4, 5]. This paper describes our free,open-source implementation of the FDTD algorithm:Meep(an acronym forMIT Electromag-netic Equation Propagation), available online athttp://ab-initio.mit.edu/meep. Meepis full-featured, including, for example: arbitrary anisotropic, nonlinear, and dispersive elec-tric and magnetic media; a variety of boundary conditions including symmetries and perfectlymatched layers (PML); distributed-memory parallelism; Cartesian (1d/2d/3d) and cylindrical co-ordinates; and flexible output and field computations. It also includes some unusual features,such as advanced signal processing to analyze resonant modes, accurate subpixel averaging, afrequency-domain solver that exploits the time-domain code, complete scriptability, and inte-grated optimization facilities. Here, rather than review the well-known FDTD algorithm itself(which is thoroughly covered elsewhere), we focus on the particular design decisions that wentinto the development of Meep whose motivation may not be apparent from textbook FDTD de-scriptions, the tension between abstraction and performance in FDTD implementations, and theunique or unusual features of our software.

Why implement yet another FDTD program? Literally dozens ofcommercial FDTD soft-ware packages are available for purchase, but the needs of research often demand the flexibilityprovided by access to the source code (and relaxed licensingconstraints to speed porting to newclusters and supercomputers). Our interactions with otherphotonics researchers suggest thatmany groups end up developing their own FDTD code to serve their needs (our own groups haveused at least three distinct in-house FDTD implementationsover the past15 years), a duplicationof effort that seems wasteful. Most of these are not released to thepublic, and the handful ofother free-software FDTD programs that could be downloadedwhen Meep was first released in2006 were not nearly full-featured enough for our purposes. Since then, Meep has been citedin over 100 journal publications and has been downloaded over10,000 times, reaffirming thedemand for such a package.

FDTD algorithms are, of course, only one of many numerical tools that have been devel-oped in computational electromagnetism, and may perhaps seem primitive in light of other so-phisticated techniques, such as finite-element methods (FEMs) with high-order accuracy and/oradaptive unstructured meshes [6, 7, 8], or even radically different approaches such as boundary-element methods (BEMs) that discretize only interfaces between homogeneous materials ratherthan volumes [9, 10, 11, 12]. Each tool, of course, has its strengths and weaknesses, and wedo not believe that any single one is a panacea. The nonuniform unstructured grids of FEMs,for example, have compelling advantages for metallic structures where micrometer wavelengthsmay be paired with nanometer skin depths. On the other hand, this flexibility comes at a price ofsubstantial software complexity, which may not be worthwhile for dielectric devices at infraredwavelengths (such as in integrated optics or fibers) where the refractive index (and hence thetypical resolution required) varies by less than a factor offour between materials, while smallfeatures such as surface roughness can be accurately handled by perturbative techniques [13].BEMs, based on integral-equation formulations of electromagnetism, are especially powerfulfor scattering problems involving small objects in a large volume, since the volume need notbe discretized and no artificial “absorbing boundaries” areneeded. On the other hand, BEMshave a number of limitations: they may still require artificial absorbers for interfaces extendingto infinity (such as input/output waveguides) [14]; any change to the Green’s function(such as


introduction of anisotropic materials, imposition of periodic or symmetry boundary conditions,or a switch from three to two dimensions) requires re-implementation of large portions of thesoftware (e.g. singular panel integrations and fast solvers) rather than purely local changes asin FDTD or FEM; continuously varying (as opposed to piecewise-constant) materials are in-efficient; and solution in the time domain (rather than frequency domain, which is inadequatefor nonlinear or active systems in which frequency is not conserved) with BEM requires an ex-pensive solver that is nonlocal in time as well as in space [11]. And then, of course, there arespecialized tools that solve only a particular type of electromagnetic problem, such as our ownMPB software that only computes eigenmodes (e.g. waveguidemodes) [15], which are powerfuland robust within their domain but are not a substitute for a general-purpose Maxwell simula-tion. FDTD has the advantages of simplicity, generality, and robustness: it is straightforward toimplement the full time-dependent Maxwell equations for nearly arbitrary materials (includingnonlinear, anisotropic, dispersive, and time-varying materials) and a wide variety of boundaryconditions, one can quickly experiment with new physics coupled to Maxwell’s equations (suchas populations of excited atoms for lasing [16, 17, 18, 19, 20]), and the algorithm is easilyparallelized to run on clusters or supercomputers. This simplicity is especially attractive to re-searchers whose primary concern is investigating new interactions of physical processes, and forwhom programmer time and the training of new students is far more expensive than computertime.

The starting point for any FDTD solver is the time-derivative parts of Maxwell’s equations,which in their simplest form can be written:

∂B∂t= −∇ × E − JB (1)

∂D∂t= +∇ × H − J, (2)

where (respectively)E andH are the macroscopic electric and magnetic fields,D andB are theelectric displacement and magnetic induction fields [21],J is the electric-charge current density,and JB is a fictitious magnetic-charge current density (sometimesconvenient in calculations,e.g. for magnetic-dipole sources). In time-domain calculations, one typically solves the initial-value problem where the fields and currents are zero fort < 0, and then nonzero values evolve inresponse to some currentsJ(x, t) and/or JB(x, t). (In contrast, afrequency-domainsolver assumesa time dependence ofe−iωt for all currents and fields, and solves the resulting linear equationsfor the steady-state response or eigenmodes [22, app. D].) We prefer to use dimensionless unitsε0 = µ0 = c = 1. From our perspective, this choice emphasizes both the scale invarianceof Maxwell’s equations [22, chap. 2] and also the fact that the most meaningful quantities tocalculate are almost always dimensionless ratios (such as scattered power over incident power, orwavelength over some characteristic lengthscale). The user can pick any desired unit of distancea (either an SI unit such asa = 1 µm or some typical lengthscale of a given problem), and alldistances are given in units ofa, all times in units ofa/c, and all frequencies in units ofc/a. Ina linear dispersionless medium, the constituent relationsareD = εE andB = µH, whereε andµ are the relative permittivity and permeability (possibly tensors); the case of nonlinear and/ordispersive media (including conductivities) is discussedfurther in Sec. 4.

The remaining paper is organized as follows. In Sec. 2, we discuss the discretization andcoordinate system; in addition to the standard Yee discretization [1], this raises the question ofhow exactly the grid is described and divided into “chunks” for parallelization, PML, and other


purposes. Section 3 describes a central principle of Meep’sdesign,pervasive interpolationpro-viding (as much as possible) the illusion of continuity in the specification of sources, materials,outputs, and so on. This led to the development of several techniques unique to Meep, suchas a scheme for subpixel material averaging designed to eliminate the first-order error usuallyassociated with averaging techniques or stairstepping of interfaces. In Sec. 4, we describe andmotivate our techniques for implementing nonlinear and dispersive materials, including a slightlyunusual method to implement nonlinear materials using a Pade approximant that eliminates theneed to solve cubic equations for every pixel. Section 5 describes how typical computationsare performed in Meep, such as memory-efficient transmission spectra or sophisticated analysisof resonant modes via harmonic inversion. This section alsodescribes how we have adapted thetime-domain code, almost without modification, to solve frequency-domain problems with muchfaster convergence to the steady-state response than merely time-stepping. The user interface ofMeep is discussed in Sec. 6, explaining the considerations that led us to a scripting interface(rather than a GUI or CAD interface). Section 7 describes some of the tradeoffs between perfor-mance and generality in this type of code and the specific compromises chosen in Meep. Finally,we make some concluding remarks in Sec. 8.

2. Grids and Boundary Conditions

The starting point for the FDTD algorithm is the discretization of space and time into agrid. In particular, Meep uses the standardYee griddiscretization which staggers the electricand magnetic fields in time and in space, with each field component sampled at different spa-tial locations offset by half a pixel, allowing the time and space derivatives to be formulated ascenter-difference approximations [23]. This much is common to nearly every FDTD implemen-tation and is described in detail elsewhere [1]. In order to parallelize Meep, efficiently supportsimulations with symmetries, and to efficiently store auxiliary fields only in certain regions (forPML absorbing layers), Meep further divides the grid intochunksthat are joined together into anarbitrary topology via boundary conditions. (In the future, different chunks may have differentresolutions to implement a nonuniform grid [24, 25, 26, 27]). Furthermore, we distinguish twocoordinate systems: one consisting of integer coordinateson the Yee grid, and one of continuouscoordinates in “physical” space that are interpolated as necessary onto the grid (see Sec. 3). Thissection describes those concepts as they are implemented inMeep, as they form a foundation forthe remaining sections and the overall design of the Meep software.

2.1. Coordinates and grids

The two spatial coordinate systems in Meep are described by thevec, a continuous vectorin R

d (in d dimensions), and theivec, an integer-valued vector inZd describing locations onthe Yee grid. Ifn is anivec, the correspondingvec is given by 0.5∆xn, where∆x is the spatialresolution (the same alongx, y, andz)—that is, the integer coordinates in anivec correspondto half-pixels, as shown in the right panel of Fig. 1. This is to represent locations on the spatialYee grid, which offsets different field components in space by half a pixel as shown (in 2d)in theright panel of Fig. 1. In 3d, theEx andDx components are sampled ativecs (2ℓ + 1, 2m, 2n),Ey andDy are sampled ativecs (2ℓ, 2m+ 1, 2n), and so on;Hx andBx are sampled ativecs(2ℓ, 2m+ 1, 2n+ 1), Hy andBy are sampled ativecs (2ℓ + 1, 2m, 2n+ 1), and so on. In additionto these grids for the different field components, we also occasionally refer to thecenteredgrid,at oddivecs (2ℓ + 1, 2m+ 1, 2n+ 1) corresponding to the “center” of each pixel. (The origin of


x

y

ownednot owned

chunk chunk

chunk chunk

1 2

3 4

0 1 2 3 4 5 6 7 8 9 10 11

0

1

2

3

4

5

6

7

8

9

10

11

Figure 1: The computational cell is divided into chunks (left) that have a one-pixel overlap (gray regions). Each chunk(right) represents a portion of the Yee grid, partitioned into ownedpoints (chunk interior) andnot-ownedpoints (grayregions around the chunk edges) that are determined from other chunks and/or via boundary conditions. Every point inthe interior of the computational cell is owned by exactly one chunk, the chunk responsible for timestepping that point.

the coordinate systems is an arbitraryivec that can be set by the user, but is typically the centerof the computational volume.) The philosophy of Meep, as described in Sec. 3, is that as muchas possible the user should be concerned only with continuous physical coordinates (vecs), andthe interpolation/discretization ontoivecs occurs internally as transparently as possible.

2.2. Grid chunks and owned points

An FDTD simulation must occur within a finite volume of space,the computational cell,terminated with some boundary conditions and possibly by absorbing PML regions as describedbelow. This (rectilinear) computational cell, however, isfurther subdivided into convex recti-linearchunks. On a parallel computer, for example, different chunks may be stored at differentprocessors. In order to simplify the calculations for each chunk, we employ the common tech-nique of padding each chunk with extra “boundary” pixels that store the boundary values [28](shown as gray regions in Fig. 1)—this means that the chunks are overlappingin the interior ofthe computational cell, where the overlaps require communication to synchronize the values.

More precisely, the grid points in each chunk are partitioned into ownedand not-ownedpoints. Thenot-ownedpoints are determined by communication with other chunks and/or byboundary conditions. Theownedpoints are time-stepped within the chunk, independently ofthe other chunks (and possibly in parallel), andevery grid point inside the computational cell isowned by exactly one chunk.

The question then arises: how do we decide which points within the chunk are owned? Inorder for a grid point to be owned, the chunk must contain all the information necessary fortimestepping that point (once the not-owned points have been communicated). For example, fora Dy point (2ℓ, 2m+ 1, 2n) to be owned, theHz points at (2ℓ ± 1, 2m+ 1, 2n) must both be in thechunk in order to compute∇×H for timesteppingD at that point. This means that theDy pointsalong the left (minimum-x) edge of the chunk (as shown in the right panel of Fig. 1)cannotbe


owned: there is noHz point to the left of it. An additional dependency is imposed by the caseof anisotropic media: if there is anεxy couplingEx to Dy, then updatingEx at (2ℓ + 1, 2m, 2n)requires the fourDy values at (2ℓ+1±1, 2m±1, 2n) (these are the surroundingDy values, as seenin the right panel of Fig. 1). This means that theEx (andDx) points along theright (maximum-x)edge of the chunk (as shown in the right panel of Fig. 1) cannotbe owned either: there is noDy

point to the right of it. Similarly for∇ × D and anisotropicµ.All of these considerations result in the shaded-gray region of Fig. 1(right) being not-owned.

That is, if the chunk intersectsk+ 1 pixels along a given direction starting at anivec coordinateof 0 (e.g. k = 5 in Fig. 1), the endpointivec coordinates 0 and 2k + 1 are not-owned and theinterior coordinates from 1 to 2k (inclusive) are owned.

2.3. Boundary conditions and symmetriesAll of the not-owned points in a chunk must be determined by boundary conditions of some

sort. The simplest boundary conditions are when the not-owned points are owned by someother chunk, in which case the values are simply copied from that chunk (possibly requiringcommunication on a multiprocessor system) each time they are updated. In order to minimizecommunications overhead, all communications between two chunks are batched into a singlemessage (by copying the relevant not-owned points to/from a contiguous buffer) rather thansending one message per point to be copied.

At the edges of the computational cell, some user-selected boundary condition must be im-posed. For example, one can use perfect electric or magneticconductors where the relevantelectric/magnetic-field components are set to zero at the boundaries.One can also use Bloch-periodic boundary conditions, where the fields on one side ofthe computational cell are copiedfrom the other side of the computational cell, optionally multiplied by a complex phase factoreikiΛi where ki is the propagation constant in the ith direction, andΛi is the length of the computa-tional cell in the same direction. Meep doesnot implement any absorbing boundary conditions—absorbing boundaries are, instead, handled by an artificialmaterial, perfectly matched layers(PML), placed adjacent to the boundaries [1].

Bloch-periodic boundary conditions are useful in periodicsystems [22], but this is only oneexample of a useful symmetry that may be exploited via boundary conditions. One may alsohave mirror and rotational symmetries. For example, if the materials and the field sources havea mirror symmetry, one can cut the computational costs in twoby storing chunks only in halfthe computational cell and applying mirror boundary conditions to obtain the not-owned pixelsadjacent to the mirror plane. As a more unusual example, consider an S-shaped structure as inFig. 2, which has no mirror symmetry but is symmetric under 180-degree rotation, calledC2

symmetry [29]. Meep can exploit this case as well (assuming the current sources have the samesymmetry), storing only half of the computational cell as inFig. 2 and inferring the not-ownedvalues along the dashed line by a 180-degree rotation. (In the simple case where the stored regionis a single chunk, this means that the not-owned points are determined by owned points in thesame chunk, requiring copies, possibly with sign flips.) Depending on the sources, of course,the fields can be even or odd under mirror flips orC2 rotations [22], so the user can specify anadditional sign flip for the transformation of the vector fields (and pseudovectorH andB fields,which incur an additional sign flip under mirror reflections [21, 22]). Meep also supports fourfoldrotation symmetry (C4), where the field can be multiplied by factors of 1,i, −1, or−i under each90-degree rotation [29]. (Other rotations, such as threefold or sixfold, are not supported becausethey do not preserve the Cartesian Yee grid.) In 2d, thexy plane is itself a mirror plane (unlessin the presence of anisotropic materials) and the symmetry decouples TE modes (with fields Ex,


+ C2 stored

not stored

Figure 2: Meep can exploit mirror and rotational symmetries, such as the 180-degree (C2) rotational symmetry of theS-shaped structure in this schematic example. Although Meep maintains the illusion that the entire structure is storedand simulated, internally only half of the structure is stored (as shown at right), and the other half is inferred by rotation.The rotation gives a boundary condition for the not-owned grid points along the dashed line.

Ey, and Hz) from TM modes (Hx, Hy, and Ez) [22]; in this case Meep only allocates those fieldsfor which the corresponding sources are present.

A central principle of Meep is that symmetry optimizations be transparent to the user oncethe desired symmetries are specified. Meep maintains the illusion that the entire computationalcell is computed—for example, the fields in the entire computational cell can still be queriedor exported to a file, flux planes and similar computations canstill extend anywhere withinthe computational cell, and so on. The fields in the non-stored regions are simply computedbehind the scenes (without ever allocating memory for them)by transforming the stored chunksas needed. A key enabling factor for maintaining this illusion efficiently is theloop-in-chunksabstraction employed by the Meep code, described in Sec. 7.

Meep also supports continuous rotational symmetry around agiven axis, where the structureis invariant under rotations and the fields transform aseimφ for somem [22], but this is imple-mented separately by providing the option to simulate Maxwell’s equations in the (r, z) planewith cylindrical coordinates, for which operators like∇ × change form.

3. Interpolation and the illusion of continuity

A core design philosophy of Meep is to provide the illusion ofcontinuous space and time,masking the underlying discretization from the user as muchas possible. There are two com-ponents to this approach: the input and the outputs. Continuously varying inputs, such as thegeometry, materials, and the source currents, lead to continuously varying outputs, as in the ex-ample of Fig. 3. Similarly, the value of any field (or any function of the fields) can be outputat any point in space or integrated over any region. Furthermore, the effects of these inputs andthe resulting outputs must converge as quickly as possible to the exact solution as the resolutionincreases. In this section, we discuss how this illusion of continuity is implemented for fieldoutputs, current inputs, and geometry/materials.

Any field component (or any combinations such as flux, energy,and user-defined functions)can be evaluated at any point in space. In general, this requires interpolation from the Yee grid.Since the underlying FDTD center-difference algorithm has second-order accuracy, we linearlyinterpolate fields as needed (which also has second-order accuracy for smooth functions). Sim-ilarly, we provide an interface to integrate any function ofthe fields over any convex rectilinear


0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8

0.38

0.385

0.39

0.395

0.4

0.405

0.41

0.415

s1 size parameter (units of a)

fre

qu

en

cy (

2π c

/a)

s1

no smoothing

subpixel smoothing

exact

a

Figure 3: A key principle of Meep is that continuously varying inputs yield continuously varying outputs. Here, an eigen-frequency of a photonic crystal varies continuously with the eccentricity of a dielectric rod, accomplished by subpixelsmoothing of the material parameters, whereas the nonsmoothed result is “stairstepped.” Specifically, the plot shows aTE eigenfrequency of 2d square lattice (perioda) of dielectric ellipses (ε=12) in air versus one semi-axis diameter ofthe ellipse (in gradations of 0.005a) for no smoothing (red squares, resolution of 20 pixels/a), subpixel smoothing (bluecircles, resolution of 20 pixels/a) and “exact” results (black line, no smoothing at resolution of 200 pixels/a)

.

region (boxes, planes, or lines), and the integral is computed by integrating the linear interpola-tion of the fields within the integration region. This is straightforward, but there are two subtletiesdue to the staggered Yee grid. First, computation of quantities likeE ×H that mix different fieldcomponents requires an additional interpolation: first, the fields are interpolated onto the centeredgrid (Sec. 2), then the integrand is computed, and then the linear interpolation of the integrandis integrated over the specified region. Second, the computation of quantities likeE × H mixestwo fields that are stored at differenttimes: H is stored at times (n − 0.5)∆t, while E is storedat timesn∆t [1]. Simply using these time-offset fields together is only first-order accurate. Ifsecond-order accuracy is desired, Meep provides the optionto temporarily synchronize the elec-tric and magnetic fields: the magnetic fields are saved to a backup array, stepped by∆t, and theyare averaged with the backup array to obtain the magnetic fields atn∆t with O(∆t2) accuracy.(The fields are restored from backup before resuming timestepping.) This restores second-orderaccuracy at the expense of an extra half a timestep’s computation, which is usually negligiblebecause such field computations are rarely required at everytimestep of a simulation—see Sec. 5for how Meep performs typical transmission simulations andother calculations efficiently.

The conceptually reversed process is required for specifying sources: the current density isspecified at some point (for dipole sources) or in some region(for distributed current sources)in continuous space, and then must berestrictedto a corresponding current source on the Yeegrid. Meep performs this restriction using exactly the samecode (the loop-in-chunks abstrac-tion of Sec. 7) and the same weights as the interpolation procedure above. Mathematically, we


f1

f2

f3

f4

J1 J2

J3 J4

=0.32 =0.48

=0.08 =0.12

J

∆x

∆x

∆x

∆x

J J

JJ

f =0.32 f1

+0.48 f2

+0.08 f3

+0.12 f4

Figure 4: Left: a bilinear interpolation of valuesf1,2,3,4 on the grid (red) to the valuef at an arbitrary point. Right: thereverse process isrestriction, taking a valueJ at an arbitrary point (e.g. a current source) and convertinginto values onthe grid. Restriction can be viewed as the transpose of interpolation and uses the same coefficients.

.

are exploiting a well-known concept (originating in multigrid methods) that restriction can bedefined as thetransposeof interpolation [30]. This is illustrated by a 2d example inFig. 4. Sup-pose that the bilinear interpolationf (blue) of four grid points (red) isf = 0.32f1 + 0.48f2 +0.08f3 + 0.12f4, which can be viewed as multiplying a vector of those fields bythe row-vector[0.32, 0.48, 0.08, 0.12]. Conversely, if we place a point-dipole current sourceJ (blue) at thesame point, it is restricted on the grid (red) to valuesJ1 = 0.32J, J2 = 0.48J, J3 = 0.08J,and J4 = 0.12J as shown in Fig. 4, corresponding to multiplyingJ by the column vector[0.32, 0.48, 0.08, 0.12]T.1 Such a restriction has the property of preserving the sum (integral)of the currents, and typically leads to second-order convergence of the resulting fields as theresolution increases (see below). An example of the utilityof this continuous restriction processis shown in Fig. 5 via the phenomenon of Cerenkov radiation [31]: a point chargeq moving ata constant velocityv with a magnitude 1.05c/n exceeding the phase velocityc/n in the mediumemits a shockwave-like radiation pattern, and this can be directly modelled in Meep by a con-tinuously moving current sourceJ = −vqδ(x − vt) [32]. In contrast, pixelizing the motion intodiscrete jumps to the nearest grid point leads to visible numerical artifacts in the radiation, asseen in the right panel of Fig. 5.

All of the second-order accuracy of FDTD and the above interpolations is generally spoiled toonly first-order, however, if one directly discretizes a discontinuous material boundary [33, 35].Moreover, directly discretizing a discontinuity inε or µ leads to “stairstepped” interfaces thatcan only be varied in discrete jumps of one pixel at a time. Both of these problems are solvedin Meep by using an appropriate subpixel smoothing ofε andµ: before discretizing, discontinu-ities are smoothed into continuous transitions over a distance of one pixel∆x, using a carefullydesigned averaging procedure. Any subpixel smoothing technique will achieve the goal of con-tinuously varying results as the geometry is continuously varied. In the case of Meep this isillustrated by Fig. 3: in a 2d photonic crystal (square lattice of dielectric rods), the lowest TE-polarization eigenfrequency (computed as in Sec. 5) variescontinuously with the eccentricityof the elliptical rods for subpixel averaging, whereas the nonaveraged discontinuous discretiza-tion produces a stairstepped discontinuous eigenfrequency. On the other hand, most subpixel

1Technically, for a dipole-current source given by a delta function with amplitudeI , the corresponding current densityis J = I/∆xd in d dimensions.


smooth motion pixelized motion

v=1.05 c/n (0.35 pixels/∆t)

v

Figure 5: Cerenkov radiation emitted by a point charge moving at a speedv = 1.05c/n exceeding the phase velocity oflight in a homogeneous medium of indexn=1.5. Thanks to Meep’s interpolation (or technicallyrestriction), the smoothmotion of the source current (left panel) can be expressed ascontinuously varying currents on the grid, whereas the non-smooth pixelized motion (no interpolation) (right panel) reveals high-frequency numerical artifacts of the discretization(counter-propagating wavefronts behind the moving charge).

smoothing techniques will not increase the accuracy of FDTD—on the contrary, smoothing dis-continuous interfaces changes the structure, and generally introducesadditional error into thesimulation [33]. In order to design an accurate smoothing technique, we exploited recent resultsin perturbation theory that show how a particular subpixel smoothing can be chosen to yield zerofirst-order error [13, 33, 34, 36]. The results are shown in Fig. 6 and Fig. 7: for both computa-tion of the eigenfrequencies (of an anisotropic photonic crystal) in Fig. 6 and the scattering lossfrom a bump on a strip waveguide in Fig. 7, the errors in Meep’sresults decrease quadratically[O(∆x2)], whereas doing no averaging leads to erratic linear convergence [O(∆x)]. Furthermore,Fig. 6 compares to other subpixel-averaging schemes, including the obvious strategy of simplyaveragingε within each pixel [37], and shows that they lead to first-order convergence no betterthan no averaging at all.

The subpixel averaging is discussed in more detail elsewhere [33, 34, 36], so we only brieflysummarize it here. In order for the smoothing to yield zero first-order perturbation, the smooth-ing scheme must be anisotropic. Even if the initial interface is between isotropic materials, oneobtains a tensorε (or µ) which uses the meanε for fields parallel to the interface and the har-monic mean (inverse of mean ofε−1) for fields perpendicular to the interface—this was initiallyproposed heuristically [38] and later shown to be justified via perturbation theory [13, 33]. (If theinitial materials are anisotropic, a more complicated formula is needed [34, 36].) The key pointis that, even if the physical structure consists entirely ofisotropic materials, the discretized struc-ture will use anisotropic materials. Stable simulation of anisotropic media requires an FDTDvariant recently proposed in Ref. 39.

There are a few limitations to this subpixel averaging. First, the case of perfect metals re-quires a different approach [40, 41] that is not yet implemented in Meep. Although Meep doesnot yet implement subpixel averaging for dispersive materials, there is numerical evidence that


102

103

10-6

10-5

10-4

10-3

10-2

resolution (units of pixels/λ)

rela

tive

err

or

in ω

no averaging

mean

mean inverse

Meep

perfect linear

perfect quadratic

band 1

y

x

Ex

band 13

y

x

Ex

band 13 band 1

Figure 6: Appropriate subpixel averaging canincreasethe accuracy of FDTD with discontinuous materials [33, 34].Here, relative error∆ω/ω (comparing to the “exact”ω0 from a planewave calculation [15]) for an eigenmode calculation(as in Sec. 5) for a cubic lattice (perioda) of 3d anisotropic-ε ellipsoids (right inset) versus spatial resolution (unitsofpixels per vacuum wavelengthλ), for a variety of subpixel smoothing techniques. Straightlines for perfect linear (blackdashed) and perfect quadratic (black solid) convergence are shown for reference. Most curves are for the first eigenvalueband (left inset showsEx in xy cross-section of unit cell), with vacuum wavelengthλ = 5.15a. Hollow squares showMeep’s method for band 13 (middle inset), withλ = 2.52a. Meep’s method for bands 1 and 13 is shown for resolutionsup to 100 pixels/a.

similar accuracy improvements are obtained in that case by the same technique [42], and wesuspect that a similar derivation can be applied (using the unconjugated form of perturbation the-ory for the complex-symmetric Maxwell equations in reciprocal media with losses [43]). Second,once the smoothing eliminates the first-order error, the presence of sharp corners (associated withfield singularities) introduce an error intermediate between first- and second-order [33], whichwe hope to address in future work. Third, the fields directlyon the interface are still at best first-order accurate even with subpixel smoothing—however, these localized errors are equivalent tocurrents that radiate zero power to first order [36, 44]. The improved accuracy from smoothingis therefore obtained for fields evaluated off of the interface as in scattered flux integrated overa surface away from the interface (Fig. 7), for nonlocal properties like resonant frequencies andeigenfrequencies (Fig. 6), and for overall integrals of fields and energies [to which the interfacecontributes onlyO(∆x) of the integration domain and hence first-order errors on the interfacehave a second-order effect].


101

102

10-5

10-4

10-3

10-2

10-1

100

resolution (units of pixels/a)

rela

tive

err

or

in s

ca

tte

red

po

we

r

subpixel smoothing

no smoothing

perfect linear

perfect quadratic

a

bump

Figure 7: The relative error in the scattered power from a small semicircular bump in a dielectric waveguide (ε = 12),excited by a point-dipole source in the waveguide (geometryand fields shown in inset), as a function of the computa-tional resolution. Appropriate subpixel smoothing of the dielectric interfaces leads to roughly second-order [O(∆x2)]convergence (red squares), whereas the unsmoothed structure has only first-order convergence (blue circles).

4. Materials

Time-dependent methods for electromagnetism, given theirgenerality, allow for the simu-lation of a broad range of material systems. Certain classesof materials, particularly activeand nonlinear materials which do not conserve frequency, are ideally suited for modeling bysuch methods. Materials are represented in Maxwell’s equations (1) and (2) via the relativepermittivity ε(x) and permeabilityµ(x) which in general depend on position, frequency (ma-terial dispersion) and the fields themselves (nonlinearities). Meep currently supports arbitraryanisotropic material tensors, anisotropic dispersive materials (Lorentz–Drude models and con-ductivities, both magnetic and electric), and nonlinear materials (both second- and third-ordernonlinearities), which taken together permit investigations of a wide range of physical phenom-ena. The implementation of these materials in Meep is mostlybased on standard techniques [1],so we will focus here on two places where Meep differs from the usual approach. For nonlin-earities, we use a Pade approximant to avoid solving cubic equations at each step. For PMLabsorbing media in cylindrical coordinates, we only use a “quasi-PML” [46] based on a Carte-sian PML, but explain why its performance is comparable to a true PML while requiring lesscomputational effort.

4.1. Nonlinear materialsOptical nonlinearities arise when large field intensities induce changes in the localε or µ to

produce a number of interesting effects: temporal and spatial soliton propagation, optical bista-bility, self-focusing of optical beams, second- and third-harmonic generation, and many other


100

101

102

10-30

10-25

10-20

10-15

10-10

10-5

100

absorber length L / λ

fie

ld c

on

ve

rge

nce

~ |E

L+

1 -

EL|2

100

101

102

10-30

10-25

10-20

10-15

10-10

10-5

100

absorber length L / λ

linear

1/L6

1/L8

1/L10

quadratic

cubic

linear

1/L6

1/L8

1/L10

quadratic

cubic

PML

PML

PM

L

PM

L

L

L

PML

PML

z quasi-P

ML

r

Figure 8: The performance of a quasi-PML in the radial direction (cylindrical co-oridnates, left panel) at a resolution20 pixels/λ is nearly equivalent to that of a true PML (in Cartesian coordinates, right panel). The plot shows the differencein the electric fieldEz (insets) from a point source between simulations with PML thicknessL andL+1, which is a simpleproxy for the PML reflections [45]. The different curves are for PML conductivities that turn on as (x/L)d for d = 1, 2, 3in the PML, leading to different rates of convergence of the reflection [45].

effects [47, 48]. Such materials are usually described by a power-series expansion ofD in termsof E and various susceptibilities. In many common materials, orwhen considering phenomenain a sufficiently narrow bandwidth (such as the resonantly enhanced nonlinear effects [49] well-suited to FDTD calculations), these nonlinear susceptibilities can be accurately approximated vianondispersive (instantaneous) effects [50]. Meep supports instantaneous isotropic (or diagonalanisotropic) nonlinear effects of the form:

Di − Pi = ε(1)Ei + χ

(2)i E2

i + χ(3)i |E|

2Ei , (3)

whereε(1) represents all the linear nondispersive terms andPi is a dispersive polarizationP =χ

(1)dispersive(ω)E from dispersive materials such as Lorentz media [1]. (A similar equation relates

B andH.) Implementing this equation directly, however, would require one to solve a cubicequation at each time step [1, sec. 9.6], sinceD is updated from∇ × H before updatingE fromD.

However, eq. (3) is merely a power series approximation for the true material response, validfor sufficiently small field intensities, so it is not necessary to insist that it be solved exactly.Instead, we approximate the solution of eq. (3) by a Pade approximant [51], which matches the


“exact” cubic solution to high-order accuracy by the rational function:

Ei =

1+(

χ(2)

[ε(1)]2 Di

)

+ 2(

χ(3)

[ε(1)]3 ‖D‖2)

1+ 2(

χ(2)

[ε(1)]2 Di

)

+ 3(

χ(3)

[ε(1)]3 ‖D‖2)

[

ε(1)]−1

Di , (4)

whereDi = Di − Pi . For the case of isotropicε(1) andχ(2) = 0, so that we have a purely Kerr(χ(3)) material, this matches the “exact” cubicE to O(D7) error. Withχ(2)

, 0, the error isO(D4).For more complicated dispersive nonlinear media or for arbitrary anisotropy inχ(2) or χ(3),

one approach that Meep may implement in the future is to incorporate the nonlinear terms in theauxiliary differential equations for a Lorentz medium [1].

4.2. Absorbing boundary layers: PML, pseudo-PML, and quasi-PML

A perfectly matched layer (PML) is an artificial absorbing medium that is commonly used totruncate computational grids for simulating wave equations (e.g. Maxwell’s equations), and isdesigned to have the property that interfaces between the PML and adjacent media are reflection-less in the exact wave equation [1]. There are various interchangeable formulations of PML forFDTD methods [1], which are all equivalent to a coordinate stretching of Maxwell’s equationsinto complex spatial coordinates; Meep implements a version of the uniaxial PML (UPML), ex-pressing the PML as an effective dispersive anisotropicε andµ [1]. Meep provides support forarbitrary user-specified PML absorption profiles (which have an important influence on reflec-tions due to discretization error and other effects) for a given round-trip reflection (describing thestrength of the PML in terms of the amplitude of light passingthrough the PML, reflecting offthe edge of the computational cell, and propagating back) [45]. For the case of periodic mediasuch as photonic crystals, the medium is not analytic and thepremise of PML’s reflectionlessproperty is violated; in this case, a “PML” material overlapped with the photonic crystal is only a“pseudo-PML” that is reflectionless only in the limit of a sufficiently thick and gradual absorber,and control over the absorption profile is important [45].

For the radial direction in cylindrical coordinates, a truePML can be derived by coordinate-stretching, but it requires more storage and computationaleffort than the Cartesian UPML [52,53], as well as increasing code complexity. Instead, we chose to implement aquasi-PML[46],which simply consists of using the Cartesian UPML materialsas an approximation for the trueradial PML. This approximation becomes more and more accurate as the outer radius of thecomputational cell increases, because the implicit curvature of the PML region decreases withradius and approaches the Cartesian case. Furthermore, onemust recall thateveryPML hasreflections once space is discretized [1], which can be mitigated by gradually turning on thePML absorption over a finite-thickness PML layer. The quasi-PML approximation is likewisemitigated by the same gradual absorption profile, and the only question is that of the constantfactor in the reflection convergence: how thick does the quasi-PML need to be to achieve lowreflections, compared to a true PML? Figure 8 shows that, for atypical calculation, the perfor-mance of the quasi-PML in cylindrical coordinates (left) iscomparable to that of a true PMLin Cartesian coordinates (right). Here, we plot a measure ofthe reflection from the PML as afunction of the PML absorber lengthL, for a fixed round-trip reflection [45], using as a measureof the reflection the “field convergence” factor: the difference between theE field at a givenpoint for simulations with PML absorber lengthsL andL + 1. The PML conductivityσ(x) isturned on gradually as (x/L)d for d = 1, 2, 3, and it can be shown that this leads to reflectionsthat decrease as 1/L2d+2 and field-convergence factors that decrease as 1/L2d+4 [45]. Precisely


these decay rates are observed in Fig. 8, with similar constant coefficients. As the resolution isincreased (approaching the exact wave equations), the constant coefficient in the Cartesian PMLplot will decrease (approaching zero reflection), while thequasi-PML’s constant coefficient willsaturate at some minimum (corresponding to its finite reflectivity in the exact wave equation fora fixedL). This difference seems of little practical concern, however, becausethe reflection froma one-wavelength thick quasi-PML at a moderate resolution (20 pixels/λ) is already so low.

5. Enabling typical computations

Simulating Maxwell’s equations in the time domain enables the investigation of problemsinherently involving multiple frequencies, such as nonlinearities and active media. However, itis also well adapted to solving frequency domain problems since it can solve large bandwidthsat once, for example analyzing resonant modes or computing transmission/reflection spectra.In this section, we describe techniques Meep uses to efficiently compute scattering spectra andresonant modes in the time domain. Furthermore, we describehow the time domain method canbe adapted to a purely frequency domain solver while sharingalmost all of the underlying code.

5.1. Computing flux spectra

A principle task of computational time-domain tools are investigations of transmission orscattering spectra from arbitrary structures, where one wants to compute the transmitted or scat-tered power in a particular direction as a function of the frequency of incident light. One cansolve for the power at many frequencies in a single time-domain simulation by Fourier trans-forming the response to a short pulse. Specifically, for a given surfaceS, one wishes to computethe integral of the Poynting flux:

P(ω) = ℜ

SEω (x)∗ × Hω (x) dA, (5)

whereEω andHω are the fields produced by a source at frequencyω, andℜ denotes the real partof the expression. The basic idea, in time-domain, is to use ashort-pulse source (covering a widebandwidth including all frequencies of interest), and computeEω andHω from the Fourier trans-forms ofE(t) andH(t). There are several different ways to compute these Fourier transforms.For example, one could store the electric and magnetic fieldsthroughoutS over all times and atthe end of the simulation perform a discrete-time Fourier transform (DTFT) of the fields:

Eω =∑

n

eiωn∆tE(n∆t)∆t, (6)

for all frequencies (ω) of interest, possibly exploiting a fast Fourier transform(FFT) algorithm.Such an approach has the following computational cost: for asimulation havingT timesteps,F ≪ T frequencies to compute,NS fields in the flux region andN pixels in the entire com-putational cell this approach requiresΘ(N + NST) storage andΘ(NT + T logT) time (using aFFT-based chirp-zalgorithm [54]).2 The difficulty with this approach is that if a long simulation(largeT) is required to obtain a high frequency resolution by the usual uncertainty relation [56],then theΘ(NST) storage requirements for the fieldsE(t) andH(t) at each point inS become

2Here,Θ has the usual meaning of an asymptotic tight bound [55].


103

104

10-4

10-3

10-2

10-1

100

total time (units of optical periods)

rela

tive

err

or

in Q

PML

PML

PM

L

PM

L

filter diagonalization

curve fitting

Figure 9: Relative error in the quality factorQ for a photonic-crystal resonant cavity (inset, perioda) with Q ∼ 106, versussimulation time in units of optical periods of the resonance. Blue circles: filter-diagonalization method. Red squares:least-squares fit of energy in cavity to a decaying exponential. Filter-diagonalization requires many fewer optical periodsthan the decay timeQ, whereas curve fitting requires a simulation long enough forthe fields to decay significantly.

excessive. Instead, Meep accumulates the DTFT summation ofthe fields at every point inS asthe simulation progresses; once the time stepping has terminated, eq. (5) can be evaluated usingthese Fourier-transformed fields.3 The computational cost of this approach isΘ(N + NSF) stor-age [much less thanΘ(NST) if F ≪ T] andΘ(NT+NSFT) time. Although our current approachworks well, another possible approach that we have been considering is to use Pade approxima-tion: one stores the fields at every timestep onS, but instead of using the DTFT one constructsa Pade approximant to extrapolate the infinite-time DTFT from a short time series [57]. This re-quiresΘ(N + NST) storage (butT is potentially much smaller) andO(NT + T log2 T) time [58].

5.2. Analyzing resonant modes

Another major goal of time-domain simulations is analysis of resonant phenomena, specif-ically by determining the resonant frequencyω0 and the quality factorsQ (i.e., the number ofoptical cycles 2π/ω0 for the field to decay bye−2π) of one or more resonant modes. One straight-forward and common approach to computeω0 and Q is by computing the DTFT of the fieldat some point in the cavity in response to a short pulse [1]:ω0 is then the center of a peak inthe DTFT and 1/Q is the fractional width of the peak at half maximum. The problem with thisapproach is that the Fourier uncertainty relation (equivalently, spectral leakage from the finite

3It is tempting to instead accumulate the Fourier transform of the Poynting flux at each time, but this is not correctsince the flux is not a linear function of the fields.


time window [56]) means that resolving the peak in this way requires a simulation much longerthanQ/ω0 (problematic for structures that may have very highQ, even 109 or higher [59]). Al-ternatively, one can perform a least squares fit of the field time-series within the cavity to anexponentially decaying sinusoid, but this leads to an ill-conditioned, non-convex, nonlinear fit-ting problem (and is especially difficult if more than one resonant mode may be present). If onlya single resonant mode is present, one can perform a least-squares fit of the energy in the cavityto a decaying exponential in order to determineQ, but a long simulation is still required to ac-curately resolve a largeQ (as shown below). A more accurate and efficient approach, requiringonly a short simulation even for very largeQ values, is the technique offilter diagonalizationoriginally developed for NMR spectroscopy, which transforms the time-series data into a smalleigenproblem that is solved for all resonant frequencies and quality factors at once (even for mul-tiple overlapping resonances) [60]. Chapter 16 of Ref. 1 compared the DFT peak-finding methodwith filter-diagonalization by attempting to resolve two near-degenerate modes in a microcavity,and demonstrated the latter’s ability to accurately resolve closely-spaced peaks with as muchas a factor of five times fewer timesteps. In our own work, we have used filter diagonalizationto compute quality factors of 108 or more using simulations only a few hundred optical cyclesin length [59]. We quantify the ability of filter diagonalization to resolve a largeQ ∼ 106 inFig. 9, comparing the relative error inQ versus simulation time for filter diagonalization andthe least-squares energy-fit method above. (The specific cavity is formed by a missing rod ina two-dimensional photonic crystal consisting of a square lattice of dielectric rods in air withperioda, radius 0.2a, andε = 12 [22].) Figure 9 demonstrates that filter diagonalizationis ableto identify the quality factor using almost an order of magnitude fewer time steps than the curvefitting method. (Another possible technique to identify resonant modes uses Pade approximants,which can also achieve high accuracy from a short simulation[57, 61].)

5.3. Frequency-domain solverA common electromagnetic problem is to find the fields that areproduced in a geometry in

response to a source at a single frequencyω. In principle, the solution of such problems neednot involve time at all, but involve solving a linear equation purely in the frequency domain [22,appendix D]; this can be achieved by many methods, such as finite-element methods [6, 7, 8],boundary-element methods [9, 10, 11, 12], or finite-difference frequency-domain methods [62].However, if one already has a full-featured parallel FDTD solver, it is attractive to exploit thatsolver for frequency-domain problems when they arise. The most straightforward approach isto simply run a simulation with a constant-frequency source—after a long time, when all tran-sient effects from the source turn-on have disappeared, the result isthe desired frequency-domainresponse. The difficulty with this approach is that a very long simulation may berequired, es-pecially if long-lived resonant modes are present at nearbyfrequencies (in which case a time≫ Q/ω is required to reach steady state). Instead, we show how the FDTD time-step can be usedto directly plug a frequency-domainproblem into an iterative linear solver, finding the frequency-domain response in the equivalent of many fewer timesteps while exploiting the FDTD codealmost without modification.

The central component of any FDTD algorithm is the time step:an operation that advancesthe field by∆t in time. In order to extract a frequency-domain problem fromthis operation, wefirst express the timestep as an abstract linear operation: if f n represents all of the fields (electricand magnetic) at time stepn, then (in a linear time-invariant structure) the time step operationcan be expressed in the form:

f n+1 = T0f n + sn, (7)


100

101

102

103

104

105

10-12

10-10

10-8

10-6

10-4

10-2

100

102

104

time steps or matrix-vector products

PML

PML

PM

L

PM

LRM

S e

rro

r in

fie

lds

frequency domain

time domain

Figure 10: Root-mean-square error in fields in response to a constant-frequency point source in vacuum (inset), forfrequency-domain solver (red squares, adapted from Meep time-stepping code) vs. time-domain method (blue circles,running until transients decay away).

whereT0 is the timestep operator with no sources andsn are the source terms (currents) from thattime step. Now, suppose that one has a time-harmonic sourcesn = e−iωn∆ts and wish to solve forthe resulting time-harmonic (steady state) fieldsf n = e−iωn∆tf . Substituting these into eq. (7), weobtain the following linear equation for the field amplitudes f :

(

T0 − e−iω∆t)

f = −s. (8)

This can then be solved by an iterative method, and the key fact is that iterative methods forAx= b only require one to supply a function that multiplies the linear operatorA by a vector [63].Here,A is represented byT0−e−iω∆t and hence one can simply use a standard iterative method bycalling the unmodified timestep function from FDTD to provide the linear operator. To obtain theproper right-hand sides, one merely needs to execute a single timestep (7), with sources, startingfrom zero fieldf = 0. Since in general this linear operator is not Hermitian (especially in thepresence of PML absorbing regions), we employ the BiCGSTAB-L algorithm (a generalizationof the stabilized biconjugate gradient algorithm, where increasing the integer parameterL tradesoff increased storage for faster convergence) [64, 65].

This technique means that all of the features implemented inour time-domain solver (notonly arbitrary materials, subpixel averaging, and other physical features, but also paralleliza-tion, visualization, and user-interface features) are immediately available as a frequency-domainsolver. To demonstrate the performance of this frequency-domain solver over the straightforward


102

103

104

105

106

10-20

10-15

10-10

10-5

100

105

1010

1015

time steps or matrix-vector products

RM

S e

rro

r in

fie

lds

PML

PML

PM

L

PM

Lfrequency domain

(L=10)

frequency domain (L=2)

time domain (width=175 periods)

time domain (width=0)

Figure 11: Root-mean-square error in fields in response to a constant-frequency point source exciting one of severalresonant modes of a dielectric ring resonator (inset,ε = 11.56), for frequency-domain solver (red squares, adapted fromMeep time-stepping code) vs. time-domain method (magenta triangles, running until transients decay away). Greendiamonds show frequency-domain BiCGSTAB-L solver for five times more storage, accelerating convergence. Bluecircles show time-domain method for a more gradual turn-on of source, which avoids exciting long-lived resonances atother frequencies.

approach of simply running a long simulation until transients have disappeared, we computed theroot-mean-square error in the field as a function of the number of time steps (or evaluations ofT0 by BiCGSTAB-L) for two typical simulations. The first simulation, shown inFig. 10, con-sists of a point source in vacuum surrounded by PML (inset). The frequency-domain solver (redsquares) shows rapid, near-exponential convergence, while the error in the time-domain method(blue circles) decreases far more gradually (in fact, only polynomially). A much more chal-lenging problem is to obtain the frequency-domain responseof a cavity (ring resonator) withmultiple long-lived resonant modes: in the time domain, these modes require a long simula-tion (∼ Q) to reach steady state, whereas in the frequency domain the resonances correspond topoles (near-zero eigenvalues ofA) that increase the condition number and hence slow conver-gence [63]. Figure 11 shows the results for a ring resonator cavity with multiple closely-spacedresonant modes, excited at one of the resonant frequencies (inset)—although both frequency-and time-domain methods take longer to converge than for thenon-resonant case of Fig. 10, theadvantage of the frequency-domain’s exponential convergence is even more clear. The conver-gence is accelerated in frequency domain by usingL = 10 (green diamonds) rather thanL = 2(at the expense of more storage). In time domain, the convergence is limited by the decay ofhigh-Q modes at other frequencies, and the impact of these modes canbe reduced by turning on


the constant-frequency source more gradually (magenta triangles, hyperbolic-tangent turn-on ofthe source over 175 optical periods).

This is by no means the most sophisticated possible frequency-domain solver. For example,we currently do not use any preconditioner for the iterativescheme [63]. In two dimensions,a sparse-direct solver may be far more efficient than an iterative scheme [63]. The key point,however, is that programmer time is much more expensive thancomputer time, and this techniqueallows us to obtain substantial improvements in solving frequency-domain problems with onlyminimal changes to an existing FDTD program.

6. User interface and scripting

In designing the style of user interaction in Meep, we were guided by two principles. First,in research or design one hardly ever needs justonesimulation—one almost always performs awhole series of simulations for a class of related problems,exploring the parameter dependenciesof the results, optimizing some output as a function of the input parameters, or looking at thesame geometry under a sequence of different stimuli. Second, there is the Unix philosophy:“Write programs that do one thing and do it well” [66]—Meep should perform electromagneticsimulations, while for additional functionality it shouldbe combined with other programs andlibraries via standard interfaces like files and scripts.

Both of these principles argue against the graphical CAD-style interface common in com-mercial FDTD software. First, while graphical interfaces provide a quick and attractive routeto setting up a single simulation, they are not so convenientfor a series of related simulations.One commonly encounters problems where the size/position of certain objects is determined bythe size/position of other objects, where the number of objects is itself a parameter (such as aphotonic-crystal cavity surrounded by a variable number ofperiods [22]), where the length of thesimulation is controlled by a complicated function of the fields, where one output is optimizedas a function of some parameter, and many other situations that become increasingly cumber-some to express via a set of graphical tools and dialog boxes.Second, we don’twant to writea mediocre CAD program—if we wanted to use a CAD program, we would use a professional-quality one, export the design to a standard interchange format, and write a conversion programto turn this format into what Meep expects. The most flexible and self-contained interface is,instead, to allow the user to control the simulation via an arbitrary program. Meep allows thisstyle of interaction at two levels: via a low-level C++ interface, and via a standard high-levelscripting language (Scheme) implemented by an external library (GNU Guile). The potentialslowness of the scripting language is irrelevant because all of the expensive parts of the FDTDcalculation are implemented in C/C++.

The high-level scripting interface to Meep is documented indetail, with several tutorials, onthe Meep web page (http://ab-initio.mit.edu/meep), so we restrict ourselves to a singleshort example in order to convey the basic flavor. This example, in Fig. 12, computes the (2d)fields in response to a point source located within a dielectric waveguide. We first set the sizeof the computational cell to 16× 8 (via geometry-lattice, so-called because it determinesthe lattice vectors in the periodic case)—recall that the interpretation of the unit of distance isarbitrary and up to the user (it could be 16µm × 8µm, in which case the frequency units arec/µm, or 16 mm× 8 mm with frequency units ofc/mm, or any other convenient distance unit).Let us call this arbitrary unit of distancea. Then we specify the geometry within the cell as alist of geometric objects like blocks, cylinders, etcetera—in this case by a single block definingthe waveguide withε = 12—or optionally by an arbitrary user-defined functionε(x, y) (and


(set! geometry-lattice (make lattice (size 16 8 no-size))) (set! geometry (list (make block (center 0 0) (size infinity 1) (material (make dielectric (epsilon 12)))))) (set! pml-layers (list (make pml (thickness 1.0)))) (set! sources (list (make source (src (make continuous-src (frequency 0.15))) (component Ez) (center -7 0)))) (set! resolution 10) (run-until 200 (at-beginning output-epsilon) (at-end output-efield-z))

com

pu

tatio

na

l ce

ll &

ma

teria

lscu

rre

nt

sou

rce

run

&

ou

tpu

t

HDF5 file plotting program

Figure 12: A simple Meep example showing theEz field in a dielectric waveguide (ε = 12) from a point source at agiven frequency. A plot of the resulting field (blue/white/red= positive/zero/negative) is in the background, and in theforeground is the input file in the high-level scripting interface (the Scheme language).

µ, etcetera). A layer of PML is then specified around the boundaries with thickness 1; thislayer liesinsidethe computational cell and overlaps the waveguide, which isnecessary in orderto absorb waveguide modes when they reach the edge of the cell. We add a point source, inthis case an electric-current sourceJ in the z direction (sources of arbitrary spatial profile canalso be specified). The time-dependence of the source is a sharp turn-on to a continuous-wavesource cos(ωt) at the beginning of the simulation; gradual turn-ons, Gaussian pulses, or arbitraryuser-specified functions of time can also be specified. The frequency is 0.15 in units ofc/a,corresponding to a vacuum wavelengthλ = a/0.15 (e.g. λ ≈ 6.67µm if a = 1µm). We setthe resolution to 10 pixels per unit distance (10 pixels/a), so that the entire computational cell is160× 80 pixels, and then run for 200 time units (units ofa/c), corresponding to 200× 0.15= 30optical periods. We output the dielectric function at the beginning, and theEz field at the end.

In keeping with the Unix philosophy, Meep is not a plotting program; instead, it outputs fieldsand related data to the standard HDF5 format for scientific datasets [67], which can be read bymany other programs and visualized in various ways. (We alsoprovide a way to effectively“pipe” the HDF5 output to an external program within Meep: for example, to output the HDF5file, convert it immediately to an image with a plotting program, and then delete the HDF5 file;this is especially useful for producing animations consisting of hundreds of frames.)

Another important technique to maintain flexibility is thatof higher-order functions [68]:wherever it is practical, our functions take functions as arguments instead of (or in addition to)numbers. Thus, for example, instead of specifying special input codes for all possible sourcedistributions in space and time, we simply allow a user-defined function to be used. Moresubtly, the argumentsoutput-epsilon andoutput-efield-z to therun-until function inFig. 12 are actually functions themselves: we allow the userto pass arbitrary “step functions” to


run-until that are called after every FDTD timestep and which can perform arbitrary compu-tations on the fields as desired (or halt the computation if a desired condition is reached). Theoutput-efield-z is simply a predefined step function that outputsEz. These step-functionscan be modified by transformation functions likeat-end, which take step functions as argu-ments and return a new step function that only calls the original step functions at specified times(at the end of the simulation, or the beginning, or at certainintervals, for example). In this way,great flexibility in the output and computations is achieved. One can, for example, output a givenfield component only at certain time intervals after a given time, and only within a certain sub-volume or slice of the computational cell, simply by composing several of these transformations.One can even output an arbitrary user-defined function of thefields instead of predeterminedcomponents.

There is an additional subtlety when it comes to field output,because of the Yee lattice inwhich different field components are stored at different points; presented in this way to the user,it would be difficult to perform post-processing involving multiple field components, or evento compare plots of different field components. As mentioned in Sec. 3 and again in Sec. 7.2,therefore, the field components are automatically interpolated from the Yee grid onto a fixed“centered” grid in each pixel when exported to a file.

Although at a simplistic level the input format can just be considered as a file format witha lot of parentheses, because Scheme is a full-fledged programming language one can controlthe simulation in essentially arbitrary ways. Not only can one write loops and use arithmetic todefine the geometry and the relationships between the objects or perform parameter sweeps, butwe also expose external libraries for multivariable optimization, integration, root-finding, andother tasks in order that they can be coupled with simulations.

Parallelism is completely transparent to the user: exactlythe same input script is fed to theparallel version of Meep (written with the MPI message-passing standard for distributed-memoryparallelism [69]) as to the serial version, and the distribution of the data across processors andthe collection of results is handled automatically.

7. Abstraction versus performance

In an FDTD simulation, essentially just one thing has to be fast: inner loops over all the gridpoints or some large fraction thereof. Everything else is negligible in terms of computation time(but not programmer time!), so it can use high-level abstractions without penalty—for example,the use of a Scheme interpreter as the user interface has no performance consequences for atypical computation, because the inner loops are not written in Scheme.4 For these inner loops,however, there is a distinct tension between abstraction (or simplicity) and performance, and inthis section we discuss some of the tradeoffs that result from this tension and the choices thathave been made in Meep.

The primacy of inner loops means that some popular principles of abstraction must be dis-carded. A few years ago, a colleague of ours attempted to write a new FDTD program in textbookobject-oriented C++ style: every pixel in the grid would be an object, and every type of material

4The exception to this rule is when the user supplies a Scheme function and asks that it be evaluated for every gridpoint, for example to integrate some function of the fields. If this is done frequently during the simulation, it is slow; inthese circumstances, however, the user can replace the Scheme function with one written in C/C++ if needed. This israre because most such functions that might be used frequently during a simulation, such as energy or flux, are alreadysupplied in C/C++ within Meep.


would be a subclass overriding the necessary timestepping and field-access operations. Timestep-ping would consist of looping over the grid, calling some “step” method of each object, so thatobjects of different materials (magnetic, dielectric, nonlinear etcetera) would dynamically applythe corresponding field-update procedures. The result of this noble experiment was a workingprogram but a performance failure, many times slower than the aging Fortran software it wasintended to replace: the performance overhead of object dereferencing, virtual method dispatch,and function calls in the inner loop overwhelmed all other considerations. In Meep, each field’scomponents are stored as simple linear arrays of floating-point numbers in row-major (C) or-der (parallel-array data structures worthy of Fortran 66),and there are separate inner loops foreach type of material (more on this below). In a simple experiment on a 2.8 GHz Intel Core 2CPU, merely moving theif statements for the different material types into these inner loopsdecreased Meep’s performance by a factor of two in a typical 3d calculation and and by a factorof six in 2d (where the calculations are simpler and hence theoverhead of the conditionals ismore significant). The cost of the conditionals, including the cost of mispredicted branches andsubsequent pipeline stalls [70] along with the frustrationof compiler unrolling and vectorization,easily overwhelmed the small cost of computing, e.g.,∇ × H at a single point.

7.1. Timestepping and cache tradeoffsOne of the dominant factors in performance on modern computer systems is not arithmetic,

but memory: random memory access is far slower than arithmetic, and the organization of mem-ory into a hierarchy of caches is designed to favor locality of access [70]. That is, one shouldorganize the computation so that as much work as possible is done with a given datum once it isfetched (temporal locality) and so that subsequent data that are read or written are located nearbyin memory (spatial locality). The optimal strategies to exploit both kinds of locality, however,appear to lead to sacrifices of abstraction and code simplicity so severe that we have choseninstead to sacrifice some potential performance in the name of simplicity.

As it is typically described, the FDTD algorithm has very little temporal locality: the field ateach point is advanced in time by∆t, and then is not modified again untilall the fields ateveryother point in the computational cell have been advanced. Inorder to gain temporal locality,one must employasynchronous timestepping: essentially, points in small regions of space areadvanced several steps in time before advancing points far away, since over a short time intervalthe effects of far-away points cannot cannot be felt. A detailed analysis of the characteristicsof this problem, as well as a beautiful “cache-oblivious” algorithm that automatically exploits acache of any size for grids of any dimensionality, is described in Ref. 71. On the other hand,an important part of Meep’s usability is the abstraction that the user can perform arbitrary com-putations or output using the fields in any spatial region at any time, which seems incompatiblewith the fields at different points in space being out-of-sync until a predetermined end of thecomputation. The bookkeeping difficulty of reconciling these two viewpoints led us to reject theasynchronous approach, despite its potential benefits.

However, there may appear to be at least a small amount of temporal locality in the syn-chronous FDTD algorithm: firstB is advanced from∇ × E, thenH is computed fromB andµ,thenD is advanced from∇ × H, thenE is computed fromD andε. Since most fields are used atleast once after they are advanced, surely the updates of thedifferent fields can be merged into asingle loop, for example advancingD at a point and then immediately computingE at the samepoint—theD field need not even be stored. Furthermore, since by merging the updates one isaccessing several fields around the same point at the same time, perhaps one can gainspatiallocality by interleaving the data, say by storing an array of(E,H, ε, µ) tuples instead of separate


arrays. Meep does not do either of these things, however, fortwo reasons, the first of which ismore fundamental. As is well-known, one cannot easily mergethe B andH updates with theD andE updates at the same point, because the discretized∇ × operation is nonlocal (involvesmultiple grid points)—this is why one normally updatesH everywhere in space before updatingD from∇ ×H, because in computing∇ ×H one uses the values ofH at different grid points andall of them must be in sync. A similar reasoning, however, applies to updatingE from D andH from B, once the possibility of anisotropic materials is included—because the Yee grid storesdifferent field components at different locations, any accurate handling of off-diagonal suscepti-bilities must also inevitably involve fields at multiple points (as in Ref. 39). To handle this,Dmust be stored explicitly and the update ofE from D must take place afterD has been updatedeverywhere, in a separate loop. And since each field is updated in a separate loop, the spatial-locality motivation to merge the field data structures rather than using parallel arrays is largelyremoved.

Of course, not all simulations involve anisotropic materials—although they appear even inmany simulations with nominally isotropic materials thanks to the subpixel averaging discussedin Sec. 3—but this leads to the second practical problem withmerging theE andD (or H andB) update loops: the combinatorial explosion of the possiblematerial cases. The update ofDfrom ∇ × H must handle 16 possible cases, each of which is a separate loop (see above for thecost of putting conditionals inside the loops): with or without PML (4 cases, depending uponthe number of PML conductivities and their orientation relative to the field), with or withoutconductivity, and with the derivative of twoH components (3d) or only oneH component (2dTE polarization). The update ofE from D involves 12 cases: with or without PML (2 cases,distinct from those in theD update), the number of off-diagonalε−1 components (3 cases: 0, 1,or 2), and with or without nonlinearity (2 cases). If we attempted to join these into a single loop,we would have 16×12= 192 cases, a code-maintenance headache. (Note that the multiplicity ofPML cases comes from the fact that, including the corners of the computational cell, we mighthave 0 to 3 directions of PML, and the orientation of the PML directions relative to a given fieldcomponent matters greatly.)

The performance penalty of separateE andD (or H andB) updates appears to be modest.Even if, by somehow merging the loops, one assumes that the time to computeE = ε−1D couldbecomezero, benchmarking the relative time spent in this operation indicates that a typical 3dtransmission calculation would be accelerated by only around 30% (and less in 2d).

7.2. The loop-in-chunks abstractionFinally, let us briefly mention a central abstraction that, while not directly visible to end-users

of Meep, is key to the efficiency and maintainability of large portions of the software (field output,current sources, flux/energy computations and other field integrals, and so on). The purposeof this abstraction is to mask the complexity of the partitioning of the computational cell intooverlapping chunks connected by symmetries, communication, and other boundary conditionsas described in Sec. 2.

Consider the output of the fields at a given timestep to an HDF5datafile. Meep providesa routineget-field-pt that, given a point in space, interpolates it onto the Yee grid and re-turns a desired field component at that point. In addition to interpolation, this routine must alsotransform the point onto a chunk that is actually stored (using rotations, periodicity, etcetera) andcommunicate the data from another processor if necessary. If the point is on a boundary betweentwo chunks, the interpolation process may involve multiplechunks, multiple rotations etcetera,and communications from multiple processors. Because thisprocess involves only a single point,


it is not easily parallelizable. Now, to output the fields everywhere in some region to a file, oneapproach is to simply callget-field-pt for every point in a grid for that region and outputthe results, but this turns out to be tremendously slow because of the repeated transformationsand communications for every single point. We neverthelesswant to interpolate fields for out-put rather than dumping the raw Yee grid, because it is much easier for post-processing if thedifferent field components are interpolated onto the same grid; also, to maintain transparency offeatures like symmetry one would like to be able to output thewhole computational cell (or anarbitrary subset) even if only a part of it is stored. Almost exactly the same problems arise forintegrating things like fluxE × H or energy or user-defined functions of the fields (noting thatfunctions combining multiple field components require interpolation), and also for implement-ing volume (or line, or surface) sources which must be projected onto the grid in some arbitraryvolume.

One key to solving this difficulty is to realize that, when the field in some volumeV isneeded (for output, integration, and so on), the rotations,communications, etcetera for pointsin V are identical for all the points in the intersection ofV with some chunk (or one of itsrotations/translations). The second is to realize that, when interpolation is needed, there is aparticular grid for which interpolation is easy: forownedpoints of thecenteredgrid (Sec. 2)lying at the center of each pixel, it is always possible to interpolate from fields on any Yee gridwithout any inter-chunk communication and by a simple equal-weight averaging of at most 2d

points ind dimensions.The combination of these two observations leads to theloop-in-chunksabstraction. Given a

(convex rectilinear) volumeV and a given grid (either centered, or one of the Yee-field grids),it computes the intersection of all the chunks and their rotations/translations withV. For eachintersection it invokes a caller-specified function, passing the portion of the chunk, the neces-sary rotations (etc.) of the fields, and interpolation weights (if needed, for the boundary ofV).That function then processes the specified portion of the chunk (for example, outputting it to thecorresponding portion of a file, or integrating the desired fields). All of this can proceed in par-allel (with each processor considering only those chunks stored locally). This is (relatively) fastbecause the rotations, interpolations, and so on are computed only once per chunk intersection,while the inner loop over all grid points in each chunk can be as tight as necessary. Moreover,all of the rather complicated and error-prone logic involved in computingV’s intersection withthe chunks (e.g., special care is required to ensure that each conceptual grid point is processedexactly once despite chunk overlaps and symmetries) is localized to one place in the sourcecode; field output, integration, sources, and other functions of the fields are isolated from thiscomplexity.

8. Concluding remarks

We have reviewed in this paper a number of the unusual implementation details of Meepthat distinguish our software package from standard textbook FDTD methods. Beginning witha discussion of the fundamental structural unit of chunks that constitute the Yee grid and en-able parallelization: we provided an overview of Meep’s core design philosophy of creating anillusion of continuous space and time for inputs and outputs; we explained and motivated thesomewhat unusual design intricacies of nonlinear materials and PMLs; we discussed importantaspects of Meep’s computational methods for flux spectra andresonant modes; we demonstratedthe formulation of a frequency-domain solver requiring only minimal modifications to the un-


derlying time-stepping algorithm. In addition to the innerworkings of Meep’s internal structure,we reviewed how such features are accessible to users via an external scripting interface.

We believe that a free/open-source, full-featured FDTD package like Meep can playa vitalrole in enabling new research in electromagnetic phenomena. Not only does it provide a low bar-rier to entry for standard FDTD simulations, but the simplicity of the FDTD algorithm combinedwith access to the source code offers an easy route to investigate new physical phenomena cou-pled with electromagnetism. For example, we have colleagues working on coupling multi-levelatoms to electromagnetism within Meep for modeling lasing and saturable absorption, adaptingpublished techniques from our and other groups [16, 17, 18, 19, 20], but also including newphysics such as diffusion of excited gases. Other colleagues have modified Meep for modelinggyromagnetic media in order to design new classes of “one-way” waveguides [72]. Meep is evenbeing used to simulate the quantum phenomena of Casimir forces (from quantum vacuum fluc-tuations, which can be computed from classical Green’s functions) [73, 74]—in fact, this waspossible without any modifications of the Meep code due to theflexibility of Meep’s scriptinginterface. We hope that other researchers, with the help of the understanding of Meep’s architec-ture that this paper provides, will be able to adapt Meep to future phenomena that we have notyet envisioned.

Acknowledgements

This work was supported in part by the Materials Research Science and Engineering Cen-ter program of the National Science Foundation under Grant Nos. DMR-9400334 and DMR-0819762, by the Army Research Office through the Institute for Soldier Nanotechnologies un-der contract DAAD-19-02-D0002, and also by Dr. Dennis Healyof DARPA MTO under awardN00014-05-1-0700administered by the Office of Naval Research. We are also grateful to A. W. Ro-driguez and A. P. McCauley for their efforts to generalize Meep for quantum-Casimir problems,S. L. Chua for his work on dispersive and multi-level materials, and Y. Chong for early supportin Meep for gyrotropic media.

References

[1] A. Taflove, S. C. Hagness, Computational Electrodynamics: The Finite-Difference Time-Domain Method, Artech,Norwood, MA, 3rd edn., 2005.

[2] K. S. Kunz, R. J. Luebbers, The Finite-Difference Time-Domain Method for Electromagnetics, CRC Press, BocaRaton, 1993.

[3] D. M. Sullivan, Electromagnetic Simulation Using the FDTD Method, Wiley–IEEE Press, New York, 2000.[4] A. Elsherbeni, V. Demir, The Finite Difference Time Domain Method for Electromagnetics: With MATLAB Sim-

ulations, SciTech, Rayleigh, NC, 2009.[5] W. Yu, R. Mittra, T. Su, Y. Liu, X. Yang, Parallel Finite-Difference Time-Domain Method, Artech House, Norwood,

MA, 2006.[6] M. N. Sadiku, Numerical Techniques in Electromagnetics, CRC, 2nd edn., 2000.[7] J. Jin, The Finite Element Method in Electromagnetics, Wiley-IEEE Press, 2nd edn., 2002.[8] K. Yasumoto (Ed.), Electromagnetic Theory and Applications for Photonic Crstals, CRC, 2005.[9] S. Rao, D. Wilton, A. Glisson, Electromagnetic scattering by surfaces of arbitrary shape, IEEE Trans. Anten. Prop.

30 (3) (1982) 409–418.[10] K. Umashankar, A. Taflove, S. Rao, Electromagnetic scattering by arbitrary shaped three-dimensional homoge-

neous lossy dielectric objects, IEEE Trans. Anten. Prop. 34(6) (1986) 758–766.[11] M. Bonnet, Boundary Integral Equation Methods for Solids and Fluids, Wiley, 1999.[12] W. C. Chew, J.-M. Jin, E. Michielssen, J. Song (Eds.), Fast and Efficient Algorithms in Computational Electromag-

netics, Artech House, 2000.


[13] S. G. Johnson, M. Ibanescu, M. A. Skorobogatiy, O. Weisberg, J. D. Joannopoulos, Y. Fink, Perturbation theory forMaxwell’s equations with shifting material boundaries, Phys. Rev. E 65 (2002) 066611.

[14] L. Zhang, J. Lee, A. Farjadpour, J. White, S. Johnson, A novel boundary element method with surface conductiveabsorbers for 3-D analysis of nanophotonics, Microwave Symposium Digest, 2008 IEEE MTT-S International(2008) 523–526.

[15] S. G. Johnson, J. D. Joannopoulos, Block-iterative frequency-domain methods for Maxwell’s equations in aplanewave basis, Opt. Express 8 (3) (2001) 173–190.

[16] R. W. Ziolkowski, J. M. Arnold, D. M. Gogny, Ultrafast pulse interactions with two-level atoms, Phys. Rev. A52 (4) (1995) 3082–3094.

[17] A. S. Nagra, R. A. York, FDTD analysis of wave propagation in nonlinear absorbing and gain media, IEEE Trans.Anten. Prop. 46 (3) (1998) 334–340.

[18] S.-H. Chang, A. Taflove, Finite-difference time-domain model of lasing action in a four-level two-electron atomicsystem, Opt. Express 12 (16) (2004) 3827–3833.

[19] Y. Huang, S.-T. Ho, Computational model of solid-state, molecular, or atomic media for FDTD simulation based ona multi-level multi-electron system governed by Pauli exclusion and Fermi–Dirac thermalization with applicationto semiconductor photonics, Opt. Express 14 (8) (2006) 3569–3587.

[20] P. Bermel, E. Lidorikis, Y. Fink, J. D. Joannopoulos, Active materials embedded in photonic crystals and coupledto electromagnetic radiation, Phys. Rev. B 73 (165125).

[21] J. D. Jackson, Classical Electrodynamics, Wiley, New York, 3rd edn., 1998.[22] J. D. Joannopoulos, S. G. Johnson, R. D. Meade, J. N. Winn, Photonic Crystals: Molding the Flow of Light,

Princeton Univ. Press, 2nd edn., 2008.[23] K. S. Yee, Numerical solution of initial boundary valueproblems involving Maxwells Equations in isotropic media,

IEEE Trans. Anten. Prop. 14 (3) (1966) 302–307.[24] M. J. Berger, J. Oliger, Adaptive Mesh Refinement for Hyperbolic Partial Differential Equations, J. Comp. Phys.

53 (1984) 484–512.[25] I. S. Kim, W. J. R. Hoefer, A Local Mesh Refinement Algorithm for the Time Domain-Finite Difference Method

Using Maxwell’s Curl Equations, IEEE Trans. Microwave Theory Tech. 38 (6) (1990) 812–815.[26] S. S. Zivanovic, K. S. Yee, K. K. Mei, A Subgridding Method for the Time-Domain Finite-Difference Method to

Solve Maxwell’s Equations, IEEE Trans. Microwave Theory Tech. 39 (3) (1991) 471–470.[27] M. Okoniewski, E. Okoniewska, M. A. Stuchly, Three-Dimensional Subgridding Algorithm for FDTD, IEEE Trans.

Anten. Prop. 45 (3) (1997) 422–429.[28] C. Lin, L. Snyder, Principles of Parallel Programming,Addison Wesley, 2008.[29] T. Inui, Y. Tanabe, Y. Onodera, Group Theory and Its Applications in Physics, Springer-Verlag Terlos, 1996.[30] U. Trottenberg, C. W. Oosterlee, A. Schuller, Multigrid, Academic Press, 2000.[31] L. Landau, L. Pitaevskii, E. Lifshitz, Electrodynamics of Continuous Media, Butterworth-Heinemann, 2nd edn.,

1984.[32] C. Luo, M. Ibanescu, S. G. Johnson, J. D. Joannopoulos, Cerenkov Radiation in Photonic Crystals, Science 299

(2003) 368–371.[33] A. Farjadpour, D. Roundy, A. Rodriguez, M. Ibanescu, P.Bermel, J. Joannopoulos, S. Johnson, G. Burr, Improving

accuracy by sub-pixel smoothing in the finite-difference time domain, Opt. Lett. 31 (2006) 2972–2974.[34] A. F. Oskooi, C. Kottke, S. G. Johnson, Accurate finite-difference time-domain simulation of anisotropic media by

subpixel smoothing, Opt. Lett. 34 (18) (2009) 2778–2780.[35] A. Ditkowski, K. Dridi, J. S. Hesthaven, Convergent Cartesian grid methods for Maxwell’s equations in complex

geometries, J. Comp. Phys. 170 (2001) 39–80.[36] C. Kottke, A. Farjadpour, S. G. Johnson, Perturbation theory for anisotropic dielectric interfaces, and application

to subpixel smoothing of discretized numerical methods, PRE 77 (036611).[37] S. Dey, R. Mittra, A conformal finite-difference time-domain technique for modeling cylindrical dielectric res-

onators, IEEE Trans. Microwave Theory Tech. 47 (9) (1999) 1737–1739.[38] R. D. Meade, A. M. Rappe, K. D. Brommer, J. D. Joannopoulos, O. L. Alerhand, Accurate theoretical analysis of

photonic band-gap materials, Phys. Rev. B 48 (1993) 8434–8437, erratum: S. G. Johnson,ibid. 55, 15942 (1997).[39] G. Werner, J. Cary, A stable FDTD algorithm for non-diagonal anisotropic dielectrics, J. Comp. Phys. 226 (2007)

1085–1101.[40] P. Mezzanotte, L. Roselli, R. Sorrentino, A Simple Way to Model Curved Metal Boundaries in FDTD Algorithm

Avoiding Staircase Approximation, IEEE Microwave Guided Wave Lett. 5 (8) (1995) 267–269.[41] J. Anderson, M. Okoniewski, S. S. Stuchly, Practical 3-D Contour/Staircase Treatment of Metals in FDTD, IEEE

Microwave Guided Wave Lett. 6 (3) (1996) 146–148.[42] A. Deinega, I. Valuev, Subpixel smoothing for conductive and dispersive media in the finite-difference time-domain

method, Opt. Lett. 32 (23) (2007) 3429–3431.[43] P. Leung, S. Liu, K. Young, Completeness and time-independent perturbation of the quasinormal modes of an


absorptive and leaky cavity, Phys. Rev. A 49 (1994) 3982–3989.[44] S. G. Johnson, M. L. Povinelli, M. Soljacic, A. Karalis, S. Jacobs, J. D. Joannopoulos, Roughness losses and

volume-current methods in photonic-crystal waveguides, Appl. Phys. B 81 (2005) 283–293.[45] A. F. Oskooi, L. Zhang, Y. Avniel, S. G. Johnson, The failure of perfectly matched layers, and towards their

redemption by adiabatic absorbers, Opt. Express 16 (15) (2008) 11376–11392.[46] Q. H. Liu, J. Q. He, Quasi-PML for waves in cylindrical coordinates, Microwave and Optical Tech. Lett. 19 (2)

(1998) 107–111.[47] N. Bloembergen, Nonlinear Optics, W. A. Benjamin, New York, 1965.[48] G. P. Agrawal, Nonlinear Fiber Optics, Academic Press,San Diego, 3rd edn., 2001.[49] A. Rodriguez, M. Soljacic, J. D. Joannopoulos, S. G. Johnson,χ(2) andχ(3) harmonic generation at a critical power

in inhomogeneous doubly resonant cavities, Opt. Express 15(12) (2007) 7303–7318.[50] R. W. Boyd, Nonlinear Optics, Academic Press, London, UK, 1992.[51] J. Baker, George A., P. Graves-Morris, Pade Approximants, Cambridge University Press, 2nd edn., 1996.[52] F. L. Teixeira, W. C. Chew, Systematic Derivation of Anisotropic PML Absorbing Media in Cylindrical and Spher-

ical Coordinates, IEEE Microwave Guided Wave Lett. 7 (11) (1997) 371–373.[53] J.-Q. He, Q.-H. Liu, A nonuniform cylindrical FDTD algorithm with improved PML and quasi-PML absorbing

boundary conditions, IEEE Trans. Geoscience Remote Sensing 37 (2) (1999) 1066–1072.[54] D. H. Bailey, P. N. Swartztrauber, A Fast Method for the Numerical Evaluation of Continuous Fourier and Laplace

Transforms, SIAM J. Sci. Comput. 15 (5) (1994) 1105–1110.[55] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms, MIT Press, 3rd edn., 2009.[56] A. V. Oppenheim, R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, 3rd edn., 2009.[57] W.-H. Guo, W.-J. Li, Y.-Z. Huang, Computation of resonant frequencies and quality factors of cavities by FDTD

technique and Pade approximation, IEEE Microwave and Wireless Comp. Lett. 11 (5) (2001) 223–225.[58] S. Cabay, D.-K. Choi, Algebraic computations of scaledPade fractions, SIAM J. Comput. 15 (1) (1986) 243–270.[59] A. Rodriguez, M. Ibanescu, J. D. Joannopoulos, S. G. Johnson, Disorder-immune confinement of light in photonic-

crystal cavities, Opt. Lett. 30 (2005) 3192–3194.[60] V. A. Mandelshtam, H. S. Taylor, Harmonic inversion of time signals and its applications, J. Chem. Phys. 107 (17)

(1997) 6756–6769.[61] S. Dey, R. Mittra, Efficient computation of resonant frequencies and quality factors of cavities via a combination of

the finite-difference time-domain technique and and the Pade approximation, IEEE Microwave Guided Wave Lett.8 (12) (1998) 415–417.

[62] A. Christ, H. L. Hartnagel, Three-Dimensional Finite-Difference Method for the Analysis of Microwave-DeviceEmbedding, IEEE Trans. Microwave Theory Tech. 35 (8) (1987)688–696.

[63] R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. V. der Vorst,Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, Philadelphia, PA,1994.

[64] G. L. G. Sleijpen, D. R. Fokkema, BiCGSTAB(L) for linear equations involving unsymmetric matrices withcom-plex spectrum, Elec. Trans. on Num. Analysis 1 (1993) 11–32.

[65] G. L. G. Sleijpen, H. A. van der Vorst, D. R. Fokkema, BiCGstab(L) and other Hybrid Bi-CG Methods, Num.Algorithms 7 (1994) 75–109.

[66] P. H. Salus, A Quarter Century of UNIX, Addison-Wesley,Reading, MA, 1994.[67] M. Folk, R. E. McGrath, N. Yeager, HDF: An update and future directions, in: Proc. 1999 Geoscience and Remote

Sensing Symposium (IGARSS), Hamburg, Germany, vol. 1, IEEEPress, 273–275, 1999.[68] H. Abelson, G. J. Sussman, Structure and Interpretation of Computer Programs, MIT Press, Cambridge, MA, 1985.[69] T. M. Forum, MPI: A Message Passing Interface, in: Supercomputing ’93, Portland, OR, 878–883, 1993.[70] J. L. Hennessy, D. A. Patterson, Computer Architecture: A Quantitative Approach, Elsevier, San Francisco, CA,

3rd edn., 2003.[71] M. Frigo, V. Strumpen, The memory behavior of cache oblivious stencil computations, J. Supercomputing 39 (2)

(2007) 93–112.[72] Z. Wang, Y. D. Chong, J. D. Joannopoulos, M. Soljacic,Reflection-Free One-Way Edge Modes in a Gyromagnetic

Photonic Crystal, Phys. Rev. Lett. 100 (2008) 013905.[73] A. W. Rodriguez, A. P. McCauley, J. D. Joannopoulos, S. G. Johnson, Casimir forces in the time domain: Theory,

Phys. Rev. A 80 (2009) 012115.[74] A. P. McCauley, A. W. Rodriguez, J. D. Joannopoulos, S. G. Johnson, Casimir forces in the time domain: II.

Applications, arXiv.org e-Print archive (2009) arXiv:0906.5170.

Documents

M : A ﬂexible free-software package for electromagnetic