Continuous Measurement and Stochastic Methods in Quantum

University of New MexicoUNM Digital Repository

Physics & Astronomy ETDs Electronic Theses and Dissertations

7-11-2013

Continuous Measurement and Stochastic Methodsin Quantum Optical SystemsRobert Cook

Follow this and additional works at: https://digitalrepository.unm.edu/phyc_etds

This Dissertation is brought to you for free and open access by the Electronic Theses and Dissertations at UNM Digital Repository. It has beenaccepted for inclusion in Physics & Astronomy ETDs by an authorized administrator of UNM Digital Repository. For more information, please contactdisc@unm.edu.

Recommended CitationCook, Robert. "Continuous Measurement and Stochastic Methods in Quantum Optical Systems." (2013).https://digitalrepository.unm.edu/phyc_etds/12

Candidate

Department

This dissertation is approved, and it is acceptable in quality and form for publication:

Approved by the Dissertation Committee:

, Chairperson

Robert Lawrence Cook

Physics and Astronomy

Ivan H. Deutsch

Carlton M. Caves

Sudhakar Prasad

Terry A. Loring

Continuous Measurement and StochasticMethods in Quantum Optical Systems

B.S., University of California Santa Cruz, 2003

DISSERTATION

Submitted in Partial Fulfillment of the

Requirements for the Degree of

Doctor of Philosophy

Physics

The University of New Mexico

Albuquerque, New Mexico

May, 2013

Dedication

Kylie, I promise we’ll take a walk when this is all over.

Acknowledgments

First of all I’d like to thank my most recent and final advisor Ivan Deutch. WhenI started at UNM 10 years ago I had no idea what I wanted to study, only thata masters program seemed better than a job at the latest flying-Starbucks. It wasyour undergraduate quantum mechanics lectures that showed me how strange andrich the quantum world can be and they ultimately set me on the path to where Iam today. I will be forever grateful for your help and guidance though the bumpierparts of my graduate career. I also have to thank Brad Chase. Without you thisdissertation would have taken a very different form. Prior to reading the epic worksof van Handel et al. I would never have guessed that I’d become an advocate formathematical formalism. To Ben Baragiola I thank you for your friendship, enthusi-asm and willingness to talk though a problem. And to Heather Partner I will alwaysbe grateful for your support and camaraderie on the roller coaster ride that startedat Los Alamos, ran through UNM and ended in Sandia.

In my latest academic home of Room 30, I need to thank Carlos Riofrıo for yourfriendship, warmth and immediate inclusion into Deutsch group, Josh Combes foryour shared enthusiasm for QSDEs, Leigh Norris for your kind hearted adoption ofthe luckiest goldfish on the planet, and Vaibhav Madhok for just being Vaibhav.To the rest of Deutsch group - Bob Keating, Charlie Baldwin, and Krittika Goya -thanks for listening to me prattle on in group meeting about stochastic calculus andstatistical estimation. I hope I didn’t bore you too much. In the greater quantuminformation group I need to thank Professors Carl Caves and Andrew Landahl, cur-rent and former CQuIC students Jonas Anderson, Chris Cesare, Seth Merkel, IrisReichenbach, Alexandre Tacla, Zhang Jiang, Matthias Lang, and Shashank Pandey.I must also thank Vicky Bird for feeding us so well during arxiv review. From myshort tenure at Sandia national labs I need to thank Cort Johnson, Dan Stick, ToddBarrick, Dave Moehring, Francisco Benito, Peter Schwindt, Yuan-Yu Jau, Mike Man-gan, Tom Hamilton, and Grant Biedermann for the help and support as I learnedthat cryogenic experiments are not for me. I will never forget the time spent workingwith Roy Keyes, Tom Jones, Thomas Loyd, and Paul Martin. While we may nothave gotten a lot done we had a whole lot of fun doing it. To Laura Zschaechnerthanks for being a good friend and a shoulder to cry on. And finally I’d like thankmy parents and family for their love and support.

Continuous Measurement and StochasticMethods in Quantum Optical Systems

B.S., University of California Santa Cruz, 2003

Ph.D., Physics, University of New Mexico, 2013

Abstract

This dissertation studies the statistics and modeling of a quantum system probed by

a coherent laser field. We focus on an ensemble of qubits dispersively coupled to a

traveling wave light field. The first research topic explores the quantum measurement

statistics of a quasi-monochromatic laser probe. We identify the shortest timescale

that successive measurements approximately commute. Our model predicts that for

a probe in the near infrared, noncommuting measurement effects are apparent for

subpicosecond times.

The second dissertation topic attempts to find an approximation to a conditional

master equation, which maps identical product states to identical product states.

Through a technique known as projection filtering, we find such a equation for an

ensemble of qubits experiencing a diffusive measurement of a collective angular mo-

mentum projection, in addition to global rotations. We then test the quality of the

approximation through numerical simulations. This measurement model is known

to be entangling and without the rotations we find poor agreement between the ex-

act and approximate predictions. However, in the presence of strong randomized

rotations, the approximation reproduces the exact expectation values to within 95%

accuracy.

The final topic applies the projection filter to the problem of state reconstruc-

tion. We find an initial state estimate based on a single continuous measurement

of an identically prepared atomic ensemble. Given the ability to make a continuous

collective measurement and simultaneously applying time varying controls, it is pos-

sible to find an accurate estimate given based upon a single measurement realization.

Previous experiments implementing this method found high fidelity estimates, but

were ultimately limited by decoherence. Here we explore the fundamental limits of

this protocol by studying an idealized model for pure qubits, which is limited only

by measurement backaction. This ultimately makes the measurement statistics a

nonlinear function of the initial state. Via the projection filter, we find an efficiently

computed approximation to the log-likelihood function. Using the exact dynamics to

produce simulated measurements, we then numerically search for a maximum like-

lihood estimate based on the approximate expression. We ultimately find that our

estimation technique nearly achieves an average fidelity bound set by an optimum

Contents

1 Introduction 1

1.0.1 A note on quantum foundations . . . . . . . . . . . . . . . . . 10

1.1 An executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.1.1 Quantum optics and quantum stochastic differential equations 12

1.1.2 Classical and quantum probability theory . . . . . . . . . . . . 13

1.1.3 Projection filtering for qubit ensembles . . . . . . . . . . . . . 16

1.1.4 Qubit state reconstruction . . . . . . . . . . . . . . . . . . . . 23

2 Quantum Optics and Quantum Stochastic Differential Equations 29

2.1 Quantum Stochastic Process in Optical Fields . . . . . . . . . . . . . 30

2.1.1 Free space quantization . . . . . . . . . . . . . . . . . . . . . 32

2.2 Wave Packets, Fock Space and Stochastic

Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.2.1 Wave packets . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.2.2 Weyl operators . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Contents ix

2.2.3 Fock space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.2.4 A basis independent expression for the wave packet inner product 43

2.2.5 Fock space and stochastic srocesses . . . . . . . . . . . . . . . 45

2.2.6 Localized wave packets and stochastic processes . . . . . . . . 47

2.3 Paraxial Envelopes and Measurable Pulses . . . . . . . . . . . . . . . 49

2.3.1 Paraxial wave packets in the time domain . . . . . . . . . . . 52

2.3.2 The measurable subspace . . . . . . . . . . . . . . . . . . . . . 54

2.4 The one-dimensional limit . . . . . . . . . . . . . . . . . . . . . . . . 57

2.5 Quantum Wiener processes and the

continuous-time decomposition . . . . . . . . . . . . . . . . . . . . . 60

2.5.1 The continuous-time tensor decomposition . . . . . . . . . . . 62

2.5.2 The quantum Wiener process . . . . . . . . . . . . . . . . . . 65

2.5.3 The units of quantum noise . . . . . . . . . . . . . . . . . . . 67

2.6 Systems Interacting with Quantum Noise . . . . . . . . . . . . . . . 68

2.6.1 Quantum white noise in paraxial wave packets . . . . . . . . . 73

2.6.2 The scattering process . . . . . . . . . . . . . . . . . . . . . . 76

2.6.3 The limiting stochastic propagator . . . . . . . . . . . . . . . 78

2.6.4 A simple 1D example . . . . . . . . . . . . . . . . . . . . . . . 80

2.7 The Faraday Interaction . . . . . . . . . . . . . . . . . . . . . . . . . 80

2.7.1 The quadratic Faraday interaction . . . . . . . . . . . . . . . . 86

Contents x

3 Classical and Quantum Probability Theory 89

3.1 Classical Probability Theory . . . . . . . . . . . . . . . . . . . . . . 90

3.1.1 Stochastic processes and random variables . . . . . . . . . . . 94

3.1.2 Expectation values, the conditional expectation, and measur-

ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3.1.3 Special processes - time-adaption and martingales . . . . . . . 103

3.1.4 The Wiener process . . . . . . . . . . . . . . . . . . . . . . . 105

3.2 Quantum Probability Theory . . . . . . . . . . . . . . . . . . . . . . 107

3.2.1 Embedding the quantum into the classical . . . . . . . . . . . 107

3.2.2 Quantum probability . . . . . . . . . . . . . . . . . . . . . . 111

3.2.3 The quantum conditional expectation . . . . . . . . . . . . . . 114

3.2.4 The conditional expectation and generalized measurements . . 116

3.3 Quantum Filtering Theory . . . . . . . . . . . . . . . . . . . . . . . 118

3.4 The Conditional Master Equation . . . . . . . . . . . . . . . . . . . 122

3.4.1 The innovation process . . . . . . . . . . . . . . . . . . . . . 124

3.4.2 The Ito correction in the conditional master equation . . . . . 126

3.4.3 The conditional Schrodinger equation . . . . . . . . . . . . . 129

4 Projection Filtering for Qubit Ensembles 132

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

4.1.1 An introduction to differential projections . . . . . . . . . . . 134

Contents xi

4.1.2 The conditional master equation . . . . . . . . . . . . . . . . . 135

4.2 Differential Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . 137

4.2.1 Tangent spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 138

4.2.2 Riemannian Metrics and orthogonal projections . . . . . . . . 139

4.2.3 Differentials on abstract manifolds . . . . . . . . . . . . . . . 141

4.2.4 Stochastic calculus on differential manifolds . . . . . . . . . . 143

4.3 The Bloch Sphere as a Riemannian Manifold . . . . . . . . . . . . . . 145

4.3.1 Projecting the unconditional master equation . . . . . . . . . 146

4.4 Projections in the tensor product submanifold . . . . . . . . . . . . . 148

4.4.1 The metric in spherical coordinates . . . . . . . . . . . . . . . 149

4.4.2 Calculating collective operator inner products . . . . . . . . . 151

4.4.3 The spherical projection of the CME . . . . . . . . . . . . . . 157

4.5 The Projection Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

4.5.1 Special cases for the projection filter . . . . . . . . . . . . . . 163

4.6 Simulations and Performance . . . . . . . . . . . . . . . . . . . . . . 165

4.6.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . 167

4.6.2 Spin squeezing comparisons . . . . . . . . . . . . . . . . . . . 167

4.6.3 Squeezing simulations . . . . . . . . . . . . . . . . . . . . . . 169

4.6.4 Projection filter simulations . . . . . . . . . . . . . . . . . . . 172

5 Qubit State Reconstruction 178

Contents xii

5.1 Previous reconstruction results . . . . . . . . . . . . . . . . . . . . . . 179

5.2 The Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . 180

5.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5.3.1 Observability and randomized controls. . . . . . . . . . . . . 184

5.4 The Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . 186

5.4.1 The reconstruction procedure . . . . . . . . . . . . . . . . . . 189

5.4.2 Coupled CMEs

and filter stability . . . . . . . . . . . . . . . . . . . . . . . . 191

5.4.3 Backaction in continuous quantum measurement . . . . . . . . 195

5.5 Numeric Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

5.5.1 Simulation parameters . . . . . . . . . . . . . . . . . . . . . . 198

5.5.2 Results and discussions . . . . . . . . . . . . . . . . . . . . . 201

6 Summary and Outlook 206

6.1 Quantum optics and quantum stochastic differential equations . . . . 206

6.2 Classical and quantum probability theory . . . . . . . . . . . . . . . . 208

6.3 Projection filtering for qudit ensembles . . . . . . . . . . . . . . . . . 211

6.4 Qubit State Reconstruction . . . . . . . . . . . . . . . . . . . . . . . 214

A Paraxial Optics 217

B Classical Stochastic Calculus 220

B.1 Ito Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Contents xiii

B.1.1 The Ito conversion . . . . . . . . . . . . . . . . . . . . . . . . 225

C Quantum Stochastic Calculus 227

C.1 The Quantum Stochastic Unitary . . . . . . . . . . . . . . . . . . . . 232

C.1.1 Unitary evolution . . . . . . . . . . . . . . . . . . . . . . . . . 235

D The Quantum Wong-Zakai Theorem 237

D.1 Quantum white noise . . . . . . . . . . . . . . . . . . . . . . . . . . 238

D.2 The quantum Wong-Zakai theorem . . . . . . . . . . . . . . . . . . . 240

D.2.1 Quantum stochastic calculus and operator ordering . . . . . . 241

D.2.2 Gauge freedom in the Ito correction . . . . . . . . . . . . . . . 244

D.3 The Limiting Propagator . . . . . . . . . . . . . . . . . . . . . . . . . 248

Chapter 1

Introduction

Within the past three decades, the ability to engineer individual quantum systems

into highly nonclassical states has become a reality. The fundamental technology

that facilitated these revolutionary experiments is the coherent laser with its ability

to address specific electronic transitions in matter. A quasi-monochromatic laser can

also introduce optical forces on position degrees of freedom. While initially used to

laser cool and trap atoms, coherent electronic superpositions can also be transferred

to external superpositions allowing for atom interferometers [1] or highly nonclassical

states in trapped ions [2, 3].

In addition to providing control of an atomic system at a quantum level, the same

laser systems can be used to measure the atomic state of the system. The simplest

of all detection methods is resonant fluorescence, where laser light resonant with a

single transition is applied to an atom, which will scatter photons if that level is

occupied. However, if the internal state is in a superposition between the resonant

level and an additional off-resonant ‘dark’ state, the presence or absence of scattered

light provides information about the internal state of the system to the experimenter

The quantum nature of the atom-light interaction carries over to off resonant

Chapter 1. Introduction 2

applications. In a low intensity regime, a free space laser with a carrier frequency

significantly detuned from an atomic transition predominantly induces a state de-

pendent energy shift without significantly exciting that transition. The specific form

of the interaction also depends upon the polarization of the exciting laser. In a given

parameter regime the resulting Hamiltonian dominates over the decoherence from

absorption and subsequent emission, resulting in a controllable coupling between the

atoms and the polarization of the probe laser [5]. This coupling affects both the

atomic and polarization quantum states. The fact that the laser is a traveling wave

means that this is a fundamentally open quantum system and the output state of

light carries with it some information about the atomic state.

Quantum mechanics is at its core a probabilistic theory where the wave function

is a tool for computing the probability of observing experimental events. Upon the

receipt of a measurement outcome, an accurate description of the quantum system

must reflect this new information. This is true in an idealized projective measurement

or in indirect measurements like those described above. However there are significant

differences between the method of state detection via resonance fluorescence and off-

resonance polarization spectroscopy.

In fluorescence detection a vast majority of the scattered light is ultimately lost,

either because only a small fraction of possible emission directions are observed or

due to losses in the detection apparatus. It takes a considerable experimental effort

to measure as little as 5% of the total scattered light from a single trapped ion [6, 7].

A single ion will have to scatter a lot of light in order for an experimenter to be

able to discriminate a bright state from a dark state with any reasonable confidence.

This means that after a relatively short time period it is likely that an ion prepared

in a superposition of bright and dark states has scattered several photons that were

lost to the experimenter. Honest scientists would be forced to admit that while they

were still uncertain as to the outcome of the measurement, they are quite certain

that any coherence between the bright and dark states has been destroyed.

In the off-resonant scheme, almost all of the probe light can be collected, mean-

ing that a clever experimentalist has access to nearly all of the information available.

After a short interaction time the measured state will in general change, but only

in proportion to the amount of information gained. The point is that armed with a

complete measurement record it is possible to track the evolution of the state from

an initial superposition to a final projected outcome. This kind of measurement is

known as a weak quantum nondemolition measurement (QND), and has been demon-

strated in several different experiments involving ensembles of monatomic gasses in

various parameter regimes. One important consequence of this kind of measurement

is that the projective outcome is a highly nonclassical state involving strong quantum

coherence between all of the atoms in the ensemble. While the experimental realities

of photon scattering ultimately limit the system from reaching this eigenstate, an

intermediate squeezed state has been observed on several occasions [8–10], where

the uncertainty of the measured observable is reduced below the standard quantum

limit.

This dissertation is focused on the modeling of a quantum atomic system inter-

acting with a quantized traveling-wave optical probe when that field is also measured

continuously in time. The most interesting quantum effects occur in the idealized

case with no loss of light and a noise free measurement, which is the only case consid-

ered here. The progression from an initial superposition state to a final measurement

eigenstate is neither a time stationary process nor a linear transformation and so a

model capable of tracking the full transition must be both time-adaptive and non-

linear. Finding such a mathematical description is not a trivial exercise but one that

has been extensively studied previously.

The ultimate objective is to apply this continuous measurement model to the

problem of quantum state tomography. Constructing an estimate for an arbitrary

quantum state based upon experimental data is very resource intensive. Specifying

an arbitrary quantum state for a d-dimensional system requires at most d2 − 1 pa-

rameters and for each parameter N uncorrelated projective measurements generally

gives an accuracy of√N . Through an alternative protocol proposed by Silberfarb

et al., these inefficiencies can be largely side-stepped, by applying a weak continu-

ous measurement plus a well chosen dynamical control collectively to an ensemble

of identically prepared systems [11]. If the control drives the system in such a way

as to make the measurement informationally complete, then a single measurement

record has encoded information about all d2 − 1 parameters, albeit with a varying

level of certainty and noise corruption.

In particular, we consider an atomic ensemble prepared in an identical tensor

product state ρtot = ρ⊗n0 that experiences a known Hamiltonian while simultaneously

coupled to a traveling wave probe via a collective degree of freedom. A continuous

measurement of this probe then generates a measurement record that is strongly cor-

related with the evolution of the system. With sufficient signal to noise, a statistical

estimate of an unknown initial system state will in general have high fidelity with the

true initial condition. Using the weak measurement generated by an off-resonance

probe, this state reconstruction procedure has been implemented in the laboratory,

allowing reconstruction of the full hyperfine, d = 16, ground state manifold of a

laser cooled neutral Cs atom ensemble [12, 13]. However, these experiments were

performed in a parameter regime where the amount of information lost to the en-

vironment dominated over any measurement induced backaction. Chap. 5 explores

how this procedure performs in an opposite regime where decoherence is negligible

and we retain a complete measurement record. Arriving at this result requires sev-

eral intermediate steps, particularly a detailed knowledge of how a classical statistical

estimate is made and how that is applied to a quantum system.

The estimation of a possibly random signal from an observation corrupted by

unwanted noise is known as filtering and takes its origin from the work of Wiener

[14], where the signal was assumed to be generated with time stationary statistics. In

a linear system with additive Gaussian noise, an optimum estimate to a continuous

nonstationary signal was computed by Kalman and Bucy [15] and is an indispensable

tool in engineering and classical signal processing and control. Not surprisingly an

estimate to a nonlinear signal is significantly more challenging than in a linear system.

The nonlinear classical filter began with Stratonovich [16] and was later expressed

in the useful language of Ito calculus by Kushner [17]. Important contributions were

made by Kallianpur and Striebel [18] and Zakai [19], steps that are particularly useful

in formulating a quantum analog. Nonlinear filtering theory has an extremely wide

range of applications including GPS based navigation, optimal stochastic control,

financial portfolio optimization, audio and imaging noise removal and enhancement,

speech recognition, weather prediction, and so on [20]. For each application there

is a rich body of literature with a wealth of numerical and approximation methods

allowing for practical and, in some cases, real-time implementations.

One of the fundamental tools that makes continuous-time classical estimation

possible is stochastic calculus. In the same way that a global function can be built

up from an integral over local infinitesimals, a random signal can also be constructed

from random increments. In order for the filtering problem to be remotely tractable,

it is necessary that the random increments originate from an uncorrelated noise

source. This means that the fundamental noise injected into an otherwise determin-

istic system is assumed to be uncorrelated white noise. Under this assumption the

filtering equation is Markovian, meaning that the estimate updates only according to

the latest measurement and its most recent value. Classically, a white noise approx-

imation is often well justified when the input noise is actually an aggregate effect

from a large number of uncorrelated sources. The canonical example is a particle ex-

periencing Brownian motion. Each impulse is a collision with a background molecule

imparts a small amount of momentum to the larger particle. For times that are longer

than the time between collisions, the net displacement is uncorrelated with previous

intermediate times. In this case not only are the collisions uncorrelated, the parti-

cle’s displacement is Gaussian distributed with a variance that grows proportional

to time. Brownian motion is a classic example for a system influenced by Gaussian

white noise, but due to the central limit theorem, these models are ubiquitous in

systems with continuous trajectories.

Working with white noise directly adds another layer of sophistication to an

already mathematically challenging topic [21]. This is due to, among other problems,

the fact that as white noise is defined to have completely uncorrelated fluctuations at

any point in time, it is therefore discontinuous; continuity from one point to the next

would imply correlations. To build a mathematical framework that is both useful

and provably consistent, the hard learned lesson is to frame the problem not in terms

of the white noise itself but to instead use its integral, which is at least continuous1

[22]. The most widely used representation for an integral over Gaussian white noise

(in other words a mathematical model for Brownian motion) is the Wiener process.

Chap. 3 reviews many of its defining properties, which we are able to leverage into a

maximum likelihood estimate of an initial quantum state based upon a polarimetry

measurement.

Beyond a successful model for integrated white noise, a classical filter needs to

be able to manipulate these integrals in a full fledged calculus. The subtlety of

dealing with a randomized integral is that different limiting approximations lead to

fundamentally different stochastic calculi. The two most common forms of stochastic

integration are the Stratonovich integral and the Ito integral, with differing calculus

rules and statistical properties [22]. Both forms of integration are used here and a

brief review is presented in Appendix B. One drawback from initializing a model

with a stochastic integral is that any predictions will dramatically depend on what

kind of integral is used. At one level the choice of integral is no different than a

number of other approximations one makes in formulating a statistical model of a

given physical system.

1The calculus of randomized distributions characterizing white noise is referenced backto the integrated expressions anyway [21].

The lessons from developing a stochastic calculus for classical systems has also

been applied to quantum models as well. In the mid 80’s Hudson and Parthasarathy

developed an operator-valued quantum version of the classical Ito calculus [23]. This

quantum Ito calculus is indispensable in modeling open quantum optical systems

and has been applied to not only continuous quantum measurement [24, 25], but

also quantum control (see [26] for an overview and introduction).

The Ito integral is based upon the assumption that the noise is completely un-

correlated so that the integral between times [t0, t1) will be independent from the

integral over times [t1, t2) for any times, 0 ≤ t0 < t1 < t2, no matter how small

the difference. For an actual Brownian particle, this is only an approximation as

the particle’s velocity is correlated for times between atomic collisions. The quan-

tum Ito integral also makes such an assumption but it does so by assuming that

the operators representing equivalent integrals commute for nonoverlapping times,

no matter how small. Before immediately applying Hudson and Parthasarathy’s

formalism to the laser probed system, Chap. 2 investigates for what times such an

approximation applies, given that the resulting operators must also be consistent

with a quasi-monochromatic description of a traveling wave light field.

The similarities between classical filtering and a quantum system subject to an

indirect measurement should not be ignored. In a classical setting one seeks an es-

timate of an unobserved system state consistent with a noisy measurement. The

fundamental goal in a quantum system is to predict the results of future measure-

ments consistently and accurately with past measurements. The unobserved atomic

system is then the estimated quantity and the measured probe gives the noisy data.

To fully exploit this similarity and apply the techniques developed for classical sys-

tems, several fundamental questions about the nature of quantum statistics and

classical probability must be addressed.

The development of a continuous-time filter for a quantum system was pioneered

by the work of Belavkin starting in the early 80’s [24, 27–29]. These mathematically

rigorous results developed and applied a deep relation between the algebraic and

commutative properties of operators on a Hilbert space and the expression of clas-

sical stochastic processes. Experimental observations in elementary quantum theory

are represented by Hermitian operators and that upon making a measurement the

random outcome corresponds to an eigenvalue of that operator. The connection is

then made though the following two insights. The first insight is then to associate

operators with random variables. In classical probability all random variables are

consistent, in the sense that all random variables will agree that the same under-

lying outcome of the system occurred, no matter what order they are queried. For

quantum systems it is a hard learned fact that only commuting operators will return

consistent results. Thus, the second insight is that in order to use a sequence of

measurements for statistical inference, all of the measured operators must commute.

Additionally, any operator whose statistics we wish to infer must also commute with

all measurements to date. The utility of considering sets of commuting operators for

the purposes of statistical inference is more well known in the physics community as

the defining property of QND [30]. Working within these limitations, the problems of

noncommutativity is no longer an issue leading to a real and useful mapping between

quantum measurements and classical probabilities.

A mapping between the quantum and the classical descriptions of probability

can be more than just a guiding principle. Through a formal isomorphism between

commuting quantum operators and the language of classical filtering theory all of

the above classical results can be easily applied. The quantum filter developed by

Belavkin is nothing more than a noncommuting analog of the classical Kushner-

Stratonovich equation of nonlinear filtering [25]. Chap. 3 reviews how a mapping

between quantum operators and a formal classical probability model is made. The

purpose of this review is two fold. The first is to provide the necessary background for

a quantum filter. The second is to shed light on how the algebraic language of classical

probability theory can be applied to quantum systems thereby gaining new insights

and intuitions into the quantum/classical divide. When the quantum and classical

coincide, nearly 50 years of engineering experience can either be immediately applied

or adapted with some modifications. Finally, this chapter shows how the quantum

filter is equivalent to a generalized measurement by making a unitary extension to a

larger dimensional Hilbert space.

Chap. 4 applies one such method to the quantum system of an identical spin

ensemble undergoing a polarimetry measurement in idealized conditions. Brigo et al.

applied the methods of differential geometry to simplify a classical filter [31, 32].

This method of making differential projections was adapted to a quantum system by

van Handel and Mabuchi where they simplified a continuous quantum measurement

of a strongly driven atom-cavity system into manifold states where the cavity has a

Gaussian Q-function [33]. The method has been subsequently applied to other cavity

QED systems [34–36], collective spin systems in a linearized-Gaussian regime [37],

and to find a low rank approximation to a general master equations in Lindblad form

[38]. Chap. 4 computes the orthogonal projection of an ensemble of n qubits into

the manifold of identical separable states of the form ρ⊗n and numerically compares

the accuracy of such an approximation to a complete evolution.

Using the projected filter as a computational tool, Chap. 5 turns the problem

of quantum state tomography essentially into a classical parameter estimation prob-

lem, where the classical parameters are the pointing angles of a spin coherent state

constructed from the initial n qubits. The parameters are estimated by numerically

computing a maximum likelihood estimate based upon a polarimetry measurement

when the measurement statistics are strongly affected by quantum backaction. By

only including the conditional effects present in the projection filter we achieve an

average reconstruction fidelity that nearly saturates an optimum bound given by any

generalized measurement scheme [39].

1.0.1 A note on quantum foundations

Any work that addresses the quantum world and in particular quantum measurement

eventually encounters some issue rooted in the foundations of quantum mechanics

and the various interpretations one could assume. This dissertation does not address

quantum mechanical foundations in any meaningful way and attempts to remain

agnostic about the reality of a quantum state or even the existence of a more fun-

damental theory. Wherever possible we take a statistical perspective and implicitly

assume that the simple models we construct may not be error free, in the sense that

they do not include the whole the reality of a given experiment.

When considering quantum state tomography, we compare the conditional evo-

lution of an ensemble of initial conditions and then select the state that maximizes

a likelihood function. In our numerical simulations, no member of that ensemble

corresponds with arbitrary precision to the initial condition used to simulate the

measurement record. So in one sense the conditional state we calculate will always

be incorrect. However, in a field where the ontology of a quantum state is still de-

bated, we take a conservative position and will not to presume to know that any

conditional state is the true conditional state. Instead we will only take the stance

that what we calculate is a quantum state that best predicts any future measurement

in a manner that is consistent with past results and the assumptions of the model.

We identify this state through the framework of quantum probability theory, a for-

malism that is less well known to physicists working in quantum information theory.

The final object that we calculate is ultimately no different from what is given by

the usual stochastic Schrodinger/master equations that are used in quantum optics.

The purpose of working with quantum probability theory is that it illustrates an

immediate connection to classical probability and estimation theory. In the classical

setting, an estimator is a tool that is used to predict or estimate some quantity given

a series of measurements. The stochastic Schrodinger equation is in a very real sense

a quantum estimator. We would rather not comment as to whether or not it is

estimating the state of the system because it lacks knowledge of a theory extending

beyond standard quantum mechanics or if it is the fundamental limit and there is

no more information in existences. In effect, we assume that the quantum state is

simply a tool for making predictions about a quantum system.

1.1 An executive summary

This is a terse summary of the fundamental results of this dissertation, presented

in the same order as the subsequent chapters. This is not intended to be a gentle

introduction to the material and assumes a strong familiarity with the background

material. We encourage an interested but nonexpert reader not to struggle too

hard trying to comprehend this section and instead consult the main text and the

associated appendices.

If any single global thesis can be applied to the entirety of this work it is that clas-

sical probabilistic methods are useful and that with some care they can be adapted

to quantum systems. The previous introduction discussed how to connect stochastic

calculus and nonlinear filtering theory to a quantum system continuously probed by

an optical field. Chaps. 2 and 3 provide a physical and mathematical foundation for

this connection while Chaps. 4 and 5 apply it to the specific problem of efficiently

estimating an initial qubit state. Chap. 6 discusses possible directions this work

could take. In addition to this main matter, we include several appendices providing

background material such as a review of paraxial optics (Appendix A), stochastic

differential equations (Appendix B), quantum stochastic differential equations (Ap-

pendix C) and the quantum Wong-Zakai theorem (Appendix D).

1.1.1 Quantum optics and quantum stochastic differential

equations

Chap. 2 shows how a second quantized picture of classical traveling wave packets

reproduces the mathematical structure necessary for defining a formal quantum Ito

stochastic calculus. It also identifies the timescales for which a quasi-monochromatic

field can be approximated as generating quantum white noise. This is a regime that

is independent from any system coupling or measurement apparatus and applies for

a large family of states - include highly nonclassical states, such as multi-mode Fock

states.

The specific model we consider is the second quantization of quasi-monochromatic

wave packets [40–42] where the single particle Hilbert space is the space of coherent

state amplitudes for an associated classical field. We assume a paraxial model where

there is a factorization between a carrier plane wave exp(−iω0(t − z/c)), spatial

mode function u(+)T (x, y, z), and longitudinal envelope function f(t−z/c). The quasi-

monochromatic approximation means that the longitudinal function must satisfy the

inequality, |f(t)| 1ω0

∣∣ ∂∂tf(t)

∣∣ 1ω2

∣∣∣ ∂2

∂t2f(t)

∣∣∣.We ultimately seek creation and annihilation operators that are simultaneously

quasi-monochromatic as well as consistent with a quantum white noise approxima-

tion. To do so, we define a†[f(0)] to be the operator that creates a single quantum in

a given spatial mode u(+)T (x, y, z), with an envelope function f , referenced to some

point along the optical axis. The operator a[f(t)] annihilates a quantum in a similar

mode but one that has experienced free propagation for a time t. We derive the

unequal time commutation relation,[a[f(t1)], a†[f(t2)]

]∝ e−iω0(t2−t1)

(f ? f (t2 − t1)− i 1

dt? f (t2 − t1)

)(1.1)

where g ? f is the cross-correlation function of g and f and the proportionality

constant is simply a scaling factor that can be absorbed into the definition of f .

Physically it is entirely reasonable that if a classical envelope is no longer temporally

correlated then the associated field operators should commute. To the best of our

knowledge this is a new result in the characterization of quantized fields.

The canonical definition for quantum white noise is that there exist the creation

and annihilation operators [a(t), a†(t′)] ∝ δ(t − t′). Therefore in order for quasi-

monochromatic light to be consistent with a white noise approximation, not only

does f ? f (t2 − t1)→ δ(t2 − t1) in a suitable limit but 1ω0

dfdt? f (t2 − t1)→ 0.

From an approximation to white noise, (in a rotating frame) we then use the

limiting white noise operators and a recent theorem by Gough [43], reviewed in

Appendix D, to consider the dispersive Faraday interaction, in an idealized regime

where the possibility for multiple scattering events is nonnegligible. The limiting

object is a quantum stochastic Ito equation for the propagator that describes the

unitary evolution between the field and the atomic system. Using well know results

in quantum stochastics we write down the equivalent master equation in Lindblad

1.1.2 Classical and quantum probability theory

Chap. 3 is a mathematical review of well known results from classical and quantum

probability theory, which serves as a foundation for the novel work in later chapters.

This review is conducted with an emphasis for physicists and attempts to explain

and justify the concepts while omitting the proofs. The end goal is have an under-

standing of how the conditional master equation results from a mapping between

sets of commuting operators and a classical probability space. This is a critical point

as the driving noise in the conditional master equation is not a Wiener process, but

is instead the random outcomes of a continuous quantum limited measurement. The

resulting classical stochastic process ytt≥0 is only a Brownian motion when the

measurements are (i) of a field quadrature in the vacuum state and (ii) there is no

system coupling to that quadrature, i.e. the measurement has no system information.

The second objective of this chapter is to emphasize the general power of this

technique and to discuss how the language of classical probability theory can be used

to identify semiclassical subspaces embedded in a quantum system. In order to do

so in a relatively self-contained manner we review the basic elements in the triple

(Ω,F ,P) forming a classical probability space. The infinite dimensional example

we focus on is the sample paths for a Brownian motion and explicitly describe the

relevant σ-algebra. In order to introduce the quantum conditional expectation, we

first review the classical conditional expectation and more generally how expectation

values are computed in the measure theoretic framework. We then introduce the

concept of time-adapted processes and martingales as both are crucial in the quantum

case. The review of classical probability theory concludes by discussing the Wiener

process and the Wiener measure over the space of continuous functions.

From a firm description of classical probability theory we then discuss the quan-

tum analog. We explicitly show how one identifies a classical probability space from

the set of mutually commuting observables by taking Ω as the set of possible eigenval-

ues, F as the σ-algebra generated by those eigenvalues, and the probability measure

P as the quantum expectation under the state ρ of the associated projectors. From

this semiclassical description we then introduce the noncommutative analog were

one omits a sample space of compatible outcomes, identifies the σ-algebra with a ∗-

algebra of operators (or a von Neumann algebra in the infinite dimensional case), and

the probability measure with a valid quantum state ρ. We then explain the power of

generating sub-∗-algebras from sets of operators focusing on the important object of

the commutant. We specifically explain how the commutant is the largest space of

operators that we can condition on a sequence of commuting observations and how it

contains noncommuting elements. Armed with that description we identify the prop-

erties of the quantum conditional expectation, and provide an explicit construction

for how it is in correspondence with the generalized measurements found in quantum

information theory. From the discussion of the quantum conditional expectation we

then state the resulting the quantum filter as it is generated from the observation

processYt = U †t (At + A†t)Ut

under vacuum expectation.

While the quantum filter is an elegant expression for a conditional operator, it

rarely closes to a finite set of quantum stochastic differential equations. Rather it is

more useful to work with an effectively semiclassical equation, the conditional master

equation. Here we use the term semiclassical in a sense that does not imply a subop-

timal approximation but rather to indicate that the quantum measurement process

Ytt≥0 (a family of operators) is treated as a classical stochastic process ytt≥0 (a

family of classical random variables defined on the probability space (Ω,F ,P) ). The

probability measure P matches the statistics of ytt≥0 to the quantum measurement

statistics Ytt≥0. While this mapping “demotes” the measurement operators to a

classical process, it still treats the system quantum mechanically, by propagating a

density operator ρtt≥0. Generally the statistics of ytt≥0 will depend upon a quan-

tum system expectation value, and so this is semiclassical and not a fully classical

probability model. This system density operator matches the quantum conditional

expectation by enforcing the equality

πt(X)|Yt=yt = Tr(ρtX) (1.2)

for every system operator X and time t.

The quantum filter is derived in terms of a quantum Ito equation and so the

resulting semiclassical conditional master equation is a matrix-valued classical Ito

equation. In Chap. 4 we are required to express it in terms of a Stratonovich

integral and so we derive the associated correction factor here. The chapter closes

by finding a conditional Schrodinger equation that corresponds to the more general

master equation in the case of pure states. This equation is useful for numerical

simulation as propagating a complex vector is more efficient than a complex matrix.

1.1.3 Projection filtering for qubit ensembles

Chap. 4 derives an approximate form of the conditional master equation for an

ensemble of n qubits under the assumption that the state will remain nearly an

identical separable state. The approximation is made though a technique known as

projection filtering, developed to reduce the dimension of a classical filtering equation

by formulating the space of solutions as a Riemannian manifold and then making

an orthogonal projection onto a lower dimensional manifold. The lower dimensional

manifold that we wish to project onto is the space of density matrices that can be

written as % = ρ⊗n for some valid single qubit state ρ. The appeal of the projection

filtering technique is that it is algorithmic in nature, in that after identifying the

desired manifold and making a choice of metric, finding the optimal projection is

reduced to a problem of matrix algebra. Due to the simplicity of the qubit we are

able to solve for this projection analytically.

A third of this chapter reviews the fundamentals of differential geometry, focusing

on the mapping between qubit states and the Bloch ball. We refer to the set of

valid quantum states for a d < ∞ dimensional quantum system as S(d) and the

three-dimensional unit ball as B. The metric we use is the trace inner product

〈A, B〉% = Tr(A†B) for A,B ∈ T% S(d). From the standard mapping between points

in the Bloch ball and qubit states ρ : B ⊂ R3 → S(2)

ρ(x) = 12

(1+ xiσi

)(1.3)

we identify a basisDi ≡ 1

for the tangent space Tρ(x)S(2). The resulting trace

inner product induces an Euclidean metric on B,

〈Di, Dj〉ρ = 14

Tr(σiσj) = 12δij. (1.4)

The manifold we ultimately want to consider is the set of density operators

P ≡ρ(x)⊗n : x ∈ B

⊂ S(2n). (1.5)

Any derivative we define for % ∈ P must distribute over the tensor product structure,

and so we identify the tangent space

T%(x)P = span

Di(x) =

n∑`=1

ρ(x)⊗`−1 ⊗ 12σi ⊗ ρ(x)⊗n−`

. (1.6)

For n 6= 1, the metric on the Bloch ball induced from the trace inner product is no

longer Eucildean. Instead it is given by the matrix

gij(x) = Tr(Di(x)Dj(x) )

2n(1 + |x|2

)n−1δij +

n(n− 1)

2n(1 + |x|2

)n−2xkx` δkiδ`j.

This metric is however isotropic, which can be seen by converting to spherical coor-

dinates. The resulting line element is

ds2 =n

2n(1 + r2

)n−1(

1 + nr2

1 + r2dr2 + r2 dθ2 + r2 sin2 θ dφ2

). (1.8)

With this non-Euclidean metric, we then wish to apply the projection map ΠP :

T%S(2n)→ T%P , defined as

ΠP(X) = gij(x) 〈Dj(x), X〉% Di(x) (1.9)

to each terms in the conditional master equation.

A general unconditioned master equation written in Lindblad form is,

ddt% = −i[H, %] +D[L](%) (1.10)

for some Hamiltonian H and jump operator L. As the master equation describes a

valid quantum evolution, the righthand side of this equation must describe a vector

in the tangent space T%S(2n). Applying the projector ΠP to the general master

equation results in a new master equation, describing the evolution of a modified

state %|P ,

dt%|P = gij(x)

(〈Dj(x), −i[H, %(x)]〉% + 〈Dj(x), D[L](%)〉%

)Di(x). (1.11)

This new equation is guaranteed to both produce a valid quantum evolution as well

as constrain the state to remain in P . Performing this projection in the case of a

conditional master equation is essentially no different, with one caveat due to the

subtle nature of stochastic integrations.

Converting the derivative ddt%|P into a differential form dρ|P has no ambiguity in

interpretation as the differential

d%|P = ai(x)Di(x) dt (1.12)

describes a valid mapping between the tangent space TtR+ and the tangent space

T%(x)P . However, interpreting a general stochastic differential in terms of a differen-

tial form is problematic because the explicit path-wise derivative generally does not

exist. Even if one were to solve the stochastic differential equation

d%|P = B(xt) dwt (1.13)

for B(xt) ∈ T%(xt)P there is no a priori reason to assume that the resulting solution

will remain in P . In developing the projection filtering technique, Brigo et al. found

that a solution to a general Ito equation decidedly does not satisfy this property

[31]. The problem is that the drift induced by the second order nature of the Ito

rule causes the solution to leave P even when the integrand is in the proper tangent

space. The saving grace is that the orthogonal projection method does constrain the

solution when the original equation is written in Stratonovich form. The bottom line

is that in order to project the conditional master equation into the tangent space

T%P it must first be written as a Stratonovich integral. Chap. 3 calculates the

proper conversion factor, generating the Ito correction map Ic[L](%) for a general

measurement operator L.

The conditional master equation is given by the Stratonovich equation

d% = −i[Htot, %]dt+D[Ltot](%)dt+ Ic[Ltot](%)dt+H[Ltot](%) dvt (1.14)

where the maps D[Ltot](·), H[L](·), and Ic[L](·) are given in eqs. (4.5, 4.6, and 4.10)

respectively. The subscript tot used here is used to specify that these operators act

on the joint Hilbert space over all n qubits.

The projections of each term are computed relatively generally, but under the

assumption that the operators Htot and Ltot act identically and independently on

each qubit in the ensemble. This means that for the single qubit operators H and L

the joint operators are equal to

Htot =n∑`=1

1⊗`−1 ⊗H ⊗ 1⊗n−` (1.15)

Ltot =n∑`=1

1⊗`−1 ⊗ L⊗ 1⊗n−`. (1.16)

From the general expressions we also specialize to the examples of L =√κ 1

2σz and

H = 12(f 1(t)σx+f 2(t)σy+f 3(t)σz) for a constant rate κ and deterministic real valued

control fields f i(t). This specialized example corresponds to an idealized model of

a dispersive measurement of a collective angular momenta and a time varying but

uniform magnetic field. The final expression we calculate for this example and call

the projection filter is a system of coupled Ito stochastic differential equations that

correspond to the single particle Bloch vector components, xt,

dxt = a1(x, t) dt−√κx z dvt,

dyt = a2(x, t) dt−√κ y z dvt,

dzt = a3(x, t) dt+√κ(1− z2) dvt.

(1.17)

The deterministic integrands ai(x, t) are

a1(x, t) = f 2(t) z − f 3(t) y − 12κx+ κ γ(r)x z2,

a2(x, t) = f 3(t)x− f 1(t) z − 12κ y + κ γ(r) y z2,

a3(x, t) = f 1(t) y − f 2(t)x− κ (n− 1)(

1−r2

)z − κ γ(r) z3

(1.18)

with the function

γ(r) ≡ (1− r2)

(n (n+ 1)

2 (1 + n r2)− 1

1 + r2

). (1.19)

Finally the stochastic increment dvt is the innovation process, calculated from the

measurement process yt (no relation to the Bloch vector component) with a differ-

ential

dvt = dyt − n√κ zt dt. (1.20)

The nonlinear function γ(r) has two important zeros that simplify the projection

filter dramatically. The first is that when n = 1, γ(r) = 0 for every value of r.

Furthermore it is easy to compute that when evaluating the projection filter for

n = 1, the equations are identical to a set of conditional Bloch vector equations one

obtains directly from the conditional master equation. In other words, the projected

space is the whole manifold of solutions, P = S(2). The second zero occurs for

γ(r = 1) = 0 for any n. The n and r dependent terms in a3(x, t) also evaluate to

zero for r = 1, meaning that for any n the projected pure state evolution is essentially

identical to the evolution of a single qubit state. The only remaining n dependence is

that the innovation process requires the expected measurement outcome to be scaled

by a factor of n.

There are three elements that makes this projection filter tractable for obtaining

an analytic expression. The first is the isotropic nature of trace inner product metric,

dramatically simplifying the calculation. The second is that the Pauli matrices form

a simple basis for 2 × 2 complex matrices and have equal eigenvalues. The third

is the identical and independent assumption for the joint operators. This allows for

terms that would in general result in ensemble averages to be given by identical single

particle values.

The final element of Chap. 4 is a series of numerical experiments testing the

accuracy and performance of the projection filter against exact simulations for initial

pure spin coherent states. The rate κ sets the measurement timescale and so all

times in the simulations are compared to this rate, effectively setting it to 1. Each

simulation ran for a fixed time t = 0.2κ−1. As the quality of the projection filter

should explicitly depend upon n, these simulations test the qubit numbers n =

1, 25, 50, 75, 100. The average performance data included a sample of ν = 100

isotropically sampled initial qubit states with a single noise realization for each initial

state.

In addition to testing the performance as a function of n, it also tests two different

control functions f i(t). The first is for f i(t) = 0 for all t and i. This corresponds to a

QND measurement of the z projection of the total angular momentum formed by the

qubit ensemble, Jz, and is known to produce spin squeezing. We find that an initial

state involving 50 qubits prepared in +Jx eigenstate produced ≈ 10 dB of squeezing

in one measurement duration. A squeezed state is inherently not a product state,

and so serves as a worst case scenario for the projection filter and acts as a lower

bound on its performance.

The zero field measurement is compared to the case of a strong randomized

control sequence. Chap. 5 uses the projection filter in an algorithm to reconstruct

the initial condition of a SCS from a continuous measurement of Jz, characterized

by the rate κ. In order to obtain information about observables other than Jz, an

external control Hamiltonian must be applied. For reasons discussed in Sec. 5.3.1,

this takes form of a sequence of global π/2 rotations, where each rotation is about

an axis n independently sampled from a uniform distribution. Fully characterizing

the control amplitude f(t) requires specifying the amplitude and duration of each

pulse, as a larger Larmor frequency is needed to enact the same rotation in a shorter

time. For simplicity, we will fix f(t) to have a constant magnitude and so for a pulse

duration τ the control field is then given by,

f(t) =π

∑m=1

χ[m−1,m)( t/τ) nm (1.21)

where χ[a,b) (t) is the indicator function for the interval [a, b) and nm are i.i.d.

unit vectors drawn from a isotropic distribution.

The accuracy of the projection filter was tested by comparing how well it is

capable of reproducing the conditional expectation values of the collective angular

momentum components Ji as well as the squared overlap between the exact state and

the equivalent spin coherent state that is made from an ensemble of n identical pure

qubits. The time-dependent results are presented in Figures 4.4 and 4.5. The RMS

error in the conditional expectation values were generally independent of the number

of qubits, likely due to the fact that for pure states the projection filter dynamics

are essentially independent of n. With the randomized controls the RMS error was

. 5% of the total spin length in all 3 expectation values. In the absence of a control

field, there was a general linear increase in the Jx and Jy errors also reaching the

5% level, while there was a noticeable increase in the Jz error with a final value in

the ∼ 5− 10% range. The poorer performance is attributable to the effect the spin

squeezing has on the mean values.

The squared overlap between the exact state and the equivalent spin coherent

state exhibits a strong dependence upon both the number of qubits and the con-

trol fields. In the uncontrolled case this metric monotonic decreased for all n > 1

dropping to 0.75 for n = 25 and 0.48 for n = 100. This is in stark contrast to the

simulations including the randomized controls. While the resulting average fidelity

was noticeably poorer for large n, the minimum value was > 0.8 for all n. While the

trend was to have poorer fidelities at longer times, the decrease was not monotonic

implying that the control wave form could be optimized to maximize the average

overlap with the spin coherent state and thereby minimizing the information lost by

performing the the projection.

We hypothesize that the state remains closer to a product state because the

randomized controls tends to mix both the squeezed and anti-squeezed components

leading to a near zero average. Not only does the mean spin rotate, but the orien-

tation of the squeezing ellipse also rotates. As the rotation axes are chosen from a

uniform distribution, the squeezed component is just as likely as the anti-squeezed

component to be oriented along the measurement axis. At any given time, the uncer-

tainty in the Jz component is equally likely to be above or below the uncertainty of an

equivalent spin coherent state. Therefore it is difficult for any significant squeezing

to develop, and thereby keep the exact state closer to the product state description.

1.1.4 Qubit state reconstruction

Chap. 5 describes how to use the quantum filtering formalism in order to construct

a tomographic estimate for an unknown initial quantum state from an ensemble of

identical copies experiencing a joint continuous measurement. We make a maximum

likelihood estimate (MLE) of the initial state, based upon the statistics of a single

continuous measurement realization. The purpose of this work is to extend previous

results [11–13], which used a continuous measurement for quantum state tomogra-

phy, into a regime where the quantum backaction significantly effect the measurement

statistics. In an idealized numerical study, we find that such an estimate can nearly

saturate an optimum reconstruction bound. Much is known about the fundamental

quantum limits of reconstructing pure qubit states from a finite number of measure-

ments. Massar and Popescu showed that given n copies of a pure qubit state, it is

possible to find a generalized measurement that returns the highest average fidelity

between the estimated state and the correct initial state [39]. The average is made

not just over measurement outcomes but also over an unbiased set of possible input

states. The optimum average fidelity bound is simply given 〈F〉opt = (n+1)/(n+2).

We consider here an idealized model of an ensemble of n qubits identically coupled

to a single traveling wave quantum light field via a linearized Faraday interaction.

Under certain approximations discussed in Sec. 2.7, a measurement of the orthogonal

quadrature contains information about the collective angular momentum variable

Jz, with a coupling rate κ. The ensemble is assumed to be prepared in a pure spin

coherent state characterized by the unknown polar angles (θ, φ). However the initial

qubit state is not a QND variable, meaning that ρ(θ, φ)⊗n does not commute with the

fundamental interaction Hamiltonian. The implication of this is that it is impossible

to find a consistent method for inverting the forward time dynamic to arrive at a

conditional expression for the initial state.

To circumvent this problem we instead map the quantum state estimation prob-

lem to a parameter estimation problem to find a MLE of (θ, φ). A single continuous

measurement realization of a noncommuting output quadrature results in a stochas-

tic process ytt≥0 that contains information about the atomic ensemble. Because

of this information, its statistics parametrically depend upon the unknown angles.

While a MLE based upon a single data point would perform quite poorly, we find a

conditional estimate based upon the entire trajectory performs quite well, when the

measurement is informationally complete. To ensure informational completeness, a

known time-varying control Hamiltonian is applied to the system, thereby mixing

all spin projections with the measurement axis. Riofrıo et al. found that an efficient

and unbiased control policy is to choose a set of operations capable of generating

any single particle state and then randomly varying the magnitude of each control

in a piecewise constant way [13]. Here we use a similar control policy by including

in the modal a uniform magnetic field with a constant field strength that rotates the

collective angular momentum vector by π/2 in a period τ about randomly chosen

rotation axes. In a fixed final time we find 40 rotations provides enough information

to obtain high fidelity estimates.

In the semiclassical probability space induced by a measurement realization, the

appropriate probability measure P has a parametric dependence on the initial angles

(θ, φ). Identifying this dependence is best seen by considering not the conditional

statistics of ytt≥0 but instead the calculated innovation process vtt≥0. For the

measurement model considered here this process is given by

vt = yt − 2√κ

ds Tr(Jz ρs(θ, φ) ), (1.22)

where ρs(θ, φ) is the system density operator calculated via the conditional master

equation assuming that initial condition is given by the angles (θ, φ). The inno-

vation process vt is shown in Sec. 3.4.1 to have the statistics of a Wiener process

if Tr(Jz ρs(θ, φ) ) corresponds to the exact quantum conditional expectation of the

Heisenberg picture operator U †t JzUt. If this correspondence cannot be made because

ρs(θ, φ) used an incorrect initial condition, then vt will not have statistics of a Wiener

process for every measurement realization. Here we use this fact to find the MLE for

(θ, φ). We seek the initial condition that makes the innovation process most likely to

be a Wiener process. As the conditional master equation is a nonlinear equation, we

resort to a mixture of numerical and analytical methods for finding an approximation

to the true likelihood function.

The Wiener measure gives the probability for a Wiener process sampled at times

ti : i = 1, . . . , n will be within associated intervals Ii = (ai, bi) and is given by the

integral

P (vti ∈ Ii) =

∫ b1

dv1 . . .

∫ bn

n∏i=1

2π∆tiexp

(−(vi − vi−1)2

2∆ti

)). (1.23)

Because this is a Gaussian probability density, the MLE coincides with the least

squares estimate. For an equally spaced mesh of finite time intervals both the nor-

malization factor and the denominator of the exponent are irrelevant for the purposes

of computing a MLE. Therefore maximizing the likelihood function is equivalent to

minimizing the quadratic variation,

QV(vt) ≡n∑i=1

(∆yi − 2

√κ∆tTr(Jz ρti−1

(θ, φ) ))2

(1.24)

The minimization of this function with respect (θ, φ) is computed numerically as we

are unable to find an analytic solution to the conditional master equation and the

dependence upon the initial condition is nonlinear.

Every evaluation of this function requires a full numerical integration of the con-

ditional master equation. Numerically integrating an exact conditional pure state is

a computationally intensive task. Here we test spin ensemble involving 25 ≤ n ≤ 100

qubits, which require order n complex number to fully describe the relevant condi-

tional dynamics. While it is computationally feasible to integrate the exact equation

to generate a simulated measurement record, for every measurement record we used

a total of 500 evaluations of the quadratic variation cost function. We found it

infeasible to use an exact expression for computing QV(vt) and instead sought a

reasonably accurate approximation. The approximation we use is the projection fil-

ter developed in Chap. 4. Under identical conditions to the dynamics used here,

the projection filter is able to match the expectation value Tr(Jz ρt) to within 95%

accuracy of the exact value while only propagating 3 real numbers for any number

of qubits. By minimizing with respect to the approximate filter, the limiting com-

putational element became generating the simulated measurement record. While in

a higher dimensional space one could approach the problem with a gradient descent

algorithm, we find it more efficient to simply make a dense Monte Carlo sampling of

the entire Bloch sphere2 and then select the most likely sample.

In order to understand what role backaction plays in limiting the reconstruction

fidelity, we compare the performance of the projection filter estimate to one that

ignores completely the conditional dynamics. Instead of propagating a conditional

state, this estimate only considers the Hamiltonian dynamics generated by the mag-

netic field rotations. In other words by solving the Heisenberg equation of motion,

dtσz(t) = +i[1

2f j(t)σj(t), σz(t)] (1.25)

for the controls f i(t) given in Eq. (1.21), the expectation value√κnTr(σz(t)ρ(θ, φ))

reproduces the expected signal, ignoring the backaction. The purpose for computing

2Due an issue involving numerical stability we start with a uniform sampling of mixedstates and then make a subsequently smaller sample of pure states. See Sec. 5.4.2.

this in the Heisenberg picture is so that we are able to solve for the dynamical

observables once, and then apply that solution for any initial condition.

The results of the numerical experiments are given in Fig. 5.2. The estimate

based upon the projection filter nearly achieves the optimum (n+ 1)/(n+ 2) fidelity

bound, averaged over ν = 1000 trials for n = (25, 40, 55, 70, 85, 100) qubits. The

difference between the optimum bound and the numerical averages never exceeded

0.21%, a deviation that is likely statistically significant but not attributable to any

fundamental Monte Carlo sampling errors. In comparison the backaction-free es-

timator performed significantly poorer especially for higher qubit numbers. This

suggest that including the conditional dynamics is indeed important in this idealized

scenario.

The cause of the discrepancy between the projection filter estimate and the

backaction-free estimate is likely due to a bias that develops when all measurement

effects are ignored. This can be see in Figures 5.3 and 5.4. These figures plot the av-

erage reconstruction fidelity as a function of the measurement duration. The filtering

based estimate shows a monotonic rise in the average fidelity which then saturates at

a level only slightly below the optimum bound. As n increases this saturation occurs

at earlier times. In contrast, the estimate based only on Hamiltonian evolution does

not have monotonic increase in reconstruction fidelity. For n = 55, 70, 85, 100 the

average fidelity reaches a maximum and then has significant decrease as more data

are collected. For n = 25, 40 it is possible that a decrease might also have occurred

if the simulation continued for longer times.

When backaction is ignored, the assumption of pure unitary evolution implies that

no coherence is lost during the course of the measurement. If at time t the random-

ized controls managed to return to the original orientation then the backaction-free

estimator would “weight” the data received at that time just as much as the data

obtained at time t = 0. In comparison, the filtering based estimate knows that while

the rotations may have canceled, the expected signal at time t is not what it was

at time t = 0, precisely because of the conditional effects. By not including this

information the unitary estimate is biased away from the optimum estimate.

Chapter 2

Quantum Optics and Quantum

Stochastic Differential Equations

The objective of this chapter is to identify how the formalism of quantum stochas-

tic differential equations is implemented in the context of quantum optics. This is

done by first showing how the second quantization of classical quasi-monochromatic

traveling wave packets gives the natural structure necessary for defining the quan-

tum Ito integral. We then show under what conditions a wave packet operator can

be treated as generating a localized field operator, which is necessary for a Markov

approximation. From that localized structure, we then review how this defines a

quantum Wiener process and relates to the quantum white noise formalism usually

presented in quantum optics. With a wave packet description of quantum white

noise, we then review how a system coupling to these operators generates a quantum

stochastic differential equation for the propagator. Generating this equation is inti-

mately related to the operator ordering of the field operators, which is also related

to defining different kinds of stochastic equations. Here we review this fact and how

the propagator is derived. Finally, we apply this result to the Faraday Hamiltonian

and discuss the interaction in the limits of both strong and weak number coupling.

Chapter 2. Quantum Optics and Quantum Stochastic Differential Equations 30

2.1 Quantum Stochastic Process in Optical Fields

The classical stochastic process is most generally defined as a family of random vari-

ables xtt≥0 indexed by time t. A quantum stochastic process is then a family of

operators Xtt≥0 also indexed by time. This definition allows for a slightly more

general structure than simply an operator is dependent upon time. A common ex-

ample of a time-dependent quantum operator is a Heisenberg picture operator, X(t),

acting on some Hilbert space H with its dynamics given by a unitary transformation.

Conversely, a quantum stochastic process implies something more general, where the

spectrum of Xt could be time-dependent and even the Hilbert space upon which it

acts nontrivially could be continuously changing in time.

A concrete and pertinent example is a continuous wave laser that is switched on

at time t0 = 0 and then switched off at some later time t > 0. Consider a stationary

observer located a distance d = c τ away from the laser who begins counting photons

with a perfectly efficient detector at time t0. By a time 0 < s < τ it is clear that

this observer will have not observed any of the laser light and so, in absence of any

corrupting background, the probability for observing anything must be zero. The

point of this example is that in the time interval [0, s] any observations must be

modeled as projectors acting on a part of Fock space that is independent of the part

displaced by the laser.

It is perfectly reasonable to use a Schrodinger picture description where the ob-

server makes projective measurements on a volume of the electromagnetic field and

the free-field Hamiltonian acts in such a way as to propagate the state of the field

fixed from the laser position to the detector. In Sec. 3.2, a mapping between a set

of commuting observables and a classical probability space is developed, where there

is a one-to-one correspondence between classical random variables and commuting

operators. To utilize these tools of classical probability theory, it is most natural to

work in a Heisenberg picture where the states remain fixed and the unitary evolution

is applied to the observables of interest. To develop a Heisenberg picture formula-

tion for a continuous measurement, we require a mathematical structure that can

cope with the fact that as time progresses a stationary observer will measure an ever

increasing set of operators, and these operators act upon different parts of the field’s

Hilbert space. This family of operators is the quintessential definition of a quantum

stochastic process.

In classical stochastic calculus, the Wiener process is the fundamental random

process from which the Ito integral is constructed and from there other processes are

defined. In the quantum setting, we require equally fundamental operator-valued

noise processes from which we will construct other processes. But as we are seek-

ing a description of a continuous optical measurement, those processes should arise

from the quantized electromagnetic field. The next section reviews the canonical

quantization of the free electromagnetic field to identify the operator nature of the

quantized field.

Throughout this chapter we will be discussing both classical and quantized ele-

ments of the electromagnetic field. In order to make this distinction, the classical

vector fields for the vector potential, electric field, etc. will be denoted as A(x, t),

E(x, t) and their quantized operator expressions as A(x, t), E(x, t). We will quan-

tize the free space electromagnetic field following the classic text by Cohen-Tannoudji

et al., and use SI units [44]. For reference, the spatial Fourier transform of a function

f(x, t) is defined as

f(k, t) ≡∫R3

d3x√(2π)3

e−ik·x f(x, t) (2.1)

and the inverse transform is

f(x, t) ≡∫R3

d3k√(2π)3

e+ik·x f(k, t). (2.2)

2.1.1 Free space quantization

When rigorously quantizing the free space electromagnetic field, one begins by defin-

ing a scalar Lagrangian functional, with respect to variations in the vector potential,

A(x, t), whose minimization reproduces Maxwell’s equations [44]. The field Hamil-

tonian is

(|Π|2

2 |∇×A|2)

with the conjugate variable to the vector potential Π being

Π = ε0∂

∂tA = −ε0E . (2.4)

(The final equality with the electric field is made by assuming that there are no free

charge.) From this classical Hamiltonian, the connection with quantum mechanics

is made by noting this is the Hamiltonian of a continuous set of harmonic oscillators

with canonical variables A,Π. Quantization then promotes these variables to

canonically commuting operators.

By choosing to work in the Coulomb gauge, it is easily shown that if A(k, t) is

the Fourier transform of A then

k · A = 0. (2.5)

This constraint results in two free polarization components (labeled by s ∈ 1, 2)

for each Fourier component, defining the vectors eq(k) which satisfy the properties

k · eq(k) = 0, (2.6a)

e∗q(k) · eq′(k) = δqq′ , and (2.6b)∑q

e∗qi(k) eqj(k) = δij − kikj/ |k|2 (2.6c)

where i, j refer to the Cartesian components. In terms of the real space operators the

components of the quantized fields A(x) andE(x) satisfy the commutation relations,

[Ai(x), Aj(x′)] = [Ei(x), Ej(x

′)] = 0 (2.7)

[Ai(x),−ε0Ej(x′)] = i~ δTij(x− x′) (2.8)

where δTij(x− x′) is the transverse delta function, defined as

δTij(x− x′) ≡∫

(2π)3e+ik·(x−x′)

(δij −

). (2.9)

In reciprocal space, the vector potential is most suitably expressed in terms of the

annihilation (aq(k)) and creation (a†q(k)) operators associated with the polarization

vectors eq(k). They obey the commutation relations,

[aq(k), aq′(k′)] = [a†q(k), a†q′(k

′)] = 0 and (2.10)

[aq(k), a†q′(k′)] = δqq′ δ(k− k′). (2.11)

The Schrodinger pictures operators for the vector potential and electric field are then

given by

A(x) =∑q

∫d3k

(2π)3/2

2ε0c |k|eik·x e∗q(k) aq(k) + h.c., (2.12)

E(x) = i∑q

∫d3k

(2π)3/2

√~c |k|2ε0

eik·xe∗q(k) aq(k) + h.c., and (2.13)

B(x) = i∑q

∫d3k

(2π)3/2

2ε0c |k|eik·x k× e∗q(k) aq(k) + h.c. (2.14)

Substituting these expressions into the Hamiltonian results in the simplified form

Hf = 12~ c∑q

∫d3k |k|

(aq(k)a†q(k) + a†q(k)aq(k)

). (2.15)

We will follow the standard practice of discarding any vacuum energy contributions

and write

Hf = ~ c∑q

∫d3k |k| a†q(k)aq(k). (2.16)

2.2 Wave Packets, Fock Space and Stochastic

Processes

For a single simple harmonic oscillator, the state space is spanned by a complete

basis of states labeling the number of quanta in the oscillator. In the free-space EM

field, we have instead a continuous distribution of oscillators, each representing a

plane wave Fourier component with one of two orthogonal polarization states. This

continuous nature means that the field operators are unbounded in two ways: for a

given k we can have a countably infinite number of quanta and a single quantum

can have an unbounded amount of energy if we allow pure plane wave states with

arbitral large wave numbers |k|. The solution to these problems is to consider only

states of light for which these operators return finite quantities.

Notice that in Eqs. (2.12 - 2.16) the plane wave operators aq(k) and a†q(k) act

as operator-valued integral kernels where they are combined with various weighting

functions to form the physically relevant operators. The fact that they have the

singular commutation relation [aq(k), a†q′(k′)] = δqq′ δ(k − k′) implies that they are

only well defined in the context of an integral, where the Dirac delta function is well

behaved. The point is that the domain of the operators E, B andHf that return finite

eigenvalues should not be considered as a set of distinct plane wave oscillators, but

instead in terms of continuous functions defined over ranges of Fourier components.

We will refer to these distributions as wave packets, in that by constructing a properly

weighted distribution over plane waves one arrives with a localized pulse or packet

of waves that propagates in some direction. Rather than initially discussing wave

packets in terms of single quanta, it is easier to first define wave packet states in

terms of semiclassical states of light that generalize the coherent state of a single

mode harmonic oscillator.

2.2.1 Wave packets

A semiclassical wave packet identifies those states of light that reproduce coherent

classical radiation when one takes expectation values of the quantized operators

A(x), E(x), etc. This relationship has been identified by many authors, e.g. Deutsch

[40], Garrison and Chiao [41], Smith and Raymer [42]. We review these results here,

focusing on the physical interpretation for the wave packet distributions.

The single-mode coherent state, ψ = |α〉, is characterized by the complex ampli-

tude α, where the mean photon number is given by |α|2 and is an eigenstate of the

annihilation operator a |α〉 = α |α〉. We have seen that in the canonical quantization

of the free field, each plane wave and transverse polarization vector has its own anni-

hilation operator aq(k) and so for a corresponding coherent state of light requires a

complex vector valued function g(k). The coherent state ψ[g] satisfies the equation

aq(k)ψ[g] = gq(k)ψ[g]. (2.17)

By hypothesizing the existence of the states ψ[g] we would like to see how the

coherent amplitude function g(k) relates to physical quantities in expectation. By

taking the expectation value of the (vacuum energy removed) Hamiltonian Eq. (2.16)

we can easily see that

〈Hf〉ψ[g] = ~c∑q

∫d3k |k| |gq(k)|2 . (2.18)

An equally trivial calculation results in

〈A(x)〉ψ[g] =∑q

∫d3k

(2π)3/2

2ε0c |k|eik·x gq(k) e∗q(k) + c.c. (2.19)

〈E(x)〉ψ[g] = i∑q

∫d3k

(2π)3/2

√~c |k|2ε0

eik·x gq(k) e∗q(k) + c.c. (2.20)

It is possible to invert these two equations and so express g(k) in terms of the spatial

Fourier transform of a classical vector potential, A(k, t). Performing this inversion

we find that,

g(k) =

√ε0

2~c |k|

(c |k| A(k, 0) + i

∂tA(k, t)

∣∣∣∣t=0

). (2.21)

When quantizing the field, Eq. (2.21) and its adjoint are nothing more than the

“normal variables” that are in classical correspondence to aq(k) and a†q(k) [44].

It is common in optics to relate the physical classical fields A, E , and B to a

unitless mode function. In terms of the vector potential this results in the ansatz,

A(x, t) = A0

(u(+)(x, t) + u(−)(x, t)

)(2.22)

where A0 is a real constant and u(+)(x, t) is a complex unit-less mode function. The

fact that the vector potential is required to be real, we have the relation that

u(−)(x, t) = u(+) ∗(x, t). (2.23)

As u(+)(x, t) is unitless, its integral

∫d3x |u(+)(x, t)|2 (2.24)

has units of volume and is referred to as the mode volume of the field. By taking

the spatial Fourier transform of Eq. (2.22) we have

A(k, t) = A0

(u(+)(k, t) + u(−)(k, t)

)(2.25)

The purpose of separating between u(+)(x, t) and u(−)(x, t) is to allow for the sepa-

ration between positive and negative frequency components, respectively. For a free

field then,

u(+)(k, t) = u(+)(k, 0) e−ic|k| t. (2.26)

The Fourier space version of Eq. 2.1.1 is

u(−)(k, t) = u(+) ∗(−k, t). (2.27)

To simplify the expression for g(k) as given in Eq. (2.21), we need to compute the

time derivative of A(k, t). Substituting Eq. (2.26) into Eq. (2.25) and computing

the derivative we have

∂tA(k, t) = −ic |k| A0

(u(+)(k, t)− u(+) ∗(−k, t)

). (2.28)

Substituting this expression into Eq. (2.21) leads to

g(k, t) = A0

√2ε0c |k|

~u(+)(k, t). (2.29)

Rather than including the vector potential constant A0, which usually contains in-

formation about the overall intensity of the field, it is useful to relate it back to the

magnitude of g. We first define the characteristic wave number k1 as

k1 ≡∫d3k |k|

∣∣u(+)(k, 0)∣∣2

v. (2.30)

By considering v−1∣∣u(+)(k, 0)

∣∣2 to be a normalized distribution in reciprocal space,

then k1 is the average magnitude. With this definition

‖g‖2 = A20

2 ε0 c k1 v

~. (2.31)

Inverting this relationship results in

g(k, t) = ‖g‖

√|k|k1

u(+)(k, 0)√v

e−ic|k|t. (2.32)

It is worth noting that the units of Eq. (2.32) is of root volume and that ‖g‖

now acts as a unitless scaling factor. This final formula shows the fundamental

relationship between a distribution over coherent state amplitudes g(k) and the

positive frequency Fourier components of the mode function u(+)(k, 0). While in one

sense this has simply been an algebraic exercise (expressing one distribution over

spatial frequencies in terms of another) the real utility of this expression is that the

mode function u(+)(x, t) has practical implications as it describes the spatial and

temporal properties of a propagating laser beam.

Finally, we express the expected energy in a wave packet state in terms of the

envelope function. Simply substituting Eq. (2.32) into Eq. (2.18) results in,

〈Hf〉ψ[g] = ~c ‖g‖2

∫d3k|k|2

∣∣u(+)(k, 0)∣∣2

v. (2.33)

Similarly to defining the mean wave vector k1 we can define a two-norm wave vector

(∫d3k |k|2

∣∣u(+)(k, 0)∣∣2

(2.34)

so that

〈Hf〉ψ[g] = ~c ‖g‖2 (k2)2

. (2.35)

If, however,∣∣u(+)(k, 0)

∣∣2 is a sharply peaked function centered at some large vector

k0, then we have that |k0| ≈ k1 ≈ k2. In this case the average energy is then

〈Hf〉ψ[g] ≈ ~ω0 ‖g‖2 (2.36)

where ω0 = c |k0|.

2.2.2 Weyl operators

Assuming the existence of the semiclassical states is only a first step, but real utility

comes from finding the family of operator that generate these states. In the context

of the simple harmonic oscillators, the coherent state with amplitude α is generated

by the unitary displacement operator

Dsho(α) = exp(α a† − α∗ a

)with |α〉 = Dsho(α) |0〉. (2.37)

Writing (2.37) in terms of its generator Υ(α)

Dsho(α) = exp (−iΥ(α)) (2.38)

we find that

Υ(α) = i(α a† − α∗ a

). (2.39)

Note that as gq(k) is a distribution of coherent amplitudes over all plane wave modes,

we make the correspondence

α∗a → g∗q (k) aq(k). (2.40)

But as this is a pointwise weighting over each plane wave, we define the total field

operators a[g] and a†[g] to be

a[g] ≡∑q

∫d3k g∗q (k) aq(k) (2.41)

a†[g] ≡∑q

∫d3k gq(k) a†q(k). (2.42)

This are sometimes called smeared creation and annihilation operators as they have

been spread over a range of k values. By applying the commutation relations (2.10),

it is easy to see that[a[f ], a†[g]

∫d3k f∗(k) · g(k). (2.43)

An important property that we will use is that by the linearity of the integral over

d3k we have that, for complex coefficients c1 and c2

c1 a†[f ] + c2 a

†[g] = a†[c1f + c2g] (2.44)

c1 a[f ] + c2 a[g] = a[c∗1f + c∗2g]. (2.45)

In other words a†[·] is linear in its argument and a[·] is anti-linear. The continuous

analog of the displacement operator, called a Weyl operator, is

W[g] ≡ exp(a†[g]− a[g]

)(2.46)

and the coherent state ψ[g] is given by

ψ[g] = W[g] |∅〉. (2.47)

Applying the Zassenhaus formula to the Weyl operator shows that

W[g] = exp(a†[g]) exp(a[g]) exp

∫d3k |g(k)|2

). (2.48)

Note that because a[g] |∅〉 = 0 for any g, this implies that

ψ[g] = exp

∫d3k |g(k)|2

)exp(a†[g])|∅〉. (2.49)

When proving limits involving sequences of coherent states, it is often more conve-

nient to work with unnormalized state vectors. Therefore, it is common to define an

exponential vector

e[g] ≡ exp(a†[g]) |∅〉. (2.50)

One particularly useful relationship that we will end up applying repeatedly is that

a[f ]ψ[g] =

∫d3k f∗(k) · g(k) ψ[g], (2.51)

i.e., ψ[g] is an eigenstate of any smeared annihilation operator a[f ], regardless of the

smearing function f . If, however, the functions f and g are orthogonal, then that

eigenvalue could very well be zero.

2.2.3 Fock space

A number of useful relations can be derived involving the Weyl displacement opera-

tors and the exponential vectors. Before doing so it is necessary to introduce some

of the formal and algebraic properties of second quantization and Fock spaces. Note

that if we take f and g to be any square integrable complex functions, then the right

hand side of (2.43) forms an inner product on a Hilbert space of wave packets. We

will denote the inner product as,

〈g, f〉 ≡∑q

∫d3k g∗q (k)fq(k) (2.52)

and the Hilbert space of wave packets as

h ≡ g(k) : 〈g, g〉 <∞ . (2.53)

A Fock space F is a total Hilbert space describing a unknown and possibly

unbounded number of particles that are each represented by states in a single particle

Hilbert space h. If we are given a single particle from h, then the full Hilbert space

of two such particles is the tensor product of two such Hilbert spaces. Likewise for

three particles, there would be three fold product. We will notate the joint space of

n particle as h⊗n = h ⊗ h ⊗ · · · ⊗ h where there are n such products. In terms of a

total space with an indeterminant number of particles, each subspace that contains

n particle will be mutually orthogonal. Thus the total space is the direct sum over

each subspace. If we take the space for zero particles to be the complex numbers,

(h⊗0 = C), then the full Fock space is given by

Ffull(h) =∞⊕n=0

h⊗n. (2.54)

The reason for the notation Ffull(h) is that if h is a Hilbert space of bosonic particles

than only states that are symmetric under particle exchange will apply. We denote

the symmetric subspace of h⊗n to be h⊗s n. So the symmetric Fock space is given by

Fsym(h) =∞⊕n=0

h⊗sn. (2.55)

We are strictly interested in bosonic particles, so throughout this document we when

refer to F (·) we are referring to the symmetric Fock space.

For a single simple harmonic oscillator, the coherent state |α〉 is expressed in

terms of the number states |n〉 as

|α〉 =∞∑n=0

αn√n!e−|α|

2/2 |n〉. (2.56)

The equivalent expression for the wave packet state ψ[f ] is

ψ[f ] = e−‖f‖2/2

∞⊕n=0

f⊗n√n!, (2.57)

From the relation that 〈f⊗n, g⊗n〉 = 〈f , g〉n, we have

〈ψ[f ]|ψ[g]〉 = exp(−1

2( ‖f‖2 + ‖g‖2 ) + 〈f , g〉

)(2.58)

or equivalently

〈e[f ]|e[g]〉 = e〈f ,g〉. (2.59)

A number of useful properties involving the Weyl displacement operators and the

exponential vectors are the following:

• The Weyl operators obey the composition law

W[g] W[f ] = exp(−1

2(〈g, f〉 − 〈f , g〉

)W[g + f ]. (2.60)

• The action of the Weyl operator on an exponential vector is

W[g] e[f ] = e−〈g, f〉−‖g‖2/2 e[f + g]. (2.61)

• The linear span of all the exponential vectors (and equivalently the coherent

states) is dense in the symmetric Fock space F (h), meaning that any state

in F (h) can be represented by a limiting sequence of a linear combination of

exponential vectors [45].

• Written in terms of the single particle inner product, the exponential vector

e[g] is an eigenvector of the annihilation operator a[f ] with,

a[f ] e[g] = 〈f , g〉 e[g]. (2.62)

2.2.4 A basis independent expression for the wave packet

inner product

An alternative to expressing ‖g(k, t)‖2 in terms of the characteristic parameters,

v, k1, etc. is to relate it to a basis independent expression involving the physical

(classical) fields E and A. From Eq. (2.29) we can see that

g(k, t) =

√2ε0c |k|

~A(+)(k, t) (2.63)

and that

‖g(k, t)‖2 =2ε0

∫d3k c |k| A(+) ∗(k, t) · A(+)(k, t). (2.64)

The presence of the factor c |k| makes this expression inherently tied to the k basis

and not immediately expressible in terms real space quantities. However by recog-

nizing that in the Coulomb gauge E = − ∂∂tA and A(+)(k, t) = A(+)(k, 0)e−ic|k|t we

have the equality

c |k| A(+)(k, t) = −i E (+)(k, t). (2.65)

Substituting this relation into Eq. (2.64),

‖g(k, t)‖2 =i2ε0

∫d3k E (+) ∗(k, t) · A(+)(k, t). (2.66)

This expression is basis independent, in the sense that we can take the inverse trans-

forms to arrive at

‖g‖2 =i2ε0

∫d3x E (+) ∗(x, t) ·A(+)(x, t). (2.67)

In [42], Smith and Raymer derive a Dirac quantization scheme for a photon wave

function, equivalent to the more standard expressions reviewed in Sec. 2.1.1. In that

work they assume that for each polarization vector q there exists a countable set

of complete scalar orthonormal wave packets gj q(k), which therefore satisfy the

properties∑j

gj q(k)∗gj q(k′) = δ(k− k′) and∫

d3k gj q(k)∗gj′ q(k) = δj j′ .

(2.68)

They then observe that the classical electric fieldsE (+)j q (x, t)

in correspondence to

these wave packets, via Eq. (2.20), are no-longer orthogonal in a real space overlap

integral precisely because of the weighting factor of√|k|,∫

d3x E (+) ∗j q (x, t) E (+)

j′ q′(x, t) 6= δj j′ δq,q′ . (2.69)

They also observe that if instead one considers the overlap with the vector potential

then the orthogonality is preserved, due to the cancelation of the factors of√|k|.

This is precisely the statement that if 〈gj(k, t), gj′(k, t)〉 = δj j′ , then

〈gj(k, t), gj′(k, t)〉 =i2ε0

∫d3x E (+) ∗

j (x, t) ·A(+)j′ (x, t)

=−i2ε0

∫d3x A(+) ∗

j (x, t) · E (+)j′ (x, t) = δj,j′ .

(2.70)

In quantizing a photon’s wave function Smith and Raymer consider E (+)(x, t) to be

the fundamental single particle wave functions. Secondly they observe that in order

to preserve orthogonality in the real space inner product then the dual vectors are

not E (+) ∗(x, t) but are instead proportional to A(+) ∗(x, t).

In this work we will continue to view the single particle vectors to be the wave

packet functions g and not the associated classical electric field. This is for two

reasons. Firstly, it is mathematically convenient that the vector dual to the wave

packet g is simply its complex conjugate. The second reason is that we continue

to treat g as an analogy with the simple harmonic oscillator’s coherent state and

the vector potential A and the electric field E are in correspondence with X and P

quadratures.

2.2.5 Fock space and stochastic srocesses

By structuring the Hilbert space of the free EM field as a Fock space defined over

a single particle Hilbert space h, we can now define a quantum stochastic processes

and a quantum stochastic calculus. Consider again the example of a coherent laser

pulse propagating towards a photon counter. A quantum description of a traveling

wave laser pulse is a coherent wave packet state ψ[g] where g(k, t) is related to the

classical field by Eq.(2.29). Imagine a perfect space fixed detector that is capable

of returning a voltage directly proportional to the total energy in a given classical

wave packet g(k, t). Furthermore, imagine that this detector is activated between

the times [t0, t1], and after this interval the voltage is read. If the “entirety” of an

incident wave packet g, could be absorbed in that time, then the detector should be

modeled as making a projective measurement on the part of Fock space containing

ψ[g]. Depending upon the details of the detector, it could likely have recorded

pulses that were similar enough to g, either in magnitude, temporal profile or carrier

frequency. For instance, a 100% efficient detector with a linear response should be

able to measure pulses with 2 µW of average laser power just as well as a pulse with

200 mW of power. By modeling a physical measurement as a Hermitian operator

acting upon some Fock space, we need to define the set of possible wave packets

the detector could have completely measured. Fig. 2.1 shows a schematic where a

paraxial laser pulse is focused upon a gated photo-detector.

In the second quantization formalism we can give a mathematical chain from a

classical wave packets to field operators. In the time interval [t0, t1] a fixed detector

could projectively measure some set g of incident wave packets and the linear

span of these wave packets forms a subspace h[t0,t1] ⊂ h. In turn h[t0,t1] defines

a subspace F (h[t0,t1]) ⊂ F (h). Furthermore there exist operators O[t0,t1] that act

nontrivially on coherent states ψ[g] ∈ F (h[t0,t1]) but as the identity for any ψ[g⊥] for

g⊥ /∈ h[t0,t]. An operator X[t0,t1] identified by this procedure then defines a quantum

stochastic process, by considering the family of operatorsX[t0,t] : t0 ≤ t <∞

DetectorColor Filter

Collection Lens

Temporal Pulse

h[t0, t1]

Paraxial Measurement Model

Incident Envelope

432 1 0Time

Figure 2.1: A Model of a Paraxial Measurement. A photo-detector is positionedrelative to an optical system which defines a paraxial beam with a characteristicwavelength and mode profile. Here the optical system is defined simply by afocusing lens and color filter with the beam schematically indicated by linesof constant intensity. The detector is activated between times t0 and t1 whichcorresponds, at time t = 0 to a pulse localized in space in the region ∆z = c(t1−t0). For a perfect detector with a linear response, the integrated output currentwill be proportional to the total pulse energy and can be modeled as making aprojective measurement on the sub-Fock space of pulses localized between thesetimes.

requirement that X[t0,t] acts trivially on coherent states ψ[g⊥] defines a process that is

time-adapted, in direct analogy with a time-adapted classical stochastic process (see

Sec. 3.1.1 for the definition of a time-adapted classical stochastic process). Turning

this qualitative procedure into a mathematically sound object requires explicitly

constructing h[t0,t1], which means defining what it means to measure the entirety

of a wave packet in a finite time interval. While naıvely this may seem trivial, in

practice it intersects with the problem of defining a localizable photon in quantum

field theory. We will illustrate why this is an issue next.

2.2.6 Localized wave packets and stochastic processes

While the canonical quantization of the free field is most easily performed in the

Fourier domain, the mathematical structure of the second quantized Fock space

F (h) is generally basis independent. The operators a[g] and a†[f ] can be related to

the coherent states ψ[h] without any reference to the fact that the wave packets are

originally defined with respect to k. Any unitary transformation of g is an equally

valid expression of the wave packet state in that the Hilbert space of wave packets

h = g : 〈g, g〉 <∞ is basis independent. The only element that depends upon

g being defined in the Fourier domain is its relationship to the spatial profile of the

mode function u(+)(x, t). But as we have defined the wave packets in the Fourier

domain, it is not immediately apparent what effect the constraint ‖g‖2 <∞ has on

u(+)(x, t). One drastic result of this constraint is that it prohibits one from defining

fields that are strictly localized in space [41, 46].

To see why this is true, consider a one-dimensional case where we wish to define

square wave pulse of duration L, with a carrier frequency ω0 = c k0. The mode

function for such a pulse is

u(+)loc (z, t) =χ[0,L] (z − ct) exp(+ik0(z − ct)). (2.71)

Taking the spatial Fourier transform shows that

u(+)loc (k, t) =

L√2π

12L(k − k0)

(− i1

2(k − k0)L− ick t

)(2.72)

and from Eq. (2.29), we then have g loc(k, t) ∝√k u

(+)loc (k, t). The

√k factor makes

all the difference, as if we try and calculate the norm we find

‖g loc(k, t)‖2 ∝∫ ∞−∞

dk |k| sinc2(12L(k − k0)) =∞. (2.73)

The failure of this calculation stems from the fact that the indicator function χ[a,b] is

a discontinuous function and this discontinuity presents itself in the Fourier domain

by this divergence. This example implies that there is a nonlocalizable property of

photon wave packets. This nonlocal property of a photon wave function has been

studied by many authors, with various definitions for a photon’s wave function, see

[42] for a pertinent discussion. While this example does not show that any localized

wave packet suffers from this or a similar problem, this is indeed the case. In [46],

Bialynicki-Birula proves that the energy density of a photon can be localized no

better than an exponential function, exp(−f(r)), where f(r) grows slightly slower

than a linear function in r.

At this point, we must make some approximation thereby admitting localized

states of light. Ultimately this means that we will relax the measurement window

to include functions localized with in exponentially damped tails. However, we gain

physical insight by considering a temporal rescaling so that on a timescale that is

long compared to an optical period, a smoothly varying function can appear to be

a localized discontinuous function. We will also show that though this rescaling,

one obtains all the familiar approximations in quantum optics, namely the Markov,

quasi-monochromatic, and rotating wave approximations. It also provides a gateway

for defining quantum white noise and equivalently the necessary conditions for ap-

plying a quantum Wong-Zakai theorem to arrive at a physical realization of quantum

stochastic calculus.

2.3 Paraxial Envelopes and Measurable Pulses

Before constructing temporally localized wave packet we first must address the spa-

tial/temporal decomposition indicated in Fig. 2.1. An excellent mathematical de-

scription for the focused collection of light by a series of thin lenses is to model

the system paraxially where a coherent plane wave is propagating along the optical

axis but is spatially and temporally modulated by a slowly varying envelope func-

tion. This envelope function describes how the plane wave is localized to the optical

axis as well as how the phase fronts are distorted by the optical system [47, 48].

Appendix A reviews the derivation of the paraxial wave equation, which describes

the propagation of a slowly-varying-envelope, as well as computes its spatial Fourier

transform.

A paraxial and quasi-monochromatic wave is characterized by the complex func-

U (+)(x, t) = f(tr)u(+)T (xT , z) e

−iω0 tr . (2.74)

Here ez is the axis of propagation, xT is the remaining transverse coordinates and

tr = t− z/c is the retarded time. The paraxial mode function u(+)T (xT , z) describes

how the carrier wave (with angular frequency ω0, and wave number k0) is modulated

as it propagates along the optical axis. Note that this is a time independent quantity.

Any nontrivial time dependence is given by the temporal envelope function f(tr),

which decouples from the paraxial mode u(+)T . The only requirement is that f(tr)

be slowly varying,∣∣ ddtf∣∣ ω0 |f |, ensuring the full solution is quasi-monochromatic.

The problem of finding the space of measurable wave packets now translates into

finding the temporal envelopes f(tr) that “fit” in the measurement window [t0, t1].

Identifying the appropriate spatial mode function u(+)T (xT , z) for a given optical

system, is simply a problem of classical optics and is well modeled by a Hermite-

Gaussian mode function [48]. Here we are only concerned with the fact that such

a function exists and is well defined and has a given “transverse area”. In free

space conservation of energy requires that the total power passing though a plane

transverse to the optical axis be conserved, which manifests thought the property

that for any paraxial mode we have∫R2

∣∣∣u(+)T (kT , z)

∣∣∣2 ≡ σT (2.75)

and that σT is independent of z for any finite z. While the distribution of energy

in the transverse plane can vary due to diffraction, the total power passing though

an infinite transverse plane will be conserved. This transverse area can be combined

with the square integrated temporal duration

τ ≡∫dt |f(t)|2 , (2.76)

to construct the total mode volume

v = c τ σT . (2.77)

We have already shown how a wave packet g is related to the spatial Fourier

transform of a classical vector potential, A(+)(k, t), and how that can be expressed

in terms of a unitless mode function u(+)(x, t). The spatial Fourier transform of the

paraxial mode function is given by

U (+)(k, t) = c f (ω(k)− ω0) u(+)T (kT , 0) e−iω(k)t. (2.78)

where f is the temporal Fourier transform of the pulse envelope, u(+)T (kT , 0) is the

spatial transform of the mode function with respect to the transverse coordinates xT

(evaluated at z = 0) and ω(k) is the approximate frequency

ω(k) ≡ c |k| ≈ c

(|kT |2

). (2.79)

Eq. (2.29) relates g(k, t) to A(+)(k, t) and so

g(k, t) = A0

√2ε0ω(k)

~c f (ω(k)− ω0) u

(+)T (kT ) e−iω(k)t. (2.80)

In Sec. 2.2.1, we eliminated the constant A0 in favor of an expression in terms

of characteristic parameters, namely the mode volume v and the characteristic wave

number k1. Here we abandon k1 in favor of a frequency ω1 = ck1, which is equal to

ω1 =1

∫d3k ω(k)

∣∣∣c f (ω(k)− ω0) u(+)T (kT )

∣∣∣2 . (2.81)

Calculating this integral is easiest with the change of variables

kT , kz → kT , ω(k), we find that

∫d2kTσT

∣∣∣u(+)T (kT )

∣∣∣2 ∫ dω(k)

τω(k)

∣∣∣f (ω(k)− ω0)∣∣∣2 . (2.82)

Implicit in the paraxial approximation, is the requirement that∣∣∣u(+)

T (kT )∣∣∣2 → 0 as

|kT |2 → ∞. This fall off implies that we can treat the factor c |kT |2 / 2k0 in ω(k)

as a finite and independent offset to the dω(k) integral and not consider how ω(k)

converges as kz → −∞ with |kT |2 → ∞. We can then make another change of

variables ω(k)→ ν + ω0 so that

∫d2kTσT

∣∣∣u(+)T (kT )

∣∣∣2 ∫ dν

τ(ω0 + ν)

∣∣∣f(ν)∣∣∣2 . (2.83)

when f(t) is real-valued, it is simple to show that∣∣∣f(ν)

∣∣∣2 is an even function and

therefore mean zero. In that case we have

ω1 = ω0

∫dν

∣∣∣f(ν)∣∣∣2 +

∫dν

τν∣∣∣f(ν)

∣∣∣2 = ω0 (2.84)

as one would intuitively expect. In the case of a complex valued f(t),∣∣∣f(ν)

∣∣∣2 will

not in general be mean zero. In this more general case,

ω1 = ω0

∫dν

∣∣∣f(ν)∣∣∣2) . (2.85)

For the one-dimensional localized pulse, Sec. 2.2.6 demonstrated that the second

integral is infinite. By integral expressions for the Fourier transforms and then inte-

grating by parts, we can show that∫dν

∣∣∣f(ν)∣∣∣2 = −i

dtf ∗(t)

)f(t). (2.86)

By the slowly varying envelope approximation, we require that∣∣ ddtf(t)

∣∣ ω0 |f(t)|,

and in order for the quasi-monochromatic regime to hold this integral must be a

small correction factor. Therefore, we find that

g(k, t) ≈ ‖g‖

√ω(k) c

ω0 τ σTf (ω(k)− ω0) u

(+)T (kT , 0) e−iω(k)t. (2.87)

2.3.1 Paraxial wave packets in the time domain

Even in a quasi-monochromatic regime, the wave packet g is still tied the Fourier

basis due to the factor of√ω(k). We just showed that when f(t) is real-valued or

very slowly varying then this factor plays no role in calculating ‖g‖2. Here we would

like express the inner product between wave packets that share the same spatial mode

in terms of real space coordinates in order observe what effect this “nonlocal” factor

has on their temporal distinguishably. If we are able to make the approximation

that ω(k) ≈ ω0 for a family of wave packets then there will be a simple unitary

relationship between wave packets the real and Fourier domains. The goal of this

section is to identify this family.

Consider the two wave packets g1(k, t) and g2(k, t) that share the same paraxial

mode function u(+)T (xT , z) and carrier frequencies, but have differing temporal profiles

f1(tr) and f2(tr). For simplicity we will assume that the corrective factor of Eq. (2.86)

is small and so Eq. (2.87) is valid for each wave packet. If we then calculate the

unequal time inner product, we have that

〈g1(k, t1), g2(k, t2)〉 =‖g1‖ ‖g2‖√

τ1 τ2∫d3k

ω(k) c

ω0 σT

∣∣∣u(+)T (kT , 0)

∣∣∣2 f ∗1 (ω(k)− ω0) f2 (ω(k)− ω0) e−iω(k)(t2−t1).

(2.88)

By again making the change of variables kT , kz → kT , ν with ν = ω(k)− ω0 we

are able to integrate out the transverse degrees of freedom to arrive at

〈g1(k, t1), g2(k, t2)〉 =‖g1‖ ‖g2‖√

τ1 τ2

∫dν

(ω0 + ν)

f ∗1 (ν) f2(ν)e−i(ω0+ν)(t2−t1). (2.89)

In analogy with the convolution theorem, it is easy to show that∫dνf ∗1 (ν) f2(ν)e−iνt =

∫ds f ∗1 (s) f2(s+ t) = f1 ? f2 (t) (2.90)

where f1 ?f2 (t) is the cross-correlation function between f1 and f2 evaluated at time

t. Furthermore by repeating the integration by parts transformation from Eq. (2.86)

we have that∫dν ν f ∗1 (ν) f2(ν)e−iνt = −idf1

dt? f2 (t). (2.91)

Combining these two facts,

〈g1(k, t1), g2(k, t2)〉 =‖g1‖ ‖g2‖√

τ1 τ2

e−iω0(t2−t1)(f1 ? f2 (t2 − t1)− i 1

dt? f2 (t2 − t1)

)(2.92)

While previously we were able to show that for zero delay and a real-valued envelope

the second term would be identically zero, this is clearly not the case for different

wave packets. However, due to the slowly varying envelope approximation we know

that this must be a small correction. Ignoring this correction results in

〈g1(k, t1), g2(k, t2)〉 ≈ ‖g1‖ ‖g2‖√τ1 τ2

e−iω0(t2−t1) f1 ? f2 (t2 − t1). (2.93)

Eq. (2.93) shows that the overlap between the two wave packets is proportional to the

cross correlation function of the temporal envelopes. Physically this is a extremely

satisfying result, as if we have the two (paraxial) field operators a[g1(k, t1)] and

a†[g2(k, t2)] then

[a[g1(k, t1)], a†[g2(k, t2)]

]∝ f1 ? f2 (t2 − t1) (2.94)

meaning that field operators for uncorrelated temporal envelopes commute! Fur-

thermore if we can construct a wave packet ϕ(k, t) whose temporal envelope ϕ(tr)

is (approximately) delta correlated in time then,[a[ϕ(k, t)], a†[ϕ(k, t′)]

]∝ δ(t′ − t). (2.95)

This is significant because this is the defining feature of quantum white noise, which

is discussed in Sec. 2.6. Before doing so, we will apply the results of this section the

defining h[t1,t2].

2.3.2 The measurable subspace

In the ideal situation, the measurable wave packet are the wave packets defined on

the paraxial mode u(+)T (xT , z) with a envelope functions f(t) such that f(s) = 0 for

all s /∈ [t0, t1]. Unfortunately because of the problem of localization no such physical

wave packets exist. If we allow for discontinuous functions, then for any function

g[t0,t1](t) ≡χ[t0,t1] (t) g(t) (2.96)

is clearly zero for any t 6= [t0, t1] and therefore would be an element of h[t0,t1]. In

order to define approximately localized temporal envelopes we need an approximate

form of the indicator function χ[t0,t1] (t), i.e. a smooth cut off function. A common

choice for such a function is to convolve χ[t0,t1] (t) with a smooth positive normalized

distribution function ϕ(σ)(t), where σ represents the degree of localization. For a

concrete example, if ϕ(σ)(t) is a mean-zero normalized Gaussian with variance σ2

χ(σ)[t0,t1] (t) ≡ ϕ(σ)∗ χ[t0,t1] (t)

∫ds 1√

2π σ2exp(− (t−s)2

2σ2 ) χ[t0,t1] (s)

(erf( t−t0√

2σ)− erf( t−t1√

2σ)).

(2.97)

Note that because limσ→0 ϕ(σ)(t) = δ(t), we also have limσ→0 χ

(σ)[t0,t1] (t) =χ[t0,t1] (t).

Again, to maintain a quasi-monochromatic field f(t) must be slowly varying. This

statement is quantified by the relation 1ω0

∣∣∂f∂t

∣∣ |f |, and in terms of σ this means

σ ω0

1. (2.98)

In quantum optical systems a carrier frequency of ω0 = 2π × 370 THz is not

uncommon, and has a corresponding wavelength of λ0 = 810 nm. A common use for a

laser at this wavelength is to generate optical pules as short as 5 fs in duration [49]. If

we take this to be a minimum but physically realizable timescale than we would have

(σ ω0)−1 ∼ 0.08. While a wave packet of this duration is still relatively slowly-varying,

its likely that the correction to ω1 in Eq. (2.86) could be a nonnegligible contribution,

as well as other higher order effects. A convenient limit would be to set σ such that

(σ ω0)−1 ∼ 10−3, meaning that for the near infrared wavelengths σ ∼ 0.1 ps. While

this sets a physically realistic smoothing variance, it does not say when χ(σ)[t0,t1] (t) is

a good approximation to an actual indicator function, as this requires a comparison

between the smoothing and its overall duration. Fig. 2.2 illustrates this distinction by

plotting χ(σ)[0,τ ] (t) with σ = 0.1 ps and a series of durations, 0.5 ps ≤ τ ≤ 100 ps. For

visual comparison each indicator is plotted in scaled units of τ . Simple inspection

shows that for intervals on the scale of τ & 10 ps a smoothing variance of 0.1 ps

makes an excellent approximation to the truly discontinuous function. Note that as

t extends beyond the interval [0, τ ], χ(σ)[0,τ ] (t) decays like an error function scaled by

σ and this extent is independent of τ for τ σ. So that for τ = 10 ps and τ = 0.1

ps, χ(σ)[0,τ ] (τ + 3σ) ≈ 10−3.

With this function in hand we can now identify a space of wave packets that are

able to be projectively measured approximately in the time window of [t0, t1]. For

any valid envelope function f(t), we can define a localized version

f(σ)[t0,t1](t) ≡χ

(σ)[t0,t1] (t) f(t). (2.99)

Clearly these functions are good temporal envelopes and approximately fit in h[t0,t1].

Figure 2.2: Approximations for a localized pulse. Shown here are a series ofslowly varying temporal envelopes. Each envelope is a unit pulse centered at zero,with a variable duration τ , convolved with a Gaussian smoother with σ = 0.1ps. The pulse duration τ ranges between 0.5 ps to 100 ps. Each pulse is plottedverses t in units of τ .

Note that we can actually increase the space of valid wave packets by observing that

with an appropriate σ, χ(σ)[t0,t1] (t) is itself a valid temporal envelope. Therefore for any

function f(t) <∞ whose support is contained in the interval [t0, t1], we can define a

smooth version via convolution

f (σ)(t) ≡ ϕ(σ) ∗ f (t). (2.100)

And so any wave packet whose temporal envelope is defined in this way is approx-

imately measurable in the time interval [t0, t1]. This can be formalized by defining

the set of functions,

S (σ)[t0,t1] ≡

ϕ(σ) ∗ f (t) :

∫dt f(t) <∞, supp(f) ⊆ [t0, t1]

(2.101)

and the set of wave packets

s(σ)[t0,t1] = span

√ω(k) c

f (ω(k)− ω0)√|t1 − t0|

u(+)T (kT , 0)√σT

e−iω(k)t : f ∈ S (σ)[t0,t1]

(2.102)

We then have the limit

h[t0,t1] = limσ→0

ω0σ→∞

s(σ)[t0,t1]. (2.103)

2.4 The one-dimensional limit

In the previous sections we showed that if there are two quasi-monochromatic wave

packets g1 and g2 defined on the same paraxial mode but with differing temporal

envelops f1 and f2, then the commutator between a[g1] and a†[g2] is proportional to

the cross-correlation function f1?f2. Furthermore the proportionality is independent

of the details of the paraxial mode. This suggests that moving to a simplified,

one-dimensional model is both appropriate and fruitful. This section makes this

connection and relates it to the standard representations of quantum white noise.

The end of Sec. 2.3.1 suggested defining a wave packet ϕ(k, t) whose temporal

envelope, ϕ(tr) is delta correlated in time. In defining the smoothed set of functions

s(σ)[t0,t1], Sec. 2.3.2 took any integrable function defined on the interval [t0, t1], f[t0,t1](t),

and convolved it with a Gaussian distribution ϕ(σ)(t) to obtain an envelope consistent

with the quasi-monochromatic approximation. To move to a one-dimensional model,

we will factor out f[t0,t1](t) from field operator and define an operator-valued density

a[ϕ(σ)(t)]. We will shortly show that this is approximately delta commuting in time.

Deriving this factorization is not difficult, and begins by first noting that as the

Fourier transform of a convolution is proportional to the product of the Fourier

transforms, we have that

f (σ)(ν) =√

2π f(ν) ϕ(σ)(ν) =

(∫ds f(s)e+iν s

)ϕ(σ)(ν). (2.104)

Substituting this expression into the definition of g(σ)(k, t), as written in Eq. (2.87),

we have,

g(σ)(k, t) =‖g‖√τ

∫ds f(s)e−iω0 s ϕ(σ)(k, t− s) (2.105)

ϕ(σ)(k, t) ≡

√ω(k) c

ω0 σTϕ(σ) (ω(k)− ω0) u

(+)T (kT , 0) e−iω(k)t. (2.106)

By the anti-linear nature of the creation operator a[g] we are able to bring the integral

over s out of the operator to write,

a[g(σ)(k, t)] =‖g‖√τ

∫ds f ∗(s) e+iω0s a[ϕ(σ)(t− s)]. (2.107)

Note that a[g(σ)(k, t)] and a[ϕ(σ)(t − s)] have different units, as the former is unit-

less while the latter has units of 1/√

time, ultimately arising from the fact that

ϕ(σ)(t) is a density over time and therefore has units. Eq. (2.106) has the following

physical implications. First is that the integral is the point-wise weighting of an

annihilation operator by a complex amplitude, completely akin to original analogy

of one-dimensional simple harmonic oscillator α∗ a ↔ f ∗(t) a(t). The second impli-

cation is that the complex weighting function in general matches the phase of the

carrier wave, resulting the explicit appearance of the e+iω0s factor. The third impli-

cation is that both the function f(s)/√τ and the operator a[ϕ(σ)(t − s)] have the

same units, which are the same as white noise. The final implication comes from the

fact that because we have the commutator

[a[ϕ(σ)(t)], a†[ϕ(σ)(t′)]

]= e−iω0(t′−t) ϕ(σ) ? ϕ(σ)(t′ − t) +O

((σ ω0)−1

)(2.108)

and the limit limσ→0 ϕ(σ) ? ϕ(σ)(t′ − t) = δ(t′ − t), then

limσ→0

ω0σ→∞

(σ)1 (k, t)], a†[g

(σ)2 (k, t)]

]=‖g1‖ ‖g2‖√

τ1 τ2

∫ds f ∗1 (s) f2(s). (2.109)

This implies that if we have two square integrable functions, h1(t) and h2(t) then

these functions can count as members of a single particle Hilbert space, h′ = L2(R).

Furthermore we can define a Fock space F (h′) and ultimately the field operators

a[h1] and a[h2]. If

h1(t) ∼=‖g‖√τf1(t), (2.110)

then we can draw the formal equivalence

a[h1] ∼= limσ→0

ω0σ→∞

a[g(σ)1 ]. (2.111)

While discussing the statistical aspects of quantum light can be interesting in its

own right, the real fun is when that light is coupled to another quantum system. Sec.

2.6 shows how when a system couples though an interaction Hamiltonian to operators

similar to a[ϕ(σ)(t)] and a†[ϕ(σ)(t)], the limiting object can be written in terms of a

quantum stochastic process on the joint Hilbert space Hsys⊗F (h′). Furthermore it

discusses how this relates to the standard expressions in quantum optics involving

simpler models of quantum white noise. Before including the system however, we

will show what is gained by taking this discontinuous limit and how it is useful for

defining quantum stochastic processes.

2.5 Quantum Wiener processes and the

continuous-time decomposition

Quantum stochastic integrals were first defined mathematically by Hudson and Par-

thasarathy in 1984. There they formulated a quantum version of a Ito-type stochas-

tic integral where the fundamental differentials, in correspondence to the classical

Wiener process and other jump processes, are operators acting on a bosonic Fock

space [23]. Independently Gardiner and Collett formulated a physical description of

quantum white noise operators where creation and annihilation operators are asso-

ciated with excitations in a bosonic heat bath, which are then used as driving noise

sources in a quantum Langevin equation [50]. This second formulation is the most

well known in the quantum optics community (see, e.g., the well written reference

[51]) but is less amenable for directly applying the filtering techniques of classical

probability theory. The picture of a heat bath does not immediately induce a pic-

ture of a traveling flow of information from a probe system to a detector. Rather it

instills a picture of a system immersed in stationary and chaotic environment and it

is unclear what it means quantum mechanically to “measure the bath”. While one

certainly could, and often does, construct a large scale flow in the bath running from

the system to an independent observer such a construction ultimately resembles a

wave packet description.

If one instead explicitly includes time into a description of the environment, as

Hudson and Parthasarathy do, then statistical properties necessary for defining a

quantum Wiener process and a quantum Ito integral, namely the ability to construct

time-adapted processes, is a direct consequence. We will shortly review how this is

done, but first note that in contrast to relying on the system to dictate how the

bath is modeled, this represents a more axiomatic approach in that the statistical

properties of the bath are postulated independently from the system. Now clearly the

physics of the entire system-probe-measurement combination will dictate whether or

not this is an appropriate model. The purpose of this section is to show what is

gained in this formulation.

Like the Gardiner and Collett formulation, this formulation begins by assuming

a bosonic Fock space, however here it assumes that it is a second quantization of a

single particle Hilbert space,

h ≡ L2(R+)⊗ h′ (2.112)

where L2(R+) represents the Hilbert space of square integrable functions defined on

the positive real line (representing time) and h′ is an auxiliary Hilbert space. Almost

all formulations immediately assume that h′ is a finite d-dimensional system and so

every g(t) is effectively a complex-vector-valued function, i.e. g : R+ → Cd. In this

case the inner product between two single particle vectors is

〈f , g〉 =

∫ ∞0

dt f∗(t) · g(t) <∞. (2.113)

From this single particle Hilbert space the symmetric Fock space F (h), exponential

vectors e[g], and Weyl displacement operators W[g] are identical to their definitions

in Sec. 2.2.3. And most importantly, we can define the annihilation and creation

operators

Ait ≡ a[χ[0,t] ei] (2.114)

Ai †t ≡ a†[χ[0,t] ei] (2.115)

that have the commutation relation[Ais, A

j †t

∫ ∞0

ds′ χ[0,s] (s′) χ[0,t] (s

′) ei · ej = δi,j min(s, t). (2.116)

These processes serve as two of the building blocks of quantum stochastic calculus

and are in analogy with an n-dimensional Wiener process. To show this last analogy,

we must first discuss how we specifically include time in h.

2.5.1 The continuous-time tensor decomposition

In basis quantum mechanics there is an intimate connection between statistical in-

dependence and a tensor product structure. When a complete system is described

by the tensor product of two Hilbert spaces and the total state is a product state

then both systems can be considered statistically independent. Specifically in this

case two operators from the individual Hilbert spaces X1 and X2 are statistically

independent in the sense that

〈X1 ⊗X2〉ρ = 〈X1〉ρ1〈X2〉ρ2 . (2.117)

In Sec. 2.2.5 we introduced the notion of a time-adapted quantum stochastic process,

where a quantum operator O[0,t] was time-adapted if it acted as the identity on any

coherent ψ[g⊥] where g⊥ is excluded from the time interval [0, t]; g⊥(s) = 0 for

0 ≤ s ≤ t. We defined two mutually orthogonal spaces of wave packets h[0,t] and h⊥[0,t]

which in turn have their associated Fock spaces F (h[0,t]]) and F (h⊥[0,t]). The classical

definition of a time-adapted stochastic process is that the process is statistically

independent of all events in the future. In light of the connection between statistical

independence on the one hand and a tensor product structure on the other, it seems

reasonable to have

F (h) ∼= F (h[0,t])⊗F (h⊥[0,t]) (2.118)

where ∼= indicates a unitary equivalence. But if h[0,t] represents all wave packets

localized to [0, t] then it also seems reasonable to conclude that h⊥[0,t] = h(t,∞).

In this section we will show that this tensor product decomposition is not only

possible for any single time t but it is also possible for any sequence of n ordered

times tn : 0 < t1 < · · · < tn <∞. This is called the continuous-time tensor de-

composition and is the relation that

F (h) ∼= F (h[0,t1))⊗F (h[t1,t2))⊗ · · · ⊗F (h[tn,∞)). (2.119)

The proof of this statement is outlined in lemma 2.1 (which is essentially proposition

19.6 of [52]).

Now if Eq. (2.119) is true, then for any partitioning of time, no matter how small,

this Fock space decomposes into a tensor product between the various partitions.

Furthermore if we have operators

O[ti,ti+1)

which are each adapted to the interval

[ti, ti+1) then we have that the expectation values factorize,⟨ ∏i

O[ti,ti+1)

⟩ψ[g]

⟨O[ti,ti+1)

⟩ψ[g[ti,ti+1)]

. (2.120)

In other words, if both the operators and the state respects the continuous-time

decomposition then those operators will be statistically independent for independent

times.

Why is this important? Well, Sec. 3.1.4 reviews the basic properties of a Wiener

process and shows how its defining feature is that its restrictions to independent

time increments are statistically independent and that each are mean zero Gaussian

random variables of variance ti+1 − ti. Therefore, any quantum analog of a Wiener

process must also respect the continuous-time decomposition. The fact that the

classical Wiener process satisfies the Markov and martingale properties is a direct

consequence of this independence [53]. Sec. 3.1.1 reviews the definition of these two

properties and how they relate to taking conditional expectation values. Additionally,

the Ito definition of a stochastic integral, (see Appendix B ) is defined in such a way

so that the integral∫xt dwt is also a martingale and that if xt is Markovian than so is

the integral. For a quantum stochastic integral to also have these desirable properties,

a necessary criteria is that a process Xtt≥0 must be statistically independent of all

future events. In the next section we return to the operators Ait and Ai †t and show

how they can be used to construct a quantum Wiener process, but first we include

a proof of the continuous-time decomposition.

Lemma 2.1. Given the single particle Hilbert space h = L2(R+)⊗Cd and an ordered

sequence of times tn = ti ∈ R+ : 0 < t1 < · · · < tn <∞, the symmetric Fock

space satisfies unitarily equivalence F (h) ∼= F (h[0,t1])⊗F (h[t1,t2))⊗ · · ·⊗F (h[tn,∞))

where h[ti,ti+1] = L2([ti, ti+1])⊗ Cd.

Sketch of Proof. For any vector g ∈ h we can define the projection of g onto a time

interval via

g[t0,t1](t) ≡χ[t0,t1] (t) g(t) (2.121)

and that for all n times we have

g(t) = g[0,t1)(t) + g[t1,t2)(t) + · · ·+ g[tn,∞)(t). (2.122)

As this is true for any element h, we have the natural decomposition,

h ∼= H[0,t1) ⊕H[t1,t2) ⊕ · · · ⊕ H[tn,∞). (2.123)

where H[ti,ti+1) is the space of square integrable vector valued functions of dimension

d defined on the interval [ti, ti+1). Because of this decomposition, proving that F (h)

satisfies the tensor decomposition now means proving unitary equivalence

F( n⊕

h[ti,ti+1)

n⊗i=1

F (h[ti,ti+1)). (2.124)

This is easily shown by first noting that

⟨g[ti,ti+1), f[tj ,tj+1)

⟩= δi,j

⟨g[ti,ti+1), f[ti,ti+1)

⟩. (2.125)

This however implies that for the exponential vectors e[g] and e[f ]

〈e[g]|e[f ]〉 = exp(〈g, f〉) =n∏j=1

exp(⟨

g[ti,ti+1), f[ti,ti+1)

⟩). (2.126)

If we define the transformation V : F (h) → F (h[0,t1]) ⊗ · · · ⊗F (h[tn,∞)) such

V e[g] = e[g[0,t1)]⊗ · · · ⊗ e[g[tn,∞)], (2.127)

The inner product between two transformed vectors is then

〈V e[g]|V e[f ]〉 =n∏j=1

e〈g[ti,ti+1), f[ti,ti+1)〉 = e〈g, f〉. (2.128)

This shows that V is a unitary transformation between the exponential vectors.

However because the exponential vectors are dense in the symmetric Fock space, V

linearly extends to any vector in F (h).

2.5.2 The quantum Wiener process

In [25], Bouten et al. given an elegant derivation of how the quadratures

Qit ≡ Ait + Ai †t and P i

t ≡ i(Ai †t − Ait

), (2.129)

have the statistics of a Wiener processes, when the field is in the vacuum state. For

the sake of completeness we reproduce this derivation here.

In Sec. 2.2.2, we introduced a[g] and a†[g] though the generators of the coher-

ent state ψ[g] through the relation, ψ[g] = W[g]|∅〉 = exp(a†[g] − a[g]) |∅〉. The

argument of this displacement operator defines a Hermitian generator

Υ[g] ≡ i(a†[g]− a[g]

)(2.130)

so that exp(a†[g]− a[g]) = exp(−iΥ[g]).

For a classical random variable x then

ϕx(κ) ≡ E(

exp(iκ x))

(2.131)

is the characteristic function for that random variable and therefore characterizes

its statistics. The Weyl operator exp(−iΥ[g]) is nearly equivalent to the char-

acteristic function, up to the constant κ and a minus sign. However though the

anti-linear property of a[g] we have a[λg] = λ∗a[g], but if λ = −κ, (real κ) then

exp(−iΥ[−κg]) = exp(+iκΥ[g]). Converting this operator into a true characteristic

function simply means taking an expectation value with respect to the field state. If

the state is in a coherent state, ψ[f ], then

ϕΥ[g](κ) ≡ 〈exp(+iκΥ[g])〉ψ[f ] , (2.132)

which characterizes the statistics of the operator Υ[g]. In terms of the Weyl dis-

placement operators this means that

ϕΥ[g](κ) = 〈ψ[f ]|W[−κg] |ψ[f ]〉 = e−‖f‖2

〈e[f ]|W[−κg] |e[f ]〉 . (2.133)

Eq. (2.61) relates the action of the Weyl operator to the exponential vector showing

that this simplifies to

ϕΥ[g](κ) = exp(−‖f‖2 + κ 〈g, f〉 − κ2 ‖g‖2 /2) 〈e[f ]|e[−κg + f ]〉

= exp(−‖f‖2 + κ 〈g, f〉 − κ2 ‖g‖2 /2− κ 〈f , g〉+ ‖f‖2)

= exp(i κ 2 Im 〈g, f〉 − κ2 ‖g‖2 /2).

(2.134)

The final line is recognizable as the characteristic function of a Gaussian random

variable of mean 2 Im 〈g, f〉 and variance ‖g‖2. Note that when g =χ[0,t)(t) ej,

Υ[χ[0,t)(t) ej] = i(Ai †t − Ait

)= P i

t (2.135)

Υ[−i χ[0,t)(t) ej] = Ai †t + Ait = Qit. (2.136)

In either case, ‖g‖2 = t and so that both operators have variance t, regardless of the

coherent amplitude f of the underlying state.

When the state of the field is in vacuum (f = 0) then both quadratures are

mean zero, Gaussian random variables whose variance is given by t. Lemma 2.1 also

shows that the operator Υ[χ[s,t)(t) ej] = P it − P i

s is a generator of displacements in

the Fock space F (h[s,t)) and therefore commutes with any generator for states in

F (h[0,s)). Clearly the vacuum respects the continuous tensor product decomposition

and therefore the quantum stochastic processes Qitt≥0 and P i

t t≥0 have, in vacuum

expectation, the statistics of Wiener processes.

2.5.3 The units of quantum noise

An extremely observant reader might be concerned about the quadrature definitions

of Qt and Pt given in Eq. (2.129). The issue lies in the units and physical interpre-

tation of the single particle wave functions. In the strictest sense of second quan-

tization, the normalized vector f ∈ L2(R+) ⊗ Cd are single particle wave functions

whose square represents the probability density for observing the particle at some

point in its domain. In order for f to be a square normalized density,∫dt |f |2 = 1,

means that f must have units of 1√time

. Consequently the operators a[f ] and a†[f ]

must be unitless as their commutator is subsequently unitless. However as written,

the quadratures Qjt and P j

t have the commutation relation[Qjt , P

]= 2i t, (2.137)

which clearly has units of time. The solution to this distinction is realize that when

defining At we should really be considering the field operators relative to some char-

acteristic rate γ. Through the linearity of a[f ], we clearly have

a[√γ χ[0,t]] =

√γ At. (2.138)

The whole point of defining At in this ways is that regardless to the magnitude of

γ t, the scaled quadrature√γ Qt will still have the statistics of a Brownian motion,

simply with the diffusion rate γ.

The objective of Sec. 2.3 was to identify on what scales we could treat a quasi-

monochromatic field to be statistically independent for independent increments of

time. Fig. 2.2 showed the scaling of a smoothed characteristic function χ(σ)[0,τ ](t) for a

fixed smoothing variance and a variable duration τ . The act of smoothing limited the

derivative to be at most on the order of 1/σ and for τ ∼ 103 σ this had little effect on

the visual appearance of the smoothed function. Note that the actual correction term

to inner product in Eq. (2.92) compared the rate of change of the temporal envelope

to the carrier frequency ω0 and the introduction of the smoothing distribution is to

simply limit this derivative. Assuming that γ represents the rate of diffusion then

for times τ ∼ 1/γ any pulse should appear to have a discontinuous derivative on

this scale. In other words, in order to treat an optical field as generating a quantum

Wiener process we must have

γ σ−1 ω0. (2.139)

In any realistic application, the physics of the system typically sets values for γ and

ω0. In an atomic physics context the carrier frequency is usually a dipole allowed

optical transition, leading to ω0 ∼ 2π× 100 THz. For such a transition the measure-

ment timescale is on the order of the lifetime of the excited state, tdecay ∼ 10 ns,

meaning that typically, γ ∼ 2π × 10 MHz. This leaves 7 orders of magnitude be-

tween these two scales. In many atomic systems this is actually an upper bound

on the measurement rate. Typically one will consider off resonant light leading to

a significantly slower diffusion rate. In term of this dissertation, Chap. 5 applies

a quantum stochastic treatment to an idealized model of the Faraday interaction,

where γ is reduced by a factor of one over this frequency difference. This will be

discussed in detail in Sec. 2.7. Before considering this specific model we will review

how to transition from a smooth deterministic Schrodinger equation, to one involving

quantum stochastic integral with respect to At and A†t .

2.6 Systems Interacting with Quantum Noise

Much of this chapter has alluded to coupling a quantum system of interest to a

traveling wave field and describing the resulting evolution in terms of a quantum

stochastic process. In the quantum optics literature a system coupled to delta com-

muting field operators have been discussed since the work of Gardiner and Collett, if

not before [50]. The field operators are typically defined as the Fourier transform of

one-dimensional operators quantizing a continuous spectrum of harmonic oscillators.

That is, from the operators a(ω) and a†(ω), [a(ω), a†(ω′)] = δ(ω − ω′),

a(t) ≡ 1√2π

∫ ∞∞

dω e+iωt a(ω). (2.140)

It then follows that [a(t), a†(t′)] = δ(t′ − t). From these operators one typically

formulates the noncommuting quadratures, q(t) = a†(t) + a(t), and p(t) = i a†(t) −

i a(t). If the state of the field is specified to be in vacuum, then 〈q(t)〉∅ = 0 and

〈q(t)q(t′)〉∅ = δ(t− t′). In other words, in vacuum expectation q(t) has the statistics

of white noise and the same is true for p(t) and any rotated combination of the two.

This is why the field operator a(t) is typically given the designation as quantum

white noise.

A system is introduced to the problem typically through a linear interaction

Hamiltonian where after making a couple of approximations vary much in line with

our assumed separation of timescale, the interaction Hamiltonian reads [51]

Hint(λ, t) = i~√γ(a†(λ, t) c− a(λ, t) c†

)(2.141)

where c is a generic system operator and a(λ, t) is an operator that limits to a(t)

as λ → 0 1. To arriving at Hint(λ, t), a transformation to an interaction picture

was made and all time dependence was associated with either a(λ, t) or its adjoint.

In this interaction picture, the joint state of the system and field evolves under a

unitary propagator U(λ, t), which satisfies the equation

dtU(λ, t) = − i

~Hint(λ, t)U(λ, t). (2.142)

It is well known that for a general time-dependent Hamiltonian the resulting propa-

gator is given by a time ordered exponential

U(λ, t) = ~T exp

(− i~

dsHint(λ, s)

). (2.143)

1Actually Gardiner and Collett consider the limit in the frequency domain and so theyconsider a bandwidth θ that approaches infinity. For our purposes it is more convenient toconsider λ→ 0, but the spirit is the same.

The ultimate goal is to interpret the operator limλ→0 U(λ, t) ≡ Ut as a solution

to an equivalent quantum stochastic differential equation. Appendix C reviews the

mathematical background needed to fully discuss these kinds of equations in the

language of quantum stochastic processes acting in terms of a second quantized Fock

space.

In the textbook formulation of quantum noise, Gardiner and Zoller define a form

of quantum Ito calculus that is explicitly tide to the statistical properties of the state

of the field and considers only a small family of Gaussian states [51]. Conversely, the

quantum Ito calculus defined by Hudson and Parthasarathy has an Ito rule that is

independent of the field state and the proof of convergence holds over a large domain

of possibly correlated system field states [23]. The chief distinction between the two

formulations is that the quantum optics derivation is based in a specific model, one

that does not initially assume the structure necessary for the more abstract version.

From the point of view of a physicist trying to model a quantum system starting

from an interaction Hamiltonian and the Schrodinger equation, it is not at all clear

how and when that fits into the abstract quantum Ito calculus. While Hudson

and Parthasarathy gave criteria for when a quantum stochastic differential equation

describes a unitary process, they did not specify how one should arrive at such a

process from a Schrodinger equation and an approximating principle. This is exactly

what the quantum optics derivation provides, however without making the final

connection to the state independent Ito calculus. In 1990, Accardi et al. proved this

connection where they showed how a unitary propagator generated from a linear

Hamiltonian similar to Eq. (2.141) converged to an Ito integral as specified by

Hudson and Parthasarathy. Additionally they showed that the proof holds in “the

weak sense of matrix elements.” What this means is the following.

Suppose we are given two coherent states with smoothed wave packets ψ[g(σ)1 ] and

ψ[g(σ)2 ] and that when σ → 0 we have the equivalent discontinuous coherent states

ψ[g′1] and ψ[g′2]. Furthermore suppose we are given a stochastic processesX

that contains both system and field operators. Then X(σ)t → Xt in the weak sense

of matrix elements if, for arbitrary system state vectors φ1 and φ2,

limσ→0

σω0→∞

⟨φ1 ⊗ ψ[g

(σ)1 ]∣∣∣X(σ)

∣∣∣φ2 ⊗ ψ[g(σ)2 ]⟩

= 〈φ1 ⊗ ψ[g′1]|Xt |φ2 ⊗ ψ[g′2]〉 . (2.144)

It’s worth noting Accardi et al. shows that the limit holds for the bear propa-

gator U(λ, t) as well as for the Heisenberg evolution of a system operator, i.e.

U †(λ, t)XU(λ, t).

We mention this here because as long as the total system-field state ρtot can be

represented in terms of the matrix elements

limσ→0

σω0→∞

⟨φ1 ⊗ ψ[g

(σ)1 ]∣∣∣ ρtot

∣∣∣φ2 ⊗ ψ[g(σ)2 ]⟩,

then the quantum stochastic representation is appropriate. Clearly these states are

much more complex than simply mean-zero Gaussian states as they can represent

entangled states between the system and the quasi-monochromatic field as well as

nonclassical field states such as superpositions between modes and even single photon

states. However not every field state is included, as we must allow for discontinu-

ous yet quasi-monochromatic matrix elements2. In other words, the total state ρtot

must be compatible with the approximations that make the stochastic representation

possible to begin with.

Here we are also interested in moving beyond an interaction Hamiltonian that is

linear in a†(λ, t) and a(λ, t). This is because the Faraday interaction is fundamen-

tally quadratic in the field operators as it describes the scattering of light in one

polarization state to another, see Sec. 2.7. Fortunately, in 2006 Gough extended the

results of Accardi et al. to include scattering or conservation interactions. In classical

stochastic calculus the conversion between a smooth ordinary differential equation

2For defining a reasonable calculus it is also required that the wave packet amplitudestake on a large but finite maximum value.

and a stochastic differential equation is the Wong-Zakai theorem. Not surprisingly,

the conversion between a smooth Schrodinger equation and quantum stochastic dif-

ferential equation for the propagator is called the quantum Wong-Zakai theorem in

The specific Hamiltonian that the quantum Wong-Zakai theorem considers and

the one that we will use here is the interaction Hamiltonian, (sum over repeated

indices with i, j = 1 . . . d)

Hint(λ, t) = ~(Eij a

†i (λ, t) aj(λ, t) + Ei0 a

†i (λ, t) + E0j aj(λ, t) + E00

)(2.145)

where Eαβ : α, β = 0, . . . , d are bounded operators acting on a system Hilbert

space Hsys. Each term in Hint(λ, t) physically represent the following:

• E00 is an operator acting solely on system degrees of freedom, independent of

the bosonic modes, with units of frequency, e.g. it could be what remains of

the free system Hamiltonian after transforming to an interaction picture.

• Ei0 is a system operator that accompanies the creation of an excitation in the

ith bosonic mode centered at time t. A canonical example would be an operator

proportional to an atomic lowering operator, with units of 1/√time.

• E0j is a complementary process where, at time t, an excitation in the jth mode

is removed.

• Eij is a unitless system operator weighting an instantaneous scattering of

quanta from the jth mode to the ith. When i = j this can be interpreted

as a system coupling to the number of quanta in that mode at time t.

Note that as this Hamiltonian is required to be self-adjoint, the system operators

must satisfy the constraint Eαβ = E†β α.

Adding the quadratic term has an interesting and slightly unexpected effect on

the physics. Again the term Eij a†i (λ, t) aj(λ, t) represents the instantaneous transfer

of a photon from mode j to mode i. However for λ > 0, a†i (λ, t) has temporal

extent, meaning that its possible for the system to interact again with the scattered

quanta. If the magnitude of Eij is relatively small then the possibility of re-interaction

maybe relatively small, but we have yet to impose any such constraint. Fortunately

the whole problem of converting from the equation U(λ, t) = − i~Hint(λ, t)U(λ, t)

to a quantum stochastic differential equation, including the possibility of multiple

scattering events was solved by Gough. Some of the details of this derivation is

reviewed in Appendix D.

Intimately related to this conversion is how the formal quantum Ito integral is

related to the operator ordering of the constituent field operators. There exists a

fundamental connection between the rules of quantum Ito calculus (see Appendix

C) and whether an iterated integral containing a sequence of field operators are

either time or normally ordered. This connection was also formalized by Gough in

[43], which Appendix D reviews. The bottom line is that in order for the limit of

the time ordered exponential in Eq. (2.143) to be interpreted as a solution to an

equivalent quantum Ito stochastic integral it must be put into normal order with

all of the annihilation operators to the right of the creation operators. The effects

of the multiple scattering events become mathematically apparent when converting

from the time ordered solution to a normally ordered form.

Before we are able to fully write down the resulting propagator we must address

two important issues. The first is to concretely link the operator ai(λ, t) to the wave

packet theory introduced in this chapter as well consider a kind of field operator

wholly different from what we have considered up to this point.

2.6.1 Quantum white noise in paraxial wave packets

There are several different derivations that lead to bosonic operators a(λ, t) and

a†(λ, t) that result in calling the object limλ→0 a(λ, t) quantum white noise. For

instance Gardiner and Zoller use a wide-bandwidth limit where they assume that the

interaction Hamiltonian Hint is initially specified in the frequency domain and that

the system couples preferentially frequencies centered at a large transition frequency

ω0. They then assume that the coupling between the system and the operators a(ω)

is nearly flat in a frequency band centered at ω0. When this flat coupling band

is sufficiently wide, the effect is for the system to be interacting with an operator

representing a white spectrum and is therefore delta correlated [51]. Accardi et al.

take a different approach by considering a weak-coupling/long-time limit. The weak

coupling implies that on short “optical” timescales the system field interaction can be

considered perturbatively but that on longer “mesoscopic” times the aggregate effect

is nontrivial and the field fluctuations develop a diffusive characteristic. Though a

subtle re-scaling of time, a field operator a(λ, t) emerges and is delta commuting

as λ → 0 [55]. Rather than fixing ourselves to a specific system-field interaction,

this work has focused instead on integrating the language of second quantization

and stochastic processes with a realistic description of classical optics meaning that

neither model fully fits our needs. Instead we hope to find a description that is not

tide to a specific model but capture the spirit of each.

Consider the time ordered exponential

U(λ, t) = ~T exp

(− i~

dsHint(λ, s)

)(2.146)

where Hint(λ, t), is given in Eq. (2.145). Expanding the exponential to just two

terms shows us that

U(λ, t) =1− i~

dsHint(λ, s) + . . .

=1− iEij∫ t

ds a†i (λ, s) aj(λ, s)− iEi0∫ t

ds a†i (λ, s)

− iE0j

ds aj(λ, s)− iE00

ds+ . . . .

(2.147)

Now consider the creation term, Ei0 a†i (λ, s). Physically, this means that localized

in an time interval near time s, create an excitation in the ith field mode and while

you’re at it, apply the system operator Ei0. Suppose the joint system was in a pure

product state |ψi〉|∅〉 and that |ψi〉 happens to be an eigenstate of Ei0 with eigenvalue

hi. Then the time integral over this term acting on this state gives

−iEi0∫ t

ds a†i (λ, s)|ψi〉|∅〉 = −ihi∫ t

ds a†i (λ, s)|ψi〉|∅〉. (2.148)

This integral looks almost like a smoothed creation operator for a wave-packet with

temporal envelope function hi(t) = −i hi χ[0,t](s) acting on vacuum. Specifically, Eq.

(2.107) gives the expression for the one-dimensional smooth wave packet a[g(σ)(k, t)].

Taking the adjoint of that equation results in

a†[g(σ)(k, t)] =‖g‖√τ

∫ ∞0

ds f(s) e−iω0s a†[ϕ(σ)(t− s)]. (2.149)

The major distinction between this operator and the integral −ihi∫ t

0ds a†i (λ, s) is

the existence of the carrier phase e−iω0s. This means that in order to interpret the

integral −iEi0∫ t

0ds a†i (λ, s) as creating a system dependent extended single photon

state, the interaction must be inherently phase modulated at the carrier frequency

ω0. But this is exactly the same statement as Gardiner and Collett when they assume

that the system interacts with the bath at a large characteristic frequency. Therefore

without specifying a detailed interaction model we can say that

a†i (λ, s)∼= e−iω0s a†[ϕ

(σ)i (−s)] (2.150)

aj(λ, s) ∼= e+iω0s a[ϕ(σ)j (−s)] (2.151)

for some system characteristic frequency ω0 and smoothing wave packets ϕ(σ)i (k, s)

and ϕ(σ)j (k, s). The presence of the time reversal might be a little puzzling at first,

but this is simply due to the fact that this wave packet was defined with respect to

a convolution, which always time reverses one of the two functions. One possible

mapping to include a set of d distinct modes is to assume the model considers a

set of d paraxial spatial mode functions u(+)i (xT , z), which satisfy the orthogonality

relation,∫d2xT u

∗i (xT , z) · uj(xT , z) = δij σT .

To complete this discussion we should identify what the parameter λ means in the

wave packet context. We are able to explicitly compute the unequal time commutator

from Eq. (2.92) and remembering that for the smoothing kernel ‖g‖ /√τ = 1. This

results in[ai(λ, t), a

†j(λ, t

= e+iω0(t−t′)⟨ϕ

(σ)i (−t), ϕ(σ)

j (−t′)⟩

= δij

(ϕ(σ) ? ϕ(σ) (t− t′)− i 1

dϕ(σ)

dt? ϕ(σ) (t− t′)

(2.152)

λ is simply a parameter representing the formal limit that as λ → 0, σ → 0 and

(σ ω0)−1 → 0.

2.6.2 The scattering process

Up until now, the only kind of field operator we have considered is a creation operator

associated with a given single particle state g. While these operators are vitally

important, it does leave out the possibility of a whole other class of field operators.

Eq. (2.147) expanded the time order exponential for Ut to first order, which included

the integral∫ t

0ds a†i (λ, s) aj(λ, s). We will now show that this operator is quite

different from the product of two smeared wave packet operators, particularly in the

limit λ→ 0.

Consider the exponential vectors e[f(λ)] and e[h(λ)], f ,h ∈ L2(R+)⊗Cd, defined

|e[f(λ)]〉 ≡ exp

(∫ ∞0

dt fi(t)a†i (λ, t)

)|∅〉. (2.153)

If∫ t

0ds a†i (λ, s) aj(λ, s) where some how equivalent to the product a†[g]a[g] as λ→ 0,

for some wave packet g we would have the eigenvalue relationship

limλ→0

⟨e[f(λ)]

∣∣ a†[g(λ)]a[g(λ)]∣∣e[h(λ)]

⟩= 〈f , g〉〈g, h〉〈e[f ]|e[h]〉 . (2.154)

We can explicitly show that this is not the case. To simplify the notation, we’ll define

the function

cij(λ, t− s) ≡ [ai(λ, t), a†j(λ, s)], (2.155)

which has the property that

limλ→0

cij(λ, t− s) = δij δ(t− s). (2.156)

Explicit calculation then shows that⟨e[f(λ)]

∣∣∣∣ ∫ t

ds a†i (λ, s) aj(λ, s)

∣∣∣∣e[h(λ)]

⟩=∫ t

∫ ∞0

ds1 f∗` (s1)c∗i`(λ, t− s)

∫ ∞0

ds2 cjk(λ, s− s2)hk(s2) 〈e[f(λ)]|e[h(λ)]〉 .

(2.157)

Then in the discontinuous limit,

limλ→0

⟨e[f(λ)]

∣∣∣∣ ∫ t

∣∣∣∣e[h(λ)]

ds f ∗i (s)hj(s) 〈e[f ]|e[h]〉 . (2.158)

In terms of an inner product on single particle wave vectors this is actually equal to,

limλ→0

⟨e[f(λ)]

∣∣∣∣ ∫ t

∣∣∣∣e[h(λ)]

⟩=⟨f , χ[0,t] eiej · h

⟩〈e[f ]|e[h]〉 . (2.159)

This is clearly not an eigenvalue relationship involving a product of wave packets.

In actuality it is a second quantization of an operator acting on the wave packets

themselves [45, 52]. The specific operator here is multiplication by the indicator

function χ[0,t] (s) and the dot product into the dyad of basis vectors eiej. In the

language of the Hudson and Parthasarathy formulation of QSDEs, this operator is a

scattering or conservation process, and is notated as Λijt . While these processes can

be derived without reference to a limiting integral, our purposes we simply take this

to be a definition

Λijt ≡ lim

λ→0

ds a†i (λ, s) aj(λ, s). (2.160)

Appendix C reviews the notation and manipulation of QSDEs in terms of the quan-

tum Ito differentials dAit, dAj †t and dΛij

2.6.3 The limiting stochastic propagator

With a firm connection between the interaction Hamiltonian with scattering terms

and the wave packet theory, we are now able to express the limiting Hamiltonian in

terms of an Ito form quantum stochastic differential equation. Appendix C shows

that the most general quantum stochastic integral usually considered defines a pro-

Ut = U0 +

dΛijs F

dAi †s Fi0s +

dAjs F0js +

dsF 00s . (2.161)

In Sec. C.1 it also shows what constraint must be placed on the operators Fαβs in

order for Ut to be a unitary process. As any unitary can be written as exp(−iAt) for

some generator At, instead of working with Fαβs it is more convenient to define the

operators Gαβs so that

Fαβs = Gαβ

s Us. (2.162)

The unitary constraints written in terms of Gαβs is given in Eq. (C.19).

The bottom line result of the Quantum Wong-Zakai theorem is that the operators

Gαβs are expressible in terms of the system operators Eαβ defining Hint(λ, t) in Eq.

(2.145) a matrix of constants κij. As Eαβ are assumed to be time independent this

results in Gαβs = Gαβ

0 and so we will omit the time index and demote the superscripts

to subscripts. While the constants κij are in general complex the simplest of all cases

is when κij = 12δij. Not only is this the simplest of cases it is also well motivated for

our problem and so we will use it here, see Appendix D.2.2. With these simplifica-

tions, the propagator Ut is expressible as a quantum stochastic Ito integral, which

solves the recursive QSDE,

dUt = Gij Ut dΛijt +Gi0 Ut dA

i †t +G0j Ut dA

jt +G00 Ut dt. (2.163)

The limiting coefficients Gαβ are

Gαβ = −iEαβ − 12Eαi

1+ i 12E

Ejβ, (2.164)

where i and j start from 1 and we defined E as the matrix of operators Eij. The

appearance of this matrix in the denominator is precisely due to the possibility of

having multiple scattering events. A Neumann series is the operator-valued gener-

alization of a geometric series, so that for an operator A,∑∞

n=0An = (1 − A)−1,

which is well defined whenever 1 − A is invertible. The equivalent Ito coefficients,

generates a Neumann series of operators where in this case A = −i12E and An rep-

resents a quantum scattering between the modes n times. The limiting coefficient

then involves the i, j component of the operator/matrix inverse (1 + i12E)−1. For

an intuitive physical picture, the coefficients Gαβ can be interpreted in the following

Each coefficient Gαβ can be roughly thought of a right-to-left acting transforma-

tion occurring on the system, dependent upon on how it couples though the field.

The original, direct couplings Eαβ are still present as shown in the first term in Eq.

(2.164). In addition to the direct coupling, there are the effects of coupling thought

the various modes. As an example the second part of the Gi0 coefficient shows a

photon can be created in the ith mode not just by just direct excitation, represented

by the −iEi0 term, but also by first exciting jth mode, and then scattering any num-

ber of times and then finally being emitted into the ith. The same goes for G00 and

Gij except these either leave the field unchanged or transfer a quantum from mode

j to mode i.

2.6.4 A simple 1D example

Nearly the simplest of all nontrivial examples of the quantum Wong-Zakai theorem

is when d = 1, and

E11 = 0, E10 = i√γ D, E01 = −i√γ D†, and E00 = Hsys (2.165)

where D and Hsys are system operators. In other words, these are the coefficients

for a total Hamiltonian

Hint(λ, t) = ~(i√γ D a†(λ, t)− i√γ D† a(λ, t) +Hsys

). (2.166)

This is an extremely common model in quantum optics where an atomic dipole

operator D couples with the rate γ to a quantized quasi-monochromatic electric field,

with “white noise” creation operator a†(λ, t). Hsys is the remaining system operator

which includes any residual detuning of the field mode from the system transition

frequency or any externally applied controls. The quantum Wong-Zakai theorem

states that this pre-limit Hamiltonian generates a propagator with the coefficients

G11 = 0, G10 =√γ D, G01 = −√γ D†, and G00 = −iHsys−1

2γ D†D. (2.167)

This results in the propagator Ut satisfying the QSDE

dUt =(√

γ D dA†t −√γ D† dAt − iHsys dt− 1

2γ D†Ddt

)Ut. (2.168)

Next section we will apply the quantum Wong-Zakai theorem to the much more

interesting case of the Faraday interaction where d = 2 and Eij 6= 0.

2.7 The Faraday Interaction

The Faraday interaction is physically based on an optical field propagating in a

polarizable medium. In classical optics it is used as a magneto-optical effect, where

the polarization of a linearly polarized probe is rotated by an amount proportional

to the component of the magnetic field parallel to the direction of propagation.

At a macroscopic level, it is modeled terms of the energy shift of a polarizable

particle induced by an oscillating electric field. Such an energy shift can be easily

implemented and precisely controlled by applying a quasi-monochromatic laser to

a rarefied monatomic gas. The laser’s effect on a single atom can be considered a

perturbation to its atomic ground states, if the laser drive is in a “low saturation”

regime. Specifically the laser’s carrier frequency must be close to, but significantly

off resonance from, a ground state transition. Furthermore the intensity must be

small enough so that the total number of excited atoms will be negligibly small.

Not surprisingly, the details of the ground state atomic structure effects both the

magnitude and direction of the induced polarization and often leads to effects beyond

a simple linear rotation of the probe polarization. The derivation of this kind of

interaction is elegantly presented by Deutsch and Jessen in [5]. Here we will consider

the simplest of all settings where the atomic ground state is given by a spin 1/2

particle. Such a ground state is experimentally realizably, if the atom has a single

valance electron and negligible hyperfine structure, or if two valence electrons form

a spin singlet ground state and the nucleus has a total spin I = 1/2.

In either the classical or quantum mechanical setting, the polarizability Hamil-

tonian for an atom located at position ra is given by

Hpolar = −E(−)(ra) · ←→α · E(+)(ra) (2.169)

where ←→α is the polarizability tensor. Quantum mechanically, ←→α is an operator

acting solely on the atomic ground states. It is worth noting that in any real system

there will be additional decoherence due to spontaneous emission, which in this

treatment we will ignore. Shortly, we will note that the strength of the coherent

interaction is proportional to Γ/∆ where Γ is the excited state decay rate and ∆ is

the probe detuning from the atomic resonance. One can additionally show that the

incoherent photon scattering is proportional to Γ/∆2. When Γ/∆ is small, Γ/∆2 is

smaller and so the decoherence is often ignored.

The connection to the quantum Wong-Zakai theorem is that Hpolar is quadratic

in the field operators. As E(+)(ra) ∝ a† and E(−)(ra) ∝ a, the components of ←→α

will be identifiable with the operators Eij. By ignoring spontaneous emission we are

also able to restrict our attention to a single quasi-monochromatic paraxial mode.

While in principle the atom couples to any electric field at its location, including the

quantized vacuum, we will be applying a coherent displacement in a definite mode

u(+)(xT , z), with an envelope function f(t). As we have seen this envelope function

is expressible in terms a convolution with the smoothing function ϕ(σ)(t), see Sec

2.6.1, and so the relevant field operators we will be considering are a(λ, t) and its

adjoint. In other words, we are simply using this as an example for applying all of

the theoretical machinery developed in this chapter.

Before finally writing down Hint(λ, t), we will make one more extremely useful

but only marginally justifiable approximation. Here we assume that the spatial

distribution of the atoms will be irrelevant and all atoms in the ensemble can be

treated as existing at the same location in space. From the point of view of the

slowly varying envelope, this is a reasonable assumption if the dimension of the gas

along the direction of propagation is on the order of c σ. If the longitudinal extent of

the atoms becomes significant then Hint would have to treat atoms at the beginning

of the gas differently from the atoms at the end. In addition to having a spatial-

temporal dependence, any realistic paraxial beam will have some intensity variation

in both xT and z. If we take u(+)(xT , z) to be a standard Hermite-Gaussian beam,

the transverse and longitudinal intensity can be treated as approximately constant if

the gas has a transverse area that is small when compared to the beams characteristic

area σT . However, if a beam of a fixed input power has a large transverse area then

it will have a relatively low intensity at any give point relative to a beam with a

smaller σT . Ultimately this means that the more uniform the probe is, the weaker

the over all interaction will be.

With all of the above caveats and assumptions the Faraday interaction Hamilto-

nian is

Hint(λ, t) = ~χ0

(a†r(λ, t)ar(λ, t)− a

†l (λ, t)al(λ, t)

)(2.170)

with the operators and constants defined though the following. For a spin 1/2 ground

state, the single atom polarizability ←→α is diagonal in the circular polarization basis,

er and el. Up to an irreverent global energy shift ←→α ∝ σz(e∗rer − e∗l el) where σz

is the pauli z operator and the proportionality constant depends upon the specifics

of the atomic physics. By assuming that all of the atoms exist at the same location

in space, computing the Hamiltonian for the whole ensemble reduces to computing

a sum over the individual polarizabilities which further reduces to summing over all

of the σz operators. It is well known that with N spin 1/2 particles, a collective

pseudo-spin J can be defined whose components (i = x, y, z) are

Ji =N∑n=1

(n)i . (2.171)

The details of finding the proper proportionality to express the total interaction as

Eq. (2.170) is given in [5] but the dimensionless constant χ0 has an exceedingly

simple form with,

χ0 =σ0

2∆(2.172)

where σ0 is the resonant scattering cross-section for the given transition. This con-

stant can be viewed as giving the probability that a single atom absorbs and remits a

photon into the paraxial beam. In order to make all the above approximations valid

(e.g. assuming that the intensity is near constant and that spontaneous emission is

negligible) requires that both σ0 σT as well as Γ ∆, meaning that χ0 1.

When proving that the convergence of U(λ, t) in the sense of matrix elements

limλ→0 〈e[f(λ)]|U(λ, t) |e[h(λ)]〉, Gough used the useful relationship that

limλ→0〈e[f(λ)]|U(λ, t) |e[h(λ)]〉 = lim

λ→0

⟨∅∣∣∣ U(λ, t)

∣∣∣∅⟩ (2.173)

where U is the propagator, except with the replacements

ai(λ, t)→ ai(λ, t) + hi(t) and a†i (λ, t)→ a†i (λ, t) + f ∗i (t). (2.174)

As we are applying this limit when the field has a coherent displacement, it is suf-

ficient to work with these displaced versions and pretend that the field is in the

vacuum. The heart of Faraday interaction is the rotation of a linearly polarized

input and so we will assume that our displacement is linearly polarized and that

the equivalent amplitude function f(t) ∈ L2 ⊗C2 represents the displacement in the

electric field. With these two constraints we have that

fr(t) = f ∗l (t) = i√2f0(t) (2.175)

where f0(t) is real-valued. Typically experiments involving the faraday interaction,

the driving displacement is operated in switched on to a constant value for the

duration of an experiment. In this case max(f0) =√NL/τ where NL is the average

number of photons in a pulse of duration τ .

In the case of this displacement, we have the effective vacuum Hamiltonian

Hint(λ, t) = ~χ0

((a†r(λ, t) + f ∗r (t)

)(ar(λ, t) + fr(t)

)−(a†l (λ, t) + f ∗l (t)

)(al(λ, t) + fl(t)

)). (2.176)

It is useful to define the function,

κ(t) =(χ0

3f0(t)

. (2.177)

When f0(t) is held at a constant level we have a characteristic rate κ = (χ0/3)2NL/τ .

From this definition we can expand out the interaction

Hint(λ, t) = ~χ0

(a†r(λ, t)ar(λ, t)− a

†l (λ, t)al(λ, t)

)+ i ~

√κ(t)

2Jz(a†r(λ, t) + a†l (λ, t)− ar(λ, t)− al(λ, t)

(2.178)

Sec. 2.6 discusses the operators Eij as representing the scattering of field quanta

in a system dependent way. In addition, the limiting coefficients show that when

Eij takes on relatively large values, there is a possibility for multiple scattering

events. In the practical approximations that resulted in χ0 1, the probability

for multiple scattering is relatively small, unless Jz takes on obscenely large val-

ues. Note that when f0(t) 1, there can still be a significant interaction, as this

means that the second linear term dominates the interaction. Also note that we have

(ar(λ, t) + al(λ, t)) = ah(λ, t), i.e. an annihilation operator for horizontal polar-

ization. Dropping quadratic terms in favor of the terms with a large displacement,

we now have an approximately linear interaction in a single horizontally polarized

Hint(λ, t) ≈ i ~√κ(t) Jz

(a†h(λ, t)− ah(λ, t)

). (2.179)

From the 1-D example this means that we have G10 =√κ(t) Jz and so if we add

a time-dependent system control Hamiltonian Hc(t) to this expression we have a

propagator

dUt =(√

κ(t) Jz dAh †t −

√κ(t) Jz dA

ht − 1

2κ(t) J2

z dt− iHc(t) dt)Ut. (2.180)

This is the propagator that we will be considering in Chaps. 4 and 5.

It is worth noting that the Faraday interaction has been applied to several differ-

ent continuous measurement models in the QSDE formalism with varying levels of

initial assumptions [56, 57]. In [56], the free field was assumed to be well modeled

by a QSDE and a Faraday like interaction was derived via adiabatically eliminat-

ing an excited state as well as an artificial cavity mode which left many questions

unanswered as the derivation was made strictly through a single mode picture and

did not address the fundamental two mode structure of the Faraday interaction.

In contrast [57] considered a scattering interaction, however there they simply took

the fundamental scattering interaction before the displacement and substituted the

scattering processes dΛrrt and dΛll

t for the white noise operators. What that model

failed to consider was the effect of normally ordering the field operators in obtaining

the proper Ito correction. In the language of the quantum Wong-Zakai theorem, the

propagator initially considered in [57] should have been interpreted as a quantum

Stratonovich equation and not a Quantum Ito equation.

2.7.1 The quadratic Faraday interaction

It would be a shame to discuss the full solution to the quantum Wong-Zakai theorem

and not give an example that retains the scattering interaction. The Faraday inter-

action is a prime candidate for this, in a world where we have a weak drive f0(t) ∼ 1

but it is possible to see some kind of effect. From Eq. (2.178) we can identify

Err = −Ell = χ0

Er0 = El0 = i

√κ(t)

E0r = E0l = −i√

κ(t)2Jz, and

E00 = 0.

(2.181)

We can substitute these operators into the coefficients Gαβ in Eq. (2.164). After

some algebraic simplifications we find that

Grr = G†ll =−iχ0

1 + iχ0

Gr0 = −G0r =

√κ(t)

1 + iχ0

Gl0 = −G0l =

√κ(t)

1− iχ0

6Jz, and

G00 = −κ(t)

1 +(χ0

6Jz)2 .

(2.182)

App. C reviews the usual formulation of the propagator dUt, in terms of the operators

Sij, Li and H. Some more simple algebra shows that

Srr = S†ll =1− iχ0

1 + iχ0

Srl = Slr = 0,

Lr = L†l =

√κ(t)

1 + iχ0

6Jz, and

H = 0.

(2.183)

In order for dUt to be a unitary process, Sij must form a unitary matrix of operators,

i.e. S†ijSjk = SijS†jk = δik, which is clearly satisfied in this case. So finally the

Faraday interaction generates the propagator Ut, which solves the QSDE

dUt =(

(Srr−1) dΛrrt +(S†rr−1) dΛll

t +Lr (dAr †t −dArt )+L†r (dAl †t −dAlt)−L†rLr)Ut

(2.184)

with the initial value U0 = 1. In the small χ0 limit we were able to write the

propagator in terms the linear polarized field operators Aht and Ah †t . This is not

the case here as the right and left polarization states have different atomic coupling

operators. This might result in creating some system dependent elipticity to the

probe laser, however more analysis is clearly needed.

The power of writing the propagator in terms of the (S, L,H) parameters is

that a large number of results have already been computed for general coefficients

which can simply be applied here. As an example suppose we wish to compute the

unconditioned master equation of the atomic system assuming that the displaced field

is in vacuum, i.e. other than the coherent drive laser. Then the master equation

is given in Lindblad form with jump operators Lr and Ll. Specifically the system

density operator ρ(t) is the solution to

dt= D[Lr](ρ) +D[Ll](ρ)

= Lr ρL†r − 1

2L†rLr ρ− 1

2ρL†rLr + Ll ρL

†l − 1

2L†lLl ρ− 1

2ρL†lLl

= 12κ(t)

1+iχ0

6Jzρ Jz

1−iχ0

6Jzρ Jz

1+iχ0

6Jz− J2

1+(χ0

)2 ρ− ρ J2z

1+(χ0

(2.185)

Here we can see yet again that when χ0 → 0 we recover the standard dissipative

master equation with measurement operator κ(t) Jz. Also note that when the system

is prepared in either an eigenstate or a mixture of eigenstates of Jz then it does not

evolve in time.

Chapter 3

Classical and Quantum Probability

Theory

This chapter serves two purposes. The first and primary intention is to present a

number of known results from classical and quantum probability theory, which will

serve as a foundation for the novel work in later chapters. Those results rely on a

detailed knowledge of the (classical) statistical properties of the Wiener process and

so we review them here. Additionally, we need to know how to extract a classical

Wiener process from a fundamentally quantum system. The procedure of identifying

a stochastic process embedded in a quantum system is one useful application of a

more general mapping between quantum systems and classical probability theory.

The second purpose of this chapter is to emphasize the power of this technique and to

discuss how the language of classical probability theory can be used to identify certain

symmetries that might exist in a quantum system. In order to do this coherently,

we also review some of the basic elements of classical probability theory.

By working in the language of classical probability theory, the tools of nearly 80

years of classical mathematical analysis can be applied to quantum problems, with

one important example being a continuous-time quantum filter. We will not rederive

Chapter 3. Classical and Quantum Probability Theory 90

it here, merely discuss its origins, limitations and various formulations. A particulary

important form is the conditional master equation, an equation that is in some

sense semiclassical and can be viewed as being generated by the quantum-to-classical

mapping. Here we use the term semiclassical in the sense that the measurement

record is modeled as a real valued classical stochastic process whose statistics are

given by a quantum system expectation (see Sec. 3.4). Chaps. 4 and 5 work

exclusively with this equation, albeit in three different variations. The final topic of

this chapter is to show how these various forms are derived.

3.1 Classical Probability Theory

A physics Ph.D. program does not generally include a course in measure theory or

axiomatic probability theory. Most physics problems only consider a handful of dis-

crete or real-valued random variables and so applying a full measure theoretic context

is unnecessary. However in some instances, working only with a probability density

function becomes either intractable or conceptually problematic. One example is

when one is attempting to understand the behavior of a random function defined

over continuous time. In principle, this requires describing an uncountable number

of random variables, one for each possible time, where the density function at a given

time could be highly correlated with past (and maybe even future) times.

Furthermore, when adding the possibility of statistical inference to the picture,

defining individual density functions becomes even more convoluted. Consider trying

to estimate the history of the random variable xt based upon a continuous observation

of a nonlinear function of x, e.g. f(xt) = sin(xt). Writing down a joint and marginal

density functions for xt and f(xt) is not particularly straightforward, as they are

clearly distinct objects but are hardly independent. In the long run, a much more

efficient way of doing business is to decouple the notions of random events and their

associated probabilities from the specifics of any one random variable. By finding a

way to associate xt, f(xt), and maybe even a third random variable y to the same

underlying structure of events, we can then calculate the probability associated with

those events, independent of the specifics of x, y or f(x). The way this decoupling

is made is by invoking some of the structure found in measure theory.

An axiomatized probability model contains three elements, usually written as the

triple (Ω,F ,P), with Ω being a sample space, F a σ-algebra of events, and P is a

probability measure over those events [22, 53, 58]. We will now discuss each element

including specific examples. Ultimately, we are interested in describing diffusive

measurements and so we will focus on the example of Brownian motion. Brownian

motion is the canonical example for a system experiencing unforced diffusion and the

Wiener process is the most widely used mathematical model for such a system. Chap.

2 already encountered an instance of a Wiener process, in the vacuum statistics of

the quadrature At + A†t .

The first element of a probability space, Ω, is called the sample space and describes

the set of all possible outcomes of the model. In a system with a discrete number of

outcomes, a flip of a coin or a roll of a die, then Ω is simply the set of all possible

outcomes. For the coin Ω = heads, tails and the die Ω = 1, 2, 3, 4, 5, 6. In

addition to these discrete examples, the sample space could also be uncountably

infinite. For a Brownian particle, moving in d dimensions, The sample space is the

space of all possible trajectories. As a particle’s trajectory must be a continuous real

valued function, Ω is then then the set of all continuous functions of time [58]

Ω =ω(t) : R+ → Rd, ω continuous

. (3.1)

The next element of the probability model, F , is a σ-algebra over the sample

space. This represents any “sensible” question we can ask about the various out-

comes. Each object in the algebra represents such a question and is called an event.

In this formalism, probabilities are computed not from the individual outcomes in

Ω, but instead from the events in F . The reason for this distinction is to exclude

pathological cases that arise when working with uncountable sets and is the same

reason measure theory was developed. When the sample space Ω is uncountably

infinite, one can find highly pathological sets that can be used to obtain paradoxical

results. For instance, by choosing just 5 disjoint subsets from the unit ball, one can

construct, simply through translations and rotations, two independent and identical

copies of that ball [59]. It would be problematic for a probability model to consider

these kinds of sets, as one could then double the probability for picking a point in

the unit ball simply by doubling the ball. Identifying the elements of F with sensible

questions means that we are excluding these kinds of pathologies.

In the discrete case the sensible questions are things like, “Did the die land

with an even number?”, “Did it land showing the number 6?”, or even “Did it

land showing any number 1 though 6?”. Mathematically, these questions represent

sets of the underlying outcomes. These correspond to the sets 2, 4, 6, 6, and

1, 2, 3, 4, 5, 6 = Ω respectively. The σ-algebra F is the set of these sets, representing

any possible question -event- we can ask about the system. For a finite and discrete

number of outcomes, F is usually the power set, in that it is the set of all possible

sets one can make out of Ω. Operationally speaking, a σ-algebra has the following

definition [53]. A σ-algebra F is a collection of sets of Ω satisfying the following

three properties1,

1. If a countable number of sets Ann∈N ∈ F then ∪nAn ∈ F .

2. If A is a set in F than its complement, Ac, is also in F .

3. F must contain the space Ω, and therefore by the second property, its comple-

ment the empty set ∅.

For the case of Brownian motion, F is the σ-algebra of all “cylinder sets”, which

are defined in the following way [58]. In real valued random variables probabilities

1The “σ” in σ-algebra is to mean “countable” [58].

are given in terms of intervals. The probability that a random variable x, with

probability density p(x), has a value in the interval [a, b] is given by the integral∫ badx p(x). Here the event is the interval [a, b] and is an element of the Borel σ-

algebra, B. This is essentially the set of all intervals, open and closed over, the real

However, at any given time, a d-dimensional Brownian motion will take on values

in Rd. In order to ask if a trajectory landed in some interval, I, we must also specify

an associated time, t, for that measurement. A basic cylinder set is then specified

by both a time and an interval. The actual set, C(t; I), is the set of all Brownian

trajectories that are in I at time t,

C(t; I) = ω ∈ Ω : ω(t) ∈ I . (3.2)

A trivial example is the set C(t,Rd) = Ω, i.e. all continuous trajectories will have a

value in Rd at any time t. A nontrivial example in one dimension is to ask for the

set of all trajectories that are between a = -10 µm and b = 5 µm at time t = 5 ms.

In addition, questions that involve multiple times are also sensible. It is per-

fectly reasonable to ask, “what one-dimensional trajectories are in I1 = (a1, b1)

at time t1 and in I2 = [a2, b2) at time t2 > t1?” This is also a cylinder set,

C(t1, t2; I1, I2). An image that might be helpful is to imagine that the cylinder set

C(t1, t2, . . . tn; I1, I2, . . . In) defines the set of trajectories that successfully navigates

the “slalom” defined by these intervals and these times.

The σ-algebra we will use for analyzing Brownian motion is the σ-algebra gener-

ated by the cylinder sets defined by all countable sequences of times and all open sets

of Rd at those times [58]. Note that the issues of discussing a uncountably infinite

number of random variables is avoided by defining the cylinder sets for a countable

number of times. In fact asking questions about an uncountable number of events

is ultimately identified as “unreasonable” as it allows for the introduction of patho-

logical possibilities. Here we are only interested in describing the continuous sample

paths of a Brownian particle and means that we can safely consider a countable

number of events, e.g. times defined by a sequence of rational numbers.

The final element of a probability space is the probability measure, P. It defines

the probability for observing the events in F . Mathematically P is a function that

takes sets of Ω, (elements of F) and maps them to real numbers between zero and

one, P : F → [0, 1]. In order for a valid measure to be a probability measure we

must have:

1. The probability of something happening be one, P(Ω) = 1, and the probability

of nothing happening be zero, P(∅) = 0.

2. The probability of the union of a countable number of disjoint events in F

must be additive,

P(∪nAn) =∑n

P(An) if An∩Am = ∅ for An, Am ∈ F and n 6= m. (3.3)

The requirement that a probability measure be countably additive is simply a state-

ment that if A is independent of B then the probability to observe A or B is the

sum of the two probabilities.

Shortly we will discuss what the probability of observing a given cylindrical set,

if the trajectories in those sets represent unforced Brownian motion.

3.1.1 Stochastic processes and random variables

From a well constructed probability space we now need to see how random variables

fit into the measure theoretic context. Much more can be said on this topic than we

can include here, so an interested reader is encouraged to consult [22, 53, 58, 60].

Chap. 5 requires a reasonable understanding of the statistical properties of a one-

dimensional Brownian motion, the Wiener process, and so we will focus on that

example here.

Abstractly, a random variable f is a function that maps elements of Ω to another

space, usually the real numbers. Placing a $50 bet that a coin toss will land heads

is an example of a random variable. Another example of a random variable the

indicator function χA(ω) for any event A ∈ F . Chap. 2 already found many uses for

an indicator function, which in a probabilistic context, is a random variable defined

χA(ω) =

1 if ω ∈ A

0 if ω /∈ A. (3.4)

Such a random variable is deceptively simple but is also extremely useful. One of

its primary uses is that they relate set operations in F to algebraic operations on

random variables. It is easy to show that the random variable x(ω) =χA(ω)+ χB(ω)

is equal to 1 whenever ω ∈ A ∪ B. Also the random variable y(ω) =χA(ω) χB(ω) is

equal to 1 only when ω ∈ A ∩B, meaning

χA∪B(ω) =χA(ω)+ χB(ω) (3.5)

χA∩B(ω) =χA(ω) χB(ω). (3.6)

For the case of 1D diffusion, one of the most important random variable is pa-

rameterized by time and simply returns the value of trajectory at that time. For all

times t ≥ 0, we define the function xt : Ω→ R such that

xt(ω) = ω(t). (3.7)

This definition might seem a bit pedantic, but note that the trivial random variable

y(ω) = ω is not real-valued, ω describes the entire trajectory not just at any one

specific time. From xt(ω) a whole host of other random variables can be defined

through functional composition. A man could place a $50 bet on whether or not a

diffusive particle will be greater than +5 µm from its starting point by time t = 1

ms. That bet is the composition b(x1ms(ω)) where b maps real number to ±50.

The random variable xt in Eq. (3.7) only gives a snapshot of a trajectory at that

time. In order to describe the trajectory dynamically in time as a random variable,

there is the notion of a stochastic process. Most generally, a stochastic process is

a family of random variables xtt∈I indexed by some parameter t, almost always

representing time. Typically we will take time to start at 0 and either let it continue

off towards infinity or, when convenient, stop at some finite time. When discussing

the concept of the process we will use the notation xtt≥0 and xt is the random

variable given at that time. Before discussing a couple important types of processes,

we should know how to compute the probability for a random variable to evaluate

to a range of values.

The previous section showed that a probability measure P acts on elements of F

and returns probabilities. To compute the probability of a $50 dollar bet b to win,

we need to identify the set of events that the function b : Ω → ±50 evaluates to

50. Because b is a function acting on Ω, we can also consider its inverse map b−1.

If a given random variable can take on a continuum of values, we can still run into

pesky problems of having uncountable numbers of things. The solution to this is

to again only consider sensible sets of outcomes for any given random variable. For

the random variable xt from Eq. (3.7) we will have to take its inverse map x−1t to

act only on elements of the Borel σ-algebra, B. When we ask for probabilities of

observing certain values of a random variable x, we must ask for the probability of

observing sets or intervals in the range of x.

For every “reasonable” interval that x maps to, there must be a corresponding

element A ∈ F in order for us to be able to calculate the probability of that under-

lying event. Such a random variable is called measurable. If a random variable x is

not measurable, then there is little we can say about it when its outcomes lead to

unreasonable questions. In other words, a nonmeasurable random variable has an

inverse that generates sets not in F . Faced with this possibility we can either ignore

such questions and pray they never occur or redefine the probability space in order

to make these sets measurable. A nontrivial example of this problem is suppose

we had a random variable yt that returned the value 1 whenever the sample path

exhibited a discontinuous jump in the time interval [0, t) and zero otherwise. The

question, “what is the probability of yt returning 1?”, corresponds to P(y−1t (1)

our probability space is constructed only of continuous functions, then we technically

can’t answer this question as the pre-image y−1t (1) ask for the set of functions that

have a discontinuity for times 0 ≤ s < t, which is not an element of F .

By defining random variables as measurable functions, we can easily relate the

statistics of multiple random variables to each other though their inverse maps.

Consider the stochastic process xtt≥0 defined by Eq. (3.7). Then xt1 and xt2 are

two random variables taking on values in the real number line. Suppose we wish to

calculate the probability of observing xt1 in the interval (a, b) and xt2 in the interval

(c, d]. Individually, we have

x−1t1

((a, b)

)= C(t1; (a, b) ) ∈ F (3.8)

x−1t2

((c, d]

)= C(t2; (c, d] ) ∈ F (3.9)

The joint probability of these two events is simply the probability of the intersection

of these two sets,

(C(t1; (a, b) ) ∩ C(t2; (c, d] )

(C(t1, t2; (a, b), (c, d]

) )(3.10)

3.1.2 Expectation values, the conditional expectation, and

measurability

The most fundamental operation one performs with random variables is computing

their expectation values. If the random variable z(ω) takes on a finite number of val-

ues,z(i) : i = 1, . . . , n

, then calculating the expectation value for z is no different

than in the nonmeasure theoretic context

E(z) ≡n∑i=1

z(i)P(z = z(i)

). (3.11)

The expectation value of z is the average of all its outcomes, weighted by how likely

they are to occur. Note that writing P(z = z(i)

)is shorthand for finding the event,

Ai ≡ z−1(z(i))

P(z = z(i)) ≡ P(Ai) = P(

ω ∈ Ω : z(ω) = z(i) ). (3.12)

In addition to these simple random variables, we need to formulate expectation

values for random variables that can take a continuum of values. This is done

by defining a measure theoretic version of a standard Riemann integral, called the

Lebesgue integral. One path for this construction is to make an approximation for

xt that takes on a finite number of values. The expectation value of such a discrete

approximation is easily computed though Eq. (3.11). Then by taking a suitable

limit where the number of values become continuous we can calculate the proper

expectation value. At this point this procedure is a bit vague, but as we have not

specified a measure for Brownian motion, it is difficult to be more specific. When

discussing the Wiener process, we will be able to be more clear.

Probability theory gets a lot more lively when instead of considering simple expec-

tation values we consider conditional quantities. When working with simple random

variables finding a conditional expectation values is no harder in a probability space

than in a standard context. No matter how sophisticated the framework, Bayes’ rule

still applies, in that for two events A1 and A2 we have

P (A1 ∩ A2) = P (A1|A2)P(A2). (3.13)

Whenever P(A2) 6= 0 we can invert to find the conditional probability of A1 given

P (A1|A2) =P (A1 ∩ A2)

P(A2). (3.14)

We emphasize that we are calculating the probability of an event A1 occurring,

conditional on the event A2. While A2 could correspond to the pre-image of a

single value of a simple random variable, it could also correspond to another random

variable taking on a range of values or even the intersection between the outcomes

of two random variables. All of these different possibilities correspond to the same

underlying event and thus carry the same information.

Turning this conditional probability into a conditional expectation value is simply

a matter of weighting the outcomes of one random variable by the conditional prob-

abilities. We will illustrate this in a quick example. Consider the two simple random

z(ω) and y(ω) with valuesz(1), . . . , z(n)

y(1), . . . , y(m)

. For each variable

and each outcome find the corresponding events, Ai = z−1(z(i)) and Bj = y−1(y(j)).

Then the conditional expectation value of z given that y = y(j) is

E(z∣∣y = y(j)

n∑i=1

z(i)P(z = z(i)|y = y(j)

n∑i=1

z(i) P (Ai ∩Bj)

P(Bj). (3.15)

If one is having to compute this conditional expectation value by hand, hopefully the

events Ai and Bj are relatively simple and that computing the probability of their

union is relatively straight forward. But even if this is not the case, by adding the

structure that events are sets and probabilities are measures on events, computing

conditional quantities does not require defining a new probability structure like a

joint density function for each random variable we want to consider.

Sometimes it is convenient to write the conditional expectation not directly in

terms of the events Ai and Bi, but instead in terms of a regular expectation value and

an indicator function. This method will be useful when both z and y are not simple

random variables, also when we want to abstract away y and instead think about

conditioning on just some abstract event B. A nice property of indicator functions

is that because they only the value 1 on a single event, we can write

E(χAi) = P (Ai) (3.16)

and in particular,

P (Ai ∩Bj) = E(χAi∩Bj) = E(χAi

χBj). (3.17)

As the expectation value is a linear operation we have that

E (z|Bj) =n∑i=1

z(i) E(χAiχBj

E(χBj)

=E( (∑

i z(i) χAi

)E(χBj

)=E(z χBj

)E(χBj

). (3.18)

The last equality here is particulary useful as it holds even when z isn’t a simple ran-

dom variable. When continuous-time and continuous-valued random variables are

involved, it should not be surprising that explicitly computing conditional quanti-

ties by finding the underlying sets and computing their union is often impracticable.

While working with the indicator function χBjmakes some substantial simplifica-

tions, often an explicit computation is still impractical. Instead, taking an indirect

root is often fruitful and one of the prime tools for doing so is what is called the

conditional expectation.

Moving from a conditional expectation value, to a conditional expectation, is

only one step more complicated. If one is prepared for computing the conditional

expectation value for every outcome of y, i.e. having computed all of the sets Bj and

know which have zero probability, you can write down a random variable that in some

sense computes all of the conditional expectation values at once. The conditional

expectation of z on y written as E(z|y), is a random variable that takes on the

value E(z∣∣y = y(j)

)whenever y = y(j). This works through the following. For every

outcome y(j) we have an underlying event Bj. Whenever P(Bj) 6= 0 we can define

the random variable,

E (z|y) ≡∑j

E(z∣∣y = y(j)

(ω). (3.19)

Whenever P(Bj) = 0 we can give the conditional expectation value any value we

wish, safe in the knowledge that the probability for obtaining such an arbitrary values

is zero. A lot can be said for this object, but one of the most important is that it can

be viewed as a reasonable estimate for z given information about y. More specifically

suppose you wanted to find a least-mean-squared estimate for what value z would

return when the outcome you receive, ω, results in y(ω) = y(i). Generally speaking,

there are many different ω for any value of y(i) also the set of ω that gives this value,

may return multiple values for z. It turns out that if you want to make any estimate

for any random variable, z, conditioned on the events given by another, y, then the

estimate for z must have the form

z(ω) =m∑j=1

aj χBj(ω) (3.20)

where Bj are the events generated by the various outcomes for y and aj are constants.

It should not be too surprising that by finding the constants that correspond to the

least-mean-square estimate for z given y turn out to be the conditional expectation

values aj = E(z∣∣y = y(j)

Any random variable that can be written as Eq. (3.20) is called a y-measurable

random variable. Note that y itself and almost any function of y can written like

this. The sets Bj can be used to construct a σ-algebra by taking all (countable)

unions and complements, then the random variable z is measurable with respect

to that σ-algebra. The σ-algebra formed by these sets, written as Y or sometimes

σ y, is called the sigma algebra generated by y. It is not difficult to show that if

you have multiple random variables x, y, etc. you can form a σ-algebra, σ x, y, . . .,

generated by all of those as well simply by taking unions and complements of all of

their various events.

All of the above concepts, joint probabilities, the conditional expectation, gener-

ating σ-algebras from random variables, etc., can be extended to continuous random

variables by taking appropriate limits. More often than not, one tries to avoid taking

an actual limit, and instead looks for a random variable that is y-measurable and

satisfies the basic properties of the condition expectation. For general reference, here

are some of those properties:

1. The conditional expectation is linear,

E (αx+ β z|y) = αE (x|y) + βE (z|y) (3.21)

for constants α and β.

2. The conditional expectation is consistent with usual expectation values,

E(E (x|y)

)= E(x ). (3.22)

3. The conditional expectation of any y-measurable random variable, x, is itself

E (x |y) = x. (3.23)

4. If z and y are independent than the conditional expectation is just the expec-

tation value for z (times the “identity”)

E (z|y) = E(z) χΩ(ω). (3.24)

5. For any y-measurable random variable y′, the conditional expectation value

satisfies the property that

E(E (z|y) y′

)= E( z y′ ). (3.25)

while this last property could be inferred from the second and third, we include it

here because it is often what is used to “guess” what the conditional expectation is,

without going though a bare bones construction.

Much, much more can be said about the conditional expectation. The deriva-

tion of classical and quantum filtering theory is based upon computing a conditional

expectation of some unobserved processed, xt based upon measurements of a cor-

related process yt. Unfortunately, we will be unable to go into the detail here

but interested reader is encouraged to seek out a number of good references on the

subject, some of which are [22, 25, 37, 53].

3.1.3 Special processes - time-adaption and martingales

Having discussed random variables and stochastic processes in terms of a classical

probability space (Ω,F ,P), we now need to introduce a couple of important and

useful processes. Chap. 2 already introduced the concept of a time-adapted process.

An stochastic process xtt≥0 is time-adapted when it depends only on events defined

in the present or past and not on the future. Having introduced the concept of a

measurable function and a σ-algebra over the cylindrical sets C(t1, t2, . . . ; I1, I2, . . . )

we can easily give a precise meaning to time-adapted process. A stochastic processes

xtt≥0 is time-adapted when each random variable xt is measurable with respect

to the σ-algebra, Ft, generated from the cylindrical sets C(t1, . . . , tn ≤ t; I1, . . . , In).

(Sometimes a more general definition avoids using these specific cylinders and simply

uses an indexed sequence of σ-algebras F0 ⊂ Fs ⊂ Ft ⊂ F called a filtration [53]. A

process adapted to this filtration is called Ft-adapted.) In the context of statistical

estimation, working with time-adapted processes is essential, as these are the pro-

cesses that are independent of any future events. Within the bounds of time-adapted

processes there are an additional type of stochastic processes that have special and

simplifying characteristics in terms of their conditional statistics, called martingales.

A martingale is an important kind of stochastic process which plays a crucial role

in classical probability theory. More importantly for our purposes, they will play a

crucial role in significantly simplifying a quantum conditional master equation as we

will see in Sec. 3.4.1. They are used to represent fair betting games where no amount

of past information is helpful in predicting future events. The defining property is

that the conditional expectation of any future value of the process is simply given by

its current value. You expect to leave a fair casino with the same amount of money

as you had when you entered2. In essence a martingale is a random process where

the conditional mean of any future increment is zero, [22, 53, 60]. To illustrate this

property, consider taking a fair coin and flip it N times. A typical sequence ω may

2In contrast to a real casino.

be something like,

ω = H, T, H, H, H, H, T, T, H, . . . . (3.26)

For any sequence ω we can create a random variable xn which is equal to the number

of heads minus the number of tails seen in the first n flips. So that for this sequence

x = 1, 0, 1, 2, 3, 4, 3, 2, 3, . . . . (3.27)

In the case of a fair coin, xn is a martingale.

To see why, note that in each flip there is equal probability of the coin landing

heads or landing tails. So that for any n we have the expectation value,

E(xn) = 0. (3.28)

In this specific realization, after the first four flips x4 is not 0, but is in fact 2.

However, because any future flip are independent of the past we should not expect

to see any more heads than tails. This means that conditioned upon the first four

outcomes we should not expect for xn≥4 to be 0, but instead it should average around

2. In other words

E(xm − xn| x1, x2 . . . xn) = 0 for all m ≥ n. (3.29)

This is the fundamental property of a martingale which is usually written as,

E(xm| x1, x2 . . . xn) = xn for all m ≥ n. (3.30)

Now imagine that that the coin was not in fact fair. Say that the probability

for heads was PH = 2/5 and the probability for tails was PT = 3/5. Then in this

case xn is not a martingale, as we would instead expect xn to trend negative. Or

in other words E(xm| x1, x2 . . . xn) < xn for m > n. But while xn may not be

a martingale, it is sometimes possible to construct from x another random variable

that is a martingale. This kind of process is called a semi-martingale and defines the

class of processes that are capable of being used in an Ito integral. The fact that

this is possible will play a crucial step in finding a maximum likelihood estimate in

Chap. 5.

3.1.4 The Wiener process

One of the most important processes one can consider is the Wiener process, a

mathematical model for Brownian motion named for Norbert Wiener. In addition to

elegantly describing diffusive motion, the Wiener process is used to model nearly any

systems interacting with white noise. Chap. 2 already found an application outside

of diffusive motion, that is the statistics of the quadrature Qt and Pt under vacuum

expectation. This section reviews of the properties of the Wiener process, including

how it relates to a classical probability space introduced in Sec. 3.1.

The defining characteristics of a Wiener process are two fold:

1. A Wiener process makes a continuous trajectory in time, with probability one.

2. A Wiener process has increments that are independent, mean zero, Gaussian

distributed random variables with a variance given by the increment’s time

duration.

The first property is obvious while the second is a bit more involved and requires

some explanation. Consider the stochastic process wtt≥0. For times 0 < s < t,

we can define the random variables, a = ws − w0 and b = wt − ws. If wtt≥0

is a Wiener process then a and b are statistically independent and a is a mean

zero, Gaussian random variable with variance s and b is also a mean zero Gaussian,

with variance t − s. Sec. 3.1.1 found that if a random process has statistically

independent increments and each increment is mean zero then its a martingale, and

so the Wiener process is also a martingale. There are many more interesting and

sometimes nonintuitive properties that one can calculate for a Wiener process, see

[58, Chap. 2]. Some interesting facts are (i) a Wiener process is nondifferentiable

with probability one and (ii) if at some time t a Wiener process takes on the value

w(t) = a then it will take on that value and infinite number of times in every interval

[t, t+ ∆t].

The statistical properties of a Wiener process are deceptively simple, and yet

exceedingly rich. The second defining property allows us to find a connection be-

tween this simple statement and a nontrivial probability measure over the space of

continuous functions. While we just used the times 0 < s < t to demonstrate what

this property means there is nothing stopping us from using a countable sequence

of times 0 < t1 < t2 < . . . . We then know that for a Wiener process the random

variables ∆wi ≡ wti−wti−1are all independent mean zero Gaussian random variables

with variances ∆ti ≡ ti− ti−1. For each time ti we can calculate the probability that

the Wiener process lies in the interval Ii = (ai, bi). This ultimately turns out to be

P (wti ∈ Ii) =

∫ b1

∫ b2

dw2 · · ·∏i

2π∆tie− (wi−wi−1)2

2∆ti

), (3.31)

which is known as Wiener’s discrete path integral [58]. Notice, that by picking the

sequence of times 0 < t1 < t2 < . . . and the intervals Ii = (ai, bi), we just defined

a cylindrical set, C(t1, t2, . . . ; I1, I2, . . . ). Eq. 3.31 is a probability for observing a

continuous trajectory to lie within this set, and therefore we can use these integrals

to define a probability measure on the space of continuous functions ω : R+ → R.

Not surprisingly, this is called the Wiener measure

(C(t1, t2, . . . ; I1, I2, . . . )

)= P (wti ∈ Ii). (3.32)

It is worth noting that under this measure, all trajectories which do not have ω(0) = 0

are given zero probability, i.e.

(ω ∈ Ω : ω(0) = 0

)= 1. (3.33)

This brings us back to the probability space (Ω,F ,P) defined over the continuous

functions ω(t) with the σ-algebra F over the cylindrical sets. If the probability

measure P is the Wiener measure as defined above than the stochastic process,

wt(ω) = ω(t) : 0 ≤ t <∞ is Wiener processes.

3.2 Quantum Probability Theory

These same concepts can also be applied to quantum theory either directly or with

some modification. The mathematics of quantum stochastic calculus and noncom-

mutative probability theory is a broad and detailed subject, one that is beyond

our scope. Reasonable introductions with an emphasis on filtering can be found in

[25, 61] and with more detailed treatments in [23, 45, 52]. However a certain amount

of review is necessary in order to address the physical implications of the formalism.

Before discussing the truly quantum nature of noncommutative probability theory,

we will discuss its similarities with the classical theory.

3.2.1 Embedding the quantum into the classical

This section reviews how to constructs a classical probability space from a set of

mutually commuting quantum observables. The purpose for this review is two fold.

First, the quantum filtering problem relies upon this kind of mapping. The contin-

uous measurements we will be making is described by a set of mutually commuting

operators which is increasing in time. The eigenvalues that we receive will be viewed

as a (read mapped to) classical random variables on a classical probability space.

The second reason for this review is to emphasizes its limitations. In Chap. 5 the

quantum filter is used to estimate an unknown initial state of a qubit. A natural tool

in classical systems is the smoother which is an estimate for an unobserved system

at some past time, given measurements up to some current time. However, naıvely

applying this classical technique violates a necessary condition that allows for the

classical mapping.

In classical probability theory we found that random variables could be viewed

as functions mapping elements of the sample space to real numbers. At its most

practical level, quantum theory is used to predict the outcomes of experiments where

the measured observables are represented as Hermitian operators acting upon some

underlying Hilbert space. The first step in bringing classical probability theory to

the quantum is to formulate an analogy between classical random variables and

Hermitian operators.

classical ↔ quantum

x(ω) ↔ X

This is a natural analogy, as the basic operation in classical probability is to calculate

the expectation values of random variables.

The next association is that in the classical theory we have the probability mea-

sure P to calculate expectation values while in the quantum we have the system

state ρ. This analogy is best illustrated in a discrete example where the classical

sample space is the set of a finite number of d realizations, Ω = ω1, . . . ωd. The

σ-algebra F for this space is then the power set of Ω and the probability measure

P is completely described by the probabilities of the singleton events pi = P(ωi).

The classical expectation value of a simple random variable x(ω) in this space is then

E(x) =d∑i=1

x(ωi)P(ωi). (3.34)

In the quantum case a system described by a Hilbert space H of dimension d is

equipped with a positive trace one density matrix ρ. Expectation values of operators

X acting on H are of course calculated as

E(X) = Tr(ρX) (3.35)

which will sometimes also be notated as 〈X〉. Thus we have the correspondence

classical ↔ quantum

E(x) ↔ Tr(ρX).

Rather than this loose analogy, a formal equivalence is possible where certain aspects

of quantum theory can be embedded into a classical probability space.

While the classical probability space has a fixed, albeit sometimes abstract set of

realizations Ω, identifying such a set in quantum mechanics is problematic. In the

spirit of deterministic classical physics, the sample space Ω most often represents

locally realistic fates of the system. The probability of observing certain events

is given by the probability measure P which act on subsets of Ω. The utility of

probability theory is that we have an event A = ω ∈ Ω | a ≤ x(ω) ≤ b which has

a concrete meaning for multiple random variables, not just x but f(x). In addition

there could certainly be another random variable y such that when it take on the

values c ≤ y ≤ d whenever x is observed in the interval [a, b], and so both correspond

to the same underlying event A. It is clear from Bell’s theorem that this locally

realistic interpretation of Ω is not consistent with quantum mechanics.

A less ambitious task is to find a classical probability representation that is capa-

ble of describing the joint statistics of compatible observations. Compatibility of two

observables X and Y means that [X, Y ] = 0 and more importantly they share a set

of eigenvectors |ei〉. For each eigenfunction we have the projector Pi = | ei 〉〈 ei |

and the operators X and Y have the spectral decompositions

X =∑i

xi | ei 〉〈 ei | and Y =∑i

yi | ei 〉〈 ei |. (3.36)

Note that the eigenvalues xi and yi need not be distinct as they could have degenerate

subspaces.

In a d-dimensional system there are at most d distinct, mutually orthogonal

projectors associated with a set of commutating operators C = X, Y, Z, . . .. If we

associated projectors Pλ = |λ 〉〈λ | so that

X =d∑

xλ Pλ (3.37)

E(X) =d∑

xλ Tr( ρ Pλ). (3.38)

The mapping between discrete quantum mechanics and classical probability is to as-

sociate the set of labels for projectors with the sample space in a classical probability

space. Then we have Ω = λi : i = 1 . . . d and F is simply the power set of Ω.

From this assignment, the probability measure is simply the quantum expectation

value of the associated projectors. For example, probability for the event λ1, λ2 is

P(λ1, λ2) = Tr(ρ Pλ1, λ2) = Tr(ρ |λ1 〉〈λ1 |) + Tr(ρ |λ2 〉〈λ2 |). (3.39)

In classical probability the simplest of simple random variables are the indicator

functions χF (for every set F ∈ F) which correspond to the projectors in the X 7→

x(ω) mapping. This procedure is formalized in Theorem 2.4 of [25] and is summarized

in Table 3.1.

Classical QuantumΩ λ1, λ2, . . .F

λ1 , λ1, λ2 , . . .

P(λi) Tr(ρ Pλi )x(ω) X

Table 3.1: The spectral mapping between a set of commuting observables anda classical probability space.

The above discussion extends also to the case of infinite dimensional Hilbert

spaces and operators with continuous spectra [25]. For any “normal” operator3 A,

which may take on all values in R, there exist a spectral decomposition for A such

3A normal operators is one that commutes with its adjoint and so has a spectral decom-position. It can be written in terms of commuting Hermitian and anti-Hermitian parts.

∫RaP(da) (3.40)

where P(da) is the spectral measure, also called a projection valued measure, associ-

ated with A taking on values in interval da. Often explicitly constructing P(da) is

a little tricky, especially if one must first identify any vectors ψ in Hilbert space for

which φ ≡ Aψ is not well behaved. ( Hermitian operators whose eigenvalues span

the entire real line, so called unbounded operators, exhibit these kinds of problems.

A trick for dealing with this case is to compute the spectral measure for the bounded

operator T = (A + i1)−1. This works because any function f(A) commutes with

A, therefore they share the same projectors. When T takes on the complex value

λ there is then the corresponding value for A, a = λ−1 − i.) Once armed with a

spectral measure for A we can then find an equivalent classical probability model,

whose sample space are labels for the possible values of A.

Regardless of whether or not the associate operators are unbounded, we empha-

size this spectral mapping is only applicable to a subspace of operators which all

commute with the underlying projectors. While this may seem to indicate that the

mapping is severely limited, in practice it is extremely useful for describing ancilla

assisted measurements. If one is interested in computing conditional expectation

values for operators that commute with the projectors defining the classical space,

then the quantum-to-classical mapping is still applicable.

3.2.2 Quantum probability

The spectral mapping to a classical probability space lacked a representation that is

independent of the specific choice of projectors. Furthermore, Bell’s theorem shows

that there are no locally realistic sample spaces consistent with quantum mechanics.

The first step in discussing quantum mechanics in a probability theoretic framework

is to omit the sample space Ω [62]. At the level of making practical calculations, the

sample space provided an underlying structure for associating random variables with

the probability measure. By observing a random outcome, x(ω) = a we were able to

see what event this corresponds to and then calculate the probability for that event

given the measure P. In other words, we identify the set of possible realizations that

are compatible with this observation then evaluate the probability for this event.

In quantum theory the underlying Hilbert space of the system, H, provides this

necessary structure. By making the association between Hermitian operators and

the results of experiments we already have the necessary mapping between random

variables and probabilities. In the above spectral mapping between quantum to clas-

sical, we associated events with collections of possible eigenvalues and so even in the

infinite dimensional case, the probability of observing an event is given by the expec-

tation value of the corresponding union of projectors. In the fully quantum case, we

need to consider all possible projections, not just those projections that commute.

The mathematical object that is guaranteed to contain all possible projections is

a ∗−algebra (read “star”-algebra) of operators4. Therefore the correspondence be-

tween the σ−algebra F in classical probability space is a ∗−algebra A of operators

A ∗−algebra of operators acting on a Hilbert space, H is defined as the set of

operators A so that

1. A contains all complex linear combinations of its elements. For all A, B ∈ A

we also have C = c1A+ c2B ∈ A for any complex coefficients c1 and c2.

2. A contains all adjoints of its elements. A ∈ A implies that A† ∈ A.

3. A contains all products of its elements. A,B ∈ A implies AB ∈ A.

4The ∗ in the name comes from the mathematical convention of using ∗ to represent anoperator adjoint rather than †.

4. A contains the identity 1.

In the finite dimensional case where H = Cn the largest ∗−algebra acting on H

is simply Mn, the space of all complex n × n matrices. However the reason for

introducing this algebraic structure is not just for a love of mathematical formalism.

In the same way that a set of classical random variables generate a σ−algebra,

σx, y, z, . . . (see Sec. 3.1.2) a set of operators generate a ∗−algebra. For example,

the above spectral mapping means that there is a ∗−algebera of operators generated

by the set commuting of projectors Pλ. The fact that a ∗−algebra generated from

a set of commuting operators still commute means that a commutative ∗−algebra is

the set of operators that define the events in a classical probability space.

When H is infinite dimensional, defining a suitable ∗−algebra (or sub-∗−algebra)

becomes much more tricky. This is particularly true when trying to show that any

limiting sequence of operators in the ∗−algebra is still in the algebra. As one might

imagine, taking limits of unbounded operators becomes problematic as a sequence of

operators might converge when acting on one set of vectors, but diverge when acting

on another. The details of how one solves these issues is beyond our scope. Suffice

it to say, the solution is to first consider only bounded operators (while keeping the

T = (A + i1)−1 trick in mind) and then include the limits of all the sequences of

operators in the algebra that converge on a class of well defined states. The technical

name for such an algebra is a von Neumann algebra [25]. Generally though, one

hardly ever needs to apply this kind of construction directly.

One additional concept that is useful, particularly for discussing a quantum con-

ditional expectation, is that of a commutant. Suppose you are given a set of operators

S and you want to know what are the set of operators that commute with S. That

set is call the commutant and is notated as S ′. To see why this idea is import consider

the following. Assuming that you are able to measure the operators Y1, Y2, . . . , Yn

and that all of these operators commute with each other. We just showed how to

form a commutative von Neumann algebra from this set, but one might wonder if

that’s the whole story in a quantum to classical mapping. The answer turns out

to be no. The wiggle room is from the fact that the number of distinct eigenvec-

tors made from the operators Y1, Y2, . . . , Yn may not be equal to the dimension of

the underlying Hilbert space. In that case, you can find operators A and B where

[A,B] 6= 0 and still have [A, Yi] = [B, Yi] = 0.

One example is in a two qubit system. Suppose you only measure σz on one qubit

but leave the other one alone. Then the projectors |+1 〉〈+1 |⊗1 and | −1 〉〈−1 |⊗1

form the singleton events in the the classical probability model. But clearly any op-

erator on the second qubit commutes with these projectors and so there is more

in this system than is wholly representable in a classical probability system. But

because the second system does commute with these projectors it is possible to form

a quantum conditional expectation of the system upon the first. The commutant

gives you the set of all possible operators that can be mapped onto a classical prob-

ability space though a quantum conditional expectation. We will briefly discuss this

mapping next.

3.2.3 The quantum conditional expectation

In a classical probability space, if we are given a random variable y, or more generally

a set of random variables ys : 0 ≤ s ≤ t, the distinct outcomes of those variables

form a set of events. From these events we are able to take unions and complements to

make a σ-algebra, Y = σ ys : 0 ≤ s ≤ t, representing the questions we can answer

about the model given the observation of the random variables ys : 0 ≤ s ≤ t.

Then for every event Ai ∈ Y , (assuming P(Ai) 6= 0) we are able to compute the

conditional expectation value of any random variable x, via Eq. (3.18)

E(x|Ai) =E(x χAi

E(χAi). (3.41)

In mapping quantum mechanics on to classical probability theory everything still

applies, as long as the operators Ys : 0 ≤ s ≤ t all mutually commute. If we have

a projection Pλ generated from these operators, we can define a quantum conditional

expectation value,

E(X|λ) ≡ 〈X Pλ〉〈Pλ〉

=Tr(ρX Pλ)

Tr(ρ Pλ). (3.42)

This equation shows how crucial that X and Pλ commute in order for this equation

to make sense as a classical analogy. Not only do we need [X,Pλ] = 0 in order for X

to be block diagonalized via the labels λ, but also in order for this expression to be

interpretable as a conditional expectation value, we needed to have 〈X Pλ〉 = 〈PλX〉

for any λ. Classically, we took a conditional expectation value to a conditional

expectation by multiplying by the “projector” onto the event Ai, see Eq. (3.19). The

same is true in the quantum case, as can be seen in a finite dimensional example.

From a set mutually commuting observables Y1, Y2, . . . Yn, we can form a com-

mutative ∗-algebra Y that is spanned by the orthogonal projectors Pλ. These

projectors form a resolution of the identity so that∑

λ Pλ = 1. For any operator X

in the commutant of Y , the quantum conditional expectation is defined as [25]

E(X|Y ) ≡∑λ

〈X Pλ〉〈Pλ〉

Pλ . (3.43)

The quantum conditional expectation has a number of properties in common with

the classical conditional expectation. Specifically,

E (E (X|Y )) = E (X) (3.44)

E (Y1X Y2|Y ) = Y1E (X|Y ) Y2 ∀ Y1, Y2 ∈ Y (3.45)

E (X|Y ) = E(X)1 ∀ X independent of Y . (3.46)

The quantum conditional expectation has some operator specific properties,

E (1|Y ) = 1 (3.47)

E(X†∣∣Y ) = E (X|Y )† (3.48)

E(X†X

∣∣Y ) ≥ 0. (3.49)

Finally, an extremely important property that also carries over from the classical

conditional expectation is that for all Y ∈ Y and X ∈ Y ′

E (E (X|Y )Y ) = E (XY ) . (3.50)

This property is as important here as it is in the classical case, because it is often

used to identify5 what the conditional expectation should be when it is intractable

to find an explicit representation for Pλ. This is particularly true in the infinite

dimensional case and then Eq. (3.50) is taken as the defining characteristic for the

conditional expectation. In other words, if you can find an operator X ∈ Y that

satisfies this equation, then you have the conditional expectation for X given Y [25].

3.2.4 The conditional expectation and generalized measure-

Before considering a specific example of a continuous-time quantum conditional ex-

pectation, we briefly pause to discuss the connection between the quantum con-

ditional expectation and generalized measurements, as traditionally formulated in

quantum mechanics. We argue here that these two ideas are essentially equivalent

and will specifically show that any measurement given in terms of a countable set

of distinct Kraus operators Mm is equally well represented in terms of a quantum

system mapped to a classical probability model. In particular, the posterior state

ρ|m is a Schrodinger picture version of a quantum conditional expectation value.

A general quantum measurement on a Hilbert space H is specified by a set of

Kraus measurement operators Mm where the indices m label the outcomes of the

measurement [63]. The measurement operators are required to satisfy a completeness

relation∑m

M †mMm = 1. (3.51)

5i.e. guess and check.

The completeness relation means thatM †

is a valid POVM, and in particular

that under the state ρ the expectation values,

Tr(ρM †mMm)

define a probability

measure for a sample space Ω = 1, . . . ,m, . . .. Upon receiving the outcome m, a

mixed state ρ updates to the posterior state ρ|m via the map

ρ|m ≡MmρM

Tr(ρM †mMm)

. (3.52)

Our claim is that there exists a Heisenberg picture formulation where the use of

this posterior state is replaced by a conditional expectation. Proving this is not diffi-

cult, by using the fact that any generalized measurement is equivalent to a projective

measurement performed on an ancillary system after an entangling unitary opera-

tion [63]. The equivalent Heisenberg/quantum probability picture is then to evolve

all of the operators with the entangling unitary and then calculate a conditional

expectation value for a post interaction projector.

The specific relation is that every measurement outcome can be modeled as a

state in an ancillary system with Hilbert space HA where there are basis vectors

|m〉 that correspond to the outcomes of the measurement. Clearly for this to make

sense, the dimension of HA must be as least as big as the number of measurement

outcomes. The entangling unitary operator U then maps a fiducial pure state |0〉 to

the basis vectors |m〉 and when doing so applies the operator Mm to system. Thus,

there always exists a unitary U such that for every system state vector ψ

U |ψ〉|0〉 =∑m′

Mm′|ψ〉|m′〉. (3.53)

Operating the projector 1⊗ |m 〉〈m | on the post unitary state, results in

(1⊗ |m 〉〈m |)U |ψ〉|0〉 =∑m′

Mm′|ψ〉|m〉〈m|m′〉 = Mm|ψ〉|m〉. (3.54)

In other words by applying the projector |m 〉〈m | to the post interaction state, we

have applied the measurement operator Mm to the system and projected the ancilla

into the measurement eigenstate m. For a general system state ρ, the probability for

obtaining the outcome m is then given by

P(m) = Tr( (U ρ⊗ | 0 〉〈 0 |U †

)1⊗ |m 〉〈m |

(ρ⊗ | 0 〉〈 0 |

(U † 1⊗ |m 〉〈m |U

) ). (3.55)

Applying a unitary transformation to an operator does not change its spectrum and

so a unitary evolved projector is still a projector and in this case, one that is no

longer acting solely on the ancilla.

The quantum probability description between the generalized measurement with

operators Mm is to use a Heisenberg picture version of the “purification” of that

measurement. Specifically the commuting set of operators that we are conditioning

on is simply the unitarily evolved projectors

Pm ≡ U † 1⊗ |m 〉〈m |U

It is not difficult to show that the partial trace of the posterior state ρ|m with the

system operator X is equivalent to the conditional expectation value of the operator

U †(X ⊗ 1)U conditioned on the projector Pm, under the joint state ρ⊗ | 0 〉〈 0 |. In

other words we have the equality

trsys(ρ|mX) =Tr(ρ⊗ | 0 〉〈 0 | U †(X ⊗ 1)U Pm

)Tr (ρ⊗ | 0 〉〈 0 | Pm)

= E(U †(X ⊗ 1)U

∣∣Pm

). (3.56)

3.3 Quantum Filtering Theory

Quantum filtering theory has a particularly grandiose title but in actuality it is not

much more than what we have already developed here. Bouten et al., wrote an award

winning introduction to the problem quantum filtering and quantum stochastic cal-

culus [25]. This section does little more than quote their final results. The quantum

filter is in essence nothing more than the conditional expectation for a system ob-

servable X, based upon a light observable, e.g. Qit, after both have interacted though

a unitary Ut. The two light measurements that are typically considered are that of

measuring an output quadrature, e.g. U †tQitUt, or a direct photon number measure-

ment, e.g. U †t Λiit Ut. Here we have focused on classical and quantum diffusion, and

so we will assume that we are measuring the quadrature Qit. In addition, to simplify

the notation, we assume that we are considering a single field mode and will drop

the label i. More general expressions are not difficult to derive once the formalism

is in place; for examples see [64].

The quantum filter for time independent system observable X is written as a

time indexed map πt(X) and is the conditional expectation of the unitarily evolved

operator U †tXUt, conditioned on the (continuous) set of measurements of an output

process Ytt≥0. In the diffusive case

Yt ≡ U †tQtUt = U †t (At + A†t)Ut. (3.57)

When Ut is given by a general single mode, it is the solution to the QSDE given in

Appendix C Eq. (C.27). In the 1D case with no scattering interactions we are able

to calculate that,

Yt = Qt +

U †s (L+ L†)Us ds. (3.58)

The general expressions for the unitary evolution of any system operator X and the

fundamental field processes Ajt , Ai†t and Λij

t are given in Appendix C, Sec. C.1.1.

Sec. 2.5.2 showed that in vacuum expectation, Qt has statistics of a Wiener

process. Because of this one may be tempted to interpret Eq. (3.58) as the time

integral of a system operator plus quantum white noise. We urge the reader to

avoid this temptation because, as Sec. C.1.1 shows, U †t (L+L†)Ut is generally a very

complicated expression involving integrals with respect to dΛt, dAt, and dA†t . Yt is a

fully coherent operator acting on the joint Hilbert space H⊗F (h[0,t]) and does not

generally commute with Qt.

It is, however, not difficult to show that Yt commutes with itself at different

times, i.e. [Yt, Ys] = 0 for any times t and s. Therefore a continuous observation of Y

between the times 0 ≤ s ≤ t makes a set of commuting observables Ys : 0 ≤ s ≤ t.

This set of observations can then be used to form a commutative von Neumann

algebra Yt. The quantum filter πt(X) is then given by the conditional expectation

πt(X) ≡ E(U †tXUt|Yt). (3.59)

Finding an expression for πt(X) requires implementing the conditional expecta-

tion in the form given in Eq. (3.50). Note that in general, the conditional expectation

depends upon the properties of the joint system field state and so you will arrive at

different filtering equations if the system is in vacuum, [25], a coherent state [65], or

a state with nonclassical photon statistics [66]. The quadrature measurement of a

single mode in vacuum expectation is arguably the simplest of all cases, and is what

we will use exclusively here. The bottom line result is that the quantum filter for

any system operator X is given by the recursive QSDE

dπt(X) = πt(L00(X) ) dt

+(πt(L

†X +XL )− πt(L† + L) πt(X))(dYt − πt(L+ L†) dt

(3.60)

with the initial condition π0(X) = E(X). This is very analogous to the classical

Kushner-Stratonovich equation of nonlinear filtering [25]. The operator map

L00(X) = +i[H,X] + L†XL− 12L†LX − 1

2X L†L (3.61)

is the 00 Evens-Hudson map, (see Sec. C.1.1) and is essentially the Heisenberg picture

version of the Lindblad master equation. A serious draw back to the quantum filter is

that because it is recursive, it will very rarely close. In order to propagate Eq. (3.60)

for the operator X, we need to also calculate in parallel the filter for the operators

A = L† +L, B = L†X +XL, and C = L00(X). It’s also highly likely that the space

of possible system operators is not generated by simply these four operators. By

calculating πt(A), we will also need to know the filter for πt(L00(A)), which itself will

likely generate more complicated operators. Fortunately a saving grace is that we

can invert this equation to find an effective “noisy” system operator ρt. The equation

of motion for ρt is the conditional master equation which will discuss in Sec. 3.4.

Before doing so, we would like to highlight one important issue that makes a

strong distinction between quantum and classical filtering. In the classical case the

filter is one of a couple of operations that one is interested in computing condi-

tioned on an observation process yt. Another process that one is interested in is a

smoother, which is defined classically as

πs,t(x) ≡ E(xs| yt′ : 0 ≤ t′ ≤ t) for s ≤ t. (3.62)

Classically this is a perfectly well defined thing to do, as long as xs is measurable

with respect to the σ-algebra defining the global probability space (Ω,F ,P). One

would generally then be tempted to define a quantum mechanical smoother,

πs,t(X) ≡ E(U †sXUs|Yt) for s < t. (3.63)

Unfortunately this object is not well defined for any system operator, X, because

U †sXUs is not in the commutant of Yt. To see why, consider that Yt = Qt+∫ t

0U †r (L+

L†)Ur dr, which certainly has support upon the system Hilbert space via the time

integral of U †r (L+L†)Ur. There is no guarentee that [U †sXUs, U†t (L+L†)Ut] for any

X and times t, s. The reason that U †tXUt is in the commutant of Yt is because we

can show that U †sQsUs = U †tQsUt, for s ≥ t. This property then shows us that

[U †tXUt, Ys] = [U †tXUt, U†tQsUt] = U †t [X, Qs]Ut = 0 (3.64)

for s ≤ t. This means that the post interaction system operator at time t is able

to be conditioned on past measurements. However the same “advancement” trick is

not possible for the system observable, and therefore there is no guarantee that Eq.

(3.63) is well defined. If you simply threw caution to the wind and went through

a smoothing calculation, even though U †sXUs is not in the commutant of Yt, then

it quite possible that by conditioning you could take positive operators to negative

ones or even Hermitian observables into non-Hermitian operators [25].

Tsang proposed a time-symmetric quantum smoother where one calculates a

smoothing operation for a classical signal imprinted on a quantum system [67]. In

this case, the smoother is calculating a conditional estimate for the classical signal

and therefore commutes with both the system and field operators. In Chap. 5

we wish to form an estimate for the system state at the initial time t = 0 given

measurements up to time t. One might be tempted to try and formulate a quantum

smoothing equation E(X|Yt), but as we just showed such an object is not in general

well defined. Therefore we have to resort to different methods.

3.4 The Conditional Master Equation

Sec. 3.3 just showed how one could form a conditional estimate for system oberv-

ables based upon a measurement of an output light quadrature via the Heisenberg

picture formalism of quantum probability. A serious drawback is that the filtering

equations are recursive and hardly ever close. The saving grace of this is to convert

to a randomized Schrodinger picture and work with a Conditional Master Equation

(CME).

We know from Sec. 3.3, that every commutative space of operators is mappable to

a classical probability space. We also know that from the definition of the conditional

expectation, the filter πt(X) = E(U †tXUt|Yt) is an operator in Yt. And so if we

generate a classical probability space (Ω,F ,P) for Yt then the filter πt(X) should

be representable in that space. Furthermore in a given experiment, the eigenvalues

we receive from measuring Y form a realization of a classical stochastic process yt

defined on that probability space.

What this means in practice is that we will now focus our attention to solely

system variables a treat the measurement record yt as a classical stochastic process. It

is in this sense that we call the conditional master equation a semiclassical equation.

Specifically, it treats the output measurements Ytt≥0 as a classical random variable

while the system undergoes a noisy quantum evolution. In our opinion, it cannot

be over emphasized that this process has its origin as a quantum object and so not

every operator will commute with Yt – particularly past system observables.

With that warning to tread lightly, finding a semiclassical equation for a noisy

system state ρt is remarkably easy. Such an equation begins by enforcing that for

every system operator X, we must have6

Tr(ρtX) ∼= πt(X). (3.65)

To find an SDE for ρt, we simply notice two things. In every term of Eq. (3.60),

there is a coefficient πt(Y ) of some operator Y which is in turn relatable to Tr(ρtY ).

The second is that the only quantum stochastic differential in Eq. (3.60) is dYt,

which from Eq. (3.58), satisfies the quantum Ito rule,

dYt dYt = dt. (3.66)

Therefore in the semiclassical mapping dyt also has the Ito rule

dytdyt = dt. (3.67)

With these two observations we have,

Tr(dρtX) = Tr(ρt L00(X) ) dt

Tr(ρt (L†X +XL)

)−Tr

(ρt (L† + L)

)Tr (ρtX)

)(dyt−Tr(ρt(L

†+L)) dt).

(3.68)

We can use the cyclic property of the trace to decompose L00(X) into an adjoint

map acting on ρt,

Tr(ρt L00(X)) = Tr((−i[Ht, ρt] + LρtL

† − 12L†Lρt − 1

†L)X). (3.69)

6Mathematically, this equivalence may seem strange as the left hand side is a scalarvalued random variable while the right hand side is an operator in Yt. The equivalenceis made though the classical outcome ω, that labels the set of eigenvalues we receive fromthe measurement.

By making the same kind of transformation of the remaining terms and noting that

it is true for any system operator X, we arrive at the conditional master equation

dρt = −i[Ht, ρt] dt+D[L](ρt) dt+H[L](ρt) dvt, (3.70)

with the initial condition is ρ0 = ρ(0) and we made the following definitions. D[L](ρt)

is the Lindblad operator map commonly found in open quantum systems and is

defined as

D[L](ρt) ≡ Lρt L† − 1

2L†Lρt − 1

2ρt L

†L. (3.71)

H[L](ρt) is the state update map defined as

H[L](ρt) ≡ Lρt + ρt L† − Tr((L+ L†) ρt) ρt. (3.72)

This map shows how the state updates, weighted by the strength of the stochastic

process,

dvt = dyt − Tr((L+ L†) ρt) dt. (3.73)

The random process vt, called the innovation process, plays an important role as

it is the only random contribution to the CME. In the next section we will review

the proof that when everything about the measurement yt is properly specified, then

dvt is a realization of a Wiener process.

3.4.1 The innovation process

Here we will show that in the innovation process vt transforms yt into a Wiener pro-

cess by subtracting off the conditional expected mean. In classical probability, Levy’s

theorem is an important result because it gives necessary and sufficient conditions

for showing that a given process is in fact a Wiener process. Roughly stated, if a

stochastic process mt is a “local martingale” and obeys the Ito rule that (dmt)2 = dt

then it must be a Wiener process [25]. Martingales are an important kind of stochas-

tic process that play a crucial role in classical probability theory (see Sec. 3.1.1). In

essence it is a random process where the conditional mean of any future increment

is zero [22, 53].

The proof that vt is a Wiener process is given in theorem 7.1 of reference [25]

and relies on some fundamental properties of the conditional expectation. We quote

this result in Lemma 3.1, for two reasons. The first is simply because it is easily

shown and is a rather elegant result. The second is that Chap. 5 uses the fact

that vt is Wiener process only when ρt is “consistent” with the actual statistics of

ytt≥0. Here consistency means that the correspondence Tr(ρtX) ∼= πt(X) holds in

the sense that πt(X) is a conditional expectation of X with respect to Yt, under the

true quantum state. If ρt does not exactly match πt(·) because its initial condition

is wrong or any number of other approximations, then vt will not generally have the

statistics of a Wiener process. See Secs. 5.4 and 5.5 for further discussion.

Lemma 3.1. In vacuum expectation, the quantum stochastic process Mt ≡ Yt −∫ t0πs(L+L†) ds is an instance of a quantum Wiener process in that its finite dimen-

sional statistics are independent mean zero Gaussian random variables with variances

equal to the time differences.

Proof. In Sec. 3.1.1, we review that the classical definition of a martingale is that

it satisfies the property E(mt|Fs) = ms for s ≤ t. In the quantum case this is

equivalent to showing that E( (Mt−Ms) |Ys) = 0. The reason for this is because the

conditional expectation obeys the property that for every K ∈ Ys, E(K |Ys) = K.

By the definition of the conditional expectation, we have that for every K ∈ Ys

E (E(Mt −Ms|Ys)K ) = E ((Mt −Ms)K ) . (3.74)

Substituting the definition of Mt,

E ((Mt −Ms)K ) = E ((Yt − Ys)K )−E(

ds′ πs′(L+ L†)K ). (3.75)

Notice, however, that πs′(X) = E(U †s′XUs′ |Ys′) we can again use the definition of

the conditional expectation to convert the second term into an expectation of an

integral of U †s′(L + L†)Us′ . In Eq. (3.58) we solved for Yt, and found that Yt =

Qt +∫ t

0ds′ U †s′(L+ L†)Us′ . After substituting that solution, Eq. (3.75) simplifies to

E ((Mt −Ms)K ) = E ((Yt − Ys)K )−E(∫ t

ds′ U †s′(L+ L†)Us′ K

)= E ((Qt −Qs)K ) .

(3.76)

Any operator K ∈ Ys is an operator which acts on the system Hilbert space and the

Fock space associated with light operators defined for times s′ ∈ [0, s). The operator

Qt −Qs acts on light field states defined on the time interval [s, t]. This means that

this final expectation value factorizes to show,

E ((Mt −Ms)K ) = E(Qt −Qs)E(K) = 0. (3.77)

This is zero because the quadrature operator Qt is mean zero in vacuum, and so Mt

is indeed a martingale, when we condition on Ys. The proof is finished by simply

observing that dMtdMt = dt and so Mt is a quantum Wiener process by Levy’s

thoerem.

3.4.2 The Ito correction in the conditional master equation

The quantum filter πt(·) is given by an Ito form quantum stochastic differential

equation and therefore the conditional master equation is a semiclassical Ito equation.

In addition to an Ito integral, there is also a Stratonovich integral where the rules

of standard calculus still apply, but the statistical properties are more subtle (see

Appendix B for their respective definitions). While the two forms of integration are

distinct, they are related by a conversion formula, resulting in the “Ito correction”,

derived in Appendix B.1.1. In Chap. 4, we are required to work with a conditional

master equation written as a Stratonovich integral and so we derive this conversion

For a general measurement operator L and HamiltonianH, the conditional master

equation is

dρt = −i[H, ρt] dt+D[L](ρt) dt+H[L](ρt) dvt. (3.78)

The first two terms are simple deterministic integrals and are unaffected by the

choice of stochastic integral and can be ignored. The integrand in Ito integral is the

conditioning map and for reference is,

H[L](ρt) = Lρt + ρt L† − Tr(Lρt + ρtL

†) ρt. (3.79)

For the remainder of this section we will suppress the parameterizing argument and

simply write H(ρt).

A one-dimensional Ito integral that is typically considered in an Ito-Stratonovich

conversion has a differential

dxt = b(xt)dwt, (3.80)

for a smooth integrand b(x). When written as a Stratonovich equation, this differ-

ential is notated as

dxt = b(xt) dwt. (3.81)

The Ito correction is what results when you enforce that both integrals must give

the same process xt, and the final result is that

b(xt) dwt = b(xt) dwt + 12

dx(xt) b(xt) dt. (3.82)

The additional term is known as the Ito correction.

To immediately apply this result to the conditional master equation would involve

defining what it means to take the derivative the H(ρ) operator with respect to ρ.

Rather than defining a calculus of super–operators, we will return to the roots of the

relation (see Appendix B.1.1) and write the correction as

dI = H(ρt) dvt −H(ρt)dvt =(H(ρt + 1

2dρt)−H(ρt)

)dvt. (3.83)

The map H is unfortunately not a linear operator in ρt and so the integrand on right

hand side is not simply H(12dρt). After a little algebra we find that

dI = 12

(Ldρt + dρt L

† − Tr(Lρt + ρtL†)dρt

− Tr(Ldρt + dρt L†) (ρt + 1

2dρt)

)dvt. (3.84)

To simplify this expression into a final form, we will substitute the Ito equation

expression for dρt and apply the Ito rule that dvtdvt = dt with all other differential

products being zero. This means that when substituting dρt we need to only use

the stochastic term as any deterministic term will result in a product dtdvt = 0.

Furthermore any term with two powers of dρt will also be zero as that will result in

three powers of dvt. The simplified expression is then,

dI = 12

(LH(ρt) +H(ρt)L

† − Tr((L+ L†)H(ρt)

)ρt − Tr((L+ L†)ρt)H(ρt)

≡Ic[L](ρt) dt.

(3.85)

Substituting the definition of H[L](ρt), the Ito correction map, Ic[L](ρt), simpli-

fies to

Ic[L](ρt) =(LρtL

† + 12L2ρt + 1

† 2)

−(〈L†L〉+ 1

2〈L2〉+ 1

2〈L†2〉

− 〈L+ L†〉 (Lρt + ρt L† − 〈L+ L†〉 ρt)

(3.86)

where 〈X〉 = Tr(Xρt).

Ultimately the Stratonovich form of the conditional master equation is then given

dρt = −i[Ht, ρt] dt+D[L](ρt) dt− Ic[L](ρt) dt+H[L](ρt) dvt. (3.87)

3.4.3 The conditional Schrodinger equation

In this chapter we have focused solely on the interpretation of quantum mechanics

in terms of probability spaces. That description lead to a quantum conditional ex-

pectation and a quantum filter, which is described in a Heisenberg picture. From

that Heisenberg picture description we found a conditional master equation (CME).

When the state of the system is pure and the dynamics are such that it will re-

main pure, then propagating a full density matrix is unnecessary and a conditional

Schrodinger equation (CSE) is sufficient. Chap. 5 uses this fact for computational

efficiency and therefore we include the general expression for a CSE based upon the

CME in Eq. (3.70). The details of the conversion can be found in [37].

The CME gives the evolution for ρt in terms of an Ito differential dρt, Eq. (3.70).

Any density matrix whose purity is 1 can be represented by an outer product of a

normalized state vector in Hilbert space [63],

ρt = |ψt 〉〈ψt | if and only if Tr(ρ2) = 1. (3.88)

Furthermore |ψt〉 is unique up to an arbitrary constant phase. While we have worked

quite hard to derive the CME and give it physical meaning, practically speaking it

is “nothing” more than a matrix valued stochastic differential equation defined on

a classical probability space. Therefore a method for moving from a CME to a

CSE is to hypothesize the existence of a random state vector |ψt〉 satisfying some

vector valued SDE d|ψt〉 and then solve for the differential that give the differential

d(|ψt 〉〈ψt |) equal to the CME.

We note that this is not a standard derivation. Typically in quantum optics, one

first derives a stochastic Schrodinger equation via an unraveling of a master equation

that considers photon counting and then takes a diffusive limit [51]. Having already

developed the CME from the quantum filter, it is much simpler to perform the

above calculation, rather than including an independent derivation. The resulting

equations are identical.

The derivation of d|ψt〉 is not difficult as we can see that the only random process

that enters the Eq. (3.70) is through the innovations vt and it does so linearly. We

also know that vt satisfies the Ito rule, dvtdvt = dt. Therefore a reasonable form for

d|ψt〉 is

d|ψt〉 = At|ψt〉 dt+Bt |ψt〉 dvt. (3.89)

for some time-adapted but possibly state dependent operators At and Bt. The adjoint

of this equation is then

d〈ψt| = 〈ψt|A†t dt+ 〈ψt|B†t dvt. (3.90)

And so we need to solve for At and Bt subject to the constraint,

dρt = d|ψt〉〈ψt|+ |ψt〉 d〈ψt|+ d|ψt〉 d〈ψt|. (3.91)

Doing so is not too difficult and the operators turn out to be

At = −iHt −1

(L†L− 2

⟨L†⟩L+

⟨L†⟩〈L〉)

(3.92)

Bt = L− 〈L〉 . (3.93)

Traditionally the operator L is Hermitian, and so if we choose our favorite example

of L =√κ Jz then

d|ψt〉 =(− iHt − 1

2κ (Jz − 〈Jz〉)2

)|ψt〉 dt+

√κ(Jz − 〈Jz〉

)|ψt〉 dvt (3.94)

dvt = dyt − 2√κ 〈Jz〉 dt. (3.95)

This is the equation used in Chaps. 4 and 5.

Chapter 4

Projection Filtering for Qubit

Ensembles

This chapter derives an approximate form for the conditional dynamics of an ensem-

ble of n qubits under the assumption that the state will remain nearly an identical

separable state. We assume that the system is undergoing a diffusive measurement

of the collective angular momentum operator Jz while simultaneously experiencing

strong global rotations. The approximation is made by formulating a projection fil-

ter from the exact conditional master equation. The projection is made through the

technique of orthogonal projections in differential geometry. Here we identify the

space of identical separable states as a Riemannian manifold and then project the

conditional master equation into its tangent space. We also review the elements of

differential geometry that make such a mapping possible. Finally we test the accu-

racy of the projection filter numerical by comparing it to simulations of a stochastic

schrodinger equation. We find that it matches the conditional mean spin projections

to within a 5% RMS error.

Chapter 4. Projection Filtering for Qubit Ensembles 133

4.1 Introduction

Numerical integration of a conditional master equation is generally a resource inten-

sive exercise. Specifying a general mixed state for a d-dimensional quantum system

requires d2 − 1 real parameters. Furthermore the total dimension of a many body

system grows exponentially. A system of n qubits generates a 2n-dimensional Hilbert

space, requiring 22n − 1 parameters. This “curse of dimensionality” is true even in

a unconditioned system and so physicists often search for symmetries that allow for

a more efficient description. The nonlinearity in the conditional master equation

means that a number of symmetries that are often preserved in an uncondition map

are no longer exploitable.

A projection filter is a tool that was developed in the context of classical filtering

theory and provides a general method for constraining nonlinear estimators to remain

in a lower dimensional space [31, 32]. Within the past decade these tools have

also been applied to quantum systems, specifically for cavity QED systems [33–

36], collective spin systems [37], and low rank approximations for general master

equations [38]. The flexibility of the projection method is provided by its formulation

in the language of differential geometry. In the quantum framework we have a high,

possibly infinite, dimensional manifold representing the space of possible states. It

is often the case that the system is initialized in a state with a large amount of

symmetry thereby initially allowing an efficient, lower dimensional representation.

The project filter modifies the exact evolution in such a way as to constrain the

system to remain in the lower dimensional submanifold. It does so by projecting the

differential into the lower dimensional tangent space.

Here we focus on an ensemble of n qubits initially prepared in an identical tensor

product state. In other words, the total state of the system ρtot is initialized in as a

n-fold tensor product of a single qubit state ρ,

ρtot = ρ⊗n. (4.1)

Clearly this is a highly symmetric and easily represented state, as a single qubit

state requires only 3 parameters to be specified uniquely. If the master equation

acts on each qubit individually then the total system will remain in an identical

separable state for all future times. However for a joint qubit system undergoing

a weak, diffusive measurement of the collective angular momentum variable Jz, the

conditional master equation is generally entangling. In the long time limit, this kind

of measurement most often results in the system projecting into a nonseparable Dicke

state.

In this chapter we demonstrate, through numerical simulation, that if the system

also undergoes strong, randomized rotations in addition to the collective measure-

ment then the system will remain nearly sparable. Under this assumption that this

is the case, we apply the technique of projection filtering to the conditional master

equation so that it maps identical separable states to identical separable states.

4.1.1 An introduction to differential projections

The general technique of differential projections can be understood though the fol-

lowing example. Consider an ordinary scalar function defined on three dimensions,

f(x, y, z). The chain rule shows that the differential for f is

df =∂f

∂xdx+

∂ydy +

∂zdz. (4.2)

Suppose that we have a particle with position vector x(t), and at each time t we

evaluate f(x(t), y(t), z(t)). In order to have a complete description for f we clearly

need to keep track of all three components because a change in x, y or z induces a

change in f . Now suppose that keeping track of z is too much of a hassle and we

are only interested in tracking x(t) and y(t). The question posed by the projection

filter is, “how should we modify f so that we only need to track x and y?” The

answer comes from the fact that if ∂f∂z

= 0 everywhere then f doesn’t change with z

and ultimately z can be ignored. Therefore the modification we should make it set

the gradient of f to point only in the xy-plane, i.e. set ∂f∂z

= 0. This modification is

the differential projection of f . Therefore we have a modified function f |x,y, whose

differential is simply,

df |x,y=∂f

∂xdx+

∂ydy + 0 dz. (4.3)

The difficulty in forming a projection filter is that f is not usually written in terms

of, x, y, z, but instead some other set of parameters, x′, y′, z′, or even just t.

Furthermore the desired subspace might be some complicated 2D surface with pa-

rameters v and w. It is very likely that v and w may not even be orthogonal, at least

not in the same sense x, y and z are orthogonal. The first challenge in developing a

projection filter is to give the desired objective a geometric interpretation.

4.1.2 The conditional master equation

Before embarking on a description of the geometry of quantum states, we will first

collect all of the necessary equations from previous chapters here for a single point

of reference. Sec. 3.4 found that the state of an atomic system conditioned on

a continuous diffusive measurement is easily represented by the conditional master

equation (CME) given by the Ito differential,

dρt = −i[H, ρt]dt+D[L](ρt)dt+H[L](ρt)dwt. (4.4)

(See Appendix B for a review of classical stochastic differential equations.) The

dissipation and conditioning maps, D[L](·) and H[L](·), are parameterized by the

measurement operator L and are defined as

D[L](ρ) = LρL† − 12L†Lρ− 1

2ρL†L (4.5)

H[L](ρt) = Lρt + ρtL† − Tr((L+ L†)ρt) ρt. (4.6)

Often we will omit the parameterizing argument and simply write D(ρt) and H(ρt).

Also note that ~ has been set equal to one, so that the Hamiltonian operator H has

units of frequency and the measurement operator L has units of root frequency.

Note that Sec. 3.4 used a slightly different notation, referring to the innovation as

dvt rather than dwt. Sec. 3.4.1 showed that innovation computed from the measure-

ment record yt has the statistics of a Wiener process, when the initial condition ρ0

coincides with the “true” initial state. In Chap. 5 this will not be the case, however

here we are assuming that the initial condition is known, and in particular, that it

can be written as ρ⊗n. Therefore, throughout this chapter we will consider the in-

novation to be Wiener process and write it as dwt. Sec. 3.1.4 reviews the statistical

and defining properties of the Wiener process.

The physical system that we have in mind is the idealized linear Faraday inter-

action in Sec. 2.7, meaning the measurement operator is

L =√κ Jz (4.7)

where κ is a constant rate. In addition to this measurement, we consider applying a

uniform but time varying magnetic field, leading to the Hamiltonian

H = fx(t)Jx + f y(t)Jy + f z(t)Jz. (4.8)

The control fields f i(t) are assumed to be real valued, deterministic functions of

time1.

For reasons made apparent in Sec. 4.2.4, we will also need to work with the

Stratonovich form of the CME,

dρt = −i[H, ρt]dt+D[L](ρt)dt− Ic[L](ρt)dt+H[L] dwt. (4.9)

The conversion from the Ito form generated the Ito correction map, derived in Sec.

1In Chap. 5 the control fields are written as b(t), however in this chapter the coordinatesbi indicate the projected coefficients for the stochastic terms, so here we use f i(t) instead.

3.4.2, which is

Ic[L](ρt) =LρtL† + 1

2L2ρt + 1

−(⟨L†L

⟩+ 1

⟨L2⟩

⟨L†2⟩)ρt

−⟨L+ L†

⟩(Lρt + ρt L

† −⟨L+ L†

⟩ρt).

(4.10)

Here the expectation value of the operator X has been written as 〈X〉 ≡ Tr(ρtX).

4.2 Differential Manifolds

A manifoldM is most generally a continuous set of point that can be locally mapped

to a d-dimensional Euclidean space. In a neighborhood of any point in M we can

define a smooth mapping points in that neighborhood to a flat space of dimension

d. How smooth this mapping needs to be, often depends upon the author and

the context, generally it must be smooth enough so that the tools of differential

calculus can be applied. The concept of smooth is quite at odds with the random

nature of Brownian motion, as the Wiener process is provably nondifferentiable with

probability one. Here we will be ultimately considering random trajectories on a

differential manifold. The resolution between these two conflicting notions is that

while a diffusive trajectory is nondifferentiable, it is a trajectory in a smooth space.

i.e. a two-dimensional Brownian motion is not a differentiable curve, but it is defined

on a 2-D plane which is smooth.

The specific manifold we need is the space of all valid density operators for n

qubits. For a single qubit, the Bloch vector defines a perfectly respectable one-to-

one mapping between a quantum state and the 3-dimensional Euclidian ball. The

conditional master equation then has a representation as a diffusive trajectory within

the Bloch ball. Defining an equivalent representation for a d−dimensional quantum

state is nontrivial and is still the subject of current research. While there does exist

an equivalent mapping to a ball living in a 2d−1-dimensional space, the boundary and

smoothness of this mapping is quite complex and not well understood [68]. Here we

will only be interested in a geometric representation of states that can be written as

n copies of a single qubit state. Ultimately a Bloch vector representation is sufficient

for our purposes.

4.2.1 Tangent spaces

The differential projection we ultimately want to preform requires a deeper under-

standing of how to define a gradient in a more abstract setting. A key conceptual

point is that we make an association between basis vectors in a d dimension space and

the partial derivatives we can take of a function defined on the manifold. Specifically,

a point p in the manifoldM is representable by the coordinatesx1, x2, . . . , xd

smooth function f on the manifold, evaluated at this point p can therefore also be

represented as a function of these coordinates, f(x1, x2, . . . xd). At the point p the

partial derivative of the function f with respect to the coordinate xi defines the rate

of change of f as xi is varied, i.e. it defines a line tangent to f pointing in the

direction of xi.

The relation between partial derivatives and vectors can be formed by associating

the basis element ei with the partial derivative operator ∂∂xi

. Differential geometry

is concerned with defining structures that are independent of any given coordinate

system. Calculating a partial derivative with respect to a different coordinate system,

yi, is easily accomplished by applying the usual chain rule,

∂yi=∂xj

∂yi∂

∂xj. (4.11)

The coordinate independent quantity here is the space of all possible partial deriva-

tives we could take at this point. At first glance this may seem like a rather large

object, however the chain rule just showed that a partial derivative in one basis is

simply a linear combination of partial derivatives in another basis. Therefore the

space of all possible partial derivatives is simply the linear span of partials taken

with respect to some basis. This space is called the tangent space of M at point p,

denoted by,

TpM = span

∂∂xi

∣∣p

: i = 1 . . . d. (4.12)

Note that the tangent space is a d-dimensional vector space, as we are taking lin-

ear combinations of d basis vectors. Often we will discuss a directional derivative,

meaning that we will be taking a derivative in the direction of another point in the

manifold. But as this could have any relation to a given coordinate system, the di-

rection derivative defines a vector in the tangent space. Another useful bit of jargon

is that if you have a tangent vector defined for ever point in the manifold then this

defines a vector field.

4.2.2 Riemannian Metrics and orthogonal projections

The tangent space TpM defines the set of all possible partial derivatives one could

make at the point p. However it does not describe how those derivatives are related.

While in a Cartesian basis we have a sense that ex is orthogonal to ey, in general its

hard to tell how the arbitrary vector eu is related to ew. The missing element is a

metric, 〈·, ·〉p, describing a positive definite inner product between any two tangent

vector fields. At each point p we can take the dot product of eu and ev to see how

they are related. If the space is Euclidean, then the metric well report the fact that

ex is orthogonal to ey, which is not true in general.

A Riemannian manifold is a manifold M that is equipped with a metric that

varies continuously between different points. While in a Euclidean space the inner

product between two vectors doesn’t change between different points, this is not

true in a general space leading to much richer geometries. For a basis of vectors ej

spanning the tangent space TpM the metric at that point can be written as a d× d

matrix with components,

gij(p) ≡ 〈ei, ej〉p . (4.13)

In addition to being positive definite, a metric is also symmetric in that 〈ei, ej〉p =

〈ej, ei〉p.

A metric gives a notion of two vectors being orthogonal and from that we are

able to make an orthogonal projection. This is crucially important as we wish to

project the conditional master equation into the tangent space of states that are n

copies of a single qubit state. For a Euclidean space, the orthogonal projection of

the vector v = v1e1 + v2e2 + v3e3 onto the XY plane is trivial to compute, as it

simply discards the e3 component. Given a metric and a general manifold we can

make a similar formulation.

Suppose for a Riemannian manifoldM we have a submanifold N ⊆M of dimen-

sion n ≤ d. Without explicitly constructing an orthogonal basis for every tangent

space TpM, we would like to find a map that discards the vector components orthog-

onal to TpN . This can easily be done, given a basis of vectors vi : i = 1, . . . , n

that span TpN . The metric 〈·, ·〉p taken from M, can equally well be applied to

N as their tangent spaces overlap. Applying this metric to vi we have the n× n

matrix with elements

gij(p) = 〈vi, vj〉p . (4.14)

As the metric is positive definite, this matrix is invertible whose entries are often

written as, gij(p) ≡(g(p)

ij. We can now show that that the map ΠN : TpM→

ΠN (·) = gij(p) 〈vj, · 〉p vi (4.15)

is equivalent to discarding the component of w orthogonal to TpN .

The projection map should operate as the identity for any vector u ∈ TpN .

To check that this is true, TpN = span vi : i = 1 . . . n, and so u be written as

u = uk vk for some coefficients uk. Then we have

ΠN (u) = gij(p)⟨vj, u

kvk⟩pvi

= gij(p)(uk gjk(p)

= uk gij(p) gjk(p)vi

= ukδik vi = u.

(4.16)

ΠN should also return zero for every vector orthogonal to TpN . This is also easy to

check, as for every v⊥ in the orthogonal complement of TpN , we have⟨v, v⊥

if v ∈ TpN . Therefore,

ΠN (v⊥) = gij(p)⟨vj, v

⊥⟩pvi = 0. (4.17)

But as TpM = TpN ∪ (TpN )c, ΠN is the correct mapping.

Note that Eq. (4.15) required only specifying the metric gij(p) on the submanifold

N and does not require an explicit representation for tangent vectors outside of this

subspace. This is the reason why it is not necessary to find an explicit mapping

between the space of n-qubit density matrices to a 2d − 1-dimensional Euclidean

space in order to use the projection filtering methods. All we need is the valid metric

for density matrices and a spanning set of tangent vectors in the submanifold we

wish to project onto.

4.2.3 Differentials on abstract manifolds

As the conditional master equation is written in terms of stochastic differentials,

we must see how a differential operates in a geometric context. In multivariable

calculus the fundamental object is the differential of the coordinates, e.g. dx, dy,

etc.. In a more abstract space, its difficult to intuit what the differential means.

For instance how would one define a differential of a matrix, say the Pauli matrix

σx. Would it be the differential of its entries, the differential of its eigenvalues or

maybe even a differential of both the eigenvalues and eigenvectors? The solution

to this problem is to consider the differential not the individual points themselves,

but the differential after the application of a smooth map to the Euclidean space.

The differential in the abstract space is then inferred from the Euclidean differential.

This process of inference is called the pullback, in that you are pulling back from the

original mapping. Our ultimate goal is to interpret the conditional master equation

dρt in the language of differential geometry and so we need to understand how it

relates to a Euclidean mapping.

Basic multivariable calculus shows that the total differential of the scalar function

f is given by,

df =∂f

∂xidxi. (4.18)

A differential can also be view as a linear map acting on tangent vectors. The action

of df on the tangent vector ∂∂xi

is defined to be

)≡ ∂f

∂xi. (4.19)

While this may seem a bit obtuse at first, it is actually a very useful concept. To

see why, consider the most basis function we can consider, namely the coordinate

function xi. The differential dxi has an action on the basis vector ∂∂xj

)=∂xi

∂xj= δij. (4.20)

This shows that the coordinate differential is biorthogonal to ∂∂xj

, and therefore

can be thought of as a dual basis vector. When defining the tangent space in Sec.

4.2.1 we found that the partial derivatives spanned that space and the coordinate

transformation coefficients were simply linear expansion coefficients. The same is

true for the differential df in that ∂f∂xi

are the expansion coefficients in a dual space

spanned by the basis vectors dxi. The dual space is often called the cotangent space.

A differential of a function between two spaces can also be defined. While we just

considered the differential of a scalar valued function, f , we can also consider the

differential of a vector, matrix, or operator valued function. When df acted on a basis

vector ∂∂xi

it returned a scalar value ∂f∂xi

, but with a more general mapping function

the returned value should be something other than a scalar. It turns out that when

you have a function ϕ : R3 →M, the differential of this is a function dϕ : TxR3 →

Tϕ(x)M. The point being that when a function maps one space into another, the

differential maps tangent vectors to tangent vectors. This is best illustrated though

a concrete example, which we will give in Sec. 4.3.1, after formulating the Bloch

vector representation as a Riemannian manifold.

4.2.4 Stochastic calculus on differential manifolds

There seems to be a fundamental inconsistency between a smooth, infinitely differ-

entiable manifold and the nowhere differentiable path of a Wiener process. From

the Wong-Zakai theorem, (see Appendix D) we know that if there exists a smooth,

ordinary differential equation that limits to a stochastic differential equation, then

the limit should be interpreted as a Stratonovich SDE. When trying to incorporate

stochastic differential equations into the language of differential forms, one approach

would be to enforce a smooth approximation, apply the differential technique and

then take a stochastic limit at the end. However a skeptical mathematician might

wonder if such a result could be believed as the end result might depend heavily on

how the smooth approximation was made.

At a practical level, the second order nature of the Ito rule is difficult to reconcile

with the notion of constrained motion on submanifold. A simple example of this is

made in [32], which we will reproduce here. Consider the ordinary differentials,

dx = dt

dy = 2t dt.(4.21)

We can easily see that this describes the parabola y = x2, which can also be con-

sidered an immersion of a one-dimensional manifold into R2. (The parameterizing

function is ϕ(t) = (t, t2).) Furthermore we can see that the coefficients of these

equations, describe a vector, (1, 2x), tangent to the parabola. Were these equations

used to describe the evolution of a system whose initial condition is on the parabola,

we expect the system to remain on this submanifold.

In contrast, consider an equivalent system of Ito stochastic differential equations

dxt = dwt

dyt = 2xt dwt.(4.22)

The coefficients still describe a vector, (1, 2xt), tangent to the parabola ϕ(t) =

(xt, x2t ). However, a simple application of the Ito rule shows that these SDEs have

the solution,

xt = x0 + wt

yt = y0 + w2t − t.

(4.23)

So even if (x0, x0) = (0, 0), these equations clearly does not remain on the parabola,

even though they are described by a vector field in its tangent space. Conversely,

the Stratonovich SDEs

dxt = dwt

dyt = 2xt dwt(4.24)

have the solution

xt = x0 + wt

yt = y0 + w2t ,

(4.25)

which properly describes diffusion on the parabolic manifold.

Our ultimate goal is to take a system of stochastic differential equations and

modify their coefficients so that they remain constrained to a particular submanifold.

This example demonstrates that in order for the tangent space projection to be

effective, we must first express the Ito equation in a Stratonovich form.

4.3 The Bloch Sphere as a Riemannian Manifold

In order to describe the space of density matrices in geometric terms, we need to

choose a metric. There are an infinite number we can choose from and it is likely

that any results we arrive at will depend upon this choice. In the classical projection

filtering problem, the metric Brigo et al. choose the the Fisher information, as it

endows information theory with a nontrivial geometry [31]. van Handel and Mabuchi

follow this example and use a quantum version of the Fisher information [33]. Later

authors choose a different metric, namely the trace inner product [34–37]. While the

trace inner product does not have an immediate connection to quantum information

theory, it is significantly simpler to work with and, as we will shortly show, under

this metric the Bloch sphere for a single qubit is Euclidean. In showing this, we will

also formally construct the state space for a single qubit as a Riemannian manifold.

For a Hilbert space of dimension d, we will follow [68] and refer to the set of all

valid density operators as S(d). In the case of a qubit with d = 2, we already know

that the Bloch sphere is an incredibly useful parametrization of this set. Formally,

we define this as the map ρ : B ⊂ R3 → S(2) so that

ρ(x) = 12

(1+ x · σ) . (4.26)

As every valid quantum state is required to be trace 1 and positive semi-definite, we

have the constraint that |x| ≤ 1, implying that B is the unit ball.

Through the Bloch sphere mapping, we can construct a tangents space for S(2).

This is first done by defining a directional derivative for S(2). Consider the Bloch

vectors x and y (|x| , |y| ∈ (0, 1) ). The derivative of ρ(x) in the direction of y is

defined to be

Dy ≡ limλ→0

ρ(x + λy)− ρ(x)

λ=y · σ

2. (4.27)

Then assuming the standard Cartesian coordinate system x1, x2, x3, we have the

basis of tangent vectors

Di ≡ 12σi. (4.28)

The tangent space at the point ρ(x) ∈ S(2) is then

Tρ(x)S(2) = span Di : i = 1, 2, 3 . (4.29)

Armed with these tangent vectors we will choose, with some foresight, the trace

inner product as a metric. For two tangent vectors Di and Dj we have the metric

gij = 〈Di, Dj〉ρ ≡ Tr(D†i Dj). (4.30)

While this could result in a complex metric, we can see that for the qubit the basis

vectors are Hermitian and therefor the metric is real. Also note that due to the cyclic

property of the trace, it is also symmetric. Then for the qubit, simply calculating

〈Di, Dj〉ρ =1

4Tr(σiσj) =

2δij. (4.31)

Up to a factor of a half, the Bloch sphere is Euclidian under this metric.

4.3.1 Projecting the unconditional master equation

In this section we work though an example of explicitly expressing an unconditional

master equation for a single qubit in terms of a differential form dρ. We will also

do so generally, without assuming a Euclidean metric. Most generally ρ(t) is a

map ρ : R+ → S(2). For any time, t, ρ(t) returns a valid density matrix. Then

as a differential, the master equation is the map dρ : TtR+ → Tρ(t)S(2), which is

specifically

dρ = −i[H, ρ]dt+D[L](ρ)dt, (4.32)

for a general Hamiltonian H and jump operator L. Instead of the direct mapping

between time and density matrices, we would like to consider this in terms of the

Bloch sphere mapping of Eq. (4.26). This can be done if we consider the time

component as a kind of functional composition, so that ρ(t) = ρ(x(t)) for a map

x(t) between time and Bloch vectors. From Eq. (4.26), the general expression for

dρ : TxR3 → Tρ(x)S(2) is

dρ = 12ai(x)σi dx

i. (4.33)

To see that this is indeed a map the two tangent spaces we can simply calculate its

action on the basis vector ∂∂xj

)=∑i

12ai(x)σidx

)=∑i

12ai(x)σi δ

ij = 1

2aj(x)σj (4.34)

where there is no sum in the final expression. This is clearly in the tangent space

Tρ(x)S(2) as it is proportional to Dj = 12σj. Our ultimate goal is then to solve for

the coefficients ai(x).

Any traceless matrix 2×2 matrix can be written as a linear combination of Pauli

matrices. As both the commutator [H, ρ] and the map D[L](ρ) are traceless, both

of these operations have some expansion coefficient in terms of the Pauli matrices.

Sec. 4.3 found that the tangent space TρS(2) is also spanned by the Pauli matrices,

meaning that −i[H, ρ] and D[L](ρ) are vectors in this space. Thus finding the

coefficients ai(x) simply comes to projecting these maps onto the basis vectors Di.

Sec. 4.2.2 established that the general projection map ΠN can be written as Eq.

(4.15), in terms of the metric and its inverse gij(x). We are able to write dρ(t) as

dρ(x(t)) = −i gij(x) 〈Di, [H, ρ]〉ρDjdt+ gij(x) 〈Di, D[L](ρ)〉ρDjdt. (4.35)

But as this is a differential with respect to dt and not dxj we can define the differ-

entials for the time-dependent coordinates xj(t)

dxj = −i gij(x) 〈Di, [H, ρ]〉ρ dt+ gij(x) 〈Di, D[L](ρ)〉ρ dt, (4.36)

meaning that

dρ = Dj dxj. (4.37)

4.4 Projections in the tensor product submanifold

Our ultimate goal is to form a projection from a general state over n qubits, to the

closest n-fold tensor product of a single qubit state. We will define P to be the

submanifold of S(2n) which describes the space of all states of the form ρ(x)⊗n. This

space has the simple parameterization % : B ⊂ R3 → P ⊂ S(2n) such that

%(x) ≡ ρ(x)⊗n =1

2n(1+ x · σ)⊗n . (4.38)

We also need to identify the tangent spaces for each point in the submanifold.

Because of the linear nature of the one qubit map ρ the directional derivative of %(x)

with respect to y is simply

Dy =∂

∂λ%(x + λy)

∣∣∣∣λ=0

. (4.39)

A derivative acting on a tensor product must obey the Leibnitz rule. The directional

derivative of ρ(x)⊗n in the direction y must then be equal to

Dy(%(x)) =∂

∂λρ(x + λy)⊗n

∣∣∣∣λ=0

=n∑i=1

ρ(x)⊗ i−1 ⊗ 1

2y · σ ⊗ ρ(x)⊗n−i. (4.40)

For the single qubit, the directional derivative was uniform over the manifold,

which implied the Euclidean geometry for our simple metric. For multiple qubits,

this is no longer the case, which implies that P has a richer geometry. With a slight

abuse of notation, the basis vector associate with the coordinate xi, evaluated at the

state ρ(x)⊗n will be notated Di(x) and is given by

Di(x) =n∑j=1

ρ(x)⊗ j−1 ⊗ 12σi ⊗ ρ(x)⊗n−j. (4.41)

The tangent space at %(x) is then

T%(x)P = span Di(x) : i = x, y, z . (4.42)

The metric on P induced from the trace inner product is now easily calculated.

The product of the two basis vectors Di and Dj is equal to

DiDj =n∑

(ρ⊗ p−1 ⊗ 12σi ⊗ ρ⊗n−p)(ρ⊗ q−1 ⊗ 1

2σj ⊗ ρ⊗n−q)

ρ2⊗ p−1 ⊗ 14σi σj ⊗ ρ2⊗n−p

ρ2⊗ p−1 ⊗ 12σiρ⊗ ρ2⊗ q−p−1 ⊗ 1

2ρσj ⊗ ρ2⊗n−q

ρ2⊗ p−1 ⊗ 12ρσj ⊗ ρ2⊗ p−q−1 ⊗ 1

2σiρ⊗ ρ2⊗n−q.

(4.43)

The metric coefficient is then

〈Di, Dj〉%(x) = Tr(DiDj )

4Tr(ρ2)n−1 Tr( σi σj ) +

n(n− 1)

4Tr(ρ2)n−2 Tr(ρ σi) Tr(ρ σj)

2n(1 + |x|2

)n−1δij +

n(n− 1)

2n(1 + |x|2

)n−2xkx` δkiδ`j.

(4.44)

We will often need to calculate the product between several collective operators

and then take the trace. While Eq. (4.43) has a distinct ordering to the tensor

products, resulting in the two sums p < q and q < p, upon taking the trace this

order becomes irrelevant. Thus, there are only two relevant terms: p = q and p 6= q.

4.4.1 The metric in spherical coordinates

The metric as given by Eq. (4.44) has a simple form when written in spherical co-

ordinates. In terms of the Cartesian basis vectors ex, ey, ez the standard spherical

basis vectors are defined as,

er = sin θ cosφ ex + sin θ sinφ ey + cos θ ez

eθ = cos θ cosφ ex + cos θ sinφ ey − sin θ ez

eφ = − sinφ ex + cosφ ey.

(4.45)

In analogy, we will define the associated tangent vectors,

Dr(x) = sin θ cosφDx(x) + sin θ sinφDy(x) + cos θ Dz(x)

Dθ(x) = cos θ cosφDx(x) + cos θ sinφDy(x)− sin θ Dz(x)

Dφ(x) = − sinφDx(x) + cosφDy(x).

(4.46)

When x is in the subset x ∈ B : (0 < r < 1, 0 < θ < π, 0 < φ < 2π), these vector

fields form a perfectly valid basis for each tangent space T%(x)P .

It will also be convenient to define “spherical” Pauli matrices,

σr ≡ sin θ cosφσx + sin θ sinφσy + cos θ σz

σθ ≡ cos θ cosφσx + cos θ sinφσy − sin θ σz

σφ ≡ − sinφσx + cosφσy.

(4.47)

These operators obey the usual properties associated with Pauli matrices, in that for

i, j, k ∈ r, θ, φ

Tr(σi) = 0 (4.48a)

Tr(σi σj) = 2δij (4.48b)

[σi, σj] = i εijk 2σk (4.48c)

(σiσj + σjσi) = δij 21. (4.48d)

Furthermore, we have that

Di(x) =∑j

ρ(x)⊗j−1 ⊗ 12σi ⊗ ρ(x)⊗n−j (4.49)

for both Cartesian and spherical bases. And the state ρ(x) can now be written as

ρ(x) = 12

(1+ r σr) . (4.50)

We can now use the fact that the Pauli matrices are orthogonal, and the fact that

the state ρ is now orthogonal to σθ and σφ to evaluate the inner product between

the spherical tangent vectors, and thus write the metric as a matrix in spherical

coordinates. From the general expression

〈Di, Dj〉 = 14nTr(ρ2)n−1 Tr(σiσj) + 1

4n(n− 1) Tr(ρ2)n−2 Tr(σiρ) Tr(σjρ), (4.51)

we have

〈Dr, Dr〉 =n

2n(1 + r2

)n−1 1 + nr2

1 + r2,

〈Dθ, Dθ〉 =n

2n(1 + r2

)n−1,

〈Dφ, Dφ〉 =n

2n(1 + r2

)n−1,

(4.52)

〈Dr, Dθ〉 = 〈Dr, Dφ〉 = 〈Dθ, Dφ〉 = 0. (4.53)

As a matrix, the metric in spherical coordinates is given by

G(x) =n

2n(1 + r2

)n−1

1+r2 0 0

(4.54)

and its inverse is

G−1(x) =2n

(1 + r2

)−(n−1)

1+nr2 0 0

. (4.55)

Notice that when n = 1 we recover the simple Euclidean metric of gij = 12δij.

4.4.2 Calculating collective operator inner products

This section contains the detailed calculations necessary for projecting the various

components of the conditional and unconditional master equations onto the space of

identical separable states. We will first derive the projection for a general conditional

master equation of the form

dρtot = −i[Htot, ρtot]dt+D[Ltot](ρtot)dt+Ic[Ltot](ρtot)dt+H[Ltot](ρtot)dwt. (4.56)

The subscript tot is used to specify that these operators are operators on the total

Hilbert space consisting of N particles. Any single particle operator A, acting on the

nth particle of the ensemble, is denoted by A(n) and is given by the tensor product,

A(n) ≡ 1⊗n−1 ⊗ A⊗ 1⊗N−n. (4.57)

The fundamental assumption for this derivation is that the operators Htot and

Ltot, act independently and identically on each each qubit and may be written as

Htot =N∑n=1

H(n) =N∑n=1

1⊗n−1 ⊗H ⊗ 1⊗N−n (4.58a)

Ltot =N∑n=1

L(n) =N∑n=1

1⊗n−1 ⊗ L⊗ 1⊗N−n. (4.58b)

Furthermore, the tangent vectors Di, for the single particle state ρ are

Di =N∑n=1

ρ⊗n−1 ⊗ 12σi ⊗ ρ⊗N−n. (4.59)

In projecting the collective master equation onto the identical product states,

we will need to calculate the product of up to three collective operators and then

take the trace. Each collective operator is composed of a sum over single particle

operator, each acting on nth member. When taking the product of sums there will

be N terms where both single particle operators act on the same subsystem, as well

as N(N − 1) terms where the constituent operators act on different systems.

The simplest case is when there are no collective operators i.e., simply calculating

the overlap between Di and the state %. This is equal to

〈Di, %〉 =N∑n=1

Tr(( ρ⊗n−1 ⊗ 1

2σi ⊗ ρ⊗N−n) %

)=N Tr(ρ2)N−1 Tr(1

2σi ρ).

(4.60)

The next step up in complexity is to include a single collective operator, Atot. This

requires two sums, the sum from Di and the sum from Atot. We then have

〈Di, Atot %〉 =N∑

Tr((ρ⊗m−1 ⊗ 1

2σi ⊗ ρ⊗N−m

)A(n)ρ⊗N

N∑n=m=1

Tr(ρ2⊗m−1 ⊗ 1

2σiAρ⊗ ρ2⊗N−m

N∑n<m=1

Tr(ρ2⊗n−1 ⊗ ρAρ⊗ ρ2⊗m−n−1 ⊗ 1

2σi ρ⊗ ρ2⊗N−m

N∑n>m=1

Tr(ρ2⊗m−1 ⊗ 1

2σi ρ⊗ ρ2⊗n−m−1 ⊗ ρAρ⊗ ρ2⊗N−n

)=N Tr(ρ2)N−1 Tr(1

2σiAρ) +N(N − 1) Tr(ρ2)N−2 Tr(1

2σi ρ) Tr(Aρ2).

(4.61)

The cyclic property of the trace shows us that upon switching the order of Atot and

%, (i.e. to instead calculate 〈Di, %Atot〉) the second term will be left unchanged, so

〈Di, %Atot〉 = N Tr(ρ2)N−1 Tr(12σi ρA) +N(N −1) Tr(ρ2)N−2 Tr(1

2σi ρ) Tr(Aρ2).

(4.62)

When calculating the projection of the dissipator terms, we need to calculate the

product of two collective operators, which will have a triple sum. For two collective

operators Atot and Btot we have,

〈Di, AtotBtot %〉 =N∑

n,m,l=1

Tr((ρ⊗n−1 ⊗ 1

2σi ⊗ ρ⊗N−n

)A(m) B(l) ρ⊗N

). (4.63)

The previous result shows us that there will be five distinct terms, for cases where

n = m = l, n 6= m = l, n = m 6= l, n = l 6= m, and n 6= m 6= l. This expression then

simplifies to

〈Di, Atot Btot %〉 =N Tr(ρ2)N−1 Tr(12σiAB ρ)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σi ρ) Tr(ρAB ρ)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiAρ) Tr(ρB ρ)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiBρ) Tr(ρAρ)

+ N(N − 1)(N − 2) Tr(ρ2)N−3 Tr(12σi ρ) Tr(ρAρ) Tr(ρB ρ).

(4.64)

The terms Tr(ρXρ) can be simplified to Tr(Xρ2) but were left to make it more

explicit. The order of the collective operators can be exchanged, but this won’t

effect the five term structure. Thus we calculate that

〈Di, Atot %Btot〉 =N Tr(ρ2)N−1 Tr(12σiAρB)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σi ρ) Tr(ρAρB)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiAρ) Tr(B ρ2)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiρB) Tr(Aρ2)

+ N(N − 1)(N − 2) Tr(ρ2)N−3 Tr(12σi ρ) Tr(Aρ2) Tr(B ρ2)

(4.65)

〈Di, %Atot Btot〉 =N Tr(ρ2)N−1 Tr(12σi ρAB)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σi ρ) Tr(AB ρ2)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiρA) Tr(B ρ2)

+ N(N − 1) Tr(ρ2)N−2 Tr(12σiρB) Tr(Aρ2)

+ N(N − 1)(N − 2) Tr(ρ2)N−3 Tr(12σi ρ) Tr(Aρ2) Tr(B ρ2).

(4.66)

The final two calculations involving collective operators are the expectation val-

ues, 〈Atot〉 and 〈AtotBtot〉. They are

〈Atot〉 = Tr(Atot%) = N Tr(Aρ) (4.67)

〈AtotBtot〉 = Tr(AtotBtot %) = N Tr(AB ρ) +N(N − 1) Tr(Aρ) Tr(B ρ). (4.68)

The general expressions in Eqs. (4.60-4.68) are all we need in order to calculate

all of the terms in the conditional master equation and a tangent vector Di. Starting

with the Hamiltonian commutator, Eq. (4.61) and its permutated version gives,

〈Di, [Htot, %]〉 = N Tr(ρ2)N−1 Tr(12σi [H, ρ]). (4.69)

For the dissipator term, substituting in to Eq. (4.64-4.66) with the appropriate

collective operator Ltot or L†tot,

〈Di, D[Ltot](%)〉 = Tr(Di (Ltot%L

†tot − 1

2L†totLtot%− 1

2%L†totLtot)

)=N Tr(ρ2)N−1 Tr(1

2σiD[L](ρ) )

+ N(N − 1) Tr(ρ2)N−2 Tr(12σi ρ) Tr(ρD[L](ρ))

+ N(N − 1) Tr(ρ2)N−2 Tr(14σi [L, ρ]) Tr(L†ρ2)

+ N(N − 1) Tr(ρ2)N−2 Tr(14σi [ρ, L

†]) Tr(Lρ2).

(4.70)

Note that final lines in Eqs. (4.64-4.66) are all equal and hence cancel in the dissipator

For the conditioning map H[Ltot] we again need Eq. (4.61), with the addition of

Eq. (4.60) and the single operator expectation value Eq. (4.67). Its overlap then

reduces to

〈Di, H[Ltot](%)〉 = Tr(Di

(Ltot%+ %L†tot −

⟨Ltot + L†tot

⟩%))

=N Tr(ρ2)N−1 Tr(12σi(Lρ+ ρL†))

+N(N − 1) Tr(ρ2)N−2 Tr((L+ L†) ρ2) Tr(12σi ρ)

−N2 Tr(ρ2)N−1 Tr((L+ L†)ρ) Tr(12σiρ).

(4.71)

Finally for the general Ito correction map Eq. (3.86),

〈Di, Ic[Ltot](%)〉 =N Tr(ρ2)N−1 Tr(

12σi (LρL

† + 12L2ρ+ 1

2ρL† 2)

)+N(N − 1) Tr(ρ2)N−2 Tr(1

2σi ρ) Tr

(ρ(LρL† + 1

2L2ρ+ 1

2ρL† 2)

)+N(N − 1) Tr(ρ2)N−2 Tr(1

2σi(Lρ+ ρL†) ) Tr((L+ L†)ρ2)

−N2 Tr(ρ2)N−1( ⟨L†L+ 1

2L2 + 1

2L2 †⟩

+12(N − 1)

⟨L+ L†

Tr(12σi ρ)

−N⟨L+ L†

⟩〈Di, H[Ltot](%)〉 .

(4.72)

These expressions simplify, in the case of a Hermitian operator L acting on a

single qubit state. Specifically when L = Jz and H is the control Hamiltonian in Eq.

(4.8), Eqs. (4.69-4.72) simplify to

〈Di, [Htot, %]〉 = 14N Tr(ρ2)N−1 Tr

(σi [f

j(t)σj, ρ]), (4.73)

〈Di, D[Jz](%)〉 = 18N Tr(ρ2)N−1 ( Tr (σi σzρ σz )−N Tr(σi ρ) )

+ 18N(N − 1) Tr(ρ2)N−2 Tr(ρσzρσz) Tr(σi ρ),

(4.74)

〈Di, H[Jz](%)〉 = 14N Tr(ρ2)N−1 Tr (σi (σzρ+ ρ σz) )

− 12N2 Tr(ρ2)N−1 Tr(σz ρ) Tr(σi ρ)

+ 12N(N − 1) Tr(ρ2)N−2 Tr(σzρ

2) Tr(σi ρ),

(4.75)

〈Di, Ic[Jz](%)〉 = 〈Di, D[Jz](%)〉

+ 14N(N − 1) Tr(ρ2)N−2 Tr(σzρ

2) Tr (σi (σzρ+ ρσz) )

− 14N2(N − 1) Tr(ρ2)N−1 Tr(σz ρ)2 Tr(σi ρ)

+ 14N(N − 1)(N − 2) Tr(ρ2)N−3 Tr(σzρ

2)2 Tr(σi ρ)

−N Tr(σz ρ) 〈Di, H[Jz](%)〉 .

(4.76)

It is worth noting that in Eqs. (4.74-4.76), a majority of the terms are proportional

to Tr(σi ρ). If the state has zero expectation along the σi axis, then the overlap

with that tangent vector will be greatly simplified. However, any qubit state (which

is not completely mixed) has a Bloch vector pointing along some axis, leaving the

orthogonal axes with zero expectation. When the state happens to align with the

Di, i.e. the Bloch vector is x = rei, Tr(σi ρ) = r. This suggests that these terms

may simplify for any state, if we choose to work in spherical coordinates.

4.4.3 The spherical projection of the CME

With the metric inverse in Eq. (4.55), and the inner product expressions in Eqs.

(4.75-4.76) we can finally calculate the projection coefficients hi, li, ci, bi. To fully

simplify the inner products we need the following relations, which are easily calcu-

lated:

Tr(σi ρ) =

r for i = r

0 for i = θ

0 for i = φ

, (4.77a)

Tr(σi (σzρ+ ρσz)) =

2 cos(θ) for i = r

−2 sin(θ) for i = θ

0 for i = φ

, (4.77b)

Tr(σi σz ρ σz) =

r cos(2θ) for i = r

−r sin(2θ) for i = θ

0 for i = φ

, (4.77c)

Tr(σz ρ) = Tr(σz ρ2) = r cos(θ), (4.77d)

Tr(ρ σz ρ σz) = 12(1 + r2 cos(2θ)). (4.77e)

The azimuthal symmetry of the problem is directly apparent in these expressions, as

the Jz projection carries no information about φ.

Substituting the spherical Pauli matrices into Eq. (4.75), the Hamiltonian inner

products simplify to

〈Dr, −i[H, %]〉 = 0

〈Dθ, −i[H, %]〉 = 12nTr(ρ2)n−1 r

(f2(t) cos φ− f1(t) sin φ

)〈Dφ, −i[H, %]〉 = 1

2nTr(ρ2)n−1 r(f3(t) sin θ − f1(t) cos θ cos φ− f2(t) cos θ sin φ

(4.78)

Physically, applying magnetic fields to a spin ensemble cannot change the total mag-

netization, a fact that is confirmed by having the Dr projection be zero.

The remaining projections we need to calculate are all based on the Jz mea-

surement operator and so contain no information about the φ coordinate. This can

be verified by substituting the Dφ tangent vector into Eqs. (4.74-4.76), which all

evaluate to zero.

The Dθ projections are greatly simplified by the fact that Tr(σθ ρ) = 0. Using

Eq. (4.77c) we find that

〈Dθ, D[Jz](%)〉 = −18nTr(ρ2)n−1 r sin(2θ). (4.79)

From Eq. (4.77b) the conditioning product reduces to

〈Dθ, H[Jz](%)〉 = −12nTr(ρ2)n−1 sin(θ). (4.80)

And combing these two results into Eq. (4.76) we have

〈Dθ, Ic[Jz](%)〉 = − 18nTr(ρ2)n−1 r sin(2θ)

− 14n(n− 1) Tr(ρ2)n−2 r sin(2θ)

+ 14n2 Tr(ρ2)n−1 r sin(2θ).

(4.81)

Simplifying the r projections is obviously a more complicated task. However, the

dissipator product with Dr is not particularly more difficult than the Dθ product.

By including the fact that Tr(σrρ) = r and substituting Eqs. (4.77c and 4.77e) into

Eq. (4.74) we find

〈Dr, D[Jz](%)〉 = −18nTr(ρ2)n−2(1 + nr2) r sin2(θ). (4.82)

Evaluating the conditioning map requires Eqs. (4.77c) and (4.77d) which reduce Eq.

(4.75) to

〈Dr, H[Jz](%)〉 = −14nTr(ρ2)n−2(1 + nr2)(r2 − 1) cos(θ). (4.83)

Finally when we combining these past results into the Ito correction product, Eq.

(4.76) simplifies to

〈Dr, Ic[Jz](%)〉 = − 18nTr(ρ2)n−2(1 + nr2) r sin2(θ)

+ 12n(n− 1) Tr(ρ2)n−2 r cos2(θ)

− 14n2(n− 1) Tr(ρ2)n−1r3 cos2(θ)

+ 14n(n− 1)(n− 2) Tr(ρ2)n−3 r3 cos2(θ)

+ 14n2 Tr(ρ2)n−2(1 + nr2)(r2 − 1)r cos2(θ).

(4.84)

Having now simplified the the projections, we are able to include the inverse met-

ric components in Eq. (4.55) to arrive at the proper projections. The Hamiltonian

projections are

hr(x, t) = 0

hθ(x, t) = gθθ 〈Dθ, −i[H, %]〉 = f2(t) r cos φ− f1(t) r sin φ

hφ(x, t) = gφφ 〈Dφ, −i[H, %]〉 = f3(t) r sin θ − f1(t) r cos θ cos φ− f2(t) r cos θ sin φ.

(4.85)

The dissipator projections are

lr(x) = grr 〈Dr, κD[Jz](%)〉 = −12κ r sin2 θ

lθ(x) = gθθ 〈Dθ, κD[Jz](%)〉 = −14κ r sin 2θ

lφ(x) = 0.

(4.86)

The conditioning projections are

br(x) = grr⟨Dr,√κH[Jz](%)

⟩= −√κ(1− r2) cos θ

bθ(x) = gθθ⟨Dθ,√κH[Jz](%)

⟩= −√κ sin θ

bφ(x) = 0.

(4.87)

The Ito correction projections are

cr(x) = grr 〈Dr, κ Ic[Jz](%)〉 = lr(x) + κα(r) r cos2 θ

cθ(x) = gθθ 〈Dθ, κ Ic[Jz](%)〉 = lθ(x) + 12κβ(r) r sin 2θ

cφ(x) = 0

(4.88)

where we defined the coefficients

α(r) ≡n(r2 − 1) +2(n− 1)

(1 + nr2)− n(n− 1)(1 + r2)

2(1 + nr2)r2 +

2(n− 2)(n− 1)

(1 + r2)(1 + nr2)r2 (4.89)

β(r) ≡ n− 2n− 1

1 + r2. (4.90)

To bring all of this together, we are reminded that the projected conditional

master equation is

dρt =(hi + li − ci

)Di dt+ biDi dwt. (4.91)

Furthermore a general Stratonovich SDE is traditionally written as dxt = A(xt) dt+

B(xt) dwt. To conform to this convention we will define the coefficients

ar(x, t) ≡ − κα(r) r cos2 θ

aθ(x, t) ≡hθ(x, t)− 12κβ(r) r sin 2θ

aφ(x, t) ≡hφ(x, t).

(4.92)

with the Hamiltonian projections hi(x, t) defined in Eq. (4.85). The projected con-

ditional master equation is finally given by the Stratonovich SDE

dρt = ai(x, t)Di(x) dt+ bi(x)Di(x) dwt (4.93)

with the spherical tangent vectors defined in Eq. (4.46).

4.5 The Projection Filter

From the projected conditional master equation in Eq. (4.93), we would like to

find a closed set of easily simulated stochastic differential equations. We now have

a nonlinear matrix-valued SDE that propagates the closest identical product state

to the exact conditional state. A single qubit state is completely characterized by

its Bloch vector, and so to simulate ρt we need only find SDEs for the three Bloch

components.

In the notation of quantum filtering theory [25], the filter is a QSDE that prop-

agates the conditional expectation of a given observable,

πt(X) ∼= Tr(ρtX). (4.94)

The filter is generally expressed as a differential, so that for the observable X (on n

qubits), we have

dπt(X) = Tr(dρtX) = ai(x, t) Tr(Di(x)X) dt+ bi(x) Tr(Di(x)X) dwt. (4.95)

The Bloch vector components can of course be identified by the expectation

value of the Pauli operators. For n qubits, we also have the relation that Tr(Ji %) =

Tr(σi ρ) = n2xi, so to extract SDEs for the Cartesian Bloch components xi from

dρt, we simply need to calculate the filtering equations for the operators 2Ji/n. Thus,

dxit =2

nTr(dρt Ji) =

n(aα(xt, t) Tr(Dα(xt)Ji) dt+ bα(xt) Tr(Dα(xt)Ji) dwt)

(4.96)

for α ∈ r, θ, φ and i ∈ 1, 2, 3. Explicit calculation shows that

Tr(Dr(x)Ji) =n

2( sin θ cosφ δi 1 + sin θ sinφ δi 2 + cos θ δi 3 )

Tr(Dθ(x)Ji) =n

2( cos θ cosφ δi 1 + cos θ sinφ δi 2 − sin θ δi 3 )

Tr(Dφ(x)Ji) =n

2(− sinφ δi 1 + cosφ δi 2 ) .

(4.97)

These equations show how to find a mixed coordinate expression for the projection

filter, i.e. it expresses dx in terms of the variables r, θ, φ. We would like to combine

these results with the expressions for aα and bα, Eqs. (4.92 and 4.87) to obtain

deterministic and stochastic coefficients expressed in Cartesian coordinates. We seek

the functions, ai(x, t) and bi(x) such that

dxit = ai(xt, t) dt+ bi(xt) dwt. (4.98)

With the standard conversion between spherical and Cartesian coordinates, Eqs.

(4.85, 4.92, 4.87, and 4.97) we can easily find these coefficients.

In Cartesian coordinates the deterministic Stratonovich coefficients are

a1(x, t) = f 2(t)x3 − f 3(t)x2 − κ (α(r) + β(r) )x1 (x3)2

a2(x, t) = f 3(t)x1 − f 1(t)x3 − κ (α(r) + β(r) )x2 (x3)2

a3(x, t) = f 1(t)x2 − f 2(t)x1 + κβ(r)x3 − κ (α(r) + β(r) )x3 (x3)2

(4.99)

and the stochastic coefficients are

b1(x) = −√κx1 x3,

b2(x) = −√κx2 x3,

b3(x) =√κ(1− (x3)2).

(4.100)

Note that the functions α(r) and β(r) actually only depend upon r2 = ‖x‖2.

To complete the derivation we will convert these Stratonovich equations back to

the Ito form. The Ito correction for the multivariable bi coefficients is given by,

∆ ai(x) = ai(x, t)− ai(x, t) =1

2bj(x)

∂bi(x)

∂xj. (4.101)

Substituting bi(x) into this formula, the Ito corrections simplify to

∆ a1(x) =κx1((x3)2 − 1

)∆ a2(x) =κx2

((x3)2 − 1

)∆ a3(x) =κx3

((x3)2 − 1

(4.102)

By adding these terms to the deterministic coefficients in Eq. (4.92) we find

a1(x, t) = f 2(t)x3 − f 3(t)x2 − 12κx1 + κ γ(r)x1 (x3)2,

a2(x, t) = f 3(t)x1 − f 1(t)x3 − 12κx2 + κ γ(r)x2 (x3)2,

a3(x, t) = f 1(t)x2 − f 2(t)x1 + κ (β(r)− 1)x3 − κ γ(r)(x3)3

(4.103)

were we defined the new coefficient function γ(r) as

γ(r) ≡ 1− α(r) + β(r)

r2= (1− r2)

(n (n+ 1)

2 (1 + n r2)− 1

1 + r2

)(4.104)

and the β(r) coefficient is given in Eq. (4.90). Note that for any n and 0 ≤ r ≤ 1,

the coefficient γ(r) is strictly nonnegative. The zeros of this function however are

two special cases, which we will discuss next.

If we substitute the stochastic coefficients bi(x) into Eq. (4.101) the projection

filtering equations are

dx1t = a1(xt, t) dt−

√κx1

t x3t dwt

dx2t = a2(xt, t) dt−

√κx2

t x3t dwt

dx3t = a3(xt, t) dt+

√κ(1− (x3

t )2) dwt.

(4.105)

4.5.1 Special cases for the projection filter

There exist two very interesting special cases for the projection filter. The first is

when we only have a single qubit, and the second is when the state is pure. We

have already shown how, when n = 1, the metric is simply a Euclidean metric. (Up

to a factor of a half.) Therefore, we expect the projection filtering equations to

dramatically simplify for a single qubit. The first thing to notice is that when n = 1

the β(r) coefficient Eq. (4.90) simplifies to

β(r) = 1 for n = 1. (4.106)

Furthermore, for n = 1 the γ(r) coefficient Eq. (4.104) simplifies to

γ(r) = 0 for n = 1. (4.107)

When we evaluate the projection filter for pure states, we arrive at remarkably similar

results. In other words for any n = 1, 2, . . . we also have

β(r = 1) = 1 and γ(r = 1) = 0.

These fantastic simplifications means that for a (possibly mixed) single qubit or

any pure multi-qubit state, the projection filtering equations are simply

dx1t =

(f 2(t)x3

t − f 3(t)x2t − 1

)dt−

√κx1

t x3t dwt

dx2t =

(f 3(t)x1

t − f 1(t)x3t − 1

)dt−

√κx2

t x3t dwt

dx3t =

(f 1(t)x2

t − f 2(t)x1t

√κ (1− (x3

t )2) dwt.

(4.108)

It is an interesting exercise to see that if one simply computed the Heisenberg picture

filtering equations for the Pauli operators, πt(σi), one arrives at these vary same 3

coupled SDEs.

What the pure state evaluation says is that when r = 1, the dynamics of the

separable system no-longer depends on the total number of qubits and evolve simply

as n identical copies of a single qubit state. The only remaining dependence upon

n is in the innovations process, where we have the differential dwt → dvt = dyt −√κnx3

t dt, where dyt is the integrated measurement current in time [t, t+ dt].

One finial question about the pure state projection filter is, “Will the filter remain

pure, once it becomes pure?” In other words, is r = 1 a trap for the system so that if

rs = 1 for some time s, will rt = 1 for all t ≥ s. The physics of the situation dictates

that it must, as with no sources of decoherence the state will purify and stay pure.

However, it is difficult to see that this is indeed the case simply by inspecting Eq.

(4.108). Where this not the case, then it would likely indicate an error with the

model. Thankfully the answer is decidedly yes and can be seen in the spherical basis

representation of dρt, with the Stratonovich coefficients a(x, t) and b(x, t) given in

Eqs. (4.92) and (4.87). From these two equations we see that

ar(x, t) = −κα(r) r cos2 θ (4.109)

and that

br(x, t) = −√κ(1− r2) cos θ. (4.110)

Substituting in for r = 1 into Eq. (4.89), we find that α(r = 1) = 0. But this means

that ar(x, t)|r=1 = 0 and br(x, t)|r=1 = 0. Following this logic to its conclusion we

find that

dρ|r=1 = aθ(x, t)|r=1Dθ dt+ bθ(x, t)|r=1Dθ dwt

+ aφ(x, t)|r=1 Dφ dt+ bφ(x, t)|r=1Dφ dwt.(4.111)

In other words, by evaluating dρ at r = 1 we see that it is independent of the

tangent vector Dr and thus will remain on the submanifold of identical separable

states defined by r = 1.

4.6 Simulations and Performance

The primary purpose for deriving the projection filter was to find a reduced dimen-

sional description of a system of qubits undergoing a continuous measurement of

the collective angular momentum Jz. However we know that the end result of a

continuous measurement of Jz, absent of any other influence, is an eigenstate of Jz,

a so called Dicke states. With the exception of the stretched states, Dicke states

are not separable. Even after a relatively short time, the reduction of the Jz spin

component results in a nonclassical spin squeezed state, a phenomena observed in

several experiments [8–10].

In this section we test through numerical simulation how well the projection fil-

ter reproduces certain properties of the exact conditional state. Here we show that

by adding strong randomized external control fields during the collective measure-

ment the joint state remains highly separable and the approximate description of the

projection filter reproduces collective expectation values much more faithfully.

Chap. 5 uses the projection filter in an algorithm to reconstruct the initial con-

dition of a SCS from a continuous measurement of Jz, characterized by the rate κ.

In order to obtain information about observables other than Jz, an external control

Hamiltonian must be applied. For reasons discussed in Sec. 5.3.1, this takes form

of a sequence of global π/2 rotations, where each rotation is about an axis n that

was independently sampled from a uniform distribution. Therefore, we will test the

performance of the projection filter with this control law in mind.

Fully characterizing the control amplitude f(t) requires specifying the amplitude

and duration of each pulse, as a larger Larmor frequency is needed to enact the

same rotation in a shorter time. For simplicity, we will fix f(t) to have a constant

magnitude and will only vary its direction. This constrains the π/2 rotations to be

square-wave pulses, each of duration τ ,

f(t) =π

∑m=1

χ[m−1,m)( t/τ) nm (4.112)

where χ[a,b)(t) is the indicator function for the interval [a, b) and nm are i.i.d. unit

vectors drawn from a isotropic distribution.

To efficiently simulate the exact dynamics we utilize two conserved quantities.

The first is that because the system Hamiltonian Ht and measurement operator

L commute with J2, the total angular momentum of the atomic system will be

conserved. Furthermore, the states we ultimate use are all initialized in states with

a maximum projection of angular momentum along some direction, thereby always

possessing n/2 units of angular momentum. This allows us to restrict the simulations

to a d = n+ 1-dimensional space. In other words we simulate a single J = n/2 spin

system.

The second conservation property we will use is the fact that without any addi-

tional sources of decoherence, the conditional master equation maps pure quantum

states to pure quantum states. Sec. 3.4.3 discusses the conditional Schrodinger equa-

tion (CSE) and how it can be derived from a conditional master equation (CME).

Using a CSE generates significant computational savings, as each time step prop-

gates a single a complex vector, rather than a complex matrix. The general form of

the CSE is given in Eq. (3.94). In our case, L =√κJz for a real, positive κ this

simplifies to

d|ψt〉 =(−iHt − 1

2κ(Jz − 〈Jz〉 )2

)|ψt〉 dt+

√κ(Jz − 〈Jz〉

)|ψt〉 dwt. (4.113)

4.6.1 Simulation parameters

In absence of the control Hamiltonian Ht, the one universal timescale in the CSE

is set by the measurement strength, i.e. the characteristic time κ−1. Therefore

these simulations are all reported in time units of this characteristic time. The

range of qubits total qubit numbers we will test are between 25− 100, meaning that

the simulations will be of collective spin values of 12.5 ≤ J ≤ 50. In addition to

these collective spins, we will compare the projection filtering equations to the exact

simulations for a single qubit, proving that they generate the same dynamics.

The remaining parameters, namely the gate duration τ and the fixed terminal

simulation time tf will be chosen to correspond to the parameters that will be ulti-

mately used in Chap. 5. Specifically, τ = 5× 10−3 κ−1 and tf = 0.2κ−1.

The actual simulations are implemented in the MATLAB computing environment

using a hand coded, weak second order predictor-corrector stochastic differential

equation integrator. The algorithm is described by Kloeden et al. [69, page 200] and

was implemented in MATLAB by Brad Chase for his PhD dissertation [37].

4.6.2 Spin squeezing comparisons

Spin squeezing is a much sought after and well studied effect in atomic spin ensembles.

The general phenomena describes the reduction in uncertainty in a expected value

of a spin component transverse to the mean spin direction. Due to a Heisenberg

uncertainty relationship, this reduction in uncertainty is accompanied by an increase

in uncertainty in the orthogonal quadrature.

The standard example is to consider an collective spin system composed of n

qubits, initialized in a SCS pointing along the +ex direction. This state is clearly

an eigenstate of Jx with eigenvalue mx = J = n/2. It is also easy to show that this

state is a minimum uncertainty state so that it minimizes the Heisenberg uncertain

relation

⟨∆J2

⟩ ⟨∆J2

⟩≥ 1

4〈Jx〉2 (4.114)

with equal uncertainties 〈∆J2z 〉 =

⟨∆J2

⟩= 1

2J . An example of a spin squeezed state

is a state that is still mostly polarized along the ex axis but also contains quantum

correlations so that 〈∆J2z 〉 <

⟨∆J2

⟩but still maintains the equality of Eq. (4.114)

A spin squeezed state is one of the immediate consequences of a continuous mea-

surement of a collective angular momentum variable, such as Jz [71], for a SCS

prepared transverse to the measurement axis. A number of papers have investigated

the relation between spin squeezing and various measures of entanglement (see e.g.

[72] and references therein). One particular measure of spin squeezing, ξ2T , has been

shown by Yin et al. to be directly related to the concurrence, a measure of pairwise

entanglement [72]. They show that when the concurrence C is greater then zero,

indicating entanglement, then ξ2T < 1 and that when ξ2

T ≥ 1, C = 0 and the state is

unentangled. ξ2T takes on the following definition.

For each component of angular momentum we can compose the symmetrize cor-

relation and covariance matrices (i, j = x, y, z)

Corri,j = 12〈JiJj + JjJi〉 (4.115)

Covari,j = Corri,j −〈Ji〉〈Jj〉 . (4.116)

From these matrices we can also form the Hermitian matrix

Γ = (n− 1) Covar + Corr . (4.117)

The squeezing parameter ξ2T is then defined as

ξ2T ≡

〈J2〉 − n2

(4.118)

where λmin is the minimum eigenvalue of the matrix Γ.

4.6.3 Squeezing simulations

This section presents simulations that benchmark the typical effect the measurement

has upon the states of interest, as well as how much the control law mitigates these

effects. We test here five classes of states, each composed of n = 1, 25, 50, 75, 100

qubits. The one qubit case is included as a control, testing that the numerics produce

reasonable results.

As discussed previously, a continuous measurement of Jz is a standard protocol

for producing a spin squeezed state. However, a spin coherent state (SCS) prepared

along to the ez axis will not squeeze at all, while the states prepared in the equatorial

plane squeeze the most. Therefore to demonstrate the maximum amount of quantum

correlations a typical measurement realization can produce, a natural choice is a SCS

prepared along the ex axis.

Fig. 4.1 characterizes the typical results of a CSE simulation for a +ex SCS

initial state, containing n = 50 qubits, (spin J = 25). Figs. 4.1.a - 4.1.c show the

conditional expectation values of Jx, Jy and Jz as a function of time. Also plotted are

1σ regions of confidence indicating the expected deviations from these mean values.

Figure 4.1: Typical Uncontrolled Evolution. CSE evolution for a +ex SCS ini-tial state, containing n = 50 qubits, (J = 25). a-c: The conditional expectationvalues for Jx, Jy, and Jz vs time, including a 1σ region of confidence. d: Thesqueezing parameter ξ2

T in dB vs time. e: The Husimi Q-function qψ(θ, φ) for theconditional state, |ψtf 〉, at the final time tf = 0.2κ−1.

In other words, the grey regions are bounded by the values 〈Ji〉 ±√〈∆J2

i 〉. Fig.

4.1.a shows how 〈Jx〉 tends to decrease as the state squeezes around the sphere and

how its uncertainty grows. Fig. 4.1.b shows how 〈Jy〉 remains zero throughout the

measurement, while its variance increase due to the characteristic anti-squeezing.

Conversely, Fig. 4.1.c shows the decrease in 〈∆J2z 〉 due to the squeezing as well as

the deviation of 〈Jz〉 from zero as the system evolves towards an eigenstate of Jz.

Fig. 4.1.d plots the evolution of the squeezing parameter ξ2T , in dB, as a function of

time. ( Where ξ2T in decibels is 10 log10(ξ2

T ). ) Fig. 4.1.e shows a single 3D plot of

the Husimi Q-function quasi-probability distribution for the state at the final time

tf . The Q-function for a pure spin state ψ with total angular momentum J is defined

qψ(θ, φ) ≡ 2J + 1

4π|〈θ, φ|ψ〉|2 (4.119)

where |θ, φ〉 is a SCS parameterized by the polar angles θ and φ. The constant

factor ensures normalization. The color scale of Fig. 4.1.e has been normalized to

the maximum value of 2J+14π

. The fact that the Q-function has a maximum value of

∼ 2.2 shows that this squeezed state has poor overlap with SCSs.

Figure 4.2: Typical Controlled Evolution. CSE evolution for a +ex SCS initialstate, containing n = 50 qubits, with 40 randomized π/2 rotations. a-c: Theconditional expectation values for Jx, Jy, and Jz vs time, including a 1σ regionof confidence. d: The squeezing parameter ξ2

T in dB vs time. e: The HusimiQ-function qψ(θ, φ) for the conditional state, |ψtf 〉, at the final time tf = 0.2κ−1.

Fig. 4.2 presents a typical realization for the same initial condition in the presence

of the randomized π/2 rotations, with a duration of τ = 5× 10−3 κ−1. By the time

tf = 0.2κ−1, this period leads to a total of 40 rotations or one for every horizontal

tick mark in Figs. 4.2.a - 4.2.d. Figs. 4.2.a - 4.2.c show the conditional expectation

values 〈Jx〉, 〈Jy〉 and 〈Jz〉 as a function of time. In contrast to the example, lacking

the randomized controls, these Figs. show that there is little qualitative difference

between the three expectation values. Fig. 4.2.d indicates that there is a significant

reduction in the amount of squeezing produce during measurement compared to the

uncontrolled system. ξ2T min = −3.21 dB in the presence of controls while ξ2

T min =

−10.1 dB without them. The 1σ confidence regions indicate that this squeezing is

not with respect to a fixed coordinate axis but is rotated between all three and

so there is a substantial averaging effect that leads to far less squeezing than the

uncontrolled case. In fact there is nearly a factor of 5 decrease in the maximum

amount of squeezing and therefore the controlled state is kept much more separable.

This separability is indicated in the Q-function of the final state, shown in Fig. 4.2.e,

with its more spherical appearance and near perfect overlap with a spin coherent

state, indicated by the maximum value ∼ 3.7.

The amount of squeezing is significantly reduced because the randomized controls

tends to mix both the squeezed and anti-squeezed components leading to a near zero

average. Not only does the mean spin rotate, but the orientation of the squeezing

ellipse also rotates. As the rotation axes are chosen from a uniform distribution, the

squeezed component is just as likely as the anti-squeezed component to be oriented

along the measurement axis. At any given time, the uncertainty in the Jz component

is equally likely to be above or below the uncertainty of an equivalent spin coherent

state. Therefore it is difficult for any significant squeezing to develop.

4.6.4 Projection filter simulations

Ultimately we need to compare how well the projection filter performs when it

calculates an innovation from a measurement record yt that is not generated by

a separable state. In this case dwt is actually given by the innovation process,

dvt = dyt −√κnx3

t dt, which is not a Wiener process in all cases. We make this

comparison through two measures. The first is to see how well the projection filter is

able to reproduce the expectation values 〈Jx〉, 〈Jy〉 and 〈Jz〉 compared to the exact

conditional state. The second is through the fidelity, squared overlap, between the

exact and approximate states. Fig. 4.3 makes the comparison between the projection

Figure 4.3: Projection Filter Tracking example. The comparison between theprojection filter predictions and the exact conditional expectation values for thesimulations shown in Figs. 4.1 and 4.2. a-c: The conditional expectation values〈Ji〉 are shown for the exact uncontrolled state (blue) with the 1σ regions ofconfidence (grey). Also shown are the projection filtered values n

2xit (red). d:

The squeezing in the exact uncontrolled state (blue) and the squeezing reportedby the projection filter (red). e-h: Same as a-d but with the controls now applied.

filter and the expectation value shown in Figs. 4.1 and 4.1. Figs. 4.3.a - 4.3.c and

4.3.e - 4.3.g re-plot the true conditional expectation values, 〈Ji〉, as well as the pro-

jection filter values, given simply as n2xit. Fig. 4.3.a shows that as the uncontrolled

state becomes significantly squeezed, the 〈Jx〉 value reduces accordingly. The projec-

tion filter is unable to account for this and therefore has a noticeable error. Fig. 4.3.c

shows that after a time t ∼ 0.05κ−1, there is an increase in the difference between

the conditional expectation value 〈Jz〉 and the value calculated from the projection

filter. These differences are in stark contrast to the tracking results in Figs. 4.3.e -

4.3.g where the differences between all three expectation values are almost all within

the line thicknesses. Figs. 4.3.d and 4.3.h emphasise the fact that the projection

filter reports separable states and so the projected squeezing parameter ξ2T remains

fixed at 0 dB.

Beyond these two sample trajectories, we also test the quality of the projection

filter for a variety of initial states and qubit numbers. This comparison is made

in Fig. 4.4, showing a trial averaged RMS error between the projection filter and

the exact conditional expectation values. Here we test five different qubit values,

n ∈ 1, 25, 50, 75, 100. For each n, the average is made over ν = 100 input SCSs

chosen at random with a uniform distribution over the Bloch sphere. These same

Bloch angles are used for each n. For each input state we run a single simulation to

compute the three exact conditional expectation values, 〈Ji〉, as well as the projection

filter Bloch components xit. Then for each run we compute the RMS errors

Err Ji ≡√⟨(

1J〈ψt| Ji |ψt〉 − xit

)2⟩ν, (4.120)

as a function of time. The expectation value 〈·〉ν represents the athermic mean

over the ν trials. The normalization of the exact expectation value means that

0 ≤ Err Ji ≤ 1 or in other words, is in units of the total spin length J . With the

exception of the single qubit case (showing only numerical integration error), the

scaled RMS errors are relatively independent of number of qubits. This is likely due

to the fact that when the system is in a pure state, the projection filtering equations

are independent of n (see Sec. 4.5.1).

However, the presence of the strong randomized controls has a significant effect.

0 0.05 0.1 0.15 0.20

0.1RMS error − No Control

0 0.05 0.1 0.15 0.20

0.1RMS error − Control

0 0.05 0.1 0.15 0.20

Figure 4.4: Average RMS Tracking error vs time. The left column shows Err Ji,with i = x, y, z (in descending order) in the case of no control fields. The rightcolumn shows the same but in the presence of control fields. The average is overν = 100 uniformly random Bloch angles with a single noise realization per state.For each Bloch vector, the RMS error is computed for n = 1 (blue), n = 25(green), n = 50 (red), n = 75 (cyan) and n = 100 (purple) qubits.

Sans the controls, Fig. 4.4 shows a near linear increase in Err Jx and Err Jy. In Fig.

4.3.a, the projection filter was unable to track the decrease in the 〈Jx〉 component as

the uncontrolled state became squeezed and developed significant curvature on the

sphere. Were the system initialized in a +ey spin coherent state, the roles of Jx and

Jy would be reversed but still have the same behavior. We attribute the increase

in Err Jx and Err Jy to this effect. In contrast, when the randomized controls are

applied, the RMS error is equally distributed across all expectation values and remain

. 5% of the total spin length. Additionally, the Err Jz values is significantly worse

in the uncontrolled case. The ∼ 1% error at time t = 0 in the n = 100 simulations

is attributed to using Stirling’s approximation to calculate the Jz basis coefficients

in the initial SCS.

0 0.05 0.1 0.15 0.20.3

lter F

Projection Filter Fidelity − No Control

0 0.05 0.1 0.15 0.20.3

κ tPr

Projection Filter Fidelity− Control

n − 1n − 25n − 50n − 75n − 100

Figure 4.5: Average Projection Filter Fidelities. These plots show the averagefidelity between the exact CSE simulation and the SCS given by the projectionfilter. The left plot shows the fidelity in the uncontrolled case with the rightadding the 40 π/2 gates per simulation. The average is over ν = 100 uniformlyrandom Bloch angles and a single noise realization per state. For each Blochvector, the average fidelity is computed for n = 1 (blue), n = 25 (green), n = 50(red), n = 75 (cyan) and n = 100 (purple) qubits.

While these results indicate that the projection filter performs well in the pres-

ence of rapid, randomized rotations, an arbitrary spin state with J > 12

contains

more information than simply three expectation values. To characterize the general

performance, we turn to the second comparison and calculate the average fidelity

between the exact state and a SCS given by the projection filter. Fig. 4.5 makes

this comparison, averaged over ν = 100 uniformly sampled states. This is made

both with and without controls and again for n = 1, 25, 50, 75, 100 qubits. The

overall state fidelity for SCSs with n > 1 shows a poor performance as the number

of qubits increases, indicating an increase in the squeezing produced during the fixed

measurement duration. In the worse case with n = 100 and no controls applied, the

average fidelity reaches a minimum value ∼ 0.47. However with the controls, the

fidelity is > 0.80 for any n. The non-monotonic decrease in the controlled fidelity

suggests that the specifics of the control law impacts this fidelity and so it might be

possible to optimize the control law so that the average fidelity is maximized.

Chapter 5

Qubit State Reconstruction

This chapter describes how to use the quantum filtering formalism to construct a

tomographic estimate for an unknown initial quantum state from an ensemble of

identical copies experiencing a joint continuous measurement. We make a maximum

likelihood estimate of this state, based upon the statistics of a continuous measure-

ment of an output field quadrature. The purpose of this work is to extend previous

results [11–13] into a regime where the quantum backaction significantly effects the

measurement statistics.

We consider here the case of an ensemble of n qubits coupled to a single traveling

wave quantum light field. The qubit ensemble is assumed to be in a pure spin coherent

state characterized by the unknown polar angles (θ, φ). The quantum state estima-

tion problem is mapped to a parameter estimation problem, which is then approxi-

mated by a Monte Carlo sampling algorithm. Numerical experiments show that the

ultimate performance of the estimate approaches an optimum fidelity bound, found

by Massar and Popescu [39]. The deficit in the reconstruction fidelity is attributed

to a separability approximation in the Monte Carlo algorithm. This algorithm is

compared to, and significantly out performs, an equivalent “Schrodinger” estimate

that ignores the backaction of the measurement. At long times the Schrodinger es-

Chapter 5. Qubit State Reconstruction 179

timate is shown to be biased away from the true state, indicating the significance of

the conditional dynamics and the utility of the quantum filtering framework.

5.1 Previous reconstruction results

A fundamental task in quantum information processing is the ability to both reliably

prepare an arbitrary quantum state and to experimentally verify its production.

Traditional quantum state estimation relies on an exhaustive tomographic procedure

where the target state is repeatedly prepared and then destructively measured in

an informationally complete number of measurement settings. Such a procedure is

often extremely time intensive, requiring both a tremendous amount of data as well

as significant post processing time [2, 3].

In an alternative protocol proposed by Silberfarb et al., these inefficiencies can

be largely side-stepped though a weak continuous measurement of an identically

prepared ensemble in conjunction with a well chosen dynamical control [11]. In

particular, an atomic ensemble is prepared in an identical tensor product state ρtot =

ρ⊗n0 and experiences a known Hamiltonian while simultaneously coupled to a traveling

wave probe, via a collective degree of freedom. A continuous measurement of this

probe then generates a measurement record that is strongly correlated with the

evolution of the system. If the dynamic drives the system in such a way as to make

the measurement informationally complete, then a statistical estimate of an unknown

initial system state should have a high fidelity with the true initial condition.

Such a system naturally arises in the field of laser cooled atoms, were an ensemble

of n atoms are easily assembled and then weakly coupled to an off-resonant probe

laser. One can then measure a collective spin state of the ensemble via the amount

of polarization rotation induced by the Faraday effect. This protocol has been im-

plemented in several experiments, ultimately reconstructing the full 16-dimensional

hyperfine ground state manifold [12, 13]. However, they were performed in a param-

eter regime where the intrinsically quantum nature of the continuous measurement

could be ignored. The amount of state disturbance caused by the nonlinear mea-

surement process, the so called backaction, was negligibly small when compared to

the decoherence induced by diffuse light scattering as well as inhomogeneous effects

in the control fields.

This work investigates, through theoretical analysis and numerical simulation,

the fundamental limits of this protocol. We do so in an idealized model, where

the effects of decoherence are absent and thus the backaction becomes a significant

effect. To avoid unnecessary complications, we will also reduce the dimensionality

of our fundamental system and consider only pure qubits, initialized in an identical

tensor product state |ψtot〉 = |ψ0〉⊗n. With a fully quantum model of the atom-light

interaction, we formulate a maximum likelihood (ML) estimate of the single particle

initial state, which we will denote as |ψ0〉.

5.2 The Estimation Procedure

When ignoring backaction, the linearity of an unconditioned master equation means

that the measurement signal can be considered as a linear function of the initial

state of the atomic system, ρ0. In an additive white noise model, the instantaneous

polarimetry signal y(t) can be modeled as,

y(t) = g Tr (V(t, O0) ρ0) + “white noise” (5.1)

where g is a measurement gain relating to the signal-to-noise ratio, V(t, ·) is the

Heisenberg picture equivalent to a dissipative master equation and O0 is the initial

system coupling observable [13]. The problem of state reconstruction in this model

then becomes a constrained linear estimation problem.

In a generalized measurement model, the set of possible outcomes is described

by a positive operator-valued measure (POVM), with elements Eα indexed by a

discrete outcome α. In a given model of this measurement there exist a (possibly not

unique) decomposition of a POVM into a set of Kraus operators Aα, which satisfy

the relation Eα = A†αAα for every outcome α. Then upon obtaining the outcome α,

a pure state |ψ〉 updates via the transformation

|ψ〉 → 1√〈ψ|Eα|ψ〉

Aα |ψ〉. (5.2)

Due to the renormalization factor, this update map is inherently nonlinear in the state

vector. Any generalized measurement scheme can be decomposed into a continuous

measurement process [73]. Conversely, a continuous measurement process can be

modeled as a limiting sequence of weak generalized measurements. Then in general,

the nonlinearity of a repeated application of a time-dependent update map means

that a measurement sequence is no longer a linear functional of the initial state ρ0.

Much is known about the fundamental quantum limits of reconstructing pure

qubit states from a finite number of measurements. Massar and Popescu showed that

given n copies of a pure qubit state, it is possible to find a generalized measurement

that optimizes the average fidelity 〈F〉 between of the state estimate and the true

state, averaged over all possible input states [39]. The fidelity for the pure states ψ1

and ψ2 is

F ≡ |〈ψ1|ψ2〉|2 . (5.3)

(For mixed states this corresponds to the Uhlmann fidelity, but here we will only

be concerned with pure states.) With this definition the optimum average fidelity

bound is simply given

〈F〉opt =n+ 1

n+ 2. (5.4)

They also showed that such a generalized measurement is necessarily a joint mea-

surement involving all n qubits, and no single measurement applied in series to each

qubit can achieve this bound. Later, Bagan et al. found that a generalized mea-

surement scheme that achieves this bound is a measurement that is uniform over all

possible spin coherent states (SCS) composed from n qubits [74]. While Varbanov

and Brun gave a constructive proof for a continuous time stochastic process that re-

produces a given generalize measurement, it is often quite difficult to obtain a closed

form expression for what POVM the entirety of a given continuous measurement

implements. Instead of pursuing this track however, we instead turn to a Monte

Carlo sampling framework.

At its most basic level, the initial state estimation problem is a parameter estima-

tion problem, in that we observe a time varying signal whose statistics parametrically

depend upon the initial state of the atomic system. The simplest of all initial state

estimation problems is binary state discrimination. In this problem, the initial con-

dition is know to be one of two possibilities, ψa or ψb. Then based upon a sequence

of measurements, yt, we wish to identify which state was most likely to generate

these data.

In our more general problem, we have a data set yt and a detailed model of

the dynamical system that generated the data, but only with the knowledge that the

initial state is a SCS. To deal with the continuous nature of this parameter estimation

problem we resort to Monte Carlo sampling. We randomly generate a collection of

m sample SCS, ψj : j = 1, . . .m, picked from some prior distribution. In Sec.

5.4 we describe how we choose the prior distribution though a two step resampling

procedure, seeded from a uniform distribution over spherical angles. Because the

space of qubit SCS is isomorphic to the surface of the sphere, with just a few hundred

samples we can easily cover that space so that any discretization error is well below

the infidelity implied by the optimum bound 〈F〉opt.

Irrespective of how the candidate states are chosen, we have reduced the contin-

uous parameter estimation problem to a much simpler state discrimination problem.

We will choose the state |ψm′〉 ∈ |ψm〉 that maximizes the likelihood function

P (yt |ψm). In other words, the ML state |ψ〉 defined as

|ψ〉 = |ψ〉 ∈ |ψm〉 : p(yt |ψ) = arg maxm

p(yt |ψm) . (5.5)

In order to evaluate the likelihood function, we are still left with the problem of solv-

ing the recursive POVM expression or finding an equivalent method for calculating

Here we choose to formulate an equivalent expression. Because we are working

with a finite set of hypothesis states, we find that it is more efficient to propagate m

(approximate) conditional states from their initial values and calculate the likelihood

for seeing the next increment, given the current estimates. This method is discussed

in detail in Sec. 5.4.

5.3 The Model

Sec. 2.7 reviews how the Faraday interaction can be modeled as a collective angular

momentum J coupled to a single P quadrature in vacuum. We align our coordinates

so that we couple to the Jz projection of angular momentum, with the collective

angular momentum operators

Ji ≡1

n∑j=1

σ(j)i , (5.6)

where σ(j)i is the ith Pauli operator for the jth qubit and we have set ~ = 1. The

coupling rate κ between Jz and P is proportional to the local power in the drive

laser field, which in general could be a time varying quantity. For simplicity, we will

assume that the laser is operated in a switched mode, where at time t = 0 it achieves

a constant value and that the measurement record ends before it is turned off.

In order to make the measurement record informationally complete, (or in the

language of filter stability, make the system observable), we need to add an external

control Hamiltonian Ht, acting solely on the collective spin system. The exact form

for Ht to make it observable will be discussed in Sec. 5.3.1. Under these parameters

the system field interaction is given by the unitary propagator Ut, which is the

solution to the QSDE

dUt =(√

κ Jz dA†t −√κ Jz dAt − 1

2κ J2

z dt− iHt dt)Ut, U0 = 1. (5.7)

From this stochastic propagator we are able to apply the results of Sec. 3.3 and work

with a conditional master equation (CME). For reference, upon the receipt of the

measurement realization ytt≥0, the CME for this model is given by the SDE

dρt = −i[Ht, ρt] dt+ κD[Jz](ρt) dt+√κH[Jz](ρt) dvt (5.8)

with the initial condition ρ0 = ρ(0), where we have the following definitions. D[Jz](ρt)

is the Lindblad map commonly found in open quantum systems and is defined as

D[Jz](ρt) ≡ Jz ρt Jz − 12J2z ρt − 1

2ρt J

2z . (5.9)

H[Jz](ρt) is the state update map defined as

H[Jz](ρt) ≡ Jz ρt + ρt Jz − 2 Tr(Jz ρt) ρt. (5.10)

This map shows how the state updates, weighted by the strength of the innovation

process,

dvt = dyt − 2√κTr(Jz ρt) dt. (5.11)

5.3.1 Observability and randomized controls.

In reconstructing the full Cs ground state manifold, Riofrıo et al. used a random-

ized control policy to generate an informationally complete measurement record [13].

Merkel et al. showed that by combining traverse RF magnetic fields and microwave

radiation, with fixed magnitudes and time varying phases, the 16-dimensional ground

state manifold is controllable [75]. In other words, through these fundamental op-

erations it is possible to generate any ground state operation and thereby map any

state to any other state.

The connection between controllability and observability is a natural one. Imag-

ine that at time t = 0 the probe couples to the operator Jz. In order for the

measurement statistics of this probe to depend upon the Jy Bloch component, an

external control must at some point rotate the system so that field now couples to

the part of Hilbert space spanned by the projectors of Jy. If the controls are unable

to effect some hidden subspace, then the only other way to know about that part

of Hilbert space is to apply an additional probe. Not every observable system needs

to be controllable, however. One can certainly observe a system completely without

being able to affect it in an arbitrary way.

The strictest definition for a system to be observable is that if there are two

quantum states ρA and ρB where ρA 6= ρB then there cannot exist a projector P in

the von Neumann algebra generated by the observation process Ytt≥0 such that

Tr(ρA P) = Tr(ρB P) [76]. (See Sec. 3.2.2 for a discussion of von Neumann algebras

and quantum stochastic processes.) This definition guarantees that after many trials,

one will always be able to distinguish ρA from ρB by looking at the statistics of Y .

However, even if a given system is observable, this does not guarantee that it

is well observed in a given measurement realization. In order for the statistics of a

single realization to give a high fidelity estimate, the space of possible initial states,

e.g. the space of all spin coherent states, should be well represented throughout the

measurement record. If the goal was to measure Jz to a high degree of accuracy, the

optimum control policy would be to apply no control at all. However our objective is

to measure every spin coherent state with equal weight, there by hopefully achieving

the optimum POVM fidelity bound.

Riofrıo et al. found that high fidelity reconstructions were possible by choosing

random, piecewise constant phase angles, thereby randomly cycling though a con-

trollable set of operations. Here we choose to implement a control policy that is

randomized between a set of generators that rapidly spans the space of spin coherent

states. This policy then guarantees that these states will be well represented in the

measurement statistics. To achieve this, the control Hamiltonian Ht is chosen to

have the form

Ht = b(t) · J = bx(t)Jx + by(t)Jy + bz(t)Jz, (5.12)

where the control field components bi(t) are drawn from a random distribution but

are predetermined before the start of the measurement, i.e. are without measurement

feedback.

For simplicity, we further emulate the control policy of the Cs experiments and

fix the magnitude of the control field while varying its direction in a randomized

but piecewise constant way. Furthermore we will constrain the magnitude so that

for each direction, the Bloch vector will rotate by π/2. Switching the field direction

with a period of τ then requires ‖b(t)‖ = π/(2τ). With this constraint, the control

law is fully defined.

To generate a control waveform with m randomized π/2 gates with a period τ ,

we first generate a set of m of unit vectors ei so that each vector ei is drawn from

a uniform distribution across the unit sphere. The control field is then

b(t) =π

m∑i=1

χ[i−1,i)( t/τ) ei. (5.13)

5.4 The Likelihood Function

In a discrete setting where the space of all possible outcomes, (the entire measurement

record ytt≥0) can only have a finite number of outcomes, the likelihood function

is simply the probability of receiving the observed values, given a parameter value.

The maximum likelihood estimate is then the parameter value that maximizes the

probability for obtaining the observed data. When the measurement takes on a

continuous number of outcomes the probability for receiving a specific outcome is in

fact zero. However, we can still formulate a likelihood function by instead considering

the probability density for the observed value.

Things become a bit more complicated when considering stochastic processes in

continuous time. In Chap. 3, we found that the probability measure for a Wiener

process was defined by Wiener’s discrete path integral. This means that for a se-

quence of n times 0 = t0 < · · · < ti · · · < tn = tf we can ask for the probability

that the Wiener process evaluated at time ti will be within the interval (ai, bi). The

resulting probability is given by the integral

P (wti ∈ Ii) =

∫ b1

∫ b2

dw2 · · ·∏i

2π∆tiexp

(−(wi − wi−1)2

2∆ti

)). (5.14)

If one attempts to take a continuous limit of this expression you find something

rather peculiar [77]. By focusing on just the product of exponentials, one finds

limn→∞

n∏i=1

(−(wi − wi−1)2

2∆ti

)= lim

n→∞exp

n∑i=1

(wi − wi−1

∫ tf

ds(dws

(5.15)

Were the Wiener process in anyway differentiable, this expression might be exceed-

ingly useful. However with our limited knowledge of stochastic analysis, it merely

indicates the subtleties in working with densities of continuous time, nondifferen-

tiable processes. Attempting to make sense of these kinds of objects lead to the

formulation of a stochastic calculus of variations, which has proved exceedingly use-

ful for extending an Ito integral for anticipative integrands [78] as well as a theory

of white noise stochastic partial differential equations [21]. We will not follow this

path here.

There is an additional consideration as we know that ytt≥0 is decidedly not

a Winer process. Even making a discrete approximation, we still need to find an

expression for the discrete density and how it depends upon the initial system state.

In this problem, we can make some progress. Sec. 3.4.1 showed that the innovation

process, vt = yt − 2√κ∫ t

0dsTr(Jzρs) is an instance of a Wiener process. More

specifically, vt is a Wiener process when the filtered state ρs accurately represents

the conditional state of the system. In our Monte Carlo setting we do not have just

a single conditional state ρt, we in fact have a set of m conditional states ρmt , as

the proper initial condition is unknown. It is possible that a candidate state ρmt will

differ in some aspects from the conditional state we would calculate, had we know

then true initial condition. For each hypothetical state we will have a set of possible

innovations vmt , each a function of the measurement record yt and the filtered state

ρmt . It should be clear that not every vmt will be an instance of a Wiener process. In

fact the maximum likelihood estimate that we will construct hinges upon the fact

that not every vmt will be a Wiener process. This is because rather than computing

the entire unknown and highly complicated statistics of yt, we will compute the

statistics of the known and simple statistics of the Wiener process vt. We then seek

the candidate initial condition that makes the statistics of vmt most resemble a Wiener

process.

Stepping back from the mathematics for a moment, converting Eq. (5.14) into

an expression for p(yt |ρm) via the innovation is deceptively simple. We can write

vt = vs + yt − ys − 2√κ

ds Tr(Jz ρs), (5.16)

or in other words,

∆vi ≡ yti − yti−1− 2√κ

∫ ti

ti−1

ds Tr(Jz ρs). (5.17)

From the nonanticipative construction of the Ito integral, we have, for the smallest

of possible time differences,

∆vi−1 ≈ ∆yi − 2√κ∆ti Tr(Jz ρti−1

). (5.18)

The density for ytt≥0 is then made by simply substituting ∆vi into Eq. (5.14). In

other words, the likelihood for the increment variables yi ≡ ∆yi is then given by

pn(y1, y2, . . . , yn|ρm) ≈(n∏i=1

(2π∆ti)−1

n∑i=1

(yi − 2√κ∆ti Tr(Jz ρ

mti−1

2 ∆ti

). (5.19)

The only possible way to maximizing Eq. (5.19) with respect to the initial condition

ρm is by minimizing the argument of the exponential. If we set all of the time

increments to be equal, ∆ti = ∆t, we can even factor out the denominator and so

the maximum likelihood estimate then becomes a problem of minimizing the sum,

QV(vmt ) ≡n∑i=1

(∆vmi )2 =n∑i=1

(∆yi − 2

√κ∆tTr(Jz ρ

mti−1

. (5.20)

This kind of object is called the quadratic variation1 and Appendix B showed that

it is ultimately what gives rise to the rules of Ito calculus. So while it is a relatively

delicate mathematical object, it is well defined in the infinitesimal limit. Further-

more, in proving the Ito rules, one shows that QV(wt) = t with probability one,

so that we expect, and observe numerically, that QV(vmt ) ∼ t for most candidate

initial conditions. It is often the case that in Guassian problems such as ours, a

maximum likelihood estimate over a Gaussian probability density simple becomes

the least squared estimate. So in our Monte Carlo search we have

arg maxm

p(yt|ρm) = arg minm

QV(vmt ). (5.21)

5.4.1 The reconstruction procedure

The Monte Carlo sampling estimator we have outlined follows this rough procedure:

1. Sample m pure Bloch vectors uniformly from the unit sphere.

1Technically the quadratic variation is given in the infinitesimal limit [22].

2. For each sample state compute the forward time evolution conditional on the

measurement record yt.

3. Compute the quadratic variations of the innovation processes for each condi-

tional state.

4. Select as the estimate, the sample state that minimizes the quadratic variation

at the final time.

In practice we need to modify this procedure in two respects. The first is that due

to involving the stability of Markov Filters [76], the above procedure suffers from

poor numerical stability when the hypothesis initial condition has very little overlap

with the true initial condition. To rectify this problem, we implement a two step

procedure, by first sampling m mixed initial conditions and then resampling, within

some solid angle, pure states about the direction of the most probable mixed state.

This issue will be discussed in detail in Sec. 5.4.2.

The second modification stems from the fact that propagating the full conditional

Schrodinger equation for a sufficient number of samples requires a large amount of

computer time. To fully propagate a spin J pure state requires 2J + 1 complex

numbers. The stochastic integrator we choose to use implements a weak second-

order predictor-corrector method ( Kloeden et al. [69, page 200] ) and empirically

requires a time step ∆t ∼ 10−6 κ−1 to produce reliable expectation values. When

considering ensembles of mixed qubits it is not sufficient to consider the maximum

projection of the collective angular momentum, but instead requires considering all

possible total angular momentum values one could construct with n spin-12

particles.

This requires a total density matrix of order n2 × n2 in size [79].

In Chap. 4 we developed a projection filter that projected the conditional master

equation for the collection of n qubits onto the manifold of identical separable states,

which greatly reduces the computation demand. We also showed that in the presence

of strong randomized control, the projection filter tends to track the exact expecta-

tion values with a RMS error of less than 5% of the total spin length. For mixed

initial conditions, rather than propagating matrices of dimension ∼ n2× n2 for each

sample state, the projection filter allows us to reduce this to tracking a single mixed

Bloch vector, i.e. three real numbers. With these modifications, the Monte Carlo

separable least squares estimate is computed though the pseudocode algorithm 5.1.

Algorithm 5.1 A Monte Carlo Separable Least Squares Estimate

rm ←− m uniformly random Bloch vectors with r = rmixed < 1

for all rm ∈ rm do

rmt ←− Integrate Eq. (4.105) with record yt and initial value rm0 = rm.

QV(vmt )←−∑

i (∆yi −√κnx3m

t ∆ti )2

end for

rmin ←− rm′ ∈ rm : QV(v(m′)t ) = min QV(vmt )

r′m ←− m random Bloch vectors with r′ = 1 and r′m · rmin /rmixed ≤ cos(Θmax )

for all r′m ∈ r′m do

rmt ←− Integrate Eq. (4.108) with record yt and initial value rm0 = r′m.

QV(vmt )←−∑

i (∆yi −√κnx3m

t ∆ti )2

end for

r′min ←− r′m′ ∈ r′m : QV(v(m′)t ) = min QV(vmt )

return ρ(r′min )

5.4.2 Coupled CMEs

and filter stability

For a given hypothesis state ρm, there is a hidden “true” state ρ? that generated

the measurement record yt. When using this measurement record to propagate the

hypothesis state, that ends up coupling ρm to ρ?. This leads to a coupled set of

stochastic differential equations. By definition, the conditional state ρ?t results in an

innovations process that is a true Wiener process. Explicitly, ρ?t is evolves according

to the stochastic differential equations,

dρ?t =− i[Ht, ρ?t ] dt+ κD[Jz](ρ

?t ) dt+

√κH[Jz](ρ

?t ) dwt, (5.22a)

dyt = dwt + 2√κTr(Jzρ

?t )dt. (5.22b)

where wt is the correct innovation process and is an unobserved Wiener process.

The candidate initial condition ρm0 , and the measurement record yt are the elements

necessary for propagating ρmt in time via,

dρmt =− i[Ht, ρmt ] dt+ κD[Jz](ρ

mt ) dt+

√κH[Jz](ρ

mt ) dvmt , (5.23a)

dvmt = dyt − Tr((L+ L†)ρmt )dt. (5.23b)

In terms of the unobserved Wiener process wt, ρmt evolves as,

dρmt =− i[Ht, ρmt ] dt+ κD[Jz](ρ

mt ) dt+

√κH[Jz](ρ

mt ) dwt (5.24a)

− 2κH[Jz](ρmt ) Tr (Jz(ρ

mt − ρ?t )) dt.

Note that if at some time t we happen to have the equality ρ?t = ρmt , then we also

have dρ?t = dρmt . But the general state ρτ at some time τ > t can be written as

ρτ = ρt +∫ τtdρt′ . This implies that because the differentials are equal whenever the

states are equal we will have ρ?τ = ρmτ for every τ > t if ρ?t = ρmt .

One possible result of this coupling is that it acts as an attractor, always decreas-

ing the “distance” between the Jz projections of ρ?t and ρmt . This correction effect

is known as filter stability. If the filter is able to correct for certain modeling errors,

it is stable. The differences in the two initial states ρ?0 and ρm0 can be viewed as a

modeling error and the convergence of ρmt → ρ∗t is a correction of this error. This is a

well studied effect in both the quantum and classical settings, see [76] and references

there in.

In [76], Van Handel gave explicit criteria for when a quantum filter is stable for

an incorrect initial conditions. For our purposes these criteria boiled down to the

following two issues. The first is that the system must be observable, in that the

measurement record must be informationally complete. If we did not have a trans-

verse magnetic field, then the measurement statistics would only include information

about the eigenstates of Jz and so the system is not observable. The second issue is

that the probability density for yt calculated with the true state ρ? must be absolutely

continuous with respect to the density calculated under the guessed state ρm. This

is a term borrowed from classical probability theory and embodies the concept that

a probability measure PB is compatible with observations that are actually governed

by PA. The quantum version is given by the following definition. In order for ρA

to be absolutely continuous with respect to ρB, then for any projector P in the von

Neumann algebra generated by Ytt≥0, we must have Tr(P ρB) = 0 implying that

Tr(P ρA) = 0. This need not be a two sided relationship so that ρB need not be abso-

lutely continuous with respect to ρA. These requirements are not just important to

the question of filter stability but also apply to the Monte Carlo sampling procedure.

As it has been discussed previously, the observability condition is vital in order to

obtain a high fidelity estimate. However absolute continuity is also quite important.

In a Kraus operator formulation of a continuous measurement, the state after the

measurement outcome i is updated as

ρ 7→ ρ|i =AiρA

Tr(A†iAiρ). (5.25)

If the denominator Tr(A†iAiρ) = 0, then the update cannot be made as it requires

dividing by 0. However Tr(A†iAiρ) is also the probability for obtaining the outcome

i, as calculated according to the state ρ. Therefore, if the event i occurs with this

probability, dividing by zero is not an issue as it will never happen. Suppose we obtain

the outcome i that occurred with probability Tr(A†iAiρ?) = p?i . Furthermore, suppose

we tried to update a state ρm that had the audacity to assert pmi = Tr(A†iAiρm) = 0.

This results in a crisis of conscious, as there is no way to incorporate this incompatible

information into our world view. The condition that ρ? must be absolutely continuous

with respect to ρm means that pmi will never be zero without p?i also equal to zero.

In principle any valid initial spin state could generate a given diffusive measure-

ment record. This can be easily seen by noting that the “true” innovations process

is given by

v?t = yt − 2√κ

dsTr(Jzρ?s) (5.26)

and is a Brownian motion. Because Jz takes on eigenvalues in the range −n/2 ≤

mz ≤ n/2, any candidate innovation vmt will be within the range,

yt − n√κ t ≤ vmt ≤ yt + n

√κ t. (5.27)

For finite, n, κ, and t, it is perfectly possible for a Brownian motion to obtain any

of these values, it is just increasingly unlikely. Therefore if the measurement record

is observable, the conditional master equation is in principle stable.

In practice, the numerical stability of states conditioned on highly improbable

measurements becomes a issue. By not taking this into account preliminary results

that did not consider the possibilities of unstable trajectories, showed nearly a 1%

drop in the average reconstruction fidelity from what we ultimately achieve. Inves-

tigating the cause of this sub-optimal performance showed that the average fidelity

was significantly biased by outlier trajectories that gave estimated states that were

nearly orthogonal to the true state. The cause of these outliers was the numerical

stability of Monte Carlo sample points with very poor overlap with the true state.

By switching to the two step sampling procedure in algorithm 5.1, every initial

mixed single qubit state can be viewed as a convex combination of pure states point-

ing along opposite directions. That is, if we have the possibly mixed single qubit

Bloch vector r with length 0 ≤ r ≤ 1 we have

ρ(r) = 1+r2ρ(er) + 1−r

2ρ(e−r), (5.28)

where ρ(er) is a projector on the the pure SCS pointing in the er direction. This

implies that by using initial mixed vectors each initial state has some support over

the orthogonal spin coherent state ψ(e−r)⊗n. In the numerical simulations presented

in Sec. 5.5 an initial mixed state vector of radius rmixed = 34

provides enough of

a signal to choose an appropriate direction for the pure state resample as well as

enough orthogonal support for the trajectories remain stable.

5.4.3 Backaction in continuous quantum measurement

In order to identify what impact the backaction has on the reconstruction fidelity, we

need to construct a similar but backaction-free estimator. A figure of merit commonly

used to consider the importance of backaction is the ratio of the “projection noise” to

the “shot-noise”. The projection noise is a description of the fluctuations (i.e. noise)

in a given observable if a projective measurement is made. As we are considering a

continuous measurement of Jz with respect to a SCS, the relevant projection noise

⟨∆J2

⟩ψ⊗n =

⟨∆σ2

= n p+1(1− p+1) (5.29)

where p+1 = |〈+1|ψ〉|2 is the probability to observe the individual spin state to be

in the +1 eigenstate of σz [4].

The shot-noise describes the noise added by making a continuous measurement

over a finite time. To identify the order of magnitude of this additive noise, note

that from Eq. (5.26) we have

yt = v?t + 2√κ

ds Tr(ρ?s Jz) (5.30)

and that v?t is a realization of Brownian motion. We would like to invert this formula

to arrive at a random variable whose statistics allow for an estimate of Tr(ρ?0 Jz).

Suppose that we wished to model the system completely ignoring the theory of

continuous quantum measurement and that for times 0 ≤ s ≤ t, ρ?s evolves according

to the Schrodinger equation. If we further assume that Ht = 0, we then have that

Tr(ρ?s Jz) = Tr(ρ?0 Jz) and so the classical random variable

jz ≡yt

2√κ t

= Tr(Jz ρ?0) +

2√κ t

v?t (5.31)

is Gaussian distributed with mean Tr(Jz ρ?0) and Var( jz) = (4κ t)−1. It is this vari-

ance that is referred to as the shot-noise added by the probe. Looking at the ratio

of these two fluctuations we have

ζ ≡〈∆J2

z 〉ψ⊗n

Var( jz)= 4nκ t p+1(1− p+1). (5.32)

If the system is prepared in a SCS with p+1 = 12

then Sec. 4.6.3 showed that we

then expect a maximum amount of spin squeezing or equivalently a large amount of

bipartite entanglement. In this case 〈∆J2z 〉ψ⊗n takes on its maximum value of n/4

and so ζ = nκ t. When ζ 1 then one expects a significant contribution of quantum

backaction in the system and therefore the measurement effects must be accounted

for [5]. In the uncontrolled spin squeezing simulations of Sec. 4.6.3, we found that

for n = 100 and κ t = 0.2 we found ξ2T ∼ 10 dB and so that in the absence of strong

Hamiltonian controls, ζ = 20 indeed leads to a strongly nonclassical state.

However, the above discussion assumed no controls. It is possible that with

the randomized controls considering only the Hamiltonian evolution is sufficient to

obtain a high fidelity estimate. To make this comparison we formulate a backaction-

free estimator, one that only includes the Hamiltonian in the model for the forward

time dynamics. Rather than considering a measurement record where yt is given by

Eq. (5.30), we instead propose a model were

yt ≈ wt + 2√κ

ds Tr(Jz ρ?(s)) (5.33)

and ρ?(t) is the solution to the Schrodinger equation

dtρ?(t) = −i[Ht, ρ

?(t)] (5.34)

and wt is a Wiener process.

To make a fair comparison, this backaction-free estimator will also be imple-

mented though a Monte Carlo sampling procedure. We use a algorithm similar to

algorithm 5.1, but with two modifications. The first is that the two step sampling

procedure is unnecessary because there are no conditional dynamics to cause numer-

ical stability. The second is that because the dynamics are linear, the Schrodinger

evolution in Eq. (5.34) is most efficiently computed in the Heisenberg picture. In

the Heisenberg picture, we simply need to integrate the time evolution of the Jz

observable once and then compute its expectation value with each candidate state.

Furthermore in this decoherence free model the system state will always remain in a

separable state and so we need only consider the Heisenberg evolution for the single

qubit Pauli operator, σz. In other words,

2√κTr(Jzρ

m(t)) =√κn 〈ψm|σz(t) |ψm〉 (5.35)

where σz(t) is the solution to the Heisenberg equation of motion

dtσz(t) = +i[Ht, σz(t)], with σz(0) = σz. (5.36)

The pseudocode for the backaction-free estimator is given in algorithm 5.2

Algorithm 5.2 A Monte Carlo Backaction-free Estimate

σz(ti) ←− Integrate Eq. (5.36) and evaluate at times ti.

rm ←− m uniformly random Bloch vectors with r = 1

for all rm ∈ rm do

QV(vmt )←−∑

i (∆yi −√κn Tr(σz(ti−1) ρ(rm) ) ∆ti )

end for

rmin ←− rm′ ∈ rm : QV(v(m′)t ) = min QV(vmt )

return ρ(rmin )

5.5 Numeric Simulations

This section presents the results of numerical simulations, comparing algorithms

5.1 and 5.2 to the optimum POVM bound in Eq. (5.4). The bound 〈F〉opt =

(n + 1)/(n + 2) gives the average fidelity of a single POVM where the average is

taken over measurement outcomes as well as an average over possible input SCS.

Therefore, the results of these simulations are reported as an average of ensemble of

ν trials. All results in this section use ν = 1000 trials.

5.5.1 Simulation parameters

For each trial, we choose a single qubit Bloch vector from a distribution that is

uniform over the surface of the unit sphere. We then use this vector to gener-

ate SCSs composed of n qubits. This simulations use the qubit numbers n =

25, 40, 55, 70, 85, and 100. Then for each initial state and each number of qubits,

we generate a single measurement realization yt and use this record to then estimate

the initial Bloch vector. In total 6000 measurement records were generated.

Every simulation uses the same control Hamiltonian, where the randomized piece-

wise constant control vector b(t) was generated at the start of the simulation. The

directions of rotation are again distributed uniformly across the unit sphere and no

attempt was made to select an optimum realization. The parameters that fully con-

strains the simulation are the measurement strength κ, the final measurement time

tf and the control gate period τ . With no other scales in the problem we choose to

essentially set κ to one and discuss the remaining two parameters in units of κ−1.

In Chap. 4 we found that the separable approximation is valid in regime where

the randomizing magnetic field strength κ b0. By fixing the strength to generate

a π/2 rotation in one gate period τ this means that b0 = π/(2τ), implying that

κ τ 1. We also found that a gate period τ = 5 × 10−3κ−1 gave less than a 5%

RMS tracking error for the separable projection filter, (see Sec. 4.6), and places b0

two orders of magnitude greater than κ.

For n = 25− 100 qubits we find that the reconstruction fidelities have saturated

by a time t ∼ tf = 0.2κ−1, which we fix as final time for every simulation run. With

this final time and gate period, each simulation has 40 randomized π/2 rotations.

To efficiently implement these simulations we exploit two conservation properties

of the system. The first is that because the total angular momentum operator J2

commutes with the stochastic unitary of Eq. (5.7), the total angular momentum of

the system is conserved. This means that by initializing the system in a state of

maximum projection of angular momentum (i.e. in a pure SCS) we are initializing

the system in the eigenspace with total angular momentum J = n/2. Rather than

considering the entire d = 2n dimensional Hilbert space we only need to simulate

a spin J = n/2 particle and work in its d = 2J + 1 = n + 1 dimensional Hilbert

space. The second property is that the conditional master equation we consider here

maps pure states to other pure states, because it has no additional loss channel. This

means that we can in fact integrate a conditional Schrodinger equation rather than

a conditional master equation. This makes a substantial savings in computational

overhead as we be propagating a d = n + 1 complex vector in time, rather than a

d× d complex matrix. These two properties that makes it computationally feasible

to generate 1000 measurement records for a system containing 100 qubits.

The actual simulations are implemented in the MATLAB computing environ-

ment using a hand coded weak second-order predictor-corrector stochastic differen-

tial equation integrator. The algorithm is described in Kloeden et al. [69, page 200]

and was implemented in MATLAB by Brad Chase for his PhD dissertation [37].

Monte Carlo Parameters

The Monte Carlo separable estimator used 250 sample states for each part of the

two-step estimation. In the initial step, the 250 mixed states produce a sparse but

uniform covering of all possible SCS directions. A typical sampling has an average

angular separation between adjacent points of ∼ 6 and a maximum separation of

∼ 20. As mentioned above, the mixed state radius of the Bloch vector used in these

simulations is rmixed = 0.75. In the second step, we sample 250 pure states that

are constrained to be no more than 45 from the most likely mixed state direction.

Example first and second step sampling distributions are shown in Fig. 5.1.

Figure 5.1: An Example Monte Carlo State Sampling Distribution. (left) Anexample of the initial mixed state sampling for the Monte Carlo algorithm, withm = 250 and rmixed = 0.75. (right) An example of resampling m = 250 statesabout the +ex axis with Θmax = 45.

For the backaction-free comparison, we use an number of samples matching the

density of points in the second resample step. The resampled solid angle covers

approximately 15% of the Bloch sphere, meaning that m = 1700 cover the whole

sphere with roughly the same density of states. This number of samples lead to an

average fidelity between nearest neighbors of 〈F〉sample = .9994, meaning that if the

true ML estimate falls between two sample points, on average, the infidelity caused

by the Monte Carlo sampling will be on the order of 10−4. This is well below the

optimum POVM bound for the simulated qubit number and so any loss in fidelity

should not be attributable to sampling errors.

5.5.2 Results and discussions

25 40 55 70 85 100

Average Fidelity of Reconstruction

The BoundSeparableSchrodinger

Figure 5.2: A comparison of numerical reconstructions to the optimum bound(Color Online.) Data points show the average fidelity of single shot reconstruc-tion as a function of the number of qubits n, averaged over ν = 1000 randomlychosen pure initial states states. Blue circles show the separable estimator. Greendiamonds show the backaction-free Schrodinger equation estimator. The opti-mum POVM bound is show in as a dotted line. Error bars show a standard errorof ±

√Var[F ]/ν.

Fig. 5.2 shows the results of the numerical simulations. The trial-averaged re-

construction fidelity is plotted as a function of the number of qubits in the system

for both the separable estimate (i.e. with backaction) and the Schrodinger evolved,

backaction-free estimator. The fidelity is computed by taking the squared overlap

between the single qubit state for that measurement record with the single qubit

state estimate. In other words, if the true qubit state is given by the Bloch vector r0

and the estimate reports the Bloch vector rm then the fidelity of that reconstruction

is given by F = 12(1 + r0 · rm ).

In these numerical experiments, the separable Monte Carlo estimator shows a

significant improvement over a simple backaction-free estimator that considers only

the unitary evolution of the state due to the control fields. The discrepancy increases

as the number of qubits increase, keeping the duration of the measurement fixed.

Furthermore, the separable estimator almost achieves the optimum bound. The

deficit between the bound and the numerical averages never exceeds 0.21% with an

average of 0.16%, which is still above the expected error caused by the discrete Monte

Carlo sampling. A possible source for this deficit could be the separability assumption

in the projection filtering method, which is known to have a non-negligible tracking

error in the Jz expectation value (see Sec. 4.6).

The performance of the backaction-free Schrodinger estimator is best understood

by considering not just the estimate for the initial state given the entire measurement

record, but to instead consider the family of estimates created by only taking part

of the measurement record.

Estimator Bias

The Monte Carlo estimators take as input a measurement record y containing data

for times t ∈ [0, tf ] and returns an estimate for the initial state ρ0. It is just as easy to

consider a whole family estimates computed with only part of the total measurement

record, i.e., instead of using the entirety of y we use ys for 0 < s ≤ tf in computing

the estimate. Ideally, having more data should only improve the estimate. However,

in order to use the data at times t > s we are required to compute an estimate for

the state of the system at time s. If this estimate is in fact inaccurate, then any

modeling errors might bias the conclusions drawn from future measurements.

Moreover both of the estimators considered here have modeling errors. The sep-

arable estimator uses the projection filtering equations, which explicitly remove any

entangling dynamics. The estimator based simply upon the unitary Schrodinger dy-

namics makes a much greater sin. This estimate completely ignores any effect the

measurement has on the system of qubits. Figures 5.3 and 5.4 indicate what affect

these modeling errors have on the average reconstruction fidelity.

0 0.05 0.1 0.15 0.20.950

0.9860.9880.990

7085100

Separable Estimator

Figure 5.3: Reconstruction fidelity vs measurement duration. This plot showsthe average reconstruction fidelities for the separable filter estimate, as a functionof the length of the measurement record. Shown are traces for the 6 qubitnumbers considered, which are (in order of decreasing reconstruction fidelity)100, 85, 70, 55, 40, and 25 qubits respectively. The vertical axis is a linear scale,with grid lines indicating the optimum fidelity bound for these same number ofqubits. The averaging was over ν = 1000 randomly chosen pure initial states.

Fig. 5.3 shows for the separable filter, the trial averaged reconstruction fidelities

for all 6 qubit numbers plotted against the duration of the measurement record.

It is clear from this figure that having a larger signal composed of more qubits

improves the final fidelity. It also shows how, as the number of qubits increases, the

fidelity improves at a faster rate. Furthermore, the modeling error introduced by the

separable approximation does not seem to significantly bias the estimate away from

an optimum sample state.

0 0.05 0.1 0.15 0.20.950

0.9860.9880.990

7085100

Schrodinger Estimator

)Figure 5.4: Reconstruction fidelity vs measurement duration. This plot showsthe average reconstruction fidelities for the backaction-free Schrodinger estimate,as a function of the length of the measurement record. Shown are traces for the6 qubit numbers considered, which are (in order of decreasing reconstructionfidelity) 100, 85, 70, 55, 40, and 25 qubits respectively. The vertical axis is alinear scale, with grid lines indicating the optimum fidelity bound for these samenumber of qubits. The averaging was over ν = 1000 randomly chosen pure initialstates.

Fig. 5.4 shows an identical plot for the backaction-free Schrodinger estimate,

showing that a larger number of qubits improves the reconstruction fidelity with a

higher fidelity estimate at shorter measurement times. However, it also shows that

not including the backaction into the model significantly decreases the reconstruction

fidelity at longer measurement times. This bias tends to be more pronounced as

number of qubits increases.

While the peak reconstruction fidelities between the two methods seem to be

comparable (certainly within error bars), the fact that the Schrodinger evolution is

biased away from an optimum state shows the importance of including the backaction

in the dynamical model.

Chapter 6

Summary and Outlook

We conclude with a summary of each research chapter and a discussion of the possible

avenues this research might take in the future.

6.1 Quantum optics and quantum stochastic dif-

ferential equations

Chap. 2 derived a relation between quasi-monochromatic traveling wave packets

and the bosonic Fock space necessary for defining a formal quantum Ito stochastic

calculus. We identified a limit where the continuous-time tensor product decom-

position is consistent with a quasi-monochromatic approximation. The limit was

ultimately enforced by convolving any bounded, square-integrable complex function

with a smoothing kernel, constraining the resulting object to be slowly-varying in

time. A suitable white noise limit was identified when the kernel approached a

delta function in conjunction with a limit where its derivative remained remained

infinitesimal when compared to an optical period. This produced a separation of

three timescales. The resulting quantum stochastic integral is on the slowest scale,

Chapter 6. Summary and Outlook 207

the delta correlated smoothing kernel is in the middle, and the fastest is an optical

period.

This version of a quantum white noise limit is new and distinct from two existing

explanations. The first is a static picture of a bosonic heat bath lacking any dynam-

ical flow of information, which does not capture the fundamental propagation of a

traveling wave field [50, 51]. In an alternative description, Accardi et al. derive a

quantum white noise limit through a rescaling argument [54, 55]. If the system field

coupling Hamiltonian had a fundamental interaction strength of λ, then by rescal-

ing time as t → t/λ2 and taking the limit λ → 0, a quantum white noise operator

appears. While this is in no doubt mathematically correct, it is in our opinion ad

hoc, as requiring time to be not just big, but specifically 1/λ2-big is an artificial

constraint. Here we do require a white noise limit in a “middle timescale”, meaning

that the smoothing envelop approaches a delta function (σ → 0) while the timescale

of one carrier oscillation tends to zero faster ((σω0)−1 → 0). We do not require any

fixed or delicate scaling law. Additionally, in our model the QSDE treatment stands

independently from any system-field interaction, as long as that coupling respects

the above approximations.

After formulating this a quantum white noise approximation, we rederived a

QSDE description of the propagator. This derivation applies the recently derived

quantum Wong-Zakai theorem [43]. This result describes how a general interaction

involving scattering operations converges to a valid QSDE. This result allows for

a stochastic description of a dispersive Faraday interaction in a regime of a weak

drive but high optical density. A fundamental characteristic of any scattering based

propagator is that admits for the possibility of having multiple scattering events in

the intermediate timescale. It effectively renormalizes over this effect.

The scattering propagator we derive is similar to a propagator derived by Bouten

and Silberfarb [80]. They derive a polarizability interaction for a 4-level atom starting

from a QSDE expression for a quantum field coupling two ground states, via two ex-

cited states. They then adiabatically eliminated the excited atomic states under the

usual approximations of weak excitation. The resulting propagator contained similar

but not identical scattering processes. The difference is that the resulting integrands

for the dΛrrt and dΛll

t terms only contain single scattering events. By starting from

a QSDE for a dipole interaction and then eliminated an atomic manifold, the atoms

are only capable of making a single scattering transition in one intermediate time

increment. It is unclear at this point which model is more applicable for describing

the underlying physics or if both are equally “wrong”, just in different ways. It

should be noted that both models agree in the limit of a weak forward scattering

rate and negligible spontaneous emission. More analysis is clearly needed to settle

any debate.

6.2 Classical and quantum probability theory

Chap. 3 served as a review of both classical and quantum probability theory. It

ultimately focused on the mapping, via the spectral theorem, between sets of com-

muting observables and a classical probability space. When a family of operators

Yii∈I pairwise commute, the underlying projectors, i.e the spectral measure P(dλ),

define the classical probability model (Ω,F ,P). The sample space Ω is set of labels

λ, F is the smallest σ-algebra over Ω, and the probability measure is defined by

the quantum expectation value for a projector associated with any element in F ,

P(dλ) = Tr(ρP(dλ)).

The utility of this mapping is that it allows us to define a classical stochastic

process that is in some sense equivalent to a QND observable. When restricting

our attention to operators that commute with the projectors defining the classical

space, we are able to treat any operator A =∫

Ωa(λ) P(dλ) in terms of the classical

random variable a. When the eigenvectors of A do not form a complete basis in

the underlying Hilbert space, there exist operators X that commutes with A but

still must be treated quantum mechanically. Rather than being a liability, this is

in fact a feature, as it allows us to simplify a system-probe interaction by treating

the system quantum mechanically and the probe as a classical random variable.

We call the resulting description semiclassical, but it still captures the physics of a

system-probe-measurement model, when Yii∈I are the measured probe observables.

However a careful physicist should always check that any operator X considered in

this model commutes with every Yi.

We then apply this formalism to rederive the conditional master equation, when

measurements are generated from the observation process Yt = U †t (At+A†t)Utt≥0 in

vacuum expectation. This derivation is initially performed in the Heisenberg picture

and is then referred to a quantum filter. In the quantum filtering language the filtered

observables U †tXUt are still operators, written as πt(X), and are linear combinations

of the projectors P(dλ). When converted to a conditional master equation, we

compute a system state ρt that is defined on the system Hilbert space only. When

taking an expectation value of a system operator X, we have the equality

πt(X)|Yt=yt = Tr(ρtX) (6.1)

for every system operator X and time t. We also described how the quantum filter

is equivalent to a purification of any generalized measurement scheme.

When the conditional master equation has complete information, and the ini-

tial condition is a pure state, then it is sufficient to use an equivalent Schrodinger

equation and calculate the random system state vector |ψt〉. Given a correct and

complete description of the system, a conditional Schrodinger equation is identi-

cal to the stochastic Schrodinger equation derived in the quantum optics literature

and the choice of the measurement observables defines the specific unraveling of the

unconditioned master equation.

In a recent paper by Tsang and Caves, they define a quantum mechanics free

subsystem as a set of time-dependent operators subject to the constraint that the

operators commute for any time [81]. Given the set of Heisenberg picture operators

Oi(t) : i = 1, . . . , n, if [Oi(t), Oj(t′)] = 0 for all i, j, t, and t′, then they are free of

the laws of quantum mechanics. This is exactly the same idea as the quantum to

classical mapping via the spectral theorem and relies on exactly the same principle.

We mention this result here as it is an example showing active research utilizing the

mapping between quantum and classical structures. The tools of classical probability

theory should have an important role to play in this line of research. In particular,

the concept of a commutant should be invaluable as it describes the set of operators

that are compatible with this subsystem.

The results of Chap. 5 would not have been possible without noting that the

fundamental noise process driving the conditional master equation is not simply a

Wiener process, but is instead the measurement realization itself. With different

initializations, the conditional master equation produces different innovations and

only a few of them will have the statistics of a Wiener process. From a statistical

perspective, the conditional master equation is an estimator that allows us to predict

the outcomes of measurements performed on the system, given an ancilla coupled

measurement record. Furthermore in classical estimation theory, the concepts of

robustness and stability play an important roles, as it is a desirable for an estimator

to be robust to modeling imperfections and incorrect initializations. The stability of

the quantum filter shows that in most cases the conditional master equation is able to

correct itself given bad initial information. This supports the case that a conditional

quantum state should be viewed as a quantum analog to a classical estimator. Using

this perspective, it is possible that one might be able to formulate a variant of the

quantum filter, that is more robust to modeling errors or corrupting noise in the

measurement signal.

For real optical beams, the continuous-time tensor product decomposition was

only an approximation, and that for short enough timescales the operators did not

strictly commute. Outside of this approximation, it is not immediately clear if one

could formulate a truly commutative space of operators for the purpose of defining

a conditional expectation. As the technology of ultrafast lasers progresses, it might

be possible to experimentally test a regime where subsequent optical measurements

almost commute and the conditional dynamics might then reveal surprising quantum

effects. Doing so would likely require formulating a conditional expectation on a

noncommutative von Neumann algebra, which in some cases is possible [76], but the

classical probabilistic interpretation is lost [25].

Von Neumann argued that a commuting approximation to almost commuting

observables was always possible [82], however this has been shown to be not the

case [83]. In the context of spin chains, Ogata recently established the existence

of commuting approximations for “macroscopic” observations [84]. It is likely that

these mathematically rigorous results will be invaluable in identifying the consistent

information embedded in a sequence of noncommuting observations.

6.3 Projection filtering for qudit ensembles

The methods of differential geometry are a set of powerful and flexible tools, readily

applied to a wide variety of problems. Orthogonal projections are fundamental to

quantum theory. Chap. 4 combines both of these tools to derive an approximation

to the conditional master equation for an ensemble of n qubits, given a diffusive

measurement of the collective angular momentum projection Jz, and in the presence

of strong global rotations. The approximation was based on the ansatz that if the

system was initialized in an identical tensor product state, % = ρ⊗n, then it should

remain close to a state ρ′ ⊗n of some single qubit state ρ′. The approximation was

made to find a modified evolution that preserved this symmetry. It was formulated

by projecting the conditional master equation, acting on the state %, into the tangent

space of the manifold of states

P ≡ρ⊗n : ρ is a valid qubit state

. (6.2)

We worked in a parametrization where the single qubit state is mapped to a vector

defined within the unit Bloch ball. We were able to derive an analytic expression for

a projection filter that describes the diffusion of the Bloch vector in a non-Euclidean

but isotropic space.

We subsequently tested the quality of the resulting approximation numerically for

pure spin coherent states. These simulations were performed for a systems composed

of 25 ≤ n ≤ 100 qubits under a variety of conditions. We make this comparison

first without an external Hamiltonian, allowing the system to evolve only under

the action of the measurement. In an exact description this model would produce

a significant amount of spin squeezing and is confirmed in the simulations. The

projection filter tracked the mean expectation values with ∼ 90% accuracy but failed

to describe the correlations induced by the squeezing, as it was designed to do. We

then performed the same analysis in the presence of a Hamiltonian driving strong

randomized rotations. In this case, spin squeezing failed to significantly accumulate,

leading to a & 95% agreement between the exact and projected mean expectation

values and an average fidelity > 80% between the projected and exact states for all

qubit numbers tested.

A natural extension of the projection filter is to move beyond qubits and consider

higher spin systems. Unfortunately, the simplicity of the Bloch sphere is lost for

d > 2. There certainly exist d2 − 1 traceless, orthogonal, Hermitian matrices for

decomposing a d-dimensional quantum state. The problem is that in attempting to

formulate a mapping between valid quantum states and a d2 − 1-dimensional ball,

you find that not every point inside the ball, or its surface, corresponds to a valid

quantum state [68]. The problem is that while the orthonormal matrices have a

number of useful features, they do not share the same spectra, and so the boundary

between valid and invalid states is not isotropic. For qubits, we happily ignored any

issues involving the boundary between valid and invalid states. It is likely that for

qudits willful ignorance may lead to disaster. The best course of action may be to

seek a more abstract representation of the state.

In addition to moving to a higher spin system, we can also consider correlated

states. One family of correlated states of general interest are spin squeezed states.

Finding a smooth parametrization for pure spin squeezed states is not difficult as the

canonical example of spin squeezing is generated by a specific Hamiltonian [70]. By

composing the one parameter group of squeezers with the group of SU(d) rotations,

it is likely that one can describe the space of pure spin-d squeezed state as a d2-

dimensional manifold, baring any issues with linear independence. Whether or not

there is a wieldy metric induced on this space is a whole other question entirely.

An additional complication is the unavoidable fact that for a model to be at

all experimentally useful, it must be able to handle mixed states and decoherence.

Adding single qubit decoherence to the separable projection filter is a trivial un-

der taking. Any map that acts identically and independently on each qubit is, by

definition, in the tangent space of identical separable states. The reason why our

simulations only considered pure state dynamics is because generating exact simu-

lations for n ∼ 50 qubits is quite challenging when the total angular momentum is

not a conserved quantity. Chase and Geremia derive a simulation technique that

required only order n2 parameters for exactly propagating n qubits under symmetry

preserving local decoherence [79]. By applying this or a similar method, we expect

to be able to extend our numerical tests to include some decoherence to the model.

While the algorithmic and “optimal” nature of the projection filter is appealing,

control and system engineers confronted by nonlinear problems have derived a num-

ber of suboptimal but highly successful estimation techniques. Some of which are

a linearized extended Kalman filters [85], “unscented” Kalman filters [86], Monte

Carlo “particle” filters [87], symmetry preserving filters [88], and so on. Some or

all of these techniques may prove useful for partially observed quantum systems.

Although, without a general mapping between quantum observables and classical

statistics, none of these tools are applicable.

6.4 Qubit State Reconstruction

Chap. 5 applied the quantum filtering formalism to construct a tomographic estimate

for an unknown initial quantum state from an ensemble of identical copies experi-

encing a joint continuous measurement. We found a maximum likelihood estimate

of the initial state, based upon the statistics of a single continuous measurement

realization. The purpose of this work was to extend previous results using a contin-

uous measurement for quantum state tomography, into a regime where the quantum

backaction significantly affect the measurement statistics. In a numerical study with

ideal conditions, we found that our estimate nearly saturate an optimum bound.

Derived by Massar and Popescu, this bound states that the average reconstruction

fidelity given n copies of a pure qubit state and no other prior information, the best

average fidelity is given by 〈F〉opt = (n+ 1)/(n+ 2).

The problem of identifying a tomographic estimate was mapped to a parameter

estimation problem, where the statistics of the measurement record parametrically

depended upon the initial qubit state. We then found that the likelihood function

for the measurement record was ultimately Gaussian, leading to an equivalence be-

tween a maximum likelihood estimate and a least-squares estimate. Maximizing the

likelihood function then ultimately reduced to minimizing the quadratic variation

of an innovation process, computed from the measurement record and a conditional

state estimate. When the conditional state corresponds to a “correct” description,

then the innovation is a Wiener process, setting the minimum value to be ∼ t.

In order to make a numerical implementation computationally feasible, we ap-

proximated an exact innovation by one computed with the projection filter. As this

reconstruction procedure is tied to the quality of the projection filter, any improve-

ments in its accuracy will almost surely improve the reconstruction fidelity. We

expect that by extending the projection filter to include squeezed states, there will

be near perfect agreement between an innovation computed from the projection filter

and an innovation computed from the exact conditional master equation.

The nature extensions of the projection filter carries over to the case of state re-

construction. The principle of finding a least-squared estimate is system independent

as long as evolution still described by a diffusive conditional master equation. How-

ever when moving to qudits, the number of parameters we need to estimate grows

unfavorably with d. It is likely that in the general case, it will no longer be feasible

to simply sample from the compact parameter space and select the most likely candi-

date. Parameter estimation in nonlinear statistical models is a well studied problem

for classical systems and we believe that a classical solution will be adaptable to the

quantum case. One possible avenue to investigate is a statistical importance and

resampling technique, a “particle filter” [87], which has already been adapted to a

quantum parameter estimation problem [89].

From our perspective, it is an open question as to whether or not minimizing the

innovation’s quadratic variation is the optimum statistical test to use. In hypothesis

testing, comparing the ratio of two likelihood functions has been shown to have the

most predictive power out of all statistical tests. From that fact, one possible method

for improving the reconstruction procedure is to compute the likelihood ratio between

each candidate state and a master equation initialized in the completely mixed state.

In hypothesis testing, the ratio is made by comparing the likelihood of the data being

generated from your model compared to a null hypothesis. For a quantum system

the most logical null hypothesis is the completely mixed state. It is possible that

by computing this likelihood ratio we will be able to better discriminate the signal

arising from the initial state from the signal caused by the quantum backaction.

It is still an open question as to why this continuous measurement scheme ap-

proaches the optimum bound computed by Massar and Popescu. Because the numer-

ical results perform so well, it is likely that the randomized controls are mapping the

continuous measurement to a uniform measure over all spin coherent states, as this

is known to achieve the optimum bound [74]. Understanding what effective POVM

a given controlled-continuous measurement implements will likely be a powerful re-

sult in itself. Armed with that knowledge, a clever experimentalist could engineer

any number of complicated measurement protocols. Not the least of which being

unambiguous state discrimination [63].

Appendix A

Paraxial Optics

This appendix review the paraxial wave equation and ultimately calculates Fourier

transform of a paraxial mode function. The paraxial wave equation begins by assum-

ing a quasi-monochromatic solution to the wave equation that takes the form of a

rapidly oscillating plane wave, exp(+i(k0z−ω0t)) modulate by an envelope function

that changes slowly in both space and time. If there is a vector valued function

U(x, t) satisfying the wave equation

∇2U(x, t)− 1

∂t2U(x, t) = 0, (A.1)

then we hypothesize a real-valued solution of the form U(x, t) = U (+)(x, t) + c.c.

U (+)(x, t) = u(+)(x, t)e+i(k0z−ω0t) (A.2)

and u(+)(x, t) is a slowly varying function. Slowly varying is characterized by the

inequalities∣∣∣∣∂2u(+)

∣∣∣∣ k0

∣∣∣∣∂u(+)

∣∣∣∣ k20

∣∣u(+)∣∣ (A.3a)

and ∣∣∣∣∂2u(+)

∣∣∣∣ ω0

∣∣∣∣∂u(+)

∣∣∣∣ ω20

∣∣u(+)∣∣ . (A.3b)

Appendix A. Paraxial Optics 218

Based upon this assumption one can then neglect terms in the wave equation that are

second derivatives with respect to z and t. The result is the paraxial wave equation,

∇2Tu

(+) + i

(∂u(+)

∂u(+)

)= 0. (A.4)

∇2T denotes the Laplacian with respect to the remaining transverse coordinates

x, y, and we will denote the transverse direction as xT . It is often said that

the paraxial approximation is valid under the assumption that the envelope function

u(+) varies slowly when compared to an optical period, 2π/ω0.

Under the change of variables (x; t)→ (xT , z; t− z/c) the paraxial wave equation

becomes independent of retarded time tr = t − z/c. Without a loss of generality it

can be assumed that

U (+)(x, t) = f(tr)u(+)T (xT , z) e

−iω0 tr (A.5)

for any slowly varying f . In this case we have,

∇2Tu

(+)T (xT , z) = −i ∂

(+)T (xT , z). (A.6)

If we make the replacement z → t and k0 → m/~ then Eq. (A.4) is identical to

a two-dimensional Schrodinger equation in free space. Furthermore in the Fourier

domain, each plane wave component is an eigenstate of the “Hamiltonian” 12k0

ultimately implying,

u(+)T (kT , z) = u

(+)T (kT , 0)e

−i |kT |2

2k0z. (A.7)

where kT = kxex + kyey. Note that this is Fourier transform is with respect to the

transverse components only and is still a function of the longitudinal component z.

Taking the Fourier transform of the full solution U(x, t)→ U(k, t),

U (+)(k, t) = u(+)T (kT , z = 0)

∫dz√2πe−ikzze

−i |kT |2

2k0zf(t− z/c)eik0(z−ct). (A.8)

Appendix A. Paraxial Optics 219

If we change variable from taking the spatial transform with respect to z to trans-

forming with respect to the retarded time tr we find that

U (+)(k, t) = c f( c|kT |2/(2k0) + ckz − ω0) u(+)T (kT , 0) e

−ic(|kT |22k0

where f(ω) is the temporal Fourier transform of f .

We know that regardless of any approximations the positive frequency compo-

nent of a traveling wave solution evolves according to U (+)(k, t) = U (+)(k, 0)e−ic|k|t.

Evidently, the paraxial approximation is an approximation that

ω(k) = c |k| ≈ c (|kT |2

+ kz). (A.10)

This approximation seems slightly at odds with the fact that |k| is strictly nonneg-

ative, while the right-hand side of Eq. (A.10) extends to negative frequencies. This

issue is resolved by the assumption that u(+)(x, t) is a slowly varying function or

equivalently u(+)(k, t) is a sharply peaked function about k = 0. The resolution is

that because of the carrier plane wave, U (+)(k, t) is a sharply peaked function about

k0 = k0 ez. While in principle ω(k) can be negative, these negative components

never contribute as long as the paraxial approximation holds. Finally we find that

U (+)(k, t) = c f (ω(k)− ω0) u(+)T (kT , 0) e−iω(k)t. (A.11)

Appendix B

Classical Stochastic Calculus

This appendix reviews the derivation of the Stratonovich and Ito integrals as well as

the rules of Ito calculus. We attempt to describe the salient points found in standard

texts while leaving out the proofs.

Stochastic integration, in either the quantum or classical sense begins from a form

of functional integration. The traditional Riemann integral takes a function f(t)

and integrates with respect to a small difference in its argument ∆t. In a functional

integral, as defined by Stieltjes, the function f(t) is integrated with respect to a small

difference in another function g(t), i.e.∫ t

f(s) dg(s) ≡ limn→∞

n∑i=1

f(t∗i )(g(ti)− g(ti−1)

)(B.1)

where ti−1 ≤ t∗i ≤ ti. This limit can be shown to make sense if f and g are reasonably

well behaved. One limitation is that g can’t vary “too much” over a time interval ∆t,

(the total variation of g must be finite) [53]. Like the usual definition of a Riemann

integral, the convergence of this integral does not depend upon where t∗i lies in the

interval [ti−1, ti].

A stochastic integral, often called a stochastic differential equation (SDE), re-

places both functions f and g by stochastic processes. However in this replacement

Appendix B. Classical Stochastic Calculus 221

the problems of working with nondifferentiable functions leads to a more delicate

situation. The fact that Brownian motion has a nowhere smooth trajectory means

that its total variation is infinite, leading to a divergence in a Riemann-Stieltjes

limit [22]. Not only does this mean that we are forced to consider a different kind

of limit, but the choice of t∗i makes a dramatic difference on its mathematical and

statistical properties. Specifically, if t∗i = ti−1, one arrives with an Ito integral, which

has several desirable statistical properties but does not obey the chain rule as seen

in ordinary calculus. If, however, t∗i is taken at the midpoint of the interval, called

a Stratonovich integral, then the rules of calculus are preserved but the statistical

properties are more involved. Fortunately there exists a simple conversion between

the two integral definitions. We will discuss all of these issues in greater detail in the

following section.

Concretely, the Ito and Stratonovich integrals begin with the following definitions.

Consider the partitioning of the time interval [0, τ ] into in increasingly dense mesh

n-ordered times tn : n ∈ N, 0 < t1 < · · · < tn = τ . The Ito integral takes the

(time-adapted) process xtt≥0 and defines the integral of the well-behaved function

b(xt), with respect to the Wiener process wt, to be∫ τ

b(xt) dwt ≡ limn→∞

n∑i=1

b(xti−1) (wti − wti−1). (B.2)

This limit is then shown to converge to an almost unique object, with probability

one1 The Stratonovich integral takes a different definition,∫ τ

b(xt) dwt ≡ limn→∞

n∑i=1

b(xti+xti−1

)(wti − wti−1). (B.3)

These two definitions arrive at fundamentally different, but not unrelated, integrals.

One integral can be converted to another by using a simple trick, derived in Sec.

B.1.1.

1There is a slight caveat where one could add another random process which happensto have zero probability of ever occurring.

Before moving on to discussing the operational and statistical properties of these

integrals it is worth noting, that in attempting to model a classical physical sys-

tem, with SDEs the choice of calculus is crucial. Fortunately, the question as to

which calculus to use is answered by the Wong-Zakai theorem [90] (see the intro-

duction to Appendix D). In this paper they showed that an ordinary differential

equation containing a piecewise smooth approximation to Brownian motion, limits

to a Stratonovich equation and not an Ito equation. If one derives an equation of mo-

tion for a system including an approximation to Brownian motion then the solution

to that equation must be interpreted in the Stratonovich sense.

B.1 Ito Calculus

The rules of Ito calculus can be derived with varying levels of detail and sophistica-

tion. Their practical purpose is to give a method for manipulating and combining

multiple Ito integrals into new and different expressions. The bottom line result is

that the standard differential chain rule, d(fg) = f ′dg + g′df must be extended to

include a second order correction, see Eqs. (B.14 - B.15). An often cited reference

for the derivation of the Ito integral and Ito calculus is the book by Oksendal [22].

There he shows how the limits in Eqs. (B.2) and (B.3) may or may not converge.

Specifically, the standard techniques for defining a integral with respect to a Rie-

mann sum fails, because in doings you ultimately consider the quantity, called the

total variation,

lim∆t→0

n∑i=1

|wti − wti−1| (B.4)

for the partition of times a ≤ t0 < · · · < tn = b. What you can show is that with

probability one, this is infinite for the Wiener process. However you can also show

that instead of summing the absolute value of each increment, |wti−wti−1|, you sum

the square of each increment, (wti − wti−1)2, then you obtain a finite quantity. This

is called the quadratic variation, and with probability one,

lim∆t→0

n∑i=1

(wti − wti−1)2 = b− a. (B.5)

What this is saying is that while the Wiener process travels an infinite absolute

distance in any finite time, it RMS displacement only grows like the square root of

time. Using the fact that the Wiener process has a well behaved quadratic variation,

the limits of the Ito and Stratonovich integrals make sense if you consider their

squared expectation value, the so called L2(P) limit.

For a flavor for how a squared expectation value might make sense for a Wiener

process, observe that as it is constructed to have Gaussian statistics for any finite

interval,

limn→∞

( n∑i=1

(wti − wti−1)

)2 = E

((wt − ws )2

)= t− s. (B.6)

The way this is turned into an integral is that if one has the time-adapted stochastic

process bt, i.e. it is assumed to be independent of wt′ for times t′ > t, then it can be

shown that

limn→∞

(( n∑i=1

bti−1(wti − wti−1

= limn→∞

(n∑i=1

b2ti−1

(ti − ti−1 )

)(B.7)

as long as E(∫ t

sb2t dt)< ∞ [22]. A fair amount of analysis goes into showing for

what kinds of processes a piecewise constant approximation. More work is needed

to extend the proof to hold with probability one. We mention this here only because

it indicates the line of reasoning that relates the product of two Ito integrals to a

integral over time of the product of the integrands.

Moving from a consistent definition of an integral to a mature calculus involves

placing a constraint on what kind of integrands we are able to use in a stochastic

integral. Consider the example of the recursively defined Ito process

xt = x0 +

a(s, xs) ds+

b(s, xs) dws, (B.8)

where a and b are continuous functions that are once differentiable in time and twice

in x. An extremely common and useful notational device is to write an Ito integral

in a differential form,

dxt = a(t, xt) dt+ b(s, xt) dwt (B.9)

to represent that integral. The utility of this notation is apparent in that if xt were

an ordinary deterministic equation, (setting b(t, xt) = 0) then xt is the solution to

the equation

dt= a(t, x). (B.10)

This is why that these kinds of equations are called stochastic differential equations.

A typical application is to consider two such equations, so in addition to xt we

have the differential

dyt = c(t, yt) dt+ d(t, yt) dwt. (B.11)

We wish to find a differential for the product xt yt or even some other function

f(xt, yt). The answer to the first example is that

d(xt yt) = a(t, xt) yt dt+ b(t, xt) yt dwt

+ xt c(t, yt) dt+ xt d(t, yt) dwt

+ b(t, xt) d(t, yt) dt.

(B.12)

This expression can be derived by computing the L2(P), ∆ti → 0 limit of the def-

inition of the Ito integral. It is important to emphasize that the right-hand side of

this equation is itself another Ito integral. By multiplying, adding, and subtracting

Ito integrals one still finds “just” another Ito integral, with the whole closing upon

itself to form an algebra. Compare this example to a second order Taylor expansion

of the product xt yt,

d(xt yt) = dxt yt + xt dyt + dxt dyt. (B.13)

If we apply the Ito rules to the second order product dxt dyt, the only surviving

term is (b dwt) (d dwt) = b d dt. This means that there is a consistency between Eqs.

(B.12) and (B.13). The general case for some function f(xt, yt) is given by the second

order Taylor expansion,

df(xt, yt) =∂f(xt, yt)

∂xtdxt +

∂f(xt, yt)

∂ytdyt

∂2f(xt, yt)

∂x2t

dxt dxt +1

∂2f(xt, yt)

∂y2t

dyt dyt +∂2f(xt, yt)

∂xt ∂ytdxt dyt (B.14)

dxt dxt = b2(t, xt) dt, (B.15)

dyt dyt = d2(t, yt) dt, and (B.16)

dxt dyt = b(t, xt) d(t, xt) dt. (B.17)

In order for this general Ito expansion to be well defined, f(x, y) must have a finite

first and second derivatives.

B.1.1 The Ito conversion

Supposed we have the recursive Ito form SDE such that

∫ τ

a(xt) dt+

∫ τ

b(xt) dwt. (B.18)

We then assert that this same process has a corresponding Stratonovich form,

∫ τ

a(xt) dt+

∫ τ

b(xt) dwt. (B.19)

(a has no relation to the Fourier transform.) Our goal is to then find a relation

between the functions a(x) and a(x) that makes this assertion true. Subtracting

these two expressions lead to the equality,

Ic ≡∫ τ

b(xt) dwt −∫ τ

b(xt) dwt =

∫ τ

a(xt) dt−∫ τ

a(xt) dt. (B.20)

This difference is known as the Ito correction term.

To lighten the notation we will define the intervals, ∆xi ≡ xti − xti−1, ∆wi ≡

wti − wti−1and ∆ti ≡ ti − ti−1. The Ito correction can then be written as

Ic = limn→∞

n∑i=1

(b(xti+xti−1

)− b(xi−1)

)∆wi

limn→∞

n∑i=1

(b(xti−1

∆xi)− b(xi−1)

)∆wi

(B.21)

We can write the integrand very suggestively in terms of a prelimit form of the

derivative of b(x). By defining

∆x(x) ≡ b(x+ ∆x)− b(x)

∆x(B.22)

we then have

b(xti−1

∆xi)− b(xi−1) =

∆x(xti−1

) ∆xi. (B.23)

With a recursive definition for xt, we can substitute ∆xi to find

Ic = limn→∞

n∑i=1

∆x(xti−1

)(a(xti−1

) ∆ti + b(xti−1) ∆wi

)∆wi. (B.24)

However, by the rules of Ito calculus (i.e. ∆ti ∆wi → 0 and ∆wi ∆wi → ∆ti with

probability 1) the whole expression converges to

∫ τ

dx(xt) b(xt) dt. (B.25)

And so we ultimately find that

a(x) = a(x)− 1

dx(x) b(x). (B.26)

This conversion between Ito and Stratonovich equations is vitally important in Chap.

4, where the first order rules of ordinary calculus allows for the application of differ-

ential geometry to a stochastic system. Sec. 4.2.4 gives an example for why using

Stratonovich calculus is necessary.

Appendix C

Quantum Stochastic Calculus

This appendix reviews the basis notation and properties of an Ito form quantum

stochastic differential equation (QSDE). Sec. 2.5 discusses at length how a bosonic

Fock space F (h), defined over the single particle Hilbert space h = L2(R+) ⊗ Cd,

has the operators Qit and P i

t . Each of these operators have the statistics of Wiener

processes when taken in vacuum expectation. They are constructed though linear

combinations of the annihilation and creation operators, Ait = a[χ[0,t] ei] and Ai †t =

a†[χ[0,t] ei], which are also used to form non-Hermitian quantum Ito integrals. In

addition to these processes, Sec. 2.6.2 encountered a different kind of operator, the

scatter or conservation processes Λijt . Here we will gloss over how Λij

t can be defined

in terms of an operation acting on the single particle Hilbert space (see [45]). Note

that(Λijt

)†= Λji

t . To define an integral with respect to Ait, Ai †t , and Λij

t , it is

Appendix C. Quantum Stochastic Calculus 228

sufficient to know the matrix elements1,

⟨e[f ]∣∣Ait ∣∣e[h]

ds hi(s) 〈e[f ]|e[g]〉 , (C.1a)⟨e[f ]∣∣∣Aj †t ∣∣∣e[h]

ds f ∗j (s) 〈e[f ]|e[g]〉 , and (C.1b)

⟨e[f ]∣∣Λij

∣∣e[h]⟩

ds f ∗i (s)hj(s) 〈e[f ]|e[g]〉 . (C.1c)

The derivation of the quantum Ito integral and the equivalent stochastic calculus

begins in much of the say way as the classical Ito integral. Eq. (B.2) gives the limiting

sequence used in defining the Ito integral, with its characteristic nonanticipative

integrand. The quantum Ito integral is given by a similar form, where the integral

with respect to dAjt is given by the limiting sum, (with the mesh of times t0 = 0 <

t1 < . . . tn < t)

Xs dAjs ≡ lim

n→∞

n∑i=1

Xti−1(Ajti − A

jti−1

). (C.2)

Similar definitions are made for integrals with respect to Aj†t , Λijt , and time. The

integrand Xs is required to be a time-adapted process meaning that it is required

to act as the identity on the Fock space F (h(s,∞)). Note that in general, Xs is not

required to act solely on the Fock space F (h[0,s]). It could be an operator defined over

a joint space Hsys ⊗F (h[0,s]). Because the integrand is required to be time-adapted

it commutes with the differential and so we can write,

Xs dAjs =

dAjs Xs. (C.3)

We will use both forms, whichever is more convenient. It should not be that surpris-

ing that proving that the above integrals exists and are finite is more difficult than

in a classical setting. We will not reproduce the full result here, see [23, 52] for the

proof.

1For technical reasons the amplitudes of these states are assumed to be square integrableand have a large but finite upper bound [25, 45].

However to get a sense of where the quantum Ito rule comes from, we will discuss

a few key points. The proof of convergence of the infinitesimal limit is shown by

taking a piecewise constant approximation, and then proving convergence of the

matrix elements. The set of vectors chosen for those matrix elements are the tensor

product of any system pure state and an exponential vector. By showing that it

hold for these matrix elements, you can then extend the result to hold in expectation

with any state composed of convex linear combinations of these vectors. For example,

consider the integral Yt in Eq. (C.2),

〈ψ ⊗ e[f ]|Yt |ψ ⊗ e[h]〉 = limn→∞

n∑i=1

∫ ti

ti−1

ds hi(s)⟨ψ ⊗ e[f ]

∣∣Xti−1

∣∣ψ ⊗ e[h]⟩. (C.4)

For a reasonably large class of integrands the piecewise constant approximation is

appropriate and converges to

〈ψ ⊗ e[f ]|Yt |ψ ⊗ e[h]〉 =

ds hi(s) 〈ψ ⊗ e[f ]|Xs |ψ ⊗ e[h]〉 . (C.5)

Equivalent expressions occur for integrals with respect to dAj †t and dΛijt , where hi(s)

is replaced by the correct amplitudes as given in Eq. (C.1).

In order to define a proper calculus, one must also consider how products of the

integrals behave. The elementary step is to consider the matrix elements of two

operator combinations of Ait, Aj †t and Λij

t . The matrix elements⟨e[f ]∣∣AitAjt ∣∣e[h]

⟩,⟨

e[f ]∣∣∣Ai†t Aj †t ∣∣∣e[h]

⟩, and

⟨e[f ]∣∣Λij

∣∣e[h]⟩

are easily to calculated, as they involve

an eigenvalue relationship of At and the matrix elements given in Eq. (C.1). The

nonobvious two operator matrix elements are [52, Proposition 20.13]⟨e[f ]∣∣∣AitAj †t ∣∣∣e[h]

(∫ t

ds f ∗j (s)

ds hi(s)

+ δij

)〈e[f ]|e[g]〉 , (C.6a)⟨

e[f ]∣∣∣Ait Λjk

∣∣∣e[h]⟩

(∫ t

ds f ∗j (s)hk(s)

ds hi(s)

+ δij

ds hk(s)

)〈e[f ]|e[g]〉 , (C.6b)⟨

e[f ]∣∣∣Λij

t Ak†t

∣∣∣e[h]⟩

(∫ t

ds f ∗i (s)hj(s)

ds f ∗k (s)

+ δjk

ds f ∗i (s)

)〈e[f ]|e[g]〉 , and (C.6c)

⟨e[f ]∣∣Λij

t Λk`t

∣∣e[h]⟩

(∫ t

ds f ∗i (s)hj(s)

ds f ∗k (s)h`(s)

+ δjk

ds f ∗i (s)h`(s)

)〈e[f ]|e[g]〉 . (C.6d)

Calculating these expressions requires knowledge of the commutation relations be-

tween all of the noises and/or their relation to the exponential vectors. Without

discussing the origin of Λt, writing down the commutation relations are no more

intuitive and less useful than above matrix elements. Inspecting these four equations

shows that each has two parts. The first is essentially the product of the single oper-

ator matrix elements, up to a factor of 〈e[f ]|e[g]〉. The second part is an additional

term due to the noncommuting structure of the processes. If we express these four

matrix elements in terms of differentials,⟨e[f ]∣∣∣ dAit dAj †t ∣∣∣e[h]

⟩=(O(dt2) + δij dt

)〈e[f ]|e[g]〉 , (C.7a)⟨

e[f ]∣∣∣ dAit dΛjk

∣∣∣e[h]⟩

=(O(dt2) + δij dt hk(t)

)〈e[f ]|e[g]〉 , (C.7b)⟨

e[f ]∣∣∣ dΛij

t dAk†t

∣∣∣e[h]⟩

=(O(dt2) + δjk dt f

∗i (t)

)〈e[f ]|e[g]〉 , and (C.7c)⟨

e[f ]∣∣ dΛij

t dΛk`t

∣∣e[h]⟩

=(O(dt2) + δjk dtf

∗i (t)h`(t)

)〈e[f ]|e[g]〉 . (C.7d)

Notice that each term on the order of dt is expressible in terms of the matrix element

of a differential, dt, dAkt , etc. Taking the terms O(dt2) → 0, and asserting that

knowing these matrix elements is sufficient, we have the quantum Ito rules

dAit dAj †t = δij dt, (C.8a)

dAit dΛjkt = δij dAt, (C.8b)

dΛijt dA

k†t = δjk dA

i †t , (C.8c)

dΛijt dΛk`

t = δjk dΛi`t . (C.8d)

All other differential products are zero.

The remainder of the construction of the quantum Ito calculus is technical, and

involves showing how the product of two piecewise approximations converge as ∆ti →

0, as well as dealing with the equality

〈Xt ψ ⊗ e[f ]|Yt ψ ⊗ e[h]〉 =⟨ψ ⊗ e[f ]

∣∣∣X†t Yt ψ ⊗ e[h]⟩

for two integrals Xt and Yt. These issues are well beyond our scope, except that we

will note that when Xt and Yt are bounded in a suitable sense, all things work out

nicely [25, 45].

Before moving on to the specifics of a QSDE for a unitary propagator, we will give

the following general example for using the quantum Ito rule. (See [45, proposition

2.4] for all of the necessary qualifiers and assumptions.) Consider the quantum

stochastic process Xt, given by the Ito integral (implied sums on repeated indices),

Xt = X0 +

dΛijs F

dAi †s Fi0s +

dAjs F0js +

dsF 00s . (C.10)

The processes Fαβs are assumed to act nontrivially only on the joint Hilbert space

Hsys ⊗F (h[0,s]) and are integrable (without really defining what that means). The

initial value X0 is assumed to be a bounded operator that acts as the identity on

F (h). This integral can also be notated differentially as the QSDE

dXt = F ijt dΛij

t + F i0t dAi †t + F 0j

t dAjt + F 00t dt. (C.11)

Given another process Yt whose differential is

dYt = Kijt dΛij

t +Ki0t dA

i †t +K0j

t dAjt +K00t dt, (C.12)

the product XtYt can also be expressed in terms of a process Zt = XtYt whose

differential is given by

dZt = M ijt dΛij

t +M i0t dA

i †t +M0j

t dAjt +M00t dt, (C.13)

where the resulting integrands Mαβt are

M ijt = XtK

ijt + F ij

t Yt + F i`t K

`jt , (C.14a)

M i0t = XtK

i0t + F i0

t Yt + F ijt K

j0t , (C.14b)

M0jt = XtK

0jt + F 0j

t Yt + F 0it Kij

t , and (C.14c)

M00t = XtK

00t + F 00

t Yt + F 0it Ki0

t . (C.14d)

This result can be viewed as an application of the quantum Ito product rule

d(XtYt) = Xt dYt + dXt Yt + dXt dYt, (C.15)

where dXt dYt is multiplied out and the second order differentials are evaluated ac-

cording to the quantum Ito rules in Eq. (C.8).

C.1 The Quantum Stochastic Unitary

With the general quantum Ito rule firmly in hand, we would like to apply it to

find a universal expression for a unitary process Ut. This is actually quite straight

forward by first noting that because its unitary, U †t Ut = 1. We also know that if

Ut is independent from the fundamental processes At, A†t and Λt, than Ut is the

solution to the ordinary differential equation dUt = −iHt Ut dt. When including

the fundamental processes, the objective is to write Ut as a general QSDE and find

how unitary constrains the various integrands. Taking the “noise free” solution as a

starting point we hypothesize the coefficients Gαβt so that Ut is given by the QSDE

dUt = Gijt Ut dΛij

t +Gi0t Ut dA

i †t +G0j

t Ut dAjt +G00

t Ut dt (C.16)

and its adjoint is

dU †t = U †t Gij†t dΛji

t + U †t Gi0†t dAit + U †t G

0j†t dAj †t + U †t G

00†t dt. (C.17)

The unitary constraint’s impact on the differential is that d(U †t Ut) = d(Ut U†t ) = 0.

The general Ito product coefficients in Eq. (C.14) then says that in order for this to

unitary,

U †t Gαβt Ut + U †t G

βα †t Ut + U †t G

`α †t G`β

t Ut = 0

Ut U†t G

βα †t +Gαβ

t Ut U†t +Gα`

t Ut U†t G

β` †t = 0

(C.18)

for α, β starting at zero and the implied sum over ` starting from 1. Eliminating Ut

and U †t from the constraints, they simplify to

Gαβt +Gβα †

t +G`α †t G`β

t = Gαβt +Gβα †

t +Gα`t G

β` †t = 0. (C.19)

The coefficients Gαβ are typically written in terms of a different set of operators,

Sijt , Lit and Ht. The reason for this transformation is that (Sijt , Lit, Ht) have more

desirable and physically relevant properties than Gαβ. Immediately we can see that

some part of G00t should be −iHt, as the general QSDE solution contains the case

where U(t) = exp(−iHt) for a time independent Hamiltonian H. Also if G0i = Gi0 =

0 then Eq.(C.19) reads as G00 = −G00 †, implying that G00t = −iHt for Hermitian

To identify how Sijt fits into the picture, consider for the moment the case where

each Gijt = gij1 for some complex coefficients gij. Then the constraints for Gij

gij + g∗ji + g∗`i g`j = gij + g∗ji + g`ig∗`j = 0 (C.20)

Writing the constants in term of a matrix G we have

G+G† +G†G = 0. (C.21)

If we define a matrix S ≡ G+ 1 then this constraint reads,

0 = S− 1+ S† − 1+ (S† − 1)(S− 1)

0 = S− 1+ S† − 1+ (S†S− S† − S+ 1)

1 = S†S.

(C.22)

In other words S is a unitary matrix. Returning to the general case, we can still

define the operators

Sijt ≡ Gijt + δij. (C.23)

Then Eq. (C.19) transforms the constraint for Gijt into the constraint,

Sij†t Sjkt = Sijt Skj †t = δik. (C.24)

In other words, Sijt is a unitary matrix of operators.

By introducing Sijt , the constraint for G0it is also significantly simpler. Specifically,

0 = G0it +Gi0 †

t +G`0 †t G`i

0 = G0it +Gi0 †

t +G`0 †t (S`it − δ`i)

G0it = −G`0 †

t S`it .

(C.25)

The remaining coefficients Gi0t are essentially arbitrary, which is relabeled as the

operators Lit. Writing the constraint for G00t in terms of Li, means that G00

t +G00 †t =

−Li†t Lit.

Bringing all of these results together we can reexpressGαβt in terms of (Sijt , L

it, Ht),

Gijt = Sijt − δij,

Gi0t = Lit,

G0jt = −Li†t S

G00t = −iHt − 1

2Li†t L

(C.26)

In other words,

dUt =( (Sijt − δij

)dΛij

t +Lit dAi †t −L

i†t S

ijt dA

jt − 1

2Li†t L

it dt− iHt dt

)Ut. (C.27)

This is the standard form for the QSDE for a general propagator Ut. While in

principle, the initial value for U0 could be any unitary operator acting on a system

Hsys, typically Ut describes an interaction picture representation of the system-field

dynamic, and then U0 = 1.

One final remark is that if we take each coefficient to be its own stochastic process,

Gαβt Ut t≥0, then they are required to still be time-adapted. This results in the

constraint that the initial values Sij0 , Li0 and H0 must all be system operators only,

as they must act as the identity on the field at that time. Furthermore if Sijt , Lit and

Ht are known to be time independent, then they must be system operators only.

C.1.1 Unitary evolution

One final calculation we will include is the unitary evolution of a time independent

system operator X. In the quantum stochastic literature this unitary evolution is

written in terms of a map jt(·) generating the “flow” or current of the operator,

jt(X) ≡ U †tXUt. (C.28)

This map can a written as a solution to a QSDE,

jt(X) = U †0XU0 +

djs(X), (C.29)

with a differential

djt(X) = dU †t XUt + U †tX dUt + dU †tXdUt. (C.30)

After an exercise in quantum stochastic calculus, one finds the recursive QSDE

djt(X) = jt(Lijt (X)) dΛijt +jt(Li0t (X)) dAi †t +jt(L0j

t (X)) dAjt +jt(L00t (X)) dt (C.31)

where Lαβt (·) are known as the Evens-Hudson maps and in terms of Gαβt are

Lαβt (X) = Gβα †t X +XGαβ

t +Gkα †t XGkβ

t . (C.32)

When written in terms of (Sijt , Lit, Ht) these maps are,

Lijt (X) = Ski †t XSkjt − δijX, (C.33a)

Li0t (X) = Ski †t

[X, Lkt

], (C.33b)

L0jt (X) = −

[X, Lk†t

]Skjt , (C.33c)

L00t (X) = +i[Ht, X] + Li †t XL

it − 1

2Li †t L

itX − 1

2XLi †t L

it. (C.33d)

For an arbitrary model the unitary evolution becomes exceedingly complicated very

rapidly. Each coefficient in Eq. (C.31) is itself given by the unitary flow of the

operator Zt ≡ Lαβt (X). Systems lacking any kind of fundamental symmetry will

rarely close on a useful subspace of operators meaning that after repeated applications

of Lαβt (·) more complicated operators will be generated, spanning a larger and larger

space of operators.

In addition to calculating the unitary output of system operators, it is also useful

to calculate the output for the fundamental field operators Aj†t , Ait, Λijt . In a rather

tedious exercise in manipulating the quantum Ito rules it can be shown that

djt(Ajt) =jt(S

jkt ) dAkt + jt(L

jt) dt, (C.34a)

djt(Ai †t ) =jt(S

ik †t ) dAk †t + jt(L

i †t ) dt, (C.34b)

djt(Λijt ) =jt(S

ik †t Sj`t ) dΛk`

t + jt(Sik †t Ljt) dA

k †t + jt(L

i †t S

j`t ) dA`t + jt(L

i †t L

jt) dt.

(C.34c)

Appendix D

The Quantum Wong-Zakai

Theorem

In a classical system, the convergence of an ordinary differential equation to a stochas-

tic one was treated in the work of Wong and Zakai [90]. There they show that an

ODE containing a piecewise-smooth approximation to white noise, ξ(λ)t , converges to

a Stratonovich integral as ξ(λ)t → ξt with λ→ 0. For instance, Consider the ODE

∂x(λ)(t)

∂t= f(t, x(λ)(t)) + g(t, x(λ)(t)) ξ

(λ)t . (D.1)

The Wong-Zakai theorem states that the integrated solution

x(λ)(t) = x(λ)(0) +

ds f(s, x(λ)(t)) +

ds g(s, x(λ)(s)) ξ(λ)s (D.2)

converges to the Stratonovich integral

xt = x0 +

f(x(t), s) ds+

g(x(s), s) dws. (D.3)

Appendix B reviews the distinctions between the two most common forms of

classical stochastic integration, the Ito integral and the Stratonovich integral and

Appendix C discusses their quantum analogs. This appendix reviews the quantum

Appendix D. The Quantum Wong-Zakai Theorem 238

analog of this result where the specific ODE is for a propagator whose Hamiltonian

contains field operators that are limiting to quantum white noise.

In 2006, Gough derived a quantum limit, equivalent to the Wong-Zakai theorem

[43]. Specifically, he investigated the convergence of the Schrodinger equation, writ-

ten in terms of the time evolution operator U(t), as field operators in the Hamiltonian

converge to singular, delta-commuting operators. The specific Hamiltonian consider

is given in Eq. (D.9), but before discussing it, we will first describe the quantum

version of ξ(λ)t and how it can be interpreted as quantum white noise.

D.1 Quantum white noise

In order to make a connection with a quantum Ito integral as formulated by Hudson

and Parthasarthy, the limiting field operators clearly must be reference to a Fock

space F (h′), h′ = L2(R+) ⊗ Cd. The delta commuting limit is introduced by con-

sidering the differentiable functions ξ(λ)i (t) ∈ h′ : i = 1, . . . , d parameterized by

λ > 0 so that we have the field operators a[ξ(λ)i (t)] and a†[ξ

(λ)j (t′)], with[

a[ξ(λ)i (t)], a†[ξ

(λ)j (t′)]

]=⟨ξ

(λ)i (t), ξ

(λ)j (t′)

⟩≡ cij(λ, t− t′). (D.4)

This inner product is assumed to satisfy the properties:∫ ∞−∞

dt cij(λ, t) <∞, (D.5a)

cij(λ, t) = c∗ji(λ,−t), and (D.5b)

limλ→0

cij(λ, t) = δij δ(t). (D.5c)

To simplify the notation, it is convenient to write ai(λ, t) ≡ a[ξ(λ)i (t)].

These operators end up serving two purposes in the quantum Wong-Zakai theo-

rem. The first is of course to act in the limiting Hamiltonian and the second is to

generate the smoothed exponential vectors,

e[g(λ)] ≡ exp

(∫ ∞0

dt gi(t) a†i (λ, t)

)|∅〉. (D.6)

Note that in relation to the one-dimensional representation of Sec. 2.4, ai(λ, t)

is almost equivalent to a[ϕ(σ)(t)]. The differences lies in how the smoothed wave

packets are defined. One possible mapping between paraxial optics and the abstract

operators ai(λ, t) is to identify d paraxial spatial mode functions u(+)i (xT , z), which

satisfy the orthogonality relation,∫d2xT u

∗i (xT , z) · uj(xT , z) = δij σT . For each

paraxial mode there are d independent complex wave packet envelopes, inducing the

smoothing operators a[ϕ(σ)i (t)]. In the case of a single paraxial mode, Eq. (2.107)

gave the expression for a smoothed paraxial wavepacket g(σ)(k, t). In order for this

expression to be equivalent to the argument of Eq. (D.6), we require that

ai(λ, t) ∼= e+iω0ta[ϕ(σ)i (−t)]. (D.7)

The fact that we require the time reversed version of ϕ(σ)i is an artifact of defining

the smoothing with respect to a convolution. The inclusion of the carrier phase is

both mathematically and physically interesting. Physically it is a reminder that the

elements of h′ represent the part of the light existing on the measurement timescale,

which is much slower than the carrier frequency. In fact the appearance of this

phase is intimately related to the rotating wave approximation. In the rotating

frame atomic transition operators develop explicit time dependence at the carrier

frequency. The mathematical relevance of this carrier phase is that it cancels any

rapidly oscillating phases in the inner product,⟨ϕ

(σ)i (−t), ϕ(σ)

i (−t′)⟩

. This cance-

lation is explicitly apparent by computing that[ai(λ, t), a

†j(λ, t

= cij(λ, t− t′)

= e+iω0(t−t′)⟨ϕ

(σ)i (−t), ϕ(σ)

j (−t′)⟩

= δij

(ϕ(σ) ? ϕ(σ) (t− t′)− i 1

dϕ(σ)

dt? ϕ(σ) (t− t′)

where in the final line we inserted the unequal time inner product in Eq. (2.92), as

well as remember that for the smoother ϕ(σ)i , ‖g‖ /

√τ → 1. λ is simply a parameter

representing the formal limit that as λ→ 0, σ → 0 and (σ ω0)−1 → 0.

D.2 The quantum Wong-Zakai theorem

The Hamiltonian that Gough ultimately considers is,

Hint(λ, t) = ~

Eij a†i (λ, t) aj(λ, t) +

d∑i=1

Ei0 a†i (λ, t) +

d∑j=1

E0j aj(λ, t) + E00

)(D.9)

The quantum Wong-Zakai theorem then takes the solution to the equation

dtU(λ, t) = − i

~Hint(λ, t)U(λ, t), U(λ, 0) = 1 (D.10)

and proves that the limit limλ→0 U(λ, t) ≡ Ut is a quantum stochastic unitary pro-

cess, solving a quantum Stratonovich differential equation. Fortunately, a quantum

Stratonovich integral is also expressible in terms of a quantum Ito integral. Gough

provides a well constructed conversion between both forms, which we review shortly.

The specifics of this limit is that it is shown to hold “weakly”, in that

limλ→0〈ψ ⊗ e[f(λ)]|U(λ, t) |φ⊗ e[g(λ)]〉 = 〈ψ ⊗ e[f ]|Ut |φ⊗ e[g]〉 , (D.11)

for any system state vectors ψ and φ and the exponential vectors e[f(λ)] and e[g(λ)]

as defined in Eq. (D.6). In addition to the matrix elements of the propagator, it is

also shown to hold weakly for the Heisenberg picture evolution of an operator X, so

limλ→0

U †(λ, t)XU(λ, t) = U †tXUt. (D.12)

Before describing the resulting unitary process, it is worth stressing several advan-

tages of the quantum Wong-Zakai theorem. So long as Hint(λ, t) and the λ→ 0 limit

are physically justifiable, then the specifics of the total state ρtot are almost irreverent.

The only constraint is that the total state must be expressible in terms of convergent

sequence of the matrix elements limλ→0 〈ψi ⊗ e[fi(λ)]| ρtot |φj ⊗ e[gj(λ)]〉. This means

that we allowed the possibility of nonclassical superpositions. A second advantage

is that the presence of the scattering interaction Eij allows for a much broader class

of interactions than the linear interactions typically considered in quantum optics.

While this dissertation will ultimately be considering a linear Hamiltonian (Eij will

be negligibly small) the fact that the theory could consider a system coupled to an in-

stantaneous number operator ai†(λ, t) ai(λ, t) in a “white noise” limit is no small feat.

One possible example is to engineering a quantum-optical router, where for some set

of modes the operators Eij coherently scatter quanta in a system dependent way.

Finally, much can be said for the fact that the limiting interaction is still described

by a unitary operation. While we have constructed the limiting field operators in

terms of a measurement timescale, we have not specified what kind of measurement

we will be performing. Formulating a conditional estimate for the system given a

measurement of the field is one of the main purposes of Chap. 3 but at the level of

the system field interaction, everything remains fully coherent.

D.2.1 Quantum stochastic calculus and operator ordering

In proving the quantum Wong-Zakai theorem, Gough also found an intuitive corre-

spondence between operator orderings and quantum stochastic differential equations.

In order to describe the limiting propagator Ut as a solution to a standard Ito form

QSDE we need to review this correspondence. Ultimately, the correspondence is

between quantum Stratonovich equations and time-ordered solutions to a given re-

cursive differential equation. Conversely, a quantum Ito equation is identified with

a normally-ordered solution [43].

The solution to the Schrodinger equation with a time-dependent Hamiltonian is

given by a time-ordered exponential,

U(λ, t) = ~T exp

(− i~

dsHint(λ, s)

). (D.13)

The time-ordered exponential is a compact short hand for the iterated integrals,

~T exp

(− i~

dsHint(λ, s)

∞∑n=0

(−i~

)n ∫ t

dtn . . .

∫ t2

dt1 Hint(λ, tn) . . . Hint(λ, t1). (D.14)

Note that Hint(λ, t) need not commute with Hint(λ, s) and thus the operator ordering

is critical in this expression. There is nothing in the λ → 0 limit that changes the

operator ordering and so identifying a Stratonovich equation with a time-ordered

equation is simply a statement of this fact. The heart of the quantum Wong-Zakai

theorem is relating this time-ordered exponential to a quantum Ito integral. As the

Wong-Zakai theorem considers the limit of matrix elements between two exponen-

tial vectors, this relation is made by comparing the matrix elements between this

expression and the matrix element of an iterated Ito integral.

Sec. B discusses the classical Ito integral and shows how an integral xt =∫ t

0bs dws

for a time-adapted process bt leads to the Ito rules of calculus and how it crucially de-

pend upon the statistical independence of bt from dwt. Not surprisingly the quantum

Ito integral also relies on a similar independence of the integrand from the differen-

tial. As Sec. 2.5.1 discussed, the continuous-time tensor product decomposition

allows for defining field operators a[χ(t,t+dt] ej] = Ajt+dt − Ajt , which commute with

any operator adapted to the the time interval [0, t]. The quantum Ito integral with

respect to dAt is defined as

Xs dAjs ≡ lim

n→∞

n∑i=1

Xti−1(Ajti − A

jti−1

) (D.15)

and likewise an integral with respect to dAk†t is

Ys dAk†s ≡ lim

n→∞

n∑i=1

Yti−1(Aj†ti − A

j†ti−1

) (D.16)

Taking the matrix element of Yt between two exponential vectors results in⟨e[f ]

∣∣∣∣ ∫ t

Xs dAjs

∣∣∣∣e[h]

ds 〈e[f ]|Xs |e[h]〉 hj(s) (D.17)

and equivalently

〈e[f ]|Zt |e[h]〉 =

ds f ∗j (s) 〈e[f ]|Ys |e[h]〉 . (D.18)

Eq. (D.18) is valid because Ys is time-adapted and therefore commutes with dAj †s .

This commutating property carries over to the iterated integral, so by substituting

in for Ys,

〈e[f ]|Zs |e[h]〉 =

ds′ f ∗k (s′) 〈e[f ]|Xs′ |e[h]〉 hj(s). (D.19)

If we hypothesize the existence of an operator aj(t) by the eigenvalue relationship

aj(t)|e[f ]〉 = fj(t) |e[f ]〉 (D.20)

we then have

〈e[f ]|Zt |e[h]〉 ∼=∫ t

ds′⟨

e[f ]∣∣∣ a†k(s′)Xs′ aj(s)

∣∣∣e[h]⟩. (D.21)

This relation holds for any iterated Ito integral as long as the integrand is expressed in

normal order, with all of the creation operators on the left and all of the annihilation

operators on the right. But as we have

limλ→0

aj(λ, t)|e[f ]〉 = limλ→0

∫ ∞0

ds cjk(λ, t− s) fk(s) |e[f ]〉 = fj(t)|e[f ]〉, (D.22)

we will ultimately find an equivalence between the operator aj(t) and the limiting

form of aj(λ, t).

The proof of the quantum Wong-Zakai theorem follows the procedure of con-

verting the time-ordered exponential into normal order, showing that the matrix

elements converge to a finite quantity and then proving a correspondence with an

equivalent Ito form QSDE.

D.2.2 Gauge freedom in the Ito correction

The difference between a Stratonovich equation and an Ito equation is often called

the Ito correction term. As we have identified an Ito equation with the normally order

version of the iterated integral, the Ito correction term is intimately related to this

conversion. Converting any product of field operators into normal order, is given by

Wick’s theorem [91]. It states that any product of creation and annihilation operators

can be written as the sum over the normal ordering of all possible contractions

between all pairs of operators. A contraction between the operators a and b is

defined as

a•b• ≡ ab− : ab : (D.23)

where : ab : is the normal ordering of the two operators. By Wick’s theorem we can

write the product

abc = : abc : + : a•b•c : + : a•bc• : + : ab•c• : . (D.24)

For the boson operator considered here, the only nonzero contraction is

a•i (λ, t)a†•j (λ, s) = [ai(λ, t), a

†j(λ, s)] = cij(λ, t− s). (D.25)

The heart of finding the equivalent Ito QSDE from the time-ordered exponential

is to first apply Wick’s theorem to each term in the time-ordered exponential, then

take the λ→ 0 limit, and finally re-sum the series. We will not be reproducing this

result here, where the details of such a limit can be found in the following references.

In the absence of the scattering terms the proof is detailed in the book by Accardi

et al. [55]. The scattering terms were subsequently added by Gough [92]. However,

one important aspect of the limit must be discussed as it affects the final limiting

QSDE, as well as takes its root in the physical origin of cij(λ, t− s).

In each term of the time-ordered exponential, the operators on the right are always

constrained to be at an earlier time than the operators on the left. Therefore when

applying Wick’s theorem, the contraction a•i (λ, t)a†•j (λ, s) will always be constrained

to have t ≥ s. This constraint means that when λ→ 0, only half of the cij(λ, t−s)→

δ(t− s) limit will apply. It is often the case that when a causal constraint is applied

to a delta function limit, an additional complex term appears involving a Cauchy

principle value. For each cij(λ, τ) the extra complex term is called a gauge freedom

and generates, among other things, an effective level shift in the E00 term.

As a concrete example, consider the second order term in the time-ordered ex-

pansion(−i~

)2 ∫ τ

dsHint(λ, t)Hint(λ, s). (D.26)

The operator product Hint(λ, t)Hint(λ, s) contains 16 terms with at most 4 field

operators (from the scattering terms) and in the case of E00(t)E00(s), no operators.

One part of this expression is the integral

−∑ij

∫ τ

dsE0i ai(λ, t)Ej0 a†j(λ, s). (D.27)

Applying Wick’s theorem means that

−∑ij

∫ τ

dsE0i ai(λ, t)Ej0 a†j(λ, s) =

−∑ij

∫ τ

dsE0iEj0

(: ai(λ, t)a

†j(λ, s) : + cij(λ, t− s)

). (D.28)

This commutator term on the right-hand side will ultimately contribute to the Ito

correction term and generate the gauge shift, as long as it survives the λ→ 0 limit.

Therefore the most basic contribution is made by the limit

limλ→0

∫ τ

ds cij(λ, t− s). (D.29)

By substituting Eq. (D.8) for cij(λ, t − s) and dropping the term proportional to

1/ω0,

limλ→0

∫ τ

ds cij(λ, t− s) = δij limσ→0

∫ τ

dsϕ(σ) ? ϕ(σ) (t− s). (D.30)

In Sec. 2.3.2 we used the example of ϕ(σ) as a real-valued Gaussian with mean zero

and standard deviation σ. In this case, it is easy to show that

ϕ(σ) ? ϕ(σ) (t− s) =1

σϕ(1) ? ϕ(1)

((t− s)/σ

)(D.31)

where ϕ(1) is a mean zero Gaussian with unit variance. In fact this is a general

property often used in distribution theory where if∫R dt ϕ

(1)(t) = 1 then

limσ→0

σϕ(1)(t/σ) = δ(t). (D.32)

In this replacement Eq. (D.31) is also satisfied. Using this relation in the right-hand

side of Eq. (D.30),

limλ→0

∫ τ

ds cij(λ, t− s) = δij limσ→0

∫ τ

σϕ(1) ? ϕ(1)

((t− s)/σ

(D.33)

This limit is easily evaluated by making the change of variables t ≡ (t− s)/σ,

δij limσ→0

∫ τ

∫ t/σ

dt ϕ(1) ? ϕ(1) (t) = δij τ

∫ ∞0

dt ϕ(1) ? ϕ(1) (t). (D.34)

If ϕ(1) is a normalized real-valued distribution then ϕ(1) ? ϕ(1) (t) = ϕ(1) ? ϕ(1) (−t)

and so∫ ∞0

dt ϕ(1) ? ϕ(1) (t) =1

∫ ∞−∞

dt ϕ(1) ? ϕ(1) (t)

(∫ ∞−∞

dt ϕ(1)(t)

(D.35)

The change of variables in Eq. (D.34) is the delta correlation limit but because

of the time-ordered integration, we obtain the factor of 12. However in the general

case, ϕ(1) need not be real-valued. For our example involving quasi-monochromatic

fields, it is sufficient for ϕ to be a real-valued function as it is simply a mathematical

tool representing a limit on the rate of change of the arbitrary complex functions

f ∈ L2(R+)⊗ Cd.

In the general Wong-Zakai limit cij(λ, t) is not assumed to be real, only that the

criteria of Eq. (D.5) are satisfied. From Eq. (D.5c) we have that

limλ→0

∫ ∞−∞

dt cij(λ, t) = δij, (D.36)

meaning that we can define the potentially complex constants

κij ≡ limλ→0

∫ ∞0

dt cji(λ, t)

κ∗ij ≡ limλ→0

∫ ∞0

dt c∗ji(λ, t) = limλ→0

−∞dt cji(λ, t)

(D.37)

where we used the fact that c∗ji(λ, t) = cji(λ,−t). Combining these two results means

δij = κij + κ∗ij. (D.38)

In the case of a real-valued smoother κij = 12δij, but the general case allows for a

complex coefficient. In complex analysis there is a general relation that∫ ∞0

dt e−iωt = π δ(ω)− iP .V .[

](D.39)

where P .V . denotes taking the Cauchy principal value. Expressing ϕ(1) ? ϕ(1) (t) in

the frequency domain means that

κij = δij

∫ ∞0

∫ ∞−∞

dω∣∣ϕ(1)(ω)

∣∣2 e−iωt= δij

∫ ∞−∞

dω∣∣ϕ(1)(ω)

∣∣2 δ(ω)− iP .V .∫ ∞−∞

∣∣ϕ(1)(ω)∣∣2

= δij

(π∣∣ϕ(1)(0)

∣∣2 − iP .V .∫ ∞−∞

∣∣ϕ(1)(ω)∣∣2

(D.40)

The requirement of Eq. (D.38) implies that π∣∣ϕ(1)(0)

∣∣2 = 12. The principal value

will be zero for a symmetric power distribution (i.e. for real ϕ(1)) but it is nonzero

in general. The remaining complex coefficient is what Gough refers to as a gauge

freedom as it depends upon the nature of ϕ(1). Here we will assume that ϕ(1) is

real-valued and so κij = κ∗ij = 12δij.

D.3 The Limiting Propagator

We just considered one part of the second order term in the time-ordered exponential

and how through normal ordering it develops a nonzero correction. As this is just

one part of the total time-ordered exponential, more complicated expressions are

generated, involving iterated integrals of the form∫ t

dtn . . .

∫ t2

dt1 cij(λ, tn − tn−3) . . . ckl(λ, t5 − t1). (D.41)

Depending upon the relative order of the times, these integrals may or may not

converge to zero as λ → 0. It turns out that the only terms that are nonzero have

time consecutive integrals, meaning that for each contraction we must have cij(λ, τn)

be evaluated at τn = tn − tn−1 [92]. In the above example, τn = tn − tn−3 and

τ5 = t5 − t1 are not time consecutive intervals and so Eq. (D.41) converges to zero.

Upon identifying the class of nonzero integrals, it is possible to then re-sum the

expansion, which in general generates a Neumann series [43].

Once the λ→ 0 limit is taken, the normally ordered propagator can be identified

and its equivalent Ito form QSDE written down. A general QSDE for the unitary

propagator Ut can be written as

dUt =(GijdΛij

t +Gi0dAi †t +G0jdA

jt +G00dt

)Ut. (D.42)

The coefficients Gαβ are constrained to insure unitarity. Appendix C.1 discusses

these constraints at length and shows how they can be re-expressed in terms of the

operators S, L and H. Eq. (C.26) gives this conversion. We note here that H is the

part of the unitary that is completely uncoupled to the field operators and so we will

have the correspondence H = E00.

In terms of the system operators Eαβ (α, β = 0, 1, . . . , d) defining Hint(λ, t) in

Eq. (2.145), the general limiting coefficients are

Gαβ = −iEαβ − Eαi(

1+ iKE

κjkEkβ, (D.43)

where i and j start from 1 and we introduced the following notation. In the most

general case, we have a d × d matrix of constants K = κij, as well as a matrix of

operators E = Eij. A Neumann series is the operator-valued generalization of a

geometric series, so that for an operator T ,

∞∑n=0

T n = (1− T )−1 (D.44)

is well defined whenever 1 − T is invertible. The time consecutive contractions ul-

timately generate a Neumann series where T is the matrix of operators −iKE =

−iκijEjk. The limiting coefficient then involves the i, j component of the opera-

tor/matrix inverse 1/(1+ iKE).

References

[1] E. M. Rasel, M. K. Oberthaler, H. Batelaan, J. Schmiedmayer, and A. Zeilinger,

Physical Review Letters 75, 2633 (1995).

[2] H. Haffner, W. Hansel, C. F. Roos, J. Benhelm, D. Chek-al kar, M. Chwalla,

T. Korber, U. D. Rapol, M. Riebe, P. O. Schmidt, et al., Nature 438, 643 (2005).

[3] D. Leibfried, E. Knill, S. Seidelin, J. Britton, R. B. Blakestad, J. Chiaverini,

D. B. Hume, W. M. Itano, J. D. Jost, C. Langer, et al., Nature 438, 639 (2005).

[4] W. M. Itano, J. C. Bergquist, J. J. Bollinger, J. M. Gilligan, D. J. Heinzen,

F. L. Moore, M. G. Raizen, and D. J. Wineland, Physical Review A 47, 3554

(1993).

[5] I. H. Deutsch and P. S. Jessen, Optics Communications 283, 681 (2010).

[6] A. P. VanDevender, Y. Colombe, J. Amini, D. Leibfried, and D. J. Wineland,

[7] E. W. Streed, B. G. Norton, A. Jechow, T. J. Weinhold, and D. Kielpinski,

[8] A. Kuzmich, L. Mandel, and N. P. Bigelow, Physical Review Letters 85, 1594

(2000).

[9] J. Hald, J. L. Srensen, C. Schori, and E. S. Polzik, Physical Review Letters 83,

1319 (1999).

REFERENCES 251

[10] M. H. Schleier-Smith, I. D. Leroux, and V. Vuleti, Physical Review Letters 104,

073604 (2010).

[11] A. Silberfarb, P. S. Jessen, and I. H. Deutsch, Physical Review Letters 95,

030402 (2005).

[12] G. A. Smith, A. Silberfarb, I. H. Deutsch, and P. S. Jessen, Physical Review

Letters 97, 180403 (2006).

[13] C. A. Riofrıo, P. S. Jessen, and I. H. Deutsch, Journal of Physics B: Atomic,

Molecular and Optical Physics 44, 154007 (2011).

[14] N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Se-

ries: With Engineering Applications (The MIT Press, 1949).

[15] R. E. Kalman and R. S. Bucy, Journal of Basic Engineering 83, 95 (1961).

[16] R. L. Stratonovich, Theory of Probability & Its Applications 5, 156 (1960).

[17] H. J. Kushner, Journal of the Society for Industrial and Applied Mathematics

Series A Control 2, 106 (1964).

[18] G. Kallianpur and C. Striebel, The Annals of Mathematical Statistics 39, 785

(1968).

[19] M. Zakai, Probability Theory and Related Fields 11, 230 (1969).

[20] R. Van Handel, Doctoral dissertation, California Institute of Technology,

Pasadena, California (2006).

[21] H. Holden, B. ksendal, J. Ube, T. Zhang, and SpringerLink (Online service),

Stochastic Partial Differential Equations A Modeling, White Noise Functional

Approach, Universitext (Springer New York, New York, NY, 2010).

[22] B. K. Oksendal, Stochastic Differential Equations: An Introduction with Appli-

cations (Springer, 2002), 5th ed.

REFERENCES 252

[23] R. L. Hudson and K. R. Parthasarathy, Communications in Mathematical

Physics 93, 301 (1984).

[24] V. P. Belavkin, Journal of Multivariate Analysis 42, 171 (1992).

[25] L. Bouten, R. van Handel, and M. R. James, SIAM Journal on Control and

Optimization 46, 2199 (2007).

[26] J. E. Gough, Philosophical Transactions of the Royal Society A: Mathematical,

Physical and Engineering Sciences 370, 5241 (2012).

[27] V. P. Belavkin, Radiotekhnika i Elektronika 25, 14451453 (1980).

[28] V. Belavkin, in XXIV Karpacz Winter School on Theoretical Physics, edited by

R. Gielerak and W. Karwowski (World Scientific, 1988), Stochastic methods in

mathematics and physics, pp. 310 – 324.

[29] V. P. Belavkin, Communications in Mathematical Physics 146, 611 (1992).

[30] P. Grangier, J. A. Levenson, and J.-P. Poizat, Nature 396, 537 (1998).

[31] D. Brigo, B. Hanzon, and F. LeGland, IEEE Transactions on Automatic Control

43, 247 (1998).

[32] D. Brigo, B. Hanzon, and F. L. Gland, Bernoulli 5, 495 (1999).

[33] R. van Handel and H. Mabuchi, Journal of Optics B: Quantum and Semiclassical

Optics 7, S226 (2005).

[34] H. Mabuchi, Physical Review A 78, 015801 (2008).

[35] A. S. Hopkins, Thesis, California Institute of Technology (2009).

[36] A. E. B. Nielsen, A. S. Hopkins, and H. Mabuchi, New Journal of Physics 11,

105043 (2009).

[37] B. A. Chase, Ph.D. thesis, University of New Mexico (2009).

REFERENCES 253

[38] C. L. Bris and P. Rouchon, arXiv:1207.4580 (2012).

[39] S. Massar and S. Popescu, Physical Review Letters 74, 1259 (1995).

[40] I. H. Deutsch, American Journal of Physics 59, 834 (1991).

[41] J. C. Garrison and R. Chiao, Quantum Optics (Oxford University Press, USA,

2008).

[42] B. J. Smith and M. G. Raymer, New Journal of Physics 9, 414 (2007).

[43] J. Gough, Journal of Mathematical Physics 47, 113509 (2006).

[44] C. Cohen-Tannoudji, J. Dupont-Roc, and G. Grynberg, Photons and Atoms:

Introduction to Quantum Electrodynamics (Wiley-Interscience, 1989).

[45] A. Barchielli, in Open Quantum Systems III, edited by S. Attal, A. Joye, and

C. Pillet (Springer, 2006), vol. 1882 of Lecture Notes in Mathematics, pp. 207–

[46] I. Bialynicki-Birula, Physical Review Letters 80, 5247 (1998).

[47] A. E. Siegman, Lasers (University Science Books, 1986).

[48] E. Hecht, Optics (Addison-Wesley, 2002).

[49] J.-C. Diels and W. Rudolph, Ultrashort Laser Pulse Phenomena (Academic

Press, 2006).

[50] C. W. Gardiner and M. J. Collett, Physical Review A 31, 3761 (1985).

[51] C. W. Gardiner and P. Zoller, Quantum noise (Springer, 2004).

[52] K. R. Parthasarathy, An introduction to quantum stochastic calculus (Birkhuser,

1992).

REFERENCES 254

[53] R. van Handel, Stochastic calculus, filtering, and stochastic control (2007), URL

http://www.princeton.edu/~rvan/.

[54] L. Accardi, A. Frigerio, and Y. G. Lu, Communications in Mathematical Physics

131, 537 (1990).

[55] L. Accardi, Y. G. Lu, and I. Volovich, Quantum Theory and Its Stochastic Limit

(Springer-Verlag, 2002).

[56] R. van Handel, J. K. Stockton, and H. Mabuchi, Journal of Optics B: Quantum

and Semiclassical Optics 7, S179 (2005).

[57] L. Bouten, J. Stockton, G. Sarma, and H. Mabuchi, Physical Review A (Atomic,

Molecular, and Optical Physics) 75, 052111 (2007).

[58] Z. Schuss, Theory and Applications of Stochastic Processes: An Analytical Ap-

proach (Springer, 2009), 1st ed.

[59] T. Tao, An Introduction to Measure Theory (American Mathematical Society,

2011).

[60] D. Williams, Probability with Martingales (Cambridge University Press, 1991).

[61] R. van Handel, J. Stockton, and H. Mabuchi, Automatic Control, IEEE Trans-

actions on 50, 768 (2005).

[62] H. Maassen, in Quantum Information, Computation and Cryptography, edited

by F. Benatti, M. Fannes, R. Floreanini, and D. Petritis (Springer, 2010), vol.

808 of Lecture Notes in Physics, pp. 65–108.

[63] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Infor-

mation (Cambridge University Press, 2000).

[64] L. Bouten and R. van Handel (2005), arXiv:math-ph/0508006.

[65] J. Gough and C. Kostler, Communications in Stochastic Analysis 4, 505 (2010).

REFERENCES 255

[66] J. E. Gough, M. R. James, and H. I. Nurdin, Quantum Information Processing

(2012).

[67] M. Tsang, Physical Review Letters 102, 250403 (2009).

[68] G. Kimura and A. Kossakowski, Open Systems and Information Dynamics 12,

207 (2005).

[69] P. E. Kloeden, E. Platen, and H. Schurz, Numerical Solution of SDE Through

Computer Experiments (Springer, 1994).

[70] M. Kitagawa and M. Ueda, Physical Review A 47, 5138 (1993).

[71] A. Kuzmich, N. P. Bigelow, and L. Mandel, Europhysics Letters (EPL) 42, 481

(1998).

[72] X. Yin, X. Wang, J. Ma, and X. Wang, Journal of Physics B: Atomic, Molecular

and Optical Physics 44, 015501 (2011).

[73] M. Varbanov and T. A. Brun, Physical Review A 76, 032104 (2007).

[74] E. Bagan, A. Monras, and R. Muoz-Tapia, Physical Review A 71, 062318 (2005).

[75] S. T. Merkel, P. S. Jessen, and I. H. Deutsch, Physical Review A 78, 023404

(2008).

[76] R. Van Handel, Infinite Dimensional Analysis, Quantum Probability and Re-

lated Topics 12, 153 (2009).

[77] J. Hunter, Lecture notes on applied mathematics – methods and models (2009),

URL http://www.math.ucdavis.edu/~hunter/.

[78] D. Nualart, The Malliavin Calculus and Related Topics (Springer, 1995), 1st ed.

[79] B. A. Chase and J. M. Geremia, Physical Review A 78, 052101 (2008).

REFERENCES 256

[80] L. Bouten and A. Silberfarb, Communications in Mathematical Physics 283,

491 (2008).

[81] M. Tsang and C. M. Caves, Physical Review X 2, 031016 (2012).

[82] J. von Neumann, Mathematical Foundations of Quantum Mechanics (Princeton

University Press, 1996).

[83] K. R. Davidson and S. J. Szarek, in Handbook of the Geometry of Banach Spaces,

edited by W. Johnson and J. Lindenstrauss (Elsevier Science B.V., 2001), vol. 1,

pp. 317–366.

[84] Y. Ogata, arXiv:1111.5933 (2011).

[85] A. H. Jazwinski, Stochastic Processes and Filtering Theory (Dover Publications,

2007).

[86] S. Julier and J. Uhlmann, Proceedings of the IEEE 92, 401 (2004).

[87] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, IEEE Transactions

on Signal Processing 50, 174 (2002).

[88] S. Bonnabel, P. Martin, and P. Rouchon, IEEE Transactions on Automatic

Control 53, 2514 (2008).

[89] B. A. Chase and J. M. Geremia, Physical Review A 79, 022314 (2009).

[90] E. Wong and M. Zakai, The Annals of Mathematical Statistics 36, 1560 (1965).

[91] G. C. Wick, Physical Review 80, 268 (1950).

[92] J. Gough, Communications in Mathematical Physics 254, 489 (2005).

Continuous Measurement and Stochastic Methods in Quantum

Documents

Stochastic quantum dynamics beyond mean-field

Continuous Stochastic Processescivil-terje.sites.olt.ubc.ca/files/2019/06/Continuous-Stochastic... · The present document explains continuous processes, which are employed to model

From Repeated to Continuous Quantum Interactions · properties. In Subsection III.2 we present the natural quantum noises on the con-tinuous ﬁeld and the associated quantum stochastic

Strategies in Stochastic Continuous-Time Games

Reinforcement Learning for Continuous Stochastic Control …papers.nips.cc/paper/1404-reinforcement-learning-for... · 2014. 4. 24. · Reinforcement Learningfor Continuous Stochastic

Introduction to Stochastic Series Expansion (SSE) Quantum

CONTINUOUS VARIABLE QUANTUM SIGNATURE ALGORITHMqnp.sjtu.edu.cn/userfiles/files/kxyj/Continuous... · A true quantum signature algorithm based on continuous-variable entanglement state

CYLINDRICAL CONTINUOUS MARTINGALES AND STOCHASTIC ... · CYLINDRICAL CONTINUOUS MARTINGALES AND STOCHASTIC INTEGRATION IN INFINITE DIMENSIONS MARK VERAAR AND IVAN YAROSLAVTSEV Abstract

Stochastic Processes in Continuous Time - Homepage | Arizona

Continuous-Time Stochastic Games withqav.comlab.ox.ac.uk/papers/ic-contgames.pdf · Keywords: continuous time stochastic systems, time-bounded reachability, stochastic games 1. Introduction

Gaussian Process Approximations of Stochastic Differential ... · Gaussian Process Approximations of Stochastic Differential Equations ... Continuous-time continuous-state ... (non-)Gaussian

On Unique Solution of Quantum Stochastic Differential Inclusions · 2017-11-06 · selection sets to non-lipschitzian quantum stochastic differential inclusion,” Stochastic Analysis

Continuous-Time Monotone Stochastic Recursions and Duality Ryan.pdf · Continuous-time monotone stochastic recursions and duality In this paper we study continuous-time, real-valued

QUANTUM STOCHASTIC COCYCLES AND COMPLETELY … · quantum stochastic cocycle on the corresponding matrix space. The ... Markov processes and solutions of linear (quantum) stochastic

(Continuous Variables) Quantum Optics, Quantum Information

Stochastic Quantum Chemistry

On Unique Solution of Quantum Stochastic Differential ...eprints.covenantuniversity.edu.ng › 9664 › 1 › WCECS2017_pp361-364.pdffor Weak Solutions of Quantum Stochastic Differential

Continuous Time Quantum Walks (CTQW)csxam/teaching/... · 2 Continuous Time Quantum Walks The continuous time quantum walk has some notable contrasts with the discrete random walk

Stochastic Single Flux Quantum Neuromorphic Computing

Continuous and Discrete Properties of Stochastic Processes