Analysis sensitivity calculation in an Ensemble Kalman … sensitivity calculation in an ... Idealized experiments based on a simplified-parameterization ... score can be calculated

1

Analysis sensitivity calculation in an Ensemble Kalman Filter

Junjie Liu1, Eugenia Kalnay2, Takemasa Miyoshi2, and Carla Cardinali3

1University of California, Berkeley, CA, USA 2University of Maryland, College Park, MD, USA

3ECMWF, Data Assimilation Section, Reading, UK

2

Abstract Analysis sensitivity indicates the sensitivity of an analysis to the observations,

which is complementary to the sensitivity of the analysis to the background. In this paper,

we discuss a method to calculate this quantity in an Ensemble Kalman Filter (EnKF). The

calculation procedure and the geometrical interpretation, which shows that the analysis

sensitivity is proportional to the analysis error and anti-correlated with the observation

error, are experimentally verified with the Lorenz 40-variable model. With the analysis

sensitivity, the cross-validation in its original formulation can be efficiently computed in

EnKFs, and this property can be used in observational quality control.

Idealized experiments based on a simplified-parameterization primitive equation

global model show that the information content (the trace of the analysis sensitivity of

any subset of observations) qualitatively agrees with the actual observation impact

calculated from much more expensive data denial experiments, not only for the same type

of dynamical variable, but also for different types of dynamical variables.

3

1. Introduction

Observations are the central information introduced into numerical weather

prediction systems through data assimilation, which is a process that combines

observations with background forecasts based on their error statistics. With the increase

of observation datasets assimilated in modern data assimilation systems, such as the

assimilation of Advanced InfraRed Satellite (AIRS) and the Infrared Atmospheric

Sounding Interferometer (IASI), it is important to examine questions such as how much

information content a new observation dataset has, and what is the relative influence of

the background forecasts and the observations on the analyses.

Analysis sensitivity (also called self-sensitivity), a quantity introduced by

Cardinali et al. (2004), can address these questions. The larger the analysis sensitivity is,

the larger is the influence of the corresponding observation on the analysis, and the

smaller is the influence of the corresponding background forecast. Since the analysis

sensitivity is a function of the analysis error covariance, which is not explicitly calculated

in variational data assimilation schemes, Cardinali et al. (2004) proposed an approximate

method to calculate this quantity within a 4D-Var data assimilation framework. They

showed that the trace of the analysis sensitivity of a particular observation data type

represented the information content of that type of observations, and the relative

importance of different observation types determined by information content was in good

qualitative agreement with the observation impact from other studies. In addition,

information content has also been used in channel selection in multi-thousand channel

satellite instruments, such as IASI (Rabier et al., 2002).

4

Though the approximate method to calculate the analysis sensitivity in 4D-Var is

computationally feasible, it introduces spurious values that are not within the theoretical

range (between 0 and 1). In contrast with variational data assimilation schemes,

Ensemble Kalman filters (EnKF) (Evensen, 1994; Anderson, 2001; Bishop et al., 2001;

Houtekamer and Mitchell, 2001; Whitaker and Hamill, 2002; Ott et al., 2004; Hunt et al.,

2007) generate ensemble analyses that can be used to calculate the analysis error

covariance. Because of this, it would be more straightforward to calculate the analysis

sensitivity in an EnKF than in a variational data assimilation system. In addition, the

Cross Validation (CV) (Wahba, 1990) score can be calculated from the analysis

sensitivity, observation and analysis values in EnKFs without carrying out data denial

experiments (section 4), whereas CV cannot be computed in this way in variational data

assimilation systems because of the approximations made in calculating the analysis

sensitivity. In this paper, following Cardinali et al. (2004), we show how to calculate the

analysis sensitivity and the related diagnostics in an EnKF, and further study the

properties and possible applications of these diagnostics. This paper is organized as

follows: section 2 describes how to calculate the analysis sensitivity in an EnKF. In

section 3, with a geometrical interpretation method adapted from Desroziers et al. (2005),

we show that the analysis sensitivity is proportional to the analysis errors and anti-

correlated with the observation errors. With the Lorenz 40-variable model (Lorenz and

Emanuel, 1998), section 4 verifies the self-sensitivity calculation procedure, and shows

that the CV can be efficiently calculated from the self-sensitivity, observation and

analysis values, and it can be used in observational quality control. In addition, the

geometrical interpretation is experimentally tested in this section. In section 5, with a

5

primitive equation model and a perfect model experimental setup (no model error), we

examine the effectiveness of the trace of the self-sensitivity in assessing the observation

impact obtained from data denial experiments. Section 6 contains a summary and

conclusions.

2. Calculation of the analysis sensitivity in an EnKF

In this section, we first briefly derive the analysis sensitivity equation valid for all

data assimilation schemes (equations (1), (2), (3), equivalent to equations (3.1), (3.4), and

(3.5) in Cardinali et al., 2004) and then focus on how to calculate this quantity in an

EnKF.

In data assimilation, the analysis state ax is a combination of the background (an

n-dimensional vector bx ) with the observations (a p-dimensional vector o

y ) based on a

weighting matrix K . It can be expressed as:

b

n

oaxKHIKyx )( !+= (1)

The (n ! p)weighting matrix K = PbH

T(HP

bH

T+ R)

!1 is a function of the background

error covariance Pb and the observation error covariance R , and H(!) is the linearized

observation operator, which transforms a perturbation from model space to observation

space. The sensitivity of the analysis vector xa to the observation vector yo is:

So =

!ya

!yo= KTHT

= R"1HPaHT , (2)

and the sensitivity with respect to the background is given by

Sb =

!ya

!yb= I p "K

THT= I p " S

o (3)

6

where ya = Hxa = HKyo + (I p !HK)yb is the projection of the analysis vector xa on the

observation space, and bbHxy = is the projection of the background on the observation

space. Pa = (I p - KH)Pb is the analysis error covariance. The matrix o

S is called

influence matrix, because the elements of this matrix reflect the sensitivity of the analysis

state to the observations. The diagonal elements of the matrix oS are the analysis self-

sensitivities, and the off-diagonal elements are cross sensitivities. Similarly !y

a

!yb

reflects

the sensitivity of the analysis state to the background. As shown in Cardinali et al. (2004),

the sensitivity of the analysis to the observation and the sensitivity of the analysis to the

corresponding background at that observation location are complementary (i.e., they add

up to one), and the self-sensitivity has no units and its theoretical value is between 0 and

1 when the observation errors are not correlated (R is diagonal).

EnKFs generate ensemble analyses at every analysis cycle, and the analysis error

covariance can be written as products of the ensemble analysis perturbations, so in

EnKFs the influence matrix So can be written as:

So= R

!1HP

aH

T=

1

k !1R

!1(HX

a)(HX

a)T (4)

where HXa is the analysis ensemble perturbation matrix in the observation space whose

thi column is

HX

ai! h(x

ai) "1

kh(x

ai)

i=1

k

# , (5)

xai is the th

i analysis ensemble member, k is the total number of ensemble analyses, and

h(!) is the observation operator, which can be linear or nonlinear. When the observation

7

operator is linear, the right hand side and the left hand side of Equation (5) are equal.

Otherwise, the left hand side is a linear approximation of the right hand side. When the

observation errors are not correlated, the self-sensitivity can be written as:

Sjjo=!y j

a

!y jo=

1

k "1#$%

&'(1

) j

2(HX

ai) j * (HX

ai) j

i=1

k

+ (6)

and the cross sensitivity Sjlo (see details in Appendix), which represents the change of the

analysis yla with respect to the variation of the observation y

j

o , can be written as:

Sjlo=!yl

a

!y jo=

1

k "1#$%

&'(1

) j

2[(HX

ai) j * (HX

ai)l ]

i=1

k

+ (7)

where !j

2 is the j th observation error variance. Equation (5) indicates that the calculation

of the self-sensitivity and cross sensitivity in EnKFs only requires applying the

observation operator to each analysis ensemble member, and then doing a scalar product

based on the equations (6) and (7). In section 4, this calculation procedure will be verified

with the Lorenz 40-variable model. The calculation of the self-sensitivity based on

equation (6) requires no approximations when the observation errors are not correlated,

so that Sjjo satisfies the theoretical limits (between 0 and 1). In 4D-Var, by contrast, the

calculation of the analysis error covariance is based on a truncated eigenvector expansion

with the vectors obtained through the Lanczos algorithm that can introduce spurious

values larger than 1 (Cardinali et al., 2004).

3. Geometric interpretation of self-sensitivity

Equation (6) shows that the self-sensitivity is proportional to the analysis error

variance and inversely proportional to the observation error variance. Since in most cases,

8

the analysis and the observation error statistics used in data assimilation do represent the

accuracy of the analyses and the observations respectively, equation (6) implicitly

indicates that the analysis sensitivity increases with the analysis errors and decreases with

the observation errors. In this section, we adapt the geometrical interpretation that

Desroziers et al. (2005) used to explain the relationship among the background error,

observation error and analysis error to show the relationship among the analysis

sensitivity, the observation error and the analysis error.

Following the same notation as Desroziers et al. (2005), we rewrite equation

ya = HKyo + (I p !HK)yb by subtracting h(xt ) , the true state in the observation space,

from both sides and obtain

ya ! h(xt ) = HK(yo ! h(xt )) + (I p !HK)(yb! h(xt )) (8)

We define !ya = ya " h(xt ) , !yb = yb " h(xt ) and !yo = yo " h(xt ) , which are the

analysis and the background errors in the observation space, and the observation errors

respectively. Then equation (8) can be written as,

!ya = HK!yo + (I p "HK)!yb (9)

After eigenvalue decomposition, HK can be written as HK = V!VT , where ! is a

diagonal matrix whose elements are the eigenvalues of HK , and the columns of V are

the eigenvectors of the matrix HK . Following Desroziers et al. (2005), the projections of

!ya onto the eigenvectors V are given by VT

!ya = VTV"VT!yo + VTV(I p # ")VT

!yb ,

which can be further written as:

!!ya = "!

!yo + (I p # ")!

!yb (10)

9

Here !!ya ,

!!yo and

!!yb are the projections of !ya , !yo and !yb on the eigenvector

matrix V respectively. When these vectors are projected on a particular eigenvector Vi

with the corresponding eigenvalue equal to !i, the above equation can be written as,

!!yi

a= "

i!!yi

o+ (1# "

i)!!yi

b (11)

Therefore, in the eigenvector iV space, the analysis sensitivity with respect to the

observation is !i, and with respect to the background is (1-!

i). As shown in Cardinali et

al. (2004), they are complementary when the observation error covariance matrix is

diagonal.

All the elements except !i in equation (11) are shown schematically in Figure 1.

In the following, we will show how to represent !i as a function of the angle ! , and

how to interpret !i with the observation error

!!yo

i and the analysis error

!!yi

a . As shown

in Figure 1, the observation error ( !!yo

i) and the background error (

!!yb

i) in the

eigenvector iV space are perpendicular because the background error and the observation

error are assumed to be uncorrelated, which implies that the inner product between

!!yo

iand

!!yb

i is zero. The analysis error (

!!yi

a ) is also perpendicular to the line connecting

the observation and the background, reflecting that the analysis is a linear combination of

the background and the observations closer to the truth (Desroziers et al., 2005). With

these two relationships, the projection of !!yo

ion

!!ya

i is,

!!yi

a"!!yi

o= #

i!!yi

o"!!yi

o$

!!yi

a!!yi

ocos(90° %& ) = #

i!!yi

o2 (12)

10

Since sin(! ) = "

!yi

a/ "!yi

o , and !i= "!yi

a"!yi

ocos(90° #$ ) / "

!yi

o2

, based on the

above equation, then !i= sin

2(" ) and (1-!

i) =cos2 (! ) . This indicates that the analysis

sensitivity with respect to the observation increases with the angle ! (! is greater than

0º and smaller than 90º), and the analysis sensitivity to the background decreases with the

angle ! . This geometrical representation is consistent with equation (6) showing that the

analysis sensitivity per observation increases with the analysis error and decreases with

the observation error.

Figure 1 Geometrical representation of the vector elements in equation (11) (see text). The analysis sensitivity with respect to the observations is sin2 ! (adapted from Desroziers et al., 2005).

4. Validation of the self-sensitivity calculation method in EnKFs and cross

validation experiments with the Lorenz 40-variable model

Cardinali et al. (2004) showed that the change in the analysis value yi

a that takes

place with the deletion of the ith observation can be calculated from Sii

o , yi

o and yi

a

without calculating yi

a(! i ) (the estimate of yi

a when the ith observation is deleted), and that

11

the same was true for the calculation of Cross Validation (CV) score (Wahba, 1990,

theorem 4.2.1), which is defined as (yi

o! y

i

a(! i ))2

i=1

p

" , where p is the total number of

observations. The CV score represents the statistical difference between the actual

observation yi

o and the predicted observation yi

a(! i ) based on all the “buddy”

observations. Therefore, to verify the self-sensitivity calculation method (Equation (6)),

we will follow equations (2.9) and (2.10) in Cardinali et al. (2004) comparing the change

of the analysis value yi

a! y

i

a(! i ) and CV scores based on the actual data denial results of

yi

a(! i ) to those from the self-sensitivity:

yia! yi

a(! i )=

Siio

(1! Siio)(yi

o! yi

a) (13)

(yi

o! yi

a(! i ))2

i=1

p

" =(yi

o! yi

a)2

(1! Siio)2

i=1

p

" (14)

With a correct estimate of the self-sensitivity, the left-hand sides and the right-hand sides

in equations (13) and (14) should be equal. Deleting each observation in turn is a

computationally formidable task in a realistic NWP data assimilation system, so we use

the Lorenz 40-variable for this purpose.

4.1 Lorenz 40-variable model and experimental setup

The Lorenz 40-variable model is governed by the following equation:

Fxxxxx

dt

djjjjj +!!=

!!+ 121 )( (15)

12

The variables ( xj, j = 1,!, J ) represent “meteorological” variables on a “latitude circle”

with periodic boundary conditions. As in Lorenz and Emanuel (1998), J is chosen to be

40. The assimilation interval is 0.05. F is the external forcing, which is 8 for the nature

run, and 7.6 for the forecast, allowing for some model error in the system. Observations

are simulated by adding Gaussian random perturbations (with a standard deviation equal

to 0.2) to the nature run.

The data assimilation scheme we use is the Local Ensemble Transform Kalman

Filter (LETKF), which is one type of EnKF especially efficient for parallel computing

(see Hunt et al. (2007) for a detailed description of this method). Since F has different

values in the nature run and in the forecast run during data assimilation, the multiplicative

covariance inflation method (Anderson and Anderson, 1999) has been applied to account

for model error in addition to sampling errors. The covariance inflation factor is fixed to

be 1.3 in this study, which means that the background error covariance Pb is multiplied

by 1.3 in each data assimilation cycle. In verifying the self-sensitivity calculation method,

we use 40 ensemble members, and observe at every grid point. The analysis sensitivity

Sii

o is calculated after each analysis cycle based on equation (6). After full observation

data assimilation, each observation is left out in turn to get yi

a(! i ) using the same

background forecasts. To experimentally examine the relationship among analysis

sensitivity per observation, observation coverage, and analysis accuracy, we carry out

several experiments with uniform observation coverage, which are 20, 25, 30, 35 and 40

observations. Analysis accuracy is measured by the Root Mean Square (RMS) error,

which is defined as the RMS difference between the analysis mean state and the state

from the nature run. In order to ensure statistical significance of the results, we run each

13

experiment for 1000 analysis cycles, and then calculate the time average over the last 500

analysis cycles.

4.2 Results

Comparison between yi

a! y

i

a(! i ) and Siio

(1! Siio)(yi

o! yi

a) at a single analysis time

(top panel in Figure 2) shows that they have the same value at every grid point. The

differences (not shown) between the two quantities are on the order of 10-7. The CV score

based on deleting each observation in turn (plus signs in the bottom panel of Figure 2)

and that from the self-sensitivity (open circles in the bottom panel of Figure 2) essentially

also have the same value. These comparisons prove that the self-sensitivity calculation

method we propose (equation (6)) is valid in EnKF, and that a complete cross-validation

that would be unfeasible with the standard approach, can be efficiently computed in

EnKFs based on the self-sensitivity. Further experiments with different observation

coverage support this conclusion (not shown).

The fact that yi

o! y

i

a(! i ) can be calculated without carrying out data denial

experiments can be used in observational quality control. In current numerical weather

prediction, one way to do observational quality control is to compare the actual

observation yi

o to the predicted observation yib from numerical model forecast. When the

difference between yi

o and yib is larger than a certain statistic, the observation is marked

as a bad observation (Dee et al., 1999). In this method, we can replace yib with y

i

a(! i ) ,

which can be calculated from equation (13). Since yi

a(! i ) is based on not only the

numerical forecast, but also on the “buddy” observations, it is more accurate than yib , as

14

shown in Figure 3. Since the error difference between yi

a(! i ) and yib is small compared to

the observation error (standard deviation is 0.2), using yi

a(! i ) instead of yib will not make

much difference in observational quality control in the current case, and it may only

make a significant difference when yib is much worse. Note that since the calculation of

yi

a(! i ) needs to know the self-sensitivity, which is a function of analysis ensemble

perturbations (equation (6)), it is only possible to use it in examining the observation

quality after the data assimilation or in reanalysis mode.

Figure 4 shows that the self-sensitivity increases with the analysis RMS error,

which is consistent with the geometrical interpretation in section 3. The analysis error is

anti-correlated with the observation coverage, and so is the self-sensitivity (Figure 4).

The analysis sensitivity per observation becomes larger when the observation coverage

becomes sparser. The analysis sensitivity per observation is about 0.16 when all the grid

points are observed, which indicates that 16% of the information in the analysis comes

from the observation at each observation location. Since the analysis sensitivity with

respect to the background is complementary to the analysis sensitivity with respect to the

observation (section 3), 84% of the information of the analysis comes from the

background forecast. When only half of the grid points have observations, the analysis

self-sensitivity is about 0.32, which indicates that deletion of one observation in the dense

observation coverage will do less harm to the analysis system than deletion of one

observation in a sparse case. This is consistent with field experiments (e.g., Kelly et al.,

2007).

15

Figure 2 Top panel: yi

a! y

i

a(! i ) (open circles) and Sii

o

(1! Sii

o)ri (plus sign) comparison at one

analysis time as function of grid point. Bottom panel: comparison of CV score calculated

from (yi

o! y

i

a(! i ))2

i=1

p

" (open circles) and that from (yi

o! yi

a)2

(1! Siio)2

i=1

p

" (plus sign) in the first 60

analysis cycles.

16

Figure 3 Comparison of the time-average of the absolute 6-hour forecast error ( yi

b! yi

T ,

black line) and of the absolute error of the predicted observation ( yi

a(! i )! y

i

T , grey line).

yi

T is the truth (from the nature run) projected onto the observation space.

Figure 4 Scatter plot of the time averaged analysis sensitivity per observation (y-axis) and the analysis RMS error (x-axis) (from the bottom to the top, the points correspond to 40 observations, 35 observations, 30 observation, 25 observation and 20 observations).

17

5. Consistency between information content and the observation impact obtained

from data denial experiments

Equation (13) indicates that the deletion of an observation with larger self-

sensitivity will result in larger change in the analysis value compared to the deletion of

the observations with smaller self-sensitivities. Assuming that assimilating an observation

makes the analysis better, which is statistically true when the error statistics used in data

assimilation are correct, the deletion of the observation with larger self-sensitivity will

result in a worse analysis than the deletion of the other observations. Equation (13) is

only valid when only one observation is left out. However, in most Observing System

Experiments (OSEs) or Observing System Simulation Experiments (OSSEs), we need to

examine the impact of a subset of observations on analysis, which is done by data denial

experiments. In this section, we will explore whether the trace of the self-sensitivities of a

subset of observations, which is referred to as information content, can qualitatively show

the actual observation impact from the data denial experiments.

5.1 Experimental setup

We use the Simplified Parameterizations primitivE Equation DYnamics

(SPEEDY, Molteni, 2003) model, which is a global atmospheric model with 96x48 grid

points in the horizontal and 7 vertical levels in a sigma coordinate. We follow a “perfect

model” OSSE setup, (e.g., Lord et al. 1997) in which the simulated truth is generated

with the same atmospheric model as the one used in the data assimilation (identical twin

experiment). Observations are created by adding Gaussian random noise to the truth. The

observation error standard deviations assumed for winds and specific humidity are about

30% of their natural variability, shown in Figure 5. The specific humidity is only

18

observed in the lowest five vertical levels, below 300hPa. Since temperature variability

does not change much with vertical levels, we assume the observation error standard

deviation is 0.8K in all vertical levels. The error standard deviation for surface pressure is

1.0hPa. In such an experimental setup, the observation error statistics are fairly accurate,

and so are the analysis error statistics estimated in each analysis cycle. We carry out 1.5-

month data assimilation cycles using the LETKF data assimilation scheme. The results

shown in this section are averaged over the last month.

Figure 5 The observation error standard deviation for zonal wind (Unit: m/s, left panel), meridional wind (Unit: m/s, middle panel) and specific humidity (Unit: g/kg, right panel).

In the data denial experimental setup, the control experiment (called all-obs) has

full observation coverage (both the closed circles and the plus signs in Figure 6). In the

data denial experiments, the denied variable is only observed at the “rawinsonde”

locations (closed circles in Figure 6). For instance, in the no-u sensitivity experiment,

zonal wind observations are still observed at the rawinsonde locations but not at the

locations marked with plus signs. We compare the information content of the zonal wind

observations over the locations with plus signs with the actual impact of these

observations obtained as the analysis error difference between no-u and all-obs

19

experiment. Again, the information content is the summation of the self-sensitivity,

which is calculated from equation (6) based on the ensemble analyses of the control run.

We also do a similar comparison for a no-q sensitivity experiment. Ideally, the area with

larger information content will correspond with the area with larger error differences

between the sensitivity experiment and the control run.

Figure 6 The full observation distribution (31% of the total grid points; closed dots: rawinsonde observation network; plus signs: dense observation network). Every observation location is made at the grid point.

5.2 Results

Figure 7 shows zonal mean zonal wind analysis RMS error difference (contours)

between no-u and all-obs and the information content (shaded) of the zonal wind

observations. Here, the information content is the trace of zonal wind self-sensitivity at

the observation locations with plus signs in each latitude circle (Figure 6). Quantitatively,

the analysis RMS error difference (contour) between no-u and all-obs experiment is

largest over the tropics, and is smallest over the mid-latitude Northern Hemisphere (NH).

Qualitatively, the information content (shaded area) agrees with this RMS error

difference, showing the largest values over the tropics and the smallest values in the mid-

latitude NH. Interestingly, the zonal wind RMS error difference over the mid-latitude

20

Southern Hemisphere (SH) is small, even though the zonal wind observation coverage is

significantly different between no-u and all-obs over that region. This is because the

observations other than zonal wind, such as the meridional wind observations and the

mass related observations (such as temperature and surface pressure) are used by the

LETKF to update the zonal wind analysis. The information content basically reflects this

feature, also showing relatively small values over that region.

Figure 8 shows the time averaged zonal wind self-sensitivity (filled grid points) at

the observation locations with plus signs (Figure 6) and the zonal wind RMS error

difference between no-u and the control run at the sixth model level (! = 0.2 in the

sigma coordinate, around 200hPa). It shows that the self-sensitivity is larger between

30ºS and 30ºN, and so is the RMS error difference. However, we should not overly

interpret this result. For any two single points, the relative magnitude of the self-

sensitivity may not reflect the relative magnitude of the RMS error difference due to

sampling errors. Self-sensitivity can only qualitatively show the observation impact when

it is summed over some spatial domain. Since the sum of the self-sensitivity and the

sensitivity of the analysis with respect to the background at the same observation location

are equal to 1, Figure 8 indicates that more than 85% of the information in the analysis

comes from the background forecasts, which is consistent with Cardinali et al. (2004),

because the background forecast contains observation information from the previous

analysis cycles.

21

Figure 7 Zonal wind RMS error difference (contour, unit: m/s) between no-u and the control experiment, and the zonal wind observation information content (shaded) over the locations with plus signs (Figure 6).

Figure 8 Time averaged zonal wind RMS error difference (contour; unit: m/s) between no-u and the control experiment at the sixth model level (! = 0.2 in sigma coordinate, around 200hPa) and the self-sensitivity (filled grid point) of the zonal wind observations at the locations with plus signs (Figure 6) at the same level.

22

The high spatial-temporal variability and the nonlinear physical processes related

with humidity make the observation impact study related with humidity a very

challenging problem (e.g., Dee and DaSilva, 2003; Langland and Baker, 2004). In spite

of these challenges, the spatial distribution of the specific humidity information content is

consistent with the specific humidity RMS error difference between the no-q and all-obs

experiments (left panel in Figure 9). It is important to note that the information content of

the specific humidity observations qualitatively reflects not only the impact on the

humidity analysis field, but also the impact on the other dynamical variables, such as the

zonal wind (right panel in Figure 9). This originates from the multivariate assimilation

characteristics in the all-obs experiment, in which the specific humidity observations

linearly affect the wind analyses through the error covariance, and this effect is

maximized in the tropical upper troposphere (right panel in Figure 9). This large impact

of humidity observations on the wind analyses over the high tropical levels is consistent

with the multivariate assimilation of the AIRS humidity retrievals in a low resolution

NCEP Global Forecast System (Liu, 2007, Liu et al., 2009).

The horizontal distribution of the specific humidity self-sensitivity (Figure 10) is

also qualitatively consistent with the analysis RMS error difference between no-q and the

control run for both the specific humidity and zonal wind analyses. Interestingly, the

information content and the self-sensitivity of the specific humidity observations (Figure

9 and Figure 10) are larger than those of the zonal wind observations (Figure 7 and

Figure 8). This indicates that the specific humidity analyses are more sensitive to the

humidity observations than the zonal wind analyses to the zonal wind observations.

However, this does not mean that assimilating specific humidity observations would

23

reduce more error overall than assimilating zonal wind observations. This is because

specific humidity and zonal wind have different dynamical roles in the forecast system.

The large-scale interaction between zonal wind and the other dynamical variables during

the forecast may make zonal wind observations more important to reduce the overall

analysis RMS error. Therefore, information content may not be applicable to examine the

relative impact of the observations belonging to different dynamical variable types for

which forecast sensitivity to observations may be a more appropriate measure (Langland

and Baker, 2004, Liu and Kalnay, 2008, Cardinali, 2009). Nevertheless, the qualitative

consistency between the information content and the actual observation impact from data

denial experiments suggests that we can use it to examine the relative impact of the same

dynamical variable type of observations without actually carrying out the much more

expensive data denial experiments.

Figure 9 RMS error difference (contour) between no-q and all-obs experiment and the specific humidity observation information content (shaded) (Left panel: specific humidity RMS error difference (Unit: 10-1g/kg); right panel: zonal wind RMS error difference (Unit: m/s).

24

Figure 10 The time averaged RMS error difference (contour) between no-q and all-obs experiment and the self-sensitivity (filled grid point) of specific humidity observations at the locations with plus signs (Figure 6) at the fourth model level (! = 0.51 in the sigma coordinate, around 500hPa). The contours in the top panel are the specific humidity RMS error difference (Unit: 10-1g/kg), and the contours in the bottom panel are the zonal wind RMS error difference (Unit: 10-1m/s).

25

6. Conclusions and discussion

The influence matrix reflects the regression fit of the analysis to observations, and

the self-sensitivity (the diagonal values of influence matrix) gives a measure of the

analysis sensitivity to the observations. Information content, defined as the trace of the

self-sensitivities, reflects the amount of information that the analysis extracts from a

subset of observations during data assimilation. These measures show the analysis

sensitivity to observations, and can further show the relative impact of the same type of

observations on the performance of the analysis system when the statistics used in the

data assimilation reflect the true uncertainty of each factor.

In this paper, we showed how to calculate the self-sensitivity and the cross

sensitivity (off-diagonal elements of influence matrix) in an EnKF framework. Based on

the ensemble analyses generated in each analysis cycle, the calculation of the self-

sensitivity and cross-sensitivity in EnKFs only needs the application of the observation

operator to each analysis ensemble member (equations (6) and (7) in section 2). By

comparing cross-validation (CV) scores calculated from leaving out each observation in

turn with those based on self-sensitivity, we verified the self-sensitivity calculation and

the cross-validation method in EnKF with the Lorenz 40-variable model. Unlike the self-

sensitivity calculation in 4D-Var (Cardinali et al., 2004), the self-sensitivity calculated in

EnKFs satisfies the theoretical limits (between 0 and 1) when the observation errors are

not correlated, which makes it possible to calculate the predicted observation based on all

the “buddy” observations without carrying out data denial experiments. The predicted

observation calculated in this way is more accurate than the predicted observation from

the numerical model forecast, which makes it a better choice in observational quality

26

control when the background forecast is not very accurate. In agreement with geometrical

interpretation (Desroziers (2005)), we showed experimentally that self-sensitivity is

proportional to the analysis errors and anti-correlated with the observation coverage when

the error statistics used in the data assimilation are fairly accurate.

In an idealized experimental setup, the comparison between information content

and the actual observation impact given by data denial experiments shows qualitative

agreement. This implies that the spatial distribution of information content can be utilized

in examining the relative importance of the observations belonging to the same

dynamical variable type without actually carrying out much more expensive data denial

experiments. Since the impact of different dynamical variable types is also related with

the dynamical role they play in the forecast system, information content may not reflect

the relative importance of the observations belonging to different dynamical variable

types during a forecast: the sensitivity of the forecasts to observations (Langland and

Baker, 2004; Liu and Kalnay, 2008, Cardinali, 2009) would be a more useful measure for

this purpose.

Acknowledgements

We are grateful to the Weather-Chaos group in Maryland for suggestions and

discussions, and to Prof. John Thuburn and to anonymous reviewers whose valuable

suggestions greatly improved the manuscript. This work has been supported by NASA grants

NNG06GB77G, NNX07AM97G, NNX08AD40G, and DOE grants DEFG0207ER64437.

Appendix: Cross sensitivity derivation

27

Equations (4) and (5) in section 2 show that the influence matrix So can be written

as:

So= R

!1HP

aH

T=

1

k !1R

!1(HX

a)(HX

a)T (A1)

In the case of a two-observations ( j th and lth ) scenario, the influence matrix So is:

So=

Sjjo

Sjlo

Sljo

Sllo

!

"#$

%&=

1

k '1!"#

$%&

( j

20

0 ( l

2

!

"#$

%&

'1 [(HXai) j ) (HX

ai) j ]

i=1

k

* [(HXai) j ) (HX

ai)l ]

i=1

k

*

[(HXai)l ) (HX

ai) j ]

i=1

k

* [(HXai)l ) (HX

ai)l ]

i=1

k

*

!

"

####

$

%

&&&&

(A2)

Therefore, the cross sensitivity is:

Sjlo=!yl

a

!y jo=

1

k "1#$%

&'(1

) j

2[(HX

ai) j * (HX

ai)l ]

i=1

k

+ (A3)

References

Anderson JL. 2001. An ensemble adjustment Kalman filter for data assimilation. Mon.

Wea. Rev., 129, 2884-2903.

Anderson JL and Anderson SL. 1999. A Monte Carlo implementation of the nonlinear

filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev.,

127, 2741–2758.

Bishop CH, Etherton B and Majumdar SJ. 2001. Adaptive sampling with the ensemble

transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420-436.

Cardinali C, Pezzulli S and Andersson E. 2004. Influence-matrix diagnostic of a data

assimilation system. Quart. J. Roy. Meteor. Soc. 130, 2767-2786

28

Cardinali C. 2009. Monitoring the observation impact on the short-range forecast. Quart.

J. Roy. Meteor. Soc. 135, 239-250.

Dee DP and da Silva AM. 2003. The choice of variable for atmospheric moisture

analysis. Mon. Wea. Rev., 131, 155-171.

Dee DP, Rukhovets L, Todling R, da Silva AM and Larson JW. 2001. An adaptive buddy

check for observational quality control. Quart. J. Roy. Meteor. Soc. 127, 2451-2471.

Desroziers G, Berre L, Chapnik B, and Poli P. 2005. Diagnosis of observation, background

and analysis error statistics in observation space. Quart. J. Roy. Meteor. Soc., 131, 3385-

3396.

Evensen G. 1994. Sequential data assimilation with a nonlinear quasi-geostrophic model

using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10

143-10 162.

Houtekamer PL, and Mitchell HL. 2001. A sequential Ensemble Kalman Filter for

atmospheric data assimilation. Mon. Wea. Rev., 129, 123-137.

Hunt BR, Kostelich EJ, and Szunyogh I. 2007. Efficient Data Assimilation for

Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter. Physics D., 230,

112-126.

Kelly G, Thepaut J-N, Buizza R and Cardinali C. 2007. The value of targeted

observations part I : the value of observations taken over the oceans. ECMWF’s

status and plans. ECMWF Tech. Memo., 511, pp27.

Langland RH and Baker NL. 2004. Estimation of observation impact using the NRL

atmospheric variational data assimilation adjoint system. Tellus, 56a, 189-201.

Liu J. 2007. Applications of the LETKF to adaptive observations, analysis sensitivity,

observation impact, and assimilation of Moisture. Ph. D thesis, University of Maryland.

29

Liu J. and Kalnay E. 2008. Estimating observation impact study without adjoint model in an

ensemble Kalman filter. Quart. J. Roy. Meteor. Soc, 134, 1327-1335.

Liu J, Li H, Kalnay E, Kostelich E. and Szunyogh I. 2009. Univariate and multivariate

assimilation of AIRS humidity retrievals with the Local Ensemble Transform Kalman

Filter. Mon. Wea. Rev. (in print)

Lord SJ, Kalnay E, Daley R, Emmit GD, and Atlas R. 1997. Using OSSEs in the design of

the future generation of integrated observing system. 1st Symposium on Integrated

Observation Systems, AMS

Lorenz EN, and Emanuel KA. 1998. Optimal sites for supplementary observations:

Simulation with a small model. J. Atmos. Sci., 55, 399-414.

Molteni F. 2003. Atmospheric simulations using a GCM with simplified physical

parametrizations. I: Model climatology and variability in multi-decadal experiments.

Climate Dyn., 20, 175-191.

Ott, E, Hunt BR, Szunyogh I, Zimin AV, Kostelich EJ, Corazza M, Kalnay E, Patil DJ,

and Yorke JA. 2004. A Local Ensemble Kalman Filter for Atmospheric Data

Assimilation. Tellus, 56A, 415-428

Rabier F, Fourrie N, Chafai D, and Prunet P. 2002. Channel selection method for Infrared

Atmospheric Sounding Interferometer radiances. Quart. J. Roy. Meteor. Soc. 128,

1011-1027.

Wahba G. 1990. Spline models for observational data. CMMS-NSF, Regional conference

series in applied mathematics, volume 59. Society for Industrial and Applied

Mathematics.

30

Whitaker JS, and Hamill TM. 2002. Ensemble data assimilation without perturbed

observations. Mon. Wea. Rev. 130, 1913-1924.

Documents

Analysis sensitivity calculation in an Ensemble Kalman … sensitivity calculation in an ... Idealized experiments based on a simplified-parameterization ... score can be calculated