36
Background error covariance with balance constraints for aerosol species and 1 applications in variational data assimilation 2 3 4 Zengliang Zang 1 , Zilong Hao 1 , Yi Li 1 , Xiaobin Pan 1 , Wei You 1 , Zhijin Li 2 and Dan Chen 3 5 6 Units: 1 College of Meteorology and Oceanography, PLA University of Science and Technology, 7 Nanjing 211101, China; 8 2 Joint Institute For Regional Earth System Science and Engineering, University of California, Los 9 Angeles, California90095, USA; 10 3 National Center for Atmospheric Research, Boulder, Colorado 80305, USA 11 12 13 14 15 16 17 18 19 20 June 8, 2016 21 22 23 24 25 26 27 Corresponding author: 28 Prof. Zengliang Zang 29 E-mail: [email protected] 30 Telephone: 86-025-80830400 31 Fax: 86-025-80830400 32 Address: No.60, Shuanglong Street, Nanjing 211101, China 33

1 Background error covariance with balance constraints for ... · 46 constraints, which is a consequence of the spreading of observation information across variables 47 due to the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Background error covariance with balance constraints for aerosol species and 1

    applications in variational data assimilation 2

    3

    4

    Zengliang Zang1, Zilong Hao

    1, Yi Li

    1, Xiaobin Pan

    1, Wei You

    1, Zhijin Li

    2 and Dan Chen

    3 5

    6

    Units: 1 College of Meteorology and Oceanography, PLA University of Science and Technology, 7

    Nanjing 211101, China; 8

    2Joint Institute For Regional Earth System Science and Engineering, University of California, Los 9

    Angeles, California90095, USA; 10

    3National Center for Atmospheric Research, Boulder, Colorado 80305, USA 11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    June 8, 2016 21

    22

    23

    24

    25

    26

    27

    Corresponding author: 28

    Prof. Zengliang Zang 29

    E-mail: [email protected] 30

    Telephone: 86-025-80830400 31

    Fax: 86-025-80830400 32

    Address: No.60, Shuanglong Street, Nanjing 211101, China 33

  • Abstract 34

    Balance constraints are important for background error covariance (BEC) in data assimilation 35

    to spread information between different variables and produce balance analysis fields. Using 36

    statistical regression, we develop a balance constraint for the BEC of aerosol variables and apply it 37

    to a three-dimensional variational data assimilation system in the WRF/Chem model. One-month 38

    forecasts from the WRF/Chem model are employed for BEC statistics. The cross-correlations 39

    between the different species are generally high. The largest correlation occurs between elemental 40

    carbon and organic carbon with as large as 0.9. After using the balance constraints, the 41

    correlations between the unbalanced variables reduce to less than 0.2. A set of data assimilation 42

    and forecasting experiments is performed. In these experiments, surface PM2.5 concentrations and 43

    speciated concentrations along aircraft flight tracks are assimilated. The analysis increments with 44

    the balance constraints show spatial distributions more complex than those without the balance 45

    constraints, which is a consequence of the spreading of observation information across variables 46

    due to the balance constraints. The forecast skills with the balance constraints show substantial 47

    and durable improvements from the 2nd

    hour to the 16th

    hour compared with the forecast skills 48

    without the balance constraints. The results suggest that the developed balance constraints are 49

    important for the aerosol assimilation and forecasting. 50

    Keyword: aerosol species, WRF/Chem, data assimilation, balance constraint, background error 51

    covariance 52

    53

    54

    55

    56

    57

    58

    59

    60

  • 1. Introduction 61

    Aerosol data assimilation in chemical transport models has received an increasing amount of 62

    attention in recent years as a basic methodology for improving aerosol analysis and forecasting. In 63

    a data assimilation system, the background error covariance (BEC) plays a crucial role in the 64

    success of an assimilation process. The BEC and the observation error determine analysis 65

    increments from the assimilation process (Derber and Bouttier 1999, Chen et al., 2013). 66

    However, accurate estimation of the BEC remains difficult due to a lack of information about 67

    the true atmospheric states and also due to computational requirement arising from the large 68

    dimension of the BEC (typically ). For a variational data assimilation system, a few 69

    methods have been developed to estimate and simplify the expression of the BEC, such as the 70

    analysis of innovations, the NMC (National Meteorological Center) and the ensemble-based 71

    (Monte Carlo) methods. The NMC method is extensively used in operational atmospheric and 72

    meteorology-chemistry data assimilation systems. It assumes that the forecast errors are 73

    approximated by differences between pairs of forecasts that are valid at the same time (Parrish and 74

    Derber, 1992). Pagowski et al. (2010) estimated the BEC of PM2.5 (particles having an 75

    aerodynamic diameter less than 2.5 µm) by calculating the differences between the forecasts of 24 76

    and 48 h, and used the estimated BEC in a Grid-point Statistical Interpolation (GSI) system (Wu et 77

    al., 2002). Benedetti et al. (2007) estimated the BEC of the sum of the mixing ratios of all aerosol 78

    species for an operational analysis and forecast systems at ECMWF (The European Centre for 79

    Medium-Range Weather Forecasts). The BEC with multiple species and size bins of aerosols have 80

    been calculated and employed in data assimilation. Liu et al. (2011) estimated the BEC with 14 81

    aerosol species in the Goddard Chemistry Aerosol Radiation and Transport scheme of the Weather 82

    Research and Forecasting/Chemistry (WRF/Chem) model and applied it to the GSI system. 83

    Schwartz et al. (2012) increased the number of the species to 15 based on the study of Liu et al. 84

    (2011). Li et al. (2013) estimated the BEC for five species derived from the Model for Simulation 85

    Aerosol Interactions and Chemistry (MOSAIC) scheme. 86

    One important role that the BEC plays in meteorological data assimilation is to spread 87

    information between different variables to produce balanced analysis fields, which employ 88

    balance constraints to convert original variables into new independent variables. Balance 89

  • constraints have been employed in atmospheric and oceanic data assimilation, such as geostrophic 90

    balance or temperature-salinity balance (Bannister, 2008a, 2008b). To incorporate balance 91

    constraints, the model variables are usually transformed to balanced and unbalanced parts. The 92

    unbalanced parts as control variables are can be assumed independent, and the balanced parts are 93

    constrained by balance constraints (Derber and Bouttier, 1999). Instead of using an empirical 94

    function as a balance constraint, balance constraints are also derived using regression techniques 95

    (Ricci and Weaver, 2005). Although distinct empirical relations between some variables (such as 96

    temperature and humidity) may not exist, the regression equation can also be estimated as balance 97

    constraints (Chen et al., 2013). 98

    In current aerosol variational data assimilation with multiple variables, balance constraints are 99

    not yet incorporated in the BEC. The state variables are assumed to be independent variables 100

    without cross-correlation. However, the aerosol species are frequently highly correlated due to 101

    their common emission sources and diffusion processes. For example, the correlations in terms of 102

    the R-square between elemental carbon and black carbon exceed 0.6 in many locations across Asia 103

    and the South Pacific in both urban and suburban locations (Salako et al., 2012), and the 104

    correlations between different size bins, such as PM2.5 and PM10-2.5 (the diameter of particles being 105

    between 2.5 and 10 µm), are also generally significant (Sun et al., 2003; Geller et al., 2004). Thus, 106

    the cross-correlations between different species or size bins are necessary to produce balanced 107

    analysis fields. Cross-correlations spread the observation information from one variable to other 108

    variables, which can produce more balanced initial fields. For the data assimilation of the 109

    ensemble Kalman filter method, the BEC with balance constraints is assured (Pagowski et al., 110

    2012; Schwartz et al., 2014), although the balance may break down because of localization. 111

    Recently, several studies have suggested that the BEC with balanced cross-correlation should 112

    be introduced into aerosol variational data assimilation (Kahnert, 2008; Liu et al., 2011; Li et al., 113

    2013; Saide et al., 2013). Kahnert (2008) exhibited cross-correlations of the seventeen aerosol 114

    variables from Multiple-scale Atmospheric Transport and Chemistry (MATCH) Model. He found 115

    that the statistical cross-correlations between aerosol components are primarily influenced by the 116

    interrelations between emissions and by interrelations due to chemical reactions to a much lesser 117

    degree. Saide et al., (2012; 2013) incorporated the capacity to add cross-correlations between 118

  • aerosol size bins in GSI for assimilating observations of aerosol optical depth (AOD) data. The 119

    cross-correlations between the two connecting size bins for each species were considered using 120

    recursive filters while, the cross-correlation is not considered for the other size bins that are not 121

    connecting. 122

    In this paper, we explore incorporating cross-correlations between different species in BEC 123

    using balance constraints. The balance constraints are established using statistical regression. We 124

    apply the BEC with the balance constraints to a data assimilation and forecasting system with the 125

    MOSAIC scheme in WRF/Chem. The MOSAIC scheme includes a large number of variables with 126

    eight species, and flexibility of eight or four size bins. The scheme of four size bins is used in our 127

    studies. The four bins are located between 0.039–0.1 μm, 0.1–1.0 μm, 1.0–2.5 μm, and 2.5–10 μm, 128

    and the total mass of the first three bins are PM2.5. A 3DVAR system for the MOSAIC (4-bin) 129

    scheme has been developed by Li et al. (2013). For comparisons, we employ this 3DVAR system 130

    with the same model configurations as employed by Li et al. (2013). The data assimilation and 131

    forecasting experiments are performed with a focus on assessing the impact of cross-correlations 132

    of the BEC on analyses and forecasts. 133

    The paper is organized as follows: Section 2 describes the 3DVAR system and the formulation 134

    of the BEC. Section 3 describes the WRF/Chem configuration and estimates the correlations 135

    among the emissions. The statistical characteristics of the BEC, including the regression 136

    coefficient of the cross-correlation, are discussed in Section 4. Using the BEC, experiments of 137

    assimilating surface PM2.5 observations and aircraft observations are discussed in Section 5. 138

    Shortcomings, conclusions and future perspectives are presented in Section 6. 139

    2. Data assimilation system and BEC 140

    In this section, we present a formulation of the BEC with cross-correlation between different 141

    species using a regression technique. Then, the cost function with the new BEC is derived and the 142

    calculating factorization of the BEC is described. 143

    The control variables of the data assimilation are obtained from the MOSAIC (4-bin) aerosol 144

    scheme in the WRF/Chem model (Zaveri et al., 2008). The MOSAIC scheme includes eight 145

    aerosol species, that is, elemental carbon or black carbon (EC/BC), organic carbon (OC), nitrate 146

    (NO3), sulfate (SO4), chloride (Cl), sodium (Na), ammonium (NH4), and other inorganic mass 147

  • (OIN). Each species is separated into four bins with different sizes: 0.039–0.1 μm, 0.1–1.0 μm, 148

    1.0–2.5 μm and 2.5–10 μm. The scheme involves 32 aerosol variables with eight species and four 149

    size bins. These variables cannot be directly introduced as control variables in an assimilation 150

    system in consideration of computational efficiency. The number of variables must be decreased 151

    prior to assimilation. Li et al. (2013) have lumped these variables into five species as control 152

    variables in the 3DVAR system. The five species consist of EC, OC, NO3, SO4 and OTR. Here, 153

    OTR is the sum of Cl, Na, NH4 and OIN. Note that the data assimilation system aims to assimilate 154

    the observation of PM2.5; only the first three of four size bins are utilized to lump as one control 155

    variable for each species. 156

    For a 3DVAR system, the cost function ( ), which measures the distance of the state vector to the 157

    background and observations, can be written as follows: 158

    . (1) 159

    Here, is the vector of the state variables, including EC, OC, NO3, SO4 and OTR; is the 160

    background vector of these five species, which are generated by the MOSAIC scheme; is the 161

    observation vector; is the observation operator that maps the model space to the observation 162

    space and is assumed to be linear here; is the observation error covariance associated with ; 163

    and is the background error covariance associated with . Eq. (1) is usually written in the 164

    incremental form 165

    , (2) 166

    where ) is the incremental state variable. The observation innovation vector is 167

    known as . The minimization solution is the analysis increment , and the final 168

    analysis is . This analysis is statistically optimal as a minimum error variance 169

    estimate (e.g., Jazwinski, 1970; Cohn, 1997). 170

    In Eq. (1) or Eq. (2), is a , where is the number of model grid points, 171

    and is the number of state variables. is a symmetric matrix with a dimension of . 172

    For a high-resolution model, the number of vector is on the order of . Therefore, the 173

    number of elements in B is approximately . With this dimension, B cannot be explicitly 174

    manipulated. To pursue simplifications of B, we employ the following factorization 175

  • , (3) 176

    where and are the standard deviation matrix and the correlation matrix, respectively. 177

    and can be described and separately prescribed after the factorization. is a diagonal matrix 178

    whose elements include the standard deviation of all state variables in the three-dimensional grids. 179

    To reduce the computational cost, we use the average value of standard deviations that are at the 180

    same level. Thus, the standard deviation is simplified with vertical levels. is a symmetric 181

    matrix, having the form 182

    , (4) 183

    where , , , and at diagonal locations are the background error 184

    auto-correlation matrices that are associated with each species. They represent the correlation 185

    among pairs of grid points for one species. Other submatrices represent the correlations between 186

    different species, known as cross-correlations. For example, represents the cross-correlations 187

    between EC and OC, and

    . In Li et al. (2013), these cross-correlations were 188

    disregarded, that is, the five species are considered independently and is thus a block diagonal 189

    matrix. 190

    In this study, the cross-correlations between different species are considered by introducing 191

    control variable transforms (Derber and Bouttier, 1999; Barker, 2004; Huang, 2009). We divide 192

    the model aerosol variables into balanced components ( ) and unbalanced components ( ): 193

    . (5) 194

    Note the EC does not need to be divided. There is not unbalanced component for EC that is 195

    similar to the variable of vorticity in the data assimilation of ECMWF (Derber and Bouttier, 1999), 196

    or the variable of stream function in the data assimilation of MM5 (Barker, 2004). The 197

    transformation from unbalanced variables ( ) to full variables ( ) by the balance operator 198

    is given by 199

    . (6) 200

  • Eq. (6) can be written as 201

    (7) 202

    where is the submatrix of , which represents the statistical regression coefficients between 203

    the variables and (Chen et al., 2013). Note that is a diagonal matrix with the dimension of 204

    model grid points. Each model grid point has a regression coefficient. For convenience, we 205

    assumed that the elements of is a constant value for all grid points, which are denoted as 206

    and are calculated by linear regression with all grid points. For example, can be obtained 207

    from the regression equation of OC and as 208

    , (8) 209

    where is the residual. and can be estimated from the forecast differences of 24 h 210

    forecasts and 48 h forecasts, similar to the statistics of the BEC. Eq. (8) contains the slope but no 211

    intercept. The intercept is nearly zero because and represent forecast differences that 212

    can be considered to be zero mean values. After obtaining , the balanced part (e.g., the value 213

    of the regression prediction) of OC can be obtained by 214

    . (9) 215

    Where represents the predicted value of Eq. (8), which is equal to the balanced part ( ). 216

    Remove the from the full variables to obtain the unbalanced part ( ), that is, in Eq. 217

    (8). Thus, the calculation of can be written as 218

    . (10) 219

    Here, and EC are employed as predictors in the next regression equation to obtain 220

    . Then, we can obtain the unbalanced parts of the remaining variables, which are defined as 221

    follows: 222

    (11) 223

    (12) 224

    (13) 225

  • The coefficient of determination ( ) can be employed to measure the fit of these regressions. It 226

    can be expressed as 227

    , (14) 228

    where SSR and SST are the regression sum of squares and the sum of squares for total, 229

    respectively. 230

    These unbalanced parts can be considered to be independent because they are residual and 231

    random. denotes the unbalanced variables of the BEC and can be factorized as 232

    , (15) 233

    where and are the standard deviation matrix and the correlation matrix, respectively. 234

    should be a block diagonal without cross-correlations as follows: 235

    . (16) 236

    According the definition of the BEC, 237

    . (17) 238

    And can be written as 239

    . (18) 240

    Using Eq. (6), Eq. (17) and Eq. (18), the relationship between and is 241

    . (19) 242

    and

    are defined as the square root of and the square root of , respectively. Their 243

    transformation is 244

    . (20) 245

    Using Eq. (15), Eq. (20) can be written as follows: 246

    . (21) 247

    Generally, a transformed cost function of Eq. (2) is expressed as a function of a preconditioned 248

    state variable: 249

  • . (22) 250

    Here,

    . Using Eq. (21), Eq. (22) can be written as 251

    . (23) 252

    Eq. (23) is the last form of the cost function with the cross-correlation of . 253

    According to Li et al. (2013), the correlation matrix of the unbalanced parts ( ) is factorized as 254

    (24) 255

    Here, denotes the Kronecker product, and , and represent the correlation 256

    matrices between gridpoints in the direction, the direction, and the direction, respectively, 257

    with the sizes , , and , respectively. Here, , and represent the 258

    numbers of grid points in the direction, direction, and direction, respectively. This 259

    factorization can decrease the size of the dimension of . Another desirable property of Eq. (24) 260

    is 261

    (25) 262

    and are expressed by Gaussian functions, and is directly computed from the proxy 263

    data. They will be discussed in Sec 4.2. 264

    3. WRF/Chem configuration and cross-correlations of emission species 265

    In this section, we describe the configuration of WRF/Chem, whose forecasting products will 266

    be employed in the following BEC statistics and data assimilation experiments. In addition, the 267

    cross-correlations of emission species from the WRF/Chem emission data are investigated to 268

    understand the cross-correlation between different species of the BEC. 269

    3.1 WRF/Chem configuration 270

    WRF/Chem (V3.5.1) is employed in our study. This is a fully coupled online model with a 271

    regional meteorological model that is coupled to aerosol and chemistry models (Grell et al., 2005). 272

    The model domain with three spatial domains is shown in Figure 1. The horizontal grid spacing 273

    for these three domains are 36 km (80×60 points), 12 km (97×97 points), and 4 km (144×96 274

    points), respectively. The outer domain spans southern California and the innermost domain 275

  • encompasses Los Angeles. All domains have 31 vertical levels with the top at 50 hPa. The vertical 276

    grid is stretched to place the highest resolution in the lower troposphere. The discussion of the 277

    BEC and the emissions presented in this paper will be confined to the innermost domain. The 278

    initial meteorology conditions for WRF/Chem are prepared using the North American Regional 279

    Reanalysis (NARR) (Mesinger et al. 2006). The meteorology boundary conditions and sea surface 280

    temperatures are updated at each initialization. For the forecast running, the initial meteorological 281

    conditions are obtained from the NARR data. The initial aerosol conditions are obtained from the 282

    former forecast. The emissions are derived from the National Emission Inventory 2005 (NEI’05) 283

    for both aerosols and trace gases (Guenther et al., 2006). For more details, the readers are referred 284

    to Li et al. (2013). 285

    286

    Figure 1. Geographical display of the three-nested model domains. The innermost domain covers 287

    the Los Angeles basin; the black point denotes the location of Los Angeles. 288

    3.2 Cross-correlations of emission species 289

    The emission source is necessary for running the WRF/Chem model. It is an important factor 290

    for the distribution of the aerosol forecasts. The analysis of the correlations among the emission 291

    species can help us to understand the BEC statistics. The emission species is derived from the 292

    emission file that is produced by the NEI’05 data for each model domain. Only the emission data 293

    for the innermost domain is used to calculate the correlation among the emission species. The 294

    emission file contains 37 variables, including gas species and aerosol species. An aerosol species 295

    also comprises a nuclei mode and accumulation model species (Peckam et al., 2013). From these 296

  • aerosol emission species, five lumped aerosol species are calculated, which is consistent with the 297

    variables in the data assimilation. These five lumped species are E_EC (sum of the nuclei mode 298

    and the accumulation mode of elemental carbon PM2.5), E_ORG (sum of the nuclei mode and the 299

    accumulation mode of organic PM2.5), E_NO3 (sum of the nuclei mode and the accumulation 300

    mode of nitrate PM2.5), E_SO4 (sum of the nuclei mode and the accumulation mode of sulfate 301

    PM2.5), and E_PM25 (sum of the nuclei mode and the accumulation mode of unspeciated primary 302

    PM2.5). 303

    Figure 2 shows the cross-correlations of the five lumped aerosol emission species. All 304

    cross-correlations exceed 0.5. This result reveals that the emission species are correlated, which 305

    may be attributed to the common emission sources and diffusion processes that are controlled by 306

    the same atmospheric circulation. The most significant cross-correlation is between E_EC and 307

    E_ORG with a value of approximately 0.8. This high correlation demonstrates that the emission 308

    distributions of these two species are very similar. Their emissions are primary in urban and 309

    suburban areas with small emissions in rural areas and along roadways (not shown). As shown in 310

    Fig. 2, the lowest cross-correlation is between E_ORG and E_SO4; the latter emissions are 311

    primary in the urban and suburban areas with few emissions in rural areas and roadways (not 312

    shown). 313

    314

    Figure 2. Cross-correlations between emission species of E_EC, E_ORG, E_NO3, E_SO4 and 315

    E_PM25. The emission species data are derived from the NEI’05 emissions set for the innermost 316

    domain of the WRF/Chem model 317

    318

    4 Balance constraints and BEC statistics 319

    E_EC E_ORG E_NO3 E_SO4 E_PM25

    E_EC

    E_ORG

    E_NO3

    E_SO4

    E_PM25

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

  • With the configuration of the WRF/Chem model described in Section 3.1, forecasts for one 320

    month (from 00UTC of May 15 to 00UTC of June 14, 2010) were performed for the balance 321

    constraints and the BEC statistics. Forecast differences between 24 h forecasts and 48 h forecasts 322

    are available at 00UTC. Thirty forecast differences are employed as inputs in the NMC method. 323

    For this method, 30 forecast differences are sufficient; however, a longer time series may be more 324

    beneficial for the BEC statistics (Parrish and Derber, 1992). 325

    4.1 Balance regression statistics 326

    Using the 30 forecast differences between 24 h and 48 h forecasts, we can obtain , , 327

    and . The size of these variables is , where is the number of 328

    model grid points. We put these data into Eqs. (6-13) to calculate the regression coefficients of 329

    and the unbalanced parts of the variables. Note the process of calculation should be step by step, 330

    since the latter equation will use the unbalanced parts of former equations. Table 1 shows the 331

    regression coefficients whose column and row are consistent with in Eq. (7). The last column 332

    in Tab. 1 is the coefficient of determination ( ) of the regression equations. For the regression 333

    equation of OC, the regression coefficient is 0.90 and the coefficient of determination of Eq. (7) is 334

    0.86, which indicates that EC and OC are highly correlated and their mass concentration scales are 335

    approximate. Their correlation is similar to the correlation of the stream function and velocity 336

    potential; thus, we set them as the first and second variables in the regression statistics. For the 337

    regression equation of , the regression coefficients of EC and are 4.01 and 3.76, 338

    respectively, because the mass concentration scale of exceeds the mass concentration scales 339

    of EC and . The coefficient of determination is only 0.32, which indicates that the 340

    correlations between and EC and between and are weak. This result reveals that 341

    the forecast errors of differ from the forecast errors of EC and . A possible reason is 342

    that is the secondary particle that is primarily derived from the transformation of , but 343

    EC and are derived from direct emissions. Similar to , is also primarily derived 344

    from the transformation of and the coefficient of determination for is also low. For the 345

    last variable OTR, the maximum coefficient of determination is 0.96 because OTR includes some 346

    different compositions that are correlated with the first four variables. 347

  • Table 1 Regression coefficients of balance operator and the coefficient of determination 348

    (regression coefficients correspond to in Eq. (7)) 349

    species regression coefficient ( ) coefficient of determination ( )

    EC 1 /

    OC 0.90 1 0.86

    NO3 4.01 3.76 1 0.32

    SO4 1.35 -0.21 -3.15 1 0.48

    OTR 2.93 2.35 0.28 0.60 1 0.96

    350

    Figure 3 shows the cross-correlations of the five full variables and the unbalanced variables. In 351

    Fig. 3a, the cross-correlations of the full variables exceed 0.3 and most of them exceed 0.5. In Fig. 352

    3b, however, the cross-correlations of the unbalanced variables are less than 0.2. Some of the 353

    cross-correlations are close to zero, which indicates that these unbalanced variables are 354

    approximatively independent and can be employed as control variables in the data assimilation 355

    system. 356

    (a) full variables (b) unbalanced variables

    Figure 3. Cross-correlations between the five variables of the BEC. These variables are (a) full 357

    variables and (b) unbalanced variables of EC, OC, , and OTR. 358

    359

    4.2 BEC statistics 360

    Using the original full variables and the unbalanced variables obtained by the regression 361

    equations, the BEC statistics are obtained. Figure 4 shows the vertical profiles of the standard 362

  • deviations of the original and the unbalanced . In Fig. 4a, the original standard deviation of 363

    is the largest value, whereas the smallest value is OC, whose profile is close to the profile of 364

    EC. All profiles show a significant decrease at approximately 800 m because the aerosol 365

    particulates are usually limited under the boundary level. In Fig. 4b, all standard deviations 366

    decrease in different degree, with the exception of EC, which remains as the control variable in the 367

    unbalanced BEC statistics. Note that the standard deviation of decreases by approximately 368

    80% compared with , which decreases by approximately 10%. This result is attributed to the 369

    small coefficient of determination for the regression of (in Tab. 1), which indicates that a 370

    small portion of can be predicted by the regression and a large portion is an unbalanced 371

    component. In contrast with , a small portion of OTR is the unbalanced component. 372

    (a) full variables (b) unbalanced variables

    Figure 4. Vertical profiles of the standard deviation of the variables. (a) full variables and (b) 373

    unbalanced variables 374

    375

    For the correlation matrix of and , they are factorized as three independent 376

    one-dimensional correlation matrices in Eq. (24). The horizontal correlation or is 377

    approximately expressed by a Gaussian function. The correlation between two points and 378

    can be written as

    , where is the horizontal correlation scale and is a constant value 379

    for and , which are considered to be isotropic (Li et al., 2013). This scale can be estimated 380

    by the curve of the horizontal correlations with distances. Figure 5 shows the curves of the 381

  • horizontal correlations for the five control variables. For the full variables (Fig. 5a), the sharpest 382

    decrease in the curves is observed for and the slowest decrease in the curves is observed 383

    for . We assume that the decline curve is according to the Gaussian function. Then the 384

    intersection of the decline curve and the line of

    can be approximately as the value 385

    of horizontal correlation scale. The horizontal correlation scales of EC, OC, , and OTR 386

    are 25 km, 27 km, 20 km, 30 km and 28 km, respectively. For the unbalanced variables (Fig. 5b), 387

    their curves are closer than the curves of the full variables. The correlation scales of EC, , 388

    , and are 25 km, 23 km, 24 km, 28 km and 25 km, respectively. These results 389

    suggest that the unbalanced variables are expressed by some common factors such as EC, 390

    and , in the regression equations of Eqs. (10-13), which produces consistent horizontal 391

    correlation scales. 392

    (a) full variables (b) unbalanced variables

    Figure 5. Same as Figure 4, with the exception of the horizontal auto-correlation curves of the 393

    variables. The horizontal thin line is the reference line of

    for determining the 394

    horizontal correlation scales. 395

    396

    For the vertical correlation between and , they are directly estimated using the 397

    forecasting differences in the data assimilation system, but not estimated from a approximately 398

    alternative function. Because it is only an matrix. Figure 6 shows the vertical correlation 399

    matrices and for the full variables (left column) and the unbalanced variables (right 400

    column), respectively. A common feature of both the full variables and the unbalanced variables is 401

  • the significant correlation between the levels of the boundary layer height, which is consistent 402

    with the profile of the standard deviation in Fig. 4. Some weak adjustments to the correlations 403

    between the full and unbalanced variables are made. For example, the correlation of is 404

    stronger than the correlation of between the boundary layers. Similar with the analysis of 405

    horizontal correlation scale, the vertical correlation scale of is larger than the vertical 406

    correlation scale of . Conversely, the vertical correlation scale of is smaller than the 407

    vertical correlation scale of OTR. These results demonstrate that the vertical correlations for the 408

    unbalanced variables are more consistent than the vertical correlations of the full variables, which 409

    is similar to the adjustments to the horizontal correlation scale. Note that the differences of vertical 410

    correlation are slight, compared with the difference of horizontal. The main reason is that the 411

    vertical correlations are generally affected by the atmospheric boundary layer height. Thus, all 412

    vertical correlation decreases rapidly for the levels above the boundary layer height. 413

    Height (m)

    Heig

    ht (m

    )

    OC

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1

    Height (m)

    Heig

    ht (m

    )

    OC

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1u

    Height (m)

    )m(

    th

    gie

    H

    NO3

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1

    Height (m)

    )m(

    th

    gie

    H

    NO3

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1u

  • Figure 6. Vertical correlations of the five variables of the BEC. The left column represents the full 414

    variables, and the right column represents the unbalanced variables. 415

    416

    5. Application to data assimilation and prediction 417

    To exhibit the effect of the balance constraint of the BEC, the data assimilation experiments 418

    and 24-h forecasts for nine cases are run using WRF/Chem model. The surface PM2.5 and 419

    aircraft-speciated observations are assimilated using different BEC, and the evaluations are 420

    presented for the data assimilation and subsequent forecasts. Three basic statistical measures 421

    including mean bias (BIAS), root mean square error (RMSE) and correlation coefficient (CORR) 422

    are utilized for the evaluations. 423

    5.1 Observation data and experiment scheme 424

    Two types of observation data are employed in our experiments. The first type of observation 425

    data consists of hourly surface PM2.5 concentrations from the California Air Resources Board 426

    (ARB). There are 42 surface PM2.5 monitoring sites existed in the innermost domain of the 427

    Height (m)

    )m(

    th

    gie

    H

    SO4

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1

    Height (m)

    )m(

    th

    gie

    H

    SO4

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1u

    Height (m)

    Heig

    ht (m

    )

    OTR

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1

    Height (m)

    Heig

    ht (m

    )

    OTR

    50 100 200 500 1000 2000 500010000

    50

    100

    200

    500

    1000

    2000

    5000

    10000

    0

    0.2

    0.4

    0.6

    0.8

    1u

  • WRF/Chem model (Fig. 7). The second type of observation data is the speciated concentration 428

    along the aircraft flight track. The aircraft observations were investigated from the California 429

    Research at the Nexus of Air Quality and Climate Change (CalNex) field campaign in 2010. Nine 430

    flights data around Los Angeles from 15 May to14 June, 2010 are selected as the cases of data 431

    assimilation. Table 2 shows the start time and end time of each flight. The species of the aircraft 432

    observations include OC, NO3, SO4 and NH4. Note that NH4 is not a control variable; thus, the 433

    aircraft observation of NH4 is disregarded in the data assimilation. Because the particle size of the 434

    aircraft observations is less than 1.0 μm, some adjustments to the flight observations are made 435

    according to the ratios between the concentration under 2.5 μm and the concentration under 1.0 436

    μm for each species using model products. With the ratios multiplied by the aircraft observed 437

    concentrations, the speciated concentrations under 2.5 μm can be obtained. 438

    439

    Table 2 The periods of flight during CalNex 2010 and the initial time of assimilation 440

    Number of

    cases

    Start time of flight End time of flight Initial time of assimilation

    1 18:00 UTC, May 16 01:42 UTC, May 17 00:00 UTC, May 17

    2 17:28 UTC, May 19 00:10 UTC, May 20 18:00 UTC, May 19

    3 17:28 UTC, May 21 00:10 UTC, May 21 18:00 UTC, May 21

    4 23:08 UTC, May 24 05:23 UTC, May 25 00:00 UTC, May 25

    5 01:59 UTC, May 30 07:45 UTC, May 30 06:00 UTC, May 30

    6 05:00 UTC, May 31 10:54 UTC, May 31 06:00 UTC, May 31

    7 07:59 UTC, June 2 14:09 UTC, June 2 12:00 UTC, June 2

    8 07:59 UTC, June 3 14:041 UTC, June 3 12:00 UTC, June 3

    9 17:56 UTC, June 14 23:35 UTC, June 14 18:00 UTC, June 14

    441

  • 442

    Figure 7. The topography of the innermost domain and the locations of surface monitoring stations 443

    (black dots). The red square is the location of Los Angeles 444

    445

    446

    (a) 00:00 UTC±1.5 h, May 17 (b) 18:00 UTC±1.5 h, May 19 447

    448

    (c) 18:00 UTC±1.5 h, May 21 (d) 00:00 UTC±1.5 h, May 25 449

    450

    (e) 06:00 UTC±1.5 h, May 30 (f) 06:00 UTC±1.5 h, May 31 451

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

  • 452

    (g) 12:00 UTC±1.5 h, Jun 02 (h) 12:00 UTC±1.5 h, Jun 03 453

    454

    (i) 18:00 UTC±1.5 h, Jun 14 455

    Figure 8. Aircraft flight tracks during the time window of data assimilation for nine cases. The 456

    color of the track indicates the aircraft height. 457

    458 The initial time of data assimilation cases are designed according to the period of flights, 459

    showed in Table 2. The time window of assimilation for the flight data is ±1.5h, though some 460

    flight times do not completely cover the time windows. Figure 8 shows the aircraft tracks during 461

    the time window of data assimilation. It is obvious that the aircarft data on May 21, May 25 and 462

    June 14 are relative few as the tracks are almost outside of the study domain. For the surface data, 463

    it is only the observations at the initial time are assimilated. For each case, three parallel 464

    experiments are performed. The first experiment is the control experiment without aerosol data 465

    assimilation, which is frequently known as a free run and denoted as Control. The second 466

    experiment is a data assimilation experiment that assimilates surface PM2.5 and aircraft 467

    observations using the full variables without balance constraints; it is denoted as DA-full. The 468

    third experiment is also a data assimilation experiment that also assimilates surface PM2.5 and 469

    aircraft observations, but employs the unbalanced variables as control variables conducted by the 470

    balanced constraint; it is denoted as DA-balance. The backgrounds for DA-full and DA-balance 471

    are the forecasting results from the previous runs without DA. These previous forecasting results 472

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

    36N

    35N

    34N

    33N

    121W 120W 119W 118W 117W 116W 115W

    (m)

  • have been obtained when we run the model for the BEC statistics. The observation error is the half 473

    of standard deviation of the original background variable, and a vertical profile of observation 474

    errors is applied with the average profile of standard deviation of the background variable. In each 475

    experiment, a 24-h forecasting is run using the WRF/Chem model with the same configuration 476

    described in Section 3.1, and the case on June 3, 2010 is presented in detail as an example. 477

    5.2 Increments of data assimilation 478

    Figure 9 shows the horizontal increments of EC, OC, NO3, SO4 and OTR at the first model 479

    level for the DA-full (left column) and DA-balance experiments (right column) of the case on 480

    June 3, 2010. In the DA-full experiment, the increment of EC and OTR (Fig. 9a and 9i) are similar. 481

    They are obtained from the surface PM2.5 observations because no direct aircraft observations 482

    correspond to these two variables. In the DA-balance experiment, significant adjustments are 483

    made to the increments of EC (Fig. 9b) under the action of the balance constraints. The 484

    observations of OC affect greatly the increments of EC for thee high cross-correlation between EC 485

    and OC. Thus the increments of EC are similar with the increments of OC. Similarly, significant 486

    adjustments are made to the increment of OTR (Fig. 9j), though there are not the species 487

    observation of OTR. There are also some slight adjustments for the increments of OC, NO3 and 488

    SO4 for the crossing spread among species. 489

    Figure 10 shows the vertical increments along 35.0 N for the DA-full and DA-balance 490

    experiments. Similar to Fig. 9, the increments of EC and OTR (Fig. 10a and 10i) spread upward 491

    from the surface in the DA-full experiment, which are obtained from the surface PM2.5 492

    observation. In the DA-balance, the increments of EC and OTR (Fig. 10b and 10j) exhibit 493

    observation information from the aircraft height at approximately 500 m, and the value of the 494

    increments show significant increases. The distributions of the increments for these five variables 495

    in the DA-balance (Fig. 10, right column) generally tend to coincide compared with the 496

    distributions of the increments in the DA-full (Fig. 10, left column). The results of the DA-balance 497

    are reasonable due to the influence of each other across the balance constraints. 498

  • (a) EC in the DA-full (b) EC in the DA-balance

    (c) OC in the DA-full (d) OC in the DA-balance

    (e) NO3 in the DA-full (f) NO3 in the DA-balance

    (g) SO4 in the DA-full (h) SO4 in the DA-balance

  • (i) OTR in the DA-full (j) OTR in the DA-balance

    Figure 9. Surface distributions of increments of the five variables of EC, OC, NO3, SO4 and OTR 499

    at 12:00 UTC on June 3, 2010. The left column and right column are from DA-full and 500

    DA-balance, respectively. 501

    502

    (a) EC in the DA-full (b) EC in the DA-balance

    (c) OC in the DA-full (d) OC in the DA-balance

    (e) NO3 in the DA-full (f) NO3 in the DA-balance

  • (g) SO4 in the DA-full (h) SO4 in the DA-balance

    (i) OTR in the DA-full (j) OTR in the DA-balance

    Figure 10. Same as Figure 9, with the exception of the vertical sections along 35 N. 503

    504 5.3 Evaluation of data assimilation and forecasts 505

    Figure 11 shows the scatter plots of the initial model fields versus the surface observation for all 506

    nine cases. In Fig. 11a, the simulated concentrations of the Control experiment display a 507

    significant underestimation with a BIAS of -3.66µg/m3. The mean concentration of Control is 508

    10.90 µg/m3, about 25.1% lower than observed mean concentrations (14.56 µg/m

    3). In the DA-full 509

    and DA-balance experiments, there are remarkable increases for the simulated concentrations, and 510

    the BIASs reduce to as small as -1.21 and -0.94 µg/m3. The RMSE is 9.53 µg/m

    3 in the Control 511

    experiment. The RMSE reduces to 4.82 and 4.48 µg/m3 in the DA-full and DA-balance 512

    experiment, respectively. There are also significant improvements for the CORR in the DA-full 513

    and DA-balance experiments, compared with the Control experiment. Furthermore, these three 514

    statistical measures of the DA-balance experiments show some slight improvement, compared 515

    with that of the DA-full experiments. The result demonstrates that more observation information 516

    spread by balance constraints can improve assimilation performance. 517

  • 518

    (a) Control 519

    520

    (b) DA-full 521

    522

    (c) DA-balance 523

    Figure 11. Scatter plots of observed concentrations of PM2.5 versus simulated PM2.5 concentrations 524

    of the experiments of (a) Control, (b) DA-full, and (c) DA-balance for all nine cases. 525

    526

    To evaluate the effects of the data assimilation, the CORR, RMSE and BIAS during the forecast 527

    time are calculated for each case, and their averaged results are showed in Figure 12. The CORRs 528

    of the DA-balance and DA-full experiments are very close (Fig. 12a). But, the difference increase 529

    0 20 40 600

    10

    20

    30

    40

    50

    60

    RMSE=9.53

    CORR=0.38

    BIAS=-3.66

    DA

    -contr

    ol

    Observation

    0 20 40 600

    10

    20

    30

    40

    50

    60

    RMSE=4.82

    CORR=0.85

    BIAS=-1.21

    DA

    -full

    Observation

    0 20 40 600

    10

    20

    30

    40

    50

    60

    RMSE=4.48

    CORR=0.87

    BIAS=-0.94

    DA

    -bala

    nce

    Observation

  • after the first hour with a higher CORR in the DA-balance experiment. The CORR of the 530

    DA-balance experiment is substantially higher than that of the DA-full experiment from the 2nd

    531

    hour to the 16th

    hour. Similar improvements for the RMSE and the BIAS of the DA-balance 532

    experiment are observed in Fig. 12b and Fig. 12c, respectively. The improvement for the BIAS in 533

    the DA-balance experiment is the most significant among these three statistical measures. The 534

    peak value of the improvement for the BIAS (Fig 12c) is at the 4th

    hour, and the improvement is 535

    distinct until the end of forecasts. These improvements indicate that the balance constraint is 536

    positive for the subsequent forecasts, which derives from the balanced initial distribution among 537

    species. 538

    (a) CORR

    (b) RMSE

    0 4 8 12 16 20 24

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Forecast duration (h)

    CO

    RR

    Control

    DA-full

    DA-balance

    0 4 8 12 16 20 244

    5

    6

    7

    8

    9

    10

    11

    Forecast duration (h)

    RM

    SE

    Control

    DA-full

    DA-balance

  • (c) BIAS

    Figure 12. The averaged (a) Correlations, (b) root-mean-square errors (RMSE in µg/m3) and (c) 539

    mean bias (BIAS in µg/m3) of the PM2.5 concentration forecasts against observations as a function 540

    of forecast duration. 541

    542

    6. Summary and discussion 543

    We examined the BEC in a 3DVAR system, which uses five control variables (EC, OC, NO3, 544

    SO4 and OTR) that are derived from the MOSAIC aerosol scheme in the WRF/Chem model. 545

    Based on the NMC method, differences within a month-long period between 24- and 48-h 546

    forecasts that are valid at the same time were employed in the estimation and analyses of the BEC. 547

    The background errors of these five control variables are highly correlated. Especially between EC 548

    and OC, their correlation is as large as 0.9. 549

    A set of balance constraints was developed using a regression technique and incorporated in 550

    the BEC to account for the large cross correlations. We employ the the balance constraint to 551

    seperate the original full variables into balanced and unbalanced parts. The regression technique is 552

    used to express the balanced parts by the unbalanced parts. These unbalanced parts can be 553

    assumed independent. Then, the unbalanced parts are employed as control variables in the BEC 554

    statics. Accordingly, the standard deviations of these unbalanced variables are less than the 555

    standard deviations of the original variables. The horizontal correlation scales of unbalanced 556

    variables are closer than that of full variables on the effect of the balance constraints. And the 557

    vertical correlations of unbalanced variables show similar trend. 558

    To evaluate the impact of the balance constraints on the analyses and forecasts, three groups of 559

    experiments, including a control experiment without data assimilation and two data assimilation 560

    0 4 8 12 16 20 24-5

    -4

    -3

    -2

    -1

    0

    Forecast duration (h)

    BIA

    S

    Control

    DA-full

    DA-balance

  • experiments with and without balance constraints (DA-full and DA-balance), were performed. In 561

    the data assimilation experiments, the observations of surface PM2.5 concentration and 562

    aircraft-speciated concentration of OC, NO3 and SO4 were assimilated. The observations of these 563

    three variables can spread to the two remaining variables in the increments of the DA-balance, 564

    which results in a more complex distribution. The evaluations of CORR, RMSE and BIAS for the 565

    initial analysis fields show more improvement in the DA-balance experiments, compared with the 566

    DA-full experiments. Though, these improvement are some slight. An important reason is that the 567

    surface PM2.5 observations are independent from the aircraft observations. If we evaluate the 568

    analysis fields by the species observation of aircraft, there may be more significant improvements 569

    in the DA-balance experiments. 570

    While the improvements increase after the first forecasting hour in the DA-balance 571

    experiments, compared with forecasts of the DA-full experiments. The improvements persist to 572

    the end of forecasts, and are substantial from the 2nd

    hour to the 16th

    hour (Fig. 12). These results 573

    suggested that the balance constraints can serve an import role for continually improving the skill 574

    of sequent forecasts. Note that some aircraft data are relative few, and some flight tracks are not 575

    around Los Angeles in some cases (Fig. 8). If there are more aircraft observations, the 576

    improvements of the DA-balance experiments should be more significant and durable. 577

    The developed method for incorporating balance constraints in aerosol data assimilation can 578

    be employed in other areas or other applications for different aerosol models. For the aerosol 579

    variables in different models, some cross-correlations between different species or size bins 580

    should exist because their common emissions and diffusion processes are controlled by the same 581

    atmospheric circulation. Although these cross-correlations may be stronger than the 582

    cross-correlations of atmospheric or oceanic model variables, theoretic balance constraints, such 583

    as geostrophic balance or temperature-salinity balance, do not exist. We expected to discover a 584

    universal balance constraint that can describe the physical or chemical balanced relationship of 585

    aerosol variables, and utilize it in the data assimilation system. In addition, we expected to expand 586

    the balance constraint to include gaseous pollutants, such as nitrite (NO2), sulfur dioxide (SO2), 587

    and (carbon monoxide) CO. These gaseous pollutants are correlated with some aerosol species, 588

    such as NO3, SO4 and EC, which can improve the data assimilation analysis fields of aerosols by 589

    http://dict.cn/sulfur%20dioxide

  • assimilating these gaseous observations. The assimilation of aerosol observations may improve the 590

    analysis fields of gaseous pollutants. 591

    592

    Code availablity 593

    This data assimilation system is established by ourself. The code of this system can be obtained on 594

    request from the first author ([email protected]). 595

    596

    Acknowledgements 597

    This research was supported by the National Natural Science Foundation of China (41275128). 598

    We gratefully thank the California Air Resources Board (http://www.arb.ca.gov/homepage.htm) 599

    and NOAA Earth System Research Laboratory Chemical Sciences Division 600

    (http://esrl.noaa.gov/csd/groups/csd7/measurements/2010calnex/), for providing the download of 601

    surface and aircraft aerosol observations. 602

    603

    References 604

    Bannister, R.N., 2008a. A review of forecast error covariance statistics in atmospheric variational 605

    data assimilation. I: Characteristics and measurements of forecast error covariances. Quart. J. Roy. 606

    Meteor. Soc., 134, 1951–1970. 607

    Bannister, R.N., 2008b. A review of forecast error covariance statistics in atmospheric variational 608

    data assimilation. II: Modelling the forecast error covariance statistics. Quart. J. Roy. Meteor. Soc., 609

    134, 1971–1996. 610

    Barker, D.M., Huang, W., Guo, Y.R, Xiao, Q.N., 2004. A Three-Dimensional (3DVAR) data 611

    assimilation system for use with MM5: implementation and initial results. Mon. Weather Rev. 132, 612

    897–914. 613

    Benedetti, A., Fisher, M., 2007. Background error statistics for aerosols. Quart. J. Roy. Meteor. 614

    Soc., 133, 391–405. 615

    Chen, Y., Rizvi, S., Huang, X., Min, J., Zhang, X., 2013. Balance characteristics of multivariate 616

    background error covariances and their impact on analyses and forecasts in tropical and Arctic 617

    regions. Meteorol. Atmos. Phys., 121, 79–98. 618

    Cohn, S.E., 1997: Estimation theory for data assimilation problems: Basic conceptual framework 619

    mailto:[email protected]://www.arb.ca.gov/homepage.htmhttp://esrl.noaa.gov/csd/groups/csd7/measurements/2010calnex/

  • and some open questions. J. Meteorol. Soc. Jpn., 75, 257–288. 620

    Derber, J., Bouttier, F., 1999. A reformulation of the background error covariance in the ECMWF 621

    global data assimilation system. Tellus, 51, 195–221. 622

    Geller, M. D., Fine, P. M., and Sioutas, C., 2004. The relationship between real-time and 623

    time-integrated coarse (2.5–10m), intermodal (1–2.5m), and fine (

  • Pagowski, M., Grell, G. A., 2012. Experiments with the assimilation of fine aerosols using an 650

    ensemble Kalman filter, J. Geophys. Res., 117, D21302, doi:10.1029/2012JD018333. 651

    Pagowski, M., Grell, G.A., McKeen, S.A., Peckham, S.E., Devenyi, D., 2010. Three-dimensional 652

    variational data assimilation of ozone and fine particulate matter observations: Some results using 653

    the Weather Research and Forecasting–Chemistry model and Grid-point Statistical Interpolation. 654

    Q. J. Roy. Meteorol. Soc., 136, 2013–2024, doi:10.1002/qj.700. 655

    Parrish, D.F., Derber, J.C., 1992. The national meteorological center spectral statistical 656

    interpolation analysis. Mon. Weather Rev., 120, 1747–1763. 657

    Ricci, S., Weaver, A.T., 2005. Incorporating State-Dependent Temperature–Salinity Constraints in 658

    the Background Error Covariance of Variational Ocean Data Assimilation. Mon. Weather Rev., 659

    133, 317–338. 660

    Saide, P.E., Carmichael, G.R., Spak, S.N., Minnis, P., Ayers, J.K., 2012. Improving aerosol 661

    distributions below clouds by assimilating satellite-retrieved cloud droplet number. P. Natl. Acad. 662

    Sci. USA, 109, 11939–11943, doi:10.1073/pnas.1205877109. 663

    Saide, P.E., Carmichael, G.R., Liu, Z., Schwartz, C.S., Lin, H.C., Da Silva, A.M., Hyer, E., 2013. 664

    Aerosol optical depth assimilation for a size-resolved sectional model: impacts of observationally 665

    constrained, multi-wavelength and fine mode retrievals on regional scale forecasts. Atmos. Chem. 666

    Phys., 13, 10425–10444, doi:10.5194/acp-13-10425-2013. 667

    Salako, G.O., Hopke, P.K., Cohen, D.D., Begum, B.A., Biswas, S.K., Pandit, G.G., Chung, Y.S., 668

    Rahman, S.A., Hamzah, M.S., Davy, P., Markwitz, A., Shagjjamba, D., Lodoysamba, S., 669

    Wimolwattanapun, W., Bunprapob, S., 2012. Exploring the Variation between EC and BC in a 670

    Variety of Locations. Aerosol Air Qual. Res, 12, 1–7. 671

    Wu, W.S., Purser, R.J., Parrish, D.F., 2002. Three-dimensional variational analysis with spatially 672

    inhomogeneous covariances. Mon. Wea. Rev., 130, 2905-2916 673

    Schwartz, C.S., Liu, Z., Lin, H.-C., McKeen, S.A., 2012. Simultaneous three-dimensional 674

    variational assimilation of surface fine particulate matter and MODIS aerosol optical depth. J. 675

    Geophys. Res., 117, D13202, doi:10.1029/2011JD017383. 676

    Schwartz, C. S., Liu, Z., Lin, H.-C., Cetola, J. D., 2014, Assimilating aerosol observations with a 677

    “hybrid” variational-ensemble data assimilation system, J. Geophys. Res. Atmos., 119, 4043–4069, 678

    doi:10.1002/ 2013JD020937. 679

  • Sun C.-H.., Lin Y.-C., Wang C.-S., 2003. 7. Relationships among Particle Fractions of Urban and 680

    Non-urban Aerosols. Aerosol and Air Quality Research, 3(1), 7-15. 681

    Zaveri, R.A., Easter, R.C., Fast, J.D., Peters. L K., 2008. Model for Simulating Aerosol 682

    Interactions and Chemistry(MOSAIC). J. Geophys. Res., 113, D13204, 683

    doi:10.1029/2007JD008782. 684

    685

    686

    687

    688

    689

    690

    691

    692

    693

    694

    695

    696

    697

    698

    699

    700

    701

    702

    703

    704

    705

    706

    707

    708

    709

  • 710

    Table 1 Regression coefficients of balance operator and the coefficient of determination 711

    (regression coefficients correspond to in Eq. (7)) 712

    species regression coefficient ( )

    coefficient of

    determination

    ( )

    EC 1 /

    OC 0.90 1 0.86

    NO3 4.01 3.76 1 0.32

    SO4 1.35 -0.21 -3.15 1 0.48

    OTR 2.93 2.35 0.28 0.60 1 0.96

    713

    714

    715

    716

    Table 2 The periods of flight during CalNex 2010 and the initial time of assimilation 717

    Number of

    cases

    Start time of flight End time of flight Initial time of assimilation

    1 18:00 UTC, May 16 01:42 UTC, May 17 00:00 UTC, May 17

    2 17:28 UTC, May 19 00:10 UTC, May 20 18:00 UTC, May 19

    3 17:28 UTC, May 21 00:10 UTC, May 21 18:00 UTC, May 21

    4 23:08 UTC, May 24 05:23 UTC, May 25 00:00 UTC, May 25

    5 01:59 UTC, May 30 07:45 UTC, May 30 06:00 UTC, May 30

    6 05:00 UTC, May 31 10:54 UTC, May 31 06:00 UTC, May 31

    7 07:59 UTC, June 2 14:09 UTC, June 2 12:00 UTC, June 2

    8 07:59 UTC, June 3 14:041 UTC, June 3 12:00 UTC, June 3

    9 17:56 UTC, June 14 23:35 UTC, June 14 18:00 UTC, June 14

    718

    719

    720

    721

    722

  • Figure 1 Geographical display of the three-nested model domains. The innermost domain covers 723

    the Los Angeles basin; the black point denotes the location of Los Angeles. 724

    725

    Figure 2 Cross-correlations between emission species of E_EC, E_ORG, E_NO3, E_SO4 and 726

    E_PM25. The emission species data are derived from the NEI’05 emissions set for the innermost 727

    domain of the WRF/Chem model 728

    729

    Figure 3 Cross-correlations between the five variables of the BEC. These variables are (a) full 730

    variables and (b) unbalanced variables of EC, OC, , and OTR. 731

    732

    Figure 4 Vertical profiles of the standard deviation of the variables. (a) full variables and (b) 733

    unbalanced variables 734

    735

    Figure 5 Same as Figure 4, with the exception of the horizontal auto-correlation curves of the 736

    variables. The horizontal thin line is the reference line of

    for determining the 737

    horizontal correlation scales. 738

    739

    740

    Figure 6 Vertical correlations of the five variables of the BEC. The left column represents the full 741

    variables, and the right column represents the unbalanced variables. 742

    743

    Figure 7 The topography of the innermost domain and the locations of surface monitoring stations 744

    (black dots). The red square is the location of Los Angeles 745

    746

    Figure 8 Aircraft flight tracks during the time window of data assimilation for nine cases. The 747

    color of the track indicates the aircraft height. 748

    749

    Figure 9 Surface distributions of increments of the five variables of EC, OC, NO3, SO4 and OTR 750

    at 12:00 UTC on June 3, 2010. The left column and right column are from DA-full and 751

    DA-balance, respectively. 752

    753

    Figure 10 Same as Figure 9, with the exception of the vertical sections along 35 N. 754

  • 755

    Figure 11 Scatter plots of observed concentrations of PM2.5 versus simulated PM2.5 concentrations 756

    of the experiments of (a) Control, (b) DA-full, and (c) DA-balance for all nine cases. 757

    758

    Figure 12 The averaged (a) Correlations, (b) root-mean-square errors (RMSE in µg/m3) and (c) 759

    mean bias (BIAS in µg/m3) of the PM2.5 concentration forecasts against observations as a function 760

    of forecast duration. 761

    762