Data-driven nonlinear optimisation of a simple air ...cee.faculty-ms.technion.ac.il/wp-content/.../05/370...Department of Civil and Environmental Engineering, Technion, Israel Institute

at SciVerse ScienceDirect

Atmospheric Environment 79 (2013) 261e270

Contents lists available

Atmospheric Environment

journal homepage: www.elsevier .com/locate/atmosenv

Data-driven nonlinear optimisation of a simple air pollutiondispersion model generating high resolution spatiotemporal exposure

Yuval*, Shlomo Bekhor, David M. BrodayDepartment of Civil and Environmental Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel

h i g h l i g h t s

� Fitting a simple dispersion model to high resolution air pollution monitoring data.� Requires nonlinear regression to obtain the optimal model.� The results are fully cross-validated.� Providing pollution exposure estimates at a high resolution in both space and time.

a r t i c l e i n f o

Article history:Received 3 March 2013Received in revised form30 May 2013Accepted 3 June 2013

Keywords:Air pollution monitoringExposureNonlinear optimisationTraffic model output

* Corresponding author. Tel.: þ972 4 8292676.E-mail address: [email protected] ( Yuval).

1352-2310/$ e see front matter � 2013 Elsevier Ltd.http://dx.doi.org/10.1016/j.atmosenv.2013.06.005

a b s t r a c t

Spatially detailed estimation of exposure to air pollutants in the urban environment is needed for manyair pollution epidemiological studies. To benefit studies of acute effects of air pollution such exposuremaps are required at high temporal resolution. This study introduces nonlinear optimisation frameworkthat produces high resolution spatiotemporal exposure maps. An extensive traffic model output, servingas proxy for traffic emissions, is fitted via a nonlinear model embodying basic dispersion properties, tohigh temporal resolution routine observations of traffic-related air pollutant. An optimisation problem isformulated and solved at each time point to recover the unknown model parameters. These parametersare then used to produce a detailed concentration map of the pollutant for the whole area covered by thetraffic model. Repeating the process for multiple time points results in the spatiotemporal concentrationfield. The exposure at any location and for any span of time can then be computed by temporal inte-gration of the concentration time series at selected receptor locations for the durations of desired pe-riods. The methodology is demonstrated for NO2 exposure using the output of a traffic model for thegreater Tel Aviv area, Israel, and the half-hourly monitoring and meteorological data from the local airquality network. A leave-one-out cross-validation resulted in simulated half-hourly concentrations thatare almost unbiased compared to the observations, with a mean error (ME) of 5.2 ppb, normalised meanerror (NME) of 32%, 78% of the simulated values are within a factor of two (FAC2) of the observations, andthe coefficient of determination (R2) is 0.6. The whole study period integrated exposure estimations arealso unbiased compared with their corresponding observations, with ME of 2.5 ppb, NME of 18%, FAC2 of100% and R2 that equals 0.62.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Spatially accurate estimation of exposure to air pollutants in theurban environment is needed for most air pollution epidemiolog-ical studies (Jerret et al., 2005; Han and Naeher, 2006). Air pollutionis emitted at many source locations, is dispersed and transformedby complex physical and chemical processes but is observed only at

All rights reserved.

a limited sample of the locations where the population is exposed.As it is not feasible to distribute personal exposure monitors to alarge number of people for long periods of time, epidemiologicalstudies resort to various techniques to model the pollutants’ con-centrations. The large spatial variability in urban traffic-relatedpollutant levels requires estimation methods that can producesmall scale exposure variations (Jerret et al., 2005). To benefitstudies of acute effects of air pollution at the personal level (e.g.,Maynard et al., 2007; Van den Hooven et al., 2012), the exposurefeatures must be provided at high temporal resolution. The idealmodel for estimating the full spatiotemporal concentration field of

mailto:[email protected]

http://crossmark.dyndns.org/dialog/?doi=10.1016/j.atmosenv.2013.06.005&domain=pdf

www.sciencedirect.com/science/journal/13522310

www.elsevier.com/locate/atmosenv

http://dx.doi.org/10.1016/j.atmosenv.2013.06.005



Yuval et al. / Atmospheric Environment 79 (2013) 261e270262

an air pollutant should attempt to simulate the emission, disper-sion, transformation and removal processes and, due to theinherent difficulties involved and the inevitable errors, poses abuilt-in data-driven mechanism to adjust the model parametersbased on the available observations. To the best of our knowledgenone of the current schemes used for air pollution exposure esti-mation employs a data-driven model that incorporates a disper-sion-based simulation of air pollutants.

Gaussian dispersion models (Cyrus et al., 2005; Beelen et al.,2010; Gulliver and Briggs, 2011) and the more comprehensivemeteorologicalephotochemical ones (Astitha et al., 2008; Borregoet al., 2010) are designed to simulate air pollution concentrationsgiven emission and meteorological inputs. However, unlike thecurrent meteorological models, they do not adjust the model pa-rameters in a closed loop controlled by available observations. Forexample, a recent paper by Beevers et al. (2012) demonstrated animpressive performance of a model coupling the photochemicalCMAQ model (Byun and Ching, 1999) with the Gaussian ADMSmodel (McHugh et al., 1997) in estimating hourly NOx, NO2 and O3concentrations in London at a 20 � 20 m spatial grid resolution.Data from 80 monitoring stations were used for a comprehensiveassessment of the model output. However, there was no attemptto incorporate any type of adjustment of the models using thesimultaneous observations to lower the model errors and biaseswhich were found. Various interpolation schemes (Son et al., 2010;Lee et al., 2012) take the opposite approach. The spatial airpollution exposure maps which they produce are based onobserved values and statistical principles only. The sparse distri-bution of monitoring stations limits the ability of observed datainterpolations to provide the detailed spatial exposure required formany epidemiological investigations (Yuval and Broday, 2006). Inthe last decade the modelling approach of Land Use Regression(LUR) has been extensively used in air pollution epidemiologicalstudies (Ryan and LeMasters, 2007; Hoek et al., 2008). An LUR fitsa suit of covariates (mainly metrics of urbanisation, topographyand traffic) to observed concentrations and the recovered regres-sion parameters are then used to produce detailed spatial expo-sure estimation. The LUR scheme is a data-driven method but itdoes not attempt to mimic the actual processes that associate themodel covariates with the observations to which they are fitted.Moreover, in most cases the observations used by an LUR are in-tegrated data over a period of a couple of weeks (or a few suchperiods at different seasons) so the short time scale processesgoverning air pollution dispersion are averaged out and only thechronic exposure is estimated.

Incorporating short time scale effects has been shown toimprove the accuracy of chronic exposure to air pollution. Arainet al. (2007) used a vector average of hourly wind direction datato differentiate the effects of roads down and up the wind in theirLUR model. Su et al. (2008) demonstrated improvement in LURperformance while considering the hourly emission fluxes alongwedgeeshaped boxes whose geometry was dictated by the hourlywind direction and speed. Wilton et al. (2010) incorporated in anLUR the impact of short term meteorological variation throughadding hourly dispersion model outputs as covariates. A differentapproach by Szpiro et al. (2010) incorporated temporal trends inlongeterm exposure estimation as a linear combination of empir-ically derived temporal basis functions. Investigating the acute ef-fects of particulate matter on mortality, Maynard et al. (2007)included various temporal covariates like day of the week, day ofyear and meteorology in assessing the black carbon exposure levelsin a statistical scheme akin to universal kriging. A recent study byVan den Hooven et al. (2012) accounted for temporal variations bycalculating spatial PM10 and NO2 exposure patterns for eightdifferent wind conditions and deriving hourly spatial patterns by

means of interpolation between the eight characteristic spatialdistributions. These patterns were subsequently adjusted for fixedtemporal patterns of source activities to achieve themonthly, day ofthe week and time of the day concentrations at the home locationsof pregnant women. None of those studies incorporated the cova-riates which they used in a schemewhichwas optimally adjusted topollution observations at a temporal resolution close to the domi-nant frequency of the physical and chemical processes that governair pollution levels.

This study introduces a new concept enabling production ofdetailed pollution concentration maps at very high temporal res-olution. The main idea is formulating the study as an optimisationproblem in which the parameters of a nonlinear scheme thatsimulates the spatial distribution of air pollution concentration aredetermined by fitting the scheme’s output to observations. Thisrequires the incorporation of meteorological and air pollution ob-servations at a temporal resolution equal to or lower than thedominant time scales of the emission and dispersion processes. Themethodology is demonstrated using very simple means but isshown to achieve spatial variability similar to that enabled by anLUR, at the high temporal resolution of the meteorological andmonitoring data. The process is packaged in a closed mathematicalform and can thus be run completely automatically. This enables aquantitative assessment of its performance using a true leave-one-out cross-validation scheme. The ability to provide both the shorttime scale acute exposure and the long term chronic exposure isdemonstrated on real data from the monitoring network along theIsraeli coast.

2. Data

2.1. Study area

The proposed method is demonstrated in a strip of land about70 km long and 25 kmwide along the Israeli coast that includes themetropolitan Tel Aviv area and the cities of Netanya and Ashdodnorth and south of it, respectively (see Fig. 1). The total populationof the area is about 3.7 million. The area includes a dense roadsystem to which feeds much of Israel’s highway network, railwaylinks, two airports and a seaport. The transportation fleet includesabout 1.5 million private cars, 200,000 commercial vehicles, and10,000 city and long distance buses. Industrial sources are relativelyminor contributors to air pollution in the study area (Yuval andBroday, 2009). Two natural gas-powered electricity generationstations, a 450 MW station in Tel Aviv and a 1 GW station north ofAshdod, are the main industrial pollution sources.

2.2. Traffic model output

The traffic data is produced by the Metropolitan Tel AvivTraffic Model developed by Cambridge Systematics (CambridgeSystematics, 2008) during 2005e2008 using the EMME/2 soft-ware platform (INRO, 2012). The model was applied and is main-tained by an expert team employed by the Netivey AyalonCompany, a joint venture of the Israeli Government and the Tel Avivmunicipality. The road network which the model uses includes11,553 road segments (Fig. 1). Private car traffic is simulated by anactivity basedmodel. Bus tours follow the true bus network and thecommercial fleet (trucks and pick-ups) is modelled by a separatemodule. The model output includes 23 traffic attributes producedfor five periods during the day: morning, morning rush hour,middle day, afternoon rush hour and evening. For this work onlythe morning rush, middle day and afternoon rush outputs wereavailable.

165 170 175 180 185 190 195 200 205

630

640

650

660

670

680

690

700

Ashdod

Tel Aviv

Netanya

N

Fig. 1. The study area map showing the shoreline, the road network (thinned in denseareas for visual purposes), the location of the ambient and road-side monitoring sta-tions (green triangles and red squares, respectively) and the spatial 500 � 500 m grid.The coordinates are given using the New Israeli Grid system in km. (For interpretationof the references to colour in this figure legend, the reader is referred to the webversion of this article.)

Yuval et al. / Atmospheric Environment 79 (2013) 261e270 263

2.3. Monitoring and meteorological data

Air pollution monitoring data in the study area are observed byan air quality network of 38 stations and are available from theIsraeli Ministry for Environmental Protection’s website. We usedfor the model development the 2009 NO2 data from the 25 stationsthat observed the pollutant and that are designed to meet therequirement of the EU Council Directive 1999/30/EC for protectionof human health. NO2 data from additional six road-side moni-toring stations were used only for additional model testing. Thewind data used by the model were created from the set of windobservations in the monitoring stations following the representa-tive wind method of Yuval and Broday (2009). The representativewind data have full temporal coverage. They were compared andfound very similar at most time points to observations at 60 melevation above ground from ameteorological mast located close tothe centre of the study area (184.3 East, 670.8 North). The air qualityand meteorological data are all available as half-hourly means ofthe continuous measurements. A series of quality control steps wascarried out prior to using the data to ensure highest fidelity.

3. Methods

3.1. Description of the proposed model

The first step in transforming the traffic model outputs, given forroad sections, to spatially distributed air pollution concentrations isassigning these outputs to a regular grid. We used a regular gridthat divided the area covered by the traffic model into square cells.The contribution E of a road segment to the emission rate from acell through which it passes can be assumed to be proportional tothe multiplication of its traffic volume V (vehicles per unit time) bythemean time dt that it takes the traffic to cross the cell via the roadsegment,

EfVdt ¼ VL=S; (1)

where L is the length of the road segment through the cell and S isthe mean speed along it. The length L for each segment in each cellwas calculated given the grid and the road network geometries. Forthe purpose of air pollution study, the traffic volume V should beconsidered as summation of the numbers of the different vehicletypes passing through each segment, weighted by their emissionrates. The traffic model output which we used calculates privatevehicles, trucks, commercial vehicles and bus volumes for each ofthe road segments in the network. The trucks and commercialvehicles volume data are considered of lower accuracy and we thusused in our traffic volume estimation only the private vehicle andbus numbers i.e.,

V ¼ Va þ ReVb; (2)

where Va and Vb are the traffic model’s private car and bus numbersper hour in the segment, respectively, and Re is the ratio betweenthe mean bus to the mean private car emission of the air pollutantof interest. Following Romilly (1999), Re in our case was taken as 20.The volumes of the trucks and commercial vehicles are assumed tobe proportional to V. The proxy to the total emission rate from agrid cell is the summation of the contributions of the road segmentsthat go through it,

Tj ¼Xnk¼1

Ejk: (3)

where Tj is the traffic emission proxy at the jth cell in the gridand Ejk is the contribution of the kth road segment that passesthrough it.

Each grid cell can be considered an air pollutant source of in-tensity proportional to Tj. The impact of the contribution of Tj on theconcentration in any cell of the grid depends on the direction be-tween the source and receptor cells relative to the regional winddirection, and the distance between them. The traffic-related con-centration Ci in the ith cell at a given time point is a superposition ofthe contributions from the M cells of the grid. A reasonable, albeitadmittedly simplistic, way to specify that concentration is by

Ci ¼ p1 þ p2XMj¼1

Tjf�qij�p3

�Dij þ p4

�p5; (4)

whereDij is the distance between the ith and jth cells, qij is the anglebetween the regional wind direction and the direction between theith and jth cells, f(qij) equals cos(qij) for qij � 90� and zero otherwise,and p1, p2, p3, p4 and p5 are unknown parameters. A schematic di-agram of the setting of Equation (4) is given in Fig. 2. In a matrix

r

s

DWind

Fig. 2. Schematic diagram of the spatial grid and the geometrical relationship betweena pair of source (s) and receptive (r) grid cells. The impact of the emissions in cell s onthe concentration in cell r depends on the distance between them, D, and the angle q

between the wind direction and the direction ser.


notation the vector of concentrations in all the grid cells, C, isgiven by

C ¼ W ðpÞT; (5)

where T1 ¼1; Tj, j ¼ 2,3,/,M þ 1 are the emission proxies for theMgrid cells; p ¼ {p1 p2 p3 p4 p5}T and W is a weighting matrix suchthat W i1 ¼ p1 and W ij; j ¼ 2; 3; /; M þ 1 is given by

W ij ¼ p2XMj¼1

f�qij�p3

�Dij þ p4

�p5: (6)

To find the unknown model parameters we can write

dW ðpÞT � bC ¼ e (7)

where bC is the vector of N observed concentrations at the grid cellswhere monitoring reside and dW ðpÞ are the corresponding rows ofW ðpÞ. The unknown parameters are found by minimising the costfunction

F ¼ kek2 ¼��dW ðpÞT � bC

��2; (8)

subject to the constraints p1, p2, p3, p5 � 0 and p4 > 0. The con-straints ensure a physically plausible model (increasing impact of asource cell with decreasing q and D), and non-negative and boundconcentrations. We used for the constrained minimisation theMatlab� fmincon function (The MathWorks Inc., 2010). The solutionp of Equation (8) is plugged in Equation (5) to calculate the full gridspatial distribution of the pollutant concentration. Note that forsimplicity of writing, the time dependence was not explicitlyexpressed in Equations (4)e(7) but in fact C]C(t), the spatial dis-tribution at time point t. Moreover, p is also time dependent and adifferent parameters set p(t) is recovered at each time point.

3.2. Assessment of the results

In addition to a qualitative inspection of the concentration mapsproduced by the proposed method, we quantitatively assessed themodel performance using leave-one-out cross-validation. Thisprocess involves the removal of one observation from the set bC,solving Equation (8) to find the model parameters and using therelevant row of Equation (7) to estimate the concentration value atthe grid cell of the observation. Repeating this process by turn forall the observations at a time point results in a set of independentlyproduced concentration estimations which can be compared to thecorresponding observations. The process is then carried outconsecutively for all time points. A second assessment of the modelwas by a comparison of the values observed in road-side stations(completely independent and not used in the model production) tothose assigned by the model to the grid cells in which they reside.The assessment of the model performance against the observationsis by a set of statistical measures recommended by the dispersionmodels statistics review of Simon et al. (2012). The measures arethe mean bias (MB), the mean error (ME), the normalised meanerror (NME), the ratio of modelled values within a factor of two ofthe observations (FAC2), and the coefficient of determination (R2).Definitions of these measures and discussion of their properties aregiven by Simon et al. (2012). The proposed method’s performancewas also compared to the performance of three benchmarkmethods (a) Assigning at each time point to all the grid locationsthe mean of the available observed concentrations. (b) A simpleLUR that uses the traffic emission proxy T and thewind speed as theexplanatory variables and is calibrated at each time point by thesame observations used by the proposed method. (c) A spatialinterpolation scheme calculated for each time point. The threecomparisonmethods are described in detail in an appendix given inthe supplementary electronic Supporting Material.

4. Results

The proposed model is designed to run for consecutive timepoints of a study period. For this study we demonstrate only theresults for 07:30, 12:00 and 17:00 on all the working days in 2009.Testing against data from road-side stations, half of which startedoperations only after the beginning of 2010, are for the same hoursof the day in 2011. The model was run on a 500 � 500 m gridcovering the traffic model road network. The reported concentra-tions are for the centres of the rectangular grid cells. The 500 mspatial resolution was selected so that it provides exposure at thepostal code scale but yet not too prohibitive in terms of computa-tion time and computer memory requirements.

Fig. 3 presents three examples of half-hourly spatial NO2 dis-tributionwhich themodel produced. The time points were selectedfor their different times of the day and very different ambientconditions. Fig. 3a shows the concentrations in an autumn morn-ing, with a typically light wind from the south (180�, 2.2 m s�1),Fig. 3b is for a spring noonwith vigorous wind from the west (270�,4.6 m s�1) and Fig. 3c is for a winter evening with mild wind fromthe northwest (315�, 2.5 m s�1). The small scale concentrationvariability is clearly seen in all three plots, with locally highervalues along the roads. The spatial patterns in Fig. 3 reflect theambient conditions at the time points for which the plots wereproduced and demonstrate how the simple modelling schemecaptures the expected dispersion pattern. The main feature in theplots is of high concentrations in the Tel Aviv area and downwindfrom it. In Fig. 3a (wind from the south) relatively high concen-trations can be seen anywhere along the coastline north of Tel Aviv.In Fig. 3b and c (onshore wind from the west and northwest,respectively) the concentrations along the coastline are very low

Fig. 3. Examples of half-hourly spatial NO2 concentration in the study area. The concentrations are colour-coded, with the colour key given by the colourbars in ppb units. (a) On 17September 2009 at 07:30 with the wind from 180o, 2.2 m s�1. (b) On 17 April 2009 at 12:00 with the wind from 270o, 4.6 m s�1. (c) On 27 December 2009 at 17:00 with the windfrom 315o, 2.5 m s�1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)


and the plume of high NO2 values expands very clearly downwind,east and southeast from Tel Aviv. However, the concentrations inFig. 3c, for 27 December at 17:00, are much higher then in Fig. 3b(17 April at noon), reflecting the fact that around sunset and close tothewinter solstice the photochemical rates are lowand the eveningTel Aviv rush hour emissions result in high concentrations downthe wind.

Fig. 4 shows scatter plots of all the cross-validated half-hourlytime points. The NO2 concentrations, simulated for grid pointswhere ambient monitoring stations reside, are plotted against thecorresponding observations, split by their time of the day. In allthree plots the points are concentrated along the 1:1 ratio line,pointing to generally minimal bias in the model simulations. Mostof the points arewithin a factor of two of the observations. The onesthat are not are mainly unduly high modelled values of very lowobserved concentrations. The plot for 07:30 (Fig. 4a) shows alsomany too high modelled values at the higher concentration range.Table 1 provides a summary of the model performance measurescalculated for the data points in Fig. 4. As could be figured out fromFig. 4, the MB is indeed very small, less than 10% of a ppb. However,the scatter around the 1:1 ratio line results in ME values of 6.4, 4.1,and 4.1 ppb for the 07:30, 12:00 and 17:00 time points, respectively.The higherME in themorningmay partially be due to poorermodelperformance in typical morning conditions, as depicted by thelower R2, but is mainly due to the higher concentrations observed atthat time of the day when traffic emissions are maximal and thesun is not yet intense enough to speed up the photochemicalreactions.

Fig. 4 and Table 1 depict the global performance calculated forthe observations and their model simulations from all the stationstogether. Fig. 5 shows the cross-validated model performancemeasures at 07:30, calculated separately for each station and pre-sented at the stations’ spatial locations. The spatial pattern of theperformance measures is not homogeneous. Better performance ismore common at the Tel Aviv and Ashdod areas, where both thetraffic network and the monitoring stations’ density are high. Theperformance of the model in the peripheral areas is usually inferior,especially at the locations of the three stations in the northern partof the study area. In each of the plots in Fig. 5 there are also a few

results which are conspicuously dissatisfactory. It is important tonote that the spatial distributions of the various performancemeasures are different. The MB and the ME are absolute measuresand tend to be higher in areas of high NO2 concentrations like theTel Aviv conurbation. All four highest ME values are found there.However, the NME, the R2 and the FAC2 are normalised measuresand are usually worse in the peripheral areas where the NO2 con-centrations are low, so even modest modelling errors may result inreduced performance. Fig. 6 shows the spatial patterns of the per-formance measures at 17:00. The model performance which itshows is generally better than the morning performance shown inFig. 5. Moreover, the patterns of the performance measures are alsoclearly different at the two different times of the day. For example,the ME in the twin stations around location (188 East, 655 North) isvery large in the morning (Fig. 5b), when the wind is usually fromthe south or southeast, but is average in the evening hour (Fig. 6b)when the wind usually blows from the west or northwest andbrings to these stations emissions from the Tel Aviv area. In thiscase the unduly low morning modelled concentrations are prob-ably a result of underestimation of the traffic volumes in the majorhighway along which the stations are situated (see Fig. 1).

To provide a visual impression of the temporal modelling whichgave rise to the performance measures in Table 1 and Figs. 5 and 6,the observed concentrations and their modelled simulations in oneof the northern Tel Aviv stations (179.6 East, 665.7 North) is given inFig. 7. The seasonal differences are clearly captured by the modelledvalues, with low concentrations and low variability during thesummer months and higher concentrations with very large vari-ability in the winter. The general level and many of the true NO2peaks are very well modelled in the noon and evening plots (Fig. 7band c, respectively). The morning modelled values (Fig. 7a),although reasonably well correlated with their corresponding ob-servations (see Fig. 5d), are generally overestimating the observa-tions as depicted by the high positive bias in Fig. 5a.

Table 2 provides the summary of model performance measurescalculated for observations in road-side stations whose data werenot used in the modelling. The ME and NME are much largercompared to the corresponding values in Table 1 which shows thecross-validated measures for the ambient monitoring stations.

Table 1Statistical measures of model performance in producing the cross-validated half-hourly NO2 observations in 2009. The table provides the number of observation andmodelled value pairs (n), the mean observed concentration value (Mean), the meanbias (MB), mean error (ME), normalised mean error (NME), the ratio of modelledvalues within factor of two from their corresponding observations (FAC2) and thecoefficient of determination (R2).

Time of day n Mean (ppb) MB (ppb) ME (ppb) NME (%) FAC2 R2

07:30 6882 19.5 0.00 6.2 32 0.87 0.4912:00 6803 11.3 0.01 4.7 41 0.74 0.5517:00 6895 12.3 0.00 4.9 39 0.74 0.60All 20,580 14.4 0.00 5.2 36 0.78 0.59

Fig. 4. Scatter plots of the half-hourly observed vs. cross-validated modelled NO2

concentrations in all the monitoring stations. The data points in the plots are colouredaccording to the density of points in their vicinity on the abscissa-ordinate plain(which is proportional to the frequency by which similar points transpired in theobservation-model space). The colour-coding key is given in the colourbars in units of[number of points]/[10�4 of the area of the abscissa-ordinate plain]. The solid line isthe 1:1 ratio and the two dashed lines are the 1:2 and 0.5:1 ratios. Six large outliers(above the 99.9 percentile) were not included in each of the plots for visual purpose.(a) Data points from all the stations and all days in 2009 at 07:30. (b) Data points fromall the stations and all days in 2009 at 12:00. (c) Data points from all the stations andall days in 2009 at 17:00. (For interpretation of the references to colour in this figurelegend, the reader is referred to the web version of this article.)


Most of the errors are due to underestimation of the observations,as seen by the negative MB. Certainly the regression based on datafrom ambient sites does not achieve a satisfactory fit at the road-side locations, which typically experience much higher concen-trations. The relatively coarse 500 � 500 m grid is also probably

inadequate to capture the high NO2 concentrations in a very closeproximity to roads. It must be noted though that similar poorperformance in simulating road-side concentrations was noted alsoby Beevers et al. (2012) who used a 20 � 20 m grid.

The half-hourly modelled concentrations discussed thus far canbe integrated to provide the accumulated exposure during anydesired time periods. Fig. 8 shows long term mean concentrationsmaps produced by integrations of the proposedmodel’s half-hourlyoutput. The plots are for the 07:30, 12:00 and 17:00 time pointsduring 2009. They were produced by taking for each grid cell centrethe mean of its modelled concentration time series during thespecified hour of the day. The plots capture the high variability inthe chronic NO2 exposure, which is expected in an urban envi-ronment. They also demonstrate the significant differences in NO2exposure during different hours of the day. The morning concen-trations are generally higher and the spatial pattern is differentfrom the noon and evening patterns. The differences in the con-centration range and patterns are due to the different emissionrates, photochemical rates, and the breeze cycle which determinesmost of the daily wind direction variability in the area. Fig. 9 showsscatter plots of the mean values of the cross-validated modelledvalues and their corresponding means of the observations duringthe 07:30, 12:00 and 17:00 h of 2009. Table 3 provides modelperformance measures corresponding to the values shown in theplots, and for the sets of means calculated for all the time pointstogether. Table 4 provides the performance measures for simu-lating the long-term mean of observations measured at the road-side stations (which were not used in the modelling). Figs. 8 and 9,and the information in Tables 3 and 4 enable assessment of theability of the model to estimate the long term chronic exposure toNO2. The maps in Fig. 8 are detailed enough for specifying theexposure at the postal code scale and there is very little bias in thecross-validated modelled values. The R2 of the model is 0.62, in themiddle of the range reported for NO2 LUR models (Ryan andLeMasters, 2007; Hoek et al., 2008). It is clear though that using arelatively coarse grid and not incorporating road-side station datain the modelling resulted in limitations on the model’s ability toestimate the exposure at the very close proximity to roads.

5. Discussion

This study proposes the formulation of air pollution exposureestimation as an optimisation problem inwhich the parameters of amodel, simulating the spatial air pollution distribution, are deter-mined by fitting its output to observations. Carrying out the pro-posed methodology requires (a) data regarding the pollutionsources, (b) a model to transform emissions (or their proxies) topollutant concentrations, (c) a spatially representative sample ofobservations at the temporal resolution of the dominant dispersionprocesses and (d) a regression algorithm to fit the modelling outputto the observations. The nonlinear nature of air pollution physicsand chemistry most probably necessitates the use of a nonlinear

180 200

630

640

650

660

670

680

690

700

−15.4

−6.8

1.7

10.3

a180 200

2.68

6.96

11.24

15.52

b180 200

0.16

0.42

0.67

0.92

c180 200

0.09

0.3

0.51

0.71

d180 200

0.48

0.65

0.81

0.98e

Fig. 5. The statistical measures of model performance at the 07:30 h, calculated separately for each station. The measures’ values are presented as colour-coded diamonds located atthe stations’ locations, with the colour coding key given in the colourbars. (a) Mean Bias (ppb). (b) Mean Error (ppb). (c) Normalised Mean Error. (d) Coefficient of Determination(R2). (e) Ratio of modelled values within factor of two from the observations (FAC2). (For interpretation of the references to colour in this figure legend, the reader is referred to theweb version of this article.)


regression. The concept is demonstrated using very simple meansfor each of the components of the scheme but it is able to produceacute NO2 exposure favourably compared with that obtained bycomprehensive dispersion models (e.g., Beevers et al., 2012), andchronic exposure at a level commensurate with the LUR modelsdiscussed by the review papers of Ryan and LeMasters (2007) andHoek et al. (2008). Application of the method to pollutants otherthan NO2, especially non-primary ones, is more complicated andmay require incorporation of chemical reaction calculations and agood pollutant background estimation. The advantages of highresolution spatiotemporal exposure are many and to our opinionworth the efforts. Exposure at a given location can be integrated forany desired period e.g., produce daily means for acute healthoutcome studies, weekly or monthly means for pregnancy studies,and annual or longer termmeans for chronic exposure. In addition,refining personal exposure by tracking a person’s mobility is made

180 200

630

640

650

660

670

680

690

700

−8.2

−3.6

1 5.6

a180 200

2.71

4.74

6.77

8.8

b180

0.17

0.6

c

Fig. 6. Like Fig. 5 but

possible thanks to the option to calculate a person’s exposure as atime-weighted mean of the exposure at home, work, etc. The hightemporal resolution may also enable investigations of the possibleexistence of threshold effects in the impact of air pollutants onhealth.

The proposed method’s performance was compared to that ofthree benchmark methods. The benchmarks’s performance mea-sures are given in the appendix in the electronic SupportingMaterial. The exposure estimated by our data-driven model out-performs by a very large margin the exposure based on assigning toeach location at each time point the mean of the simultaneousobservations. This shows that the additional variability provided bythe proposed method is indeed contributing a beneficial spatialvariability and not additional spatial noise. Similar advantage wasachieved in the comparison with the exposure produced by linearLURmodels computed for each time point and using as explanatory

200

1.03

1.46

180 200

0 0.3

0.6

0.89

d180 200

0.27

0.5

0.74

0.98

e

for the 17:00 h.

20

40

60N

O2 (

ppb

) a

20

40

60

80

NO

2 ( pp

b ) b

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

20

40

60

80

NO

2 ( pp

b ) c

Fig. 7. The time series of observed and cross-validated modelled NO2 concentrations (green circles and black crosses, respectively) in station Antokolski (179.6 East, 665.7 North) in2009. (a) The 07:30 time points. (b) The 12:00 time points. (c) The 17:00 time points. (For interpretation of the references to colour in this figure legend, the reader is referred to theweb version of this article.)

Fig. 8. The long termmean NO2 concentration in the study area. The concentrations are colouthe time points in 2009 at 07:30. (b) Mean values for of all the time points in 2009 at 12:00references to colour in this figure legend, the reader is referred to the web version of this

Table 2Statistical measures of model performance in producing the half-hourly NO2 ob-servations in 2011 at grid cells in which transportation stations reside. The tableprovides the number of observation and modelled value pairs (n), the meanobserved concentration value (Mean), the mean bias (MB), mean error (ME), nor-malised mean error (NME), the ratio of modelled values within factor of two fromtheir corresponding observations (FAC2) and the coefficient of determination (R2).


07:30 1723 34.2 �7.8 10.1 30 0.88 0.2012:00 1681 19.8 �6.6 7.9 40 0.71 0.5617:00 1705 20.0 �6.1 7.9 40 0.71 0.60All 5109 24.7 �7.0 8.7 35 0.77 0.58


variables the same traffic emission proxy used by our model andthe wind speed. This justifies our approach of embedding adispersion model in a regression which is nonlinear and thus moredemanding in terms of computing resources. The comparison of theproposed model’s cross-validated performance measures (Tables 1and 3) to the corresponding ones obtained by a spatial interpolation(Tables 9 and 11 in the Supporting Material appendix) is quitebalanced. However, the proposed model clearly outperforms theinterpolation in estimating the exposure at the road-side stationswhose data were not used by either scheme (compare Tables 2 and

r-coded, with the colour key given by the colourbars in ppb units. (a) Mean values of all. (c) Mean Values for of all the time points in 2009 at 17:00. (For interpretation of thearticle.)

0 10 20 30 400

5

10

15

20

25

30

35

40

Observed NO2 ( ppb )

Mod

elle

d N

O 2 ( pp

b )

a

0 5 10 150

5

10

15


Mod

elle

d N

O 2 ( pp

b )

b

0 5 10 150

5

10

15


Mod

elle

d N

O 2 ( pp

b )

c

Fig. 9. Scatter plots of the annual mean observed vs. cross-validated modelled NO2

concentrations in the ambient monitoring stations during 2009. (a) Rush hour, 07:30.(b) Noon, 12:00. (c) Evening rush hour, 17:00.

Table 3Statistical measures of model performance in producing the mean 2009 observationvalues using the corresponding means of the cross-validated half-hourly modelledNO2 concentrations. The table provides the number of observation and modelledvalue pairs (n), the mean observed concentration value (Mean), the mean bias (MB),mean error (ME), normalised mean error (NME), the ratio of modelled values withinfactor of two from their corresponding observations (FAC2) and the coefficient ofdetermination (R2).


07:30 25 19.5 �0.27 4.2 21 1.00 0.5412:00 25 11.3 �0.09 2.2 20 0.96 0.4917:00 25 12.3 �0.13 2.1 17 0.96 0.56All 25 14.4 �0.16 2.5 18 1.00 0.62

Table 4Statistical measures of model performance in producing the mean 2011 observationvalues at grid cells in which transportation stations reside using the correspondingmeans of the half-hourlymodelled concentrations. The table provides the number ofobservation-modelled value pairs (n), the mean observed concentration value(Mean), the mean bias (MB), mean error (ME), normalised mean error (NME), theratio of modelled values within factor of two from their corresponding observations(FAC2) and the coefficient of determination (R2).


07:30 6 34.2 �7.8 7.8 23 1.0 0.1412:00 6 19.8 �6.7 6.7 34 1.0 0.4417:00 6 20.0 �6.2 6.3 31 1.0 0.46All 6 24.7 �7.0 7.0 28 1.0 0.22


4 for the proposed model with Tables 10 and 12 in the SupportingMaterial for the interpolation). This advantage is a quantitativemanifestation of the more physically-sensible approach of ourmodelling and the higher spatial variability in the maps which itproduces (e.g., compare Figs. 3 to 5 in the Supporting Dataappendix).

There are potentially many options to enhance our model’sperformance, which can be relatively easy to incorporate. Refiningthe spatial grid resolution is straightforward and may yield betterestimation in the areas where the traffic network is dense. Pointsource emissions from grid cells where large industrial plants existcan be integrated in the dispersion scheme and account for what isusually the second largest source of air pollution in modern urbancities. The traffic emission proxies could be refined using bettertraffic volume estimations and incorporating also vehicle speedsinformation. In the near future comprehensive specification oftraffic intensity in a road network may become possible thanks tothe ubiquity of GPS mounted smartphone. A more challenging andpromising possible improvement, one that is actively sought by theauthors, is incorporating a recursive scheme to minimise themodelling errors using information from past time points.

The advantage of the proposed scheme comes from the inte-gration of various data sources via a regression scheme. Thepossible difficulty in obtaining all these components is a disad-vantage. Traffic models, providing proxies for emissions, are avail-able for many regions. For example, Henderson et al. (2007) usedoutput of traffic model similar to the one we used (the EMME/2) asa traffic density input in their LUR. Data from cellular telephonenetworks is a very promising traffic information source which willsoon be available almost anywhere. Lack of monitoring data at asufficiently high spatial and temporal resolution is a more severelimitation. Our methodology relies on detailed source specificationto provide the high spatial variability. However, to properly trans-form this input into pollutant concentrations a sample of obser-vations that represents well the concentration distribution in thearea must be available. Many urban areas, especially in developingcountries, do not have adequate air quality monitoring network.Many examples of such networks do exist however (e.g., Hendersonet al., 2007; Beelen et al., 2010; Beevers et al., 2012) and theemergence of cheapmobile monitoring devices may result in denseair quality network becomingmore common. The lesser accuracy ofsuch devices compared to established monitoring is less of aconcern while incorporated in our method because their data arenot used directly as in an interpolation scheme.

Additional issue to bear in mind is that running a nonlinearregression may be costly in terms of computer resources. Forexample, in our very simple scheme evaluating the full grid con-centration field at each time point took approximately 7 s on adesktop machine with an Intel� Core2 processor and required atleast 6 GB of random access memory (RAM). Carrying out carefulcross-validation is of prime importance for a nonlinear regression


scheme with potentially many model parameters. In our case, withonly five model parameters and no more than 25 observations,producing the cross-validation results for each time point tookabout 55 s. Adding complexity, and model parameters, or refiningthe grid may increase significantly the computation time and theRAM requirements. Moreover, a hidden danger with complexmodels is that they might overfit the observations. An inner cross-validation scheme, separate from the final cross-validation to testthe model performance, may be needed in such cases in order toarrive at the optimal model parameters. The general cross valida-tion function (Golub et al., 1979) has been successfully used in thepast for that purpose to control the fit of nonlinear schemes toobservations (Yuval, 2000). We believe that the technical obstaclesin incorporating optimisation schemes in the field of air pollutionexposure can be overcome and enhance our capabilities in carryingout air pollution epidemiological studies.

Acknowledgements

This study has been supported by the Technion Centre ofExcellence in Exposure Sciences and Environmental Health. Theauthors would like to thank two anonymous reviewers for theirconstructive comments.

Appendix A. Supplementary data

Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.atmosenv.2013.06.005.

References

Arain, M.A., Blair, R., Finkelstein, N., Brook, J.R., Sahsuvaroglu, T., Beckereman, B.,Zhang, L., Jerret, M., 2007. The use of wind fields in a land use regression modelto predict air pollution concentrations for health exposure studies. AtmosphericEnvironment 41, 3453e3464.

Astitha, M., Kallos, G., Katsafados, P., 2008. Air pollution monitoring in the Medi-terranean region: analysis and forecasting of episodes. Atmospheric Research89, 358e364.

Beelen, R., Voogt, M., Duyzer, J., Zandveld, P., Hoek, G., 2010. Comparison of theperformances of land use regression modelling and dispersion modelling inestimating the small-scale variations in long-term air pollution concentrationsin a Dutch urban area. Atmospheric Environment 44, 4614e4621.

Beevers, S.D., Kitwiroon, N., Williams, M.L., Carslaw, D.C., 2012. One way coupling ofCMAQ and a road source dispersion model for fine scale air pollution pre-dictions. Atmospheric Environment 59, 47e58.

Borrego, C., Monteiro, A., Ferreira, J., Noraes, M.R., Carvalho, A., Ribeiro, I.,Miranda, A.I., Moreiram, D.M., 2010. Modelling the photochemical pollutionover the metropolitan area of Porto Alegre, Brazil. Atmospheric Environment44, 370e380.

Byun, D.W., Ching, J.K.S., 1999. Science Algorithms of the EPA Models-3 CommunityMultiscale Air Quality (CMAQ) Modeling System. U.S. Environmental ProtectionAgency, Office of Research and Development. EPA/600/R-99/030.

Cambridge Systematics Inc., 2008. Tel-Aviv activity schedule travel demand modelsystem: a tour-based approach. In: Final Report Prepared for the Israeli Ministryof Transport.

Cyrus, J., Hochadel, M., Gehring, U., Hoek, G., Diegmann, V., Brunekreef, B.,Heinrich, J., 2005. GIS-based estimation of exposure to particulate matter and

NO2 in an urban area: stochastic versus dispersion modeling. EnvironmentalHealth Perspectives 113, 987e992.

Golub, G.H., Heath, M., Wahba, G., 1979. Generalized cross-validation as a methodfor choosing a good ridge parameter. Technometrics 21, 215e223.

Gulliver, J., Briggs, D., 2011. STEMS-Air: a simple air pollution dispersion model forcity-wide exposure assessment. Science of the Total Environment 409, 2419e2429.

Han, X., Naeher, L.P., 2006. A review of traffic-related pollution exposure assessmentstudies in the developing world. Environmental International 32, 106e120.

Henderson, S.B., Beckerman, B., Jerrett, M., Brauer, M., 2007. Application of land useregression to estimate long-term concentrations of traffic-related nitrogenoxides and fine particulate matter. Environmental Science and Technology 41,2422e2428.

Hoek, G., Beelen, R., de Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P., Briggs, D.,2008. A review of land-use regression models to assess spatial variation ofoutdoor air pollution. Atmospheric Environment 42, 7561e7578.

INRO inc., 2012. http://www.inrosoftware.com.Jerret, M., Arain, A., Kanaroglou, P., Beckerman, B., Potoglou, D., Sahsuvaroglou, T.,

Morisson, J., Giovis, C., 2005. A review and evaluation of intraurban air pollutionexposure models. Journal of Exposure Analysis and Environmental Epidemi-ology 15, 185e204.

Lee, P., Talbott, E.O., Roberts, J.M., Catov, J.M., Bilonick, R.A., Stone, R.A., Sharma, R.K.,Ritz, B., 2012. Ambient air pollution exposure and blood pressure changesduring pregnancy. Environmental Research 117, 46e53.

Maynard, D., Coull, B.A., Gryparis, A., Schwartz, J., 2007. Mortality risk associatedwith short-term exposure to traffic particles and sulfates. Environmental HealthPerspectives 115, 751e755.

McHugh, C.A., Carruthers, D.J., Edmunds, H.A., 1997. ADMS-Urban: an air qualitymanagement system for traffic, domestic and industrial pollution. InternationalJournal of Environment and Pollution 18, 666e674.

Romilly, P., 1999. Substitution of bus for car travel in urban Britain: an economicevaluation of bus and car exhaust emission and other costs. TransportationResearch D, 109e125.

Ryan, P.H., LeMasters, G.K., 2007. A review of land-use regression models forcharacterizing intraurban air pollution exposure. Inhalation Toxicology 19(Suppl. 1), 127e133.

Simon, H., Baker, K.R., Phillips, S., 2012. Compilation and interpretation pf photo-chemical model performance statistics published between 2006 and 2012.Atmospheric Environment 61, 124e139.

Son, J., Bell, M., Lee, T., 2010. Individual exposure to air pollution and lung functionin Korea: spatial analysis using multiple exposure approaches. EnvironmentalExposure 110, 739e749.

Su, J.G., Brauer, M., Ainslie, B., Steyn, D., Larson, T., Buzzelli, M., 2008. An innovativeland use regression model incorporating meteorology for exposure analysis.Science of the Total Environment 390, 520e529.

Szpiro, A.A., Sampson, P.D., Sheppard, L., Lumley, T., Adar, S.D., Kaufman, J.D., 2010.Predicting intra-urban variation in air pollution concentrations with complexspatio-temporal dependencies. Environmetrics 21, 606e631.

Matlab Version 7.9.0.529 (R2009b), 2010. The MathWorks Inc., Natick,Massachusetts.

Van den-Hooven, E.H., Pierik, F.H., Van-Ratihgen, S.W., Zandveld, O.Y.J., Meijer, E.W.,Hofman, A., Miedema, H.M.E., Jaddoe, V.W.V., de-Kluizenaar, Y., 2012. Airpollution exposure estimation using dispersion modelling and continuousmonitoring data in a prospective birth cohort study in the Netherlands. Envi-ronmental Health 11, 9.

Wilton, D., Szpiro, A., Gould, T., Larson, T., 2010. Improving spatial concentrationestimates of nitrogen oxides using hybrid meteorological dispersion/land useregression model in Los Angeles, CA and Seattle, WA. Science of the TotalEnvironment 408, 1120e1130.

Yuval, 2000. Neural network training for prediction of climatological time series,regularized by minimization of the Generalized Cross Validation function.Monthly Weather Review 128, 1456e1473.

Yuval, Broday, D.M., 2006. High-resolution spatial patterns of long-term meanconcentrations of air pollutants in Haifa Bay area. Atmospheric Environment 40,3653e3664.

Yuval, Broday, D.M., 2009. Assessing the long term impact of power plant emissionson regional air pollution using extensive monitoring data. Journal of Environ-mental Monitoring 11, 425e433.



http://refhub.elsevier.com/S1352-2310(13)00459-7/sref1





















































http://www.inrosoftware.com


































































Documents

Data-driven nonlinear optimisation of a simple air ...cee.faculty-ms.technion.ac.il/wp-content/.../05/370...Department of Civil and Environmental Engineering, Technion, Israel Institute