66
131 Hartwell Avenue Lexington, Massachusetts 02421-3126 USA Tel: +1 781 761-2288 Fax: +1 781 761-2299 www.aer.com FINAL REPORT El Paso Ozone and PM2.5 Background and Totals Trend Analysis TCEQ Contract No. 582-15-50414 Work Order No. 582-18-81763-07 Revision 2.0 Prepared by: Amy McVey, Richard Pernak, Jennifer Hegarty, and Matthew Alvarado Atmospheric and Environmental Research, Inc. (AER) 131 Hartwell Ave. Lexington, MA 02466 Correspondence to: [email protected] Prepared for: Erik Gribbin Texas Commission on Environmental Quality Air Quality Division Building E, Room 342S Austin, Texas 78711-3087 June 29, 2018

FINAL REPORT El Paso Ozone and PM Background and Totals Trend Analysis · 2018. 6. 29. · El Paso Ozone and PM 2.5 Background and Totals Trend Analysis TCEQ Contract No. 582-15-50414

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • 131 Hartwell Avenue

    Lexington, Massachusetts 02421-3126

    USA Tel: +1 781 761-2288 Fax: +1 781 761-2299

    www.aer.com

    FINAL REPORT

    El Paso Ozone and PM2.5 Background and Totals Trend Analysis

    TCEQ Contract No. 582-15-50414 Work Order No. 582-18-81763-07

    Revision 2.0

    Prepared by:

    Amy McVey, Richard Pernak, Jennifer Hegarty, and Matthew Alvarado Atmospheric and Environmental Research, Inc. (AER)

    131 Hartwell Ave. Lexington, MA 02466

    Correspondence to: [email protected]

    Prepared for:

    Erik Gribbin Texas Commission on Environmental Quality

    Air Quality Division Building E, Room 342S

    Austin, Texas 78711-3087

    June 29, 2018

  • Work Order No. 582-18-81763-07 Final Report

    2

    Document Change Record Revision Revision Date Remarks 0.1 21 May 2018 Internal draft for review 1.0 1 June 2018 Draft version submitted to TCEQ 2.0 29 June 2018 Final version submitted to TCEQ

  • Work Order No. 582-18-81763-07 Final Report

    3

    TABLEOFCONTENTS

    ExecutiveSummary...................................................................................................................81. Introduction....................................................................................................................101.1 ProjectObjectives...............................................................................................................101.2 PurposeandBackground.................................................................................................101.2.1 TrendsinO3andPM2.5...........................................................................................................................101.2.2 RegionalBackgroundConcentrationsofO3andPM2.5.........................................................101.2.3 Synoptic-andUrban-scaleMeteorologicalControlsonO3................................................11

    1.3 ReportOutline.....................................................................................................................11

    2 Task2:EffectsofMeteorologyonO3andPM2.5Trends..........................................132.1 InputDataandProcessing...............................................................................................132.1.1 TCEQMonitorData.................................................................................................................................132.1.2 IGRARadiosondeData..........................................................................................................................162.1.3 NCDCIntegratedSurfaceHourlyData..........................................................................................162.1.4 NAM-12MeteorologicalData............................................................................................................172.1.5 HYSPLITBackTrajectories.................................................................................................................17

    2.2 GeneralizedAdditiveModel............................................................................................172.2.1 GAMsDescription....................................................................................................................................192.2.2 MDA8GAMResults.................................................................................................................................192.2.3 PM2.5GAMs..................................................................................................................................................242.2.4 CrossValidationAnalysis....................................................................................................................26

    2.3 GAMsforBackgroundO3..................................................................................................272.4 MeteorologicallyAdjustedTrendsofO3andPM2.5....................................................29

    3 Task3:BackgroundO3.....................................................................................................343.1 DailyEstimatesofRegionalBackgroundO3(TCEQMethod)..................................343.2 TemporalTrendsofBackgroundO3..............................................................................353.3 AlternativeMethodstoDetermineRegionalBackgroundO3.................................373.3.1 DeterminingBackgroundO3withPCA.........................................................................................37

    4 Task 4: The Role and Importance of Synoptic or Mesoscale MeteorologicalConditionsinCreatingHighO3andPM2.5Days......................................................................42

    4.1 SynopticMapTypeAnalysis............................................................................................424.1.1 TechnicalMethodandResults..........................................................................................................424.1.2 Discussion....................................................................................................................................................47

    4.2 Urban-ScaleMeteorologicalPredictorsofO3..............................................................484.2.1 LogisticRegressionApproach...........................................................................................................484.2.2 ResultsandDiscussion..........................................................................................................................52

    5 QualityAssuranceStepsandReconciliationwithUserRequirements..............535.1 Task2:DevelopmentofGAMs.........................................................................................535.2 Task3:BackgroundO3andPM2.5...................................................................................545.3 Task4:SynopticandMesoscaleControlsofO3andPM2.5.......................................54

    6 Conclusions.........................................................................................................................567 RecommendationsforFutureStudy............................................................................58

    8 References...........................................................................................................................59

  • Work Order No. 582-18-81763-07 Final Report

    4

    AppendixAFileDescriptionsandProcessFlow.............................................................60A.1 ProcessFlow........................................................................................................................60A.2 FileDescriptions.................................................................................................................61A.2.1 Inputdata(./data/)................................................................................................................................61A.2.2 DataProcessingScripts(./scripts/)...............................................................................................62A.2.3 HYSPLITFiles(./HYSPLIT/)...............................................................................................................63A.2.4 ProcessedInputDataFilesinCSVFormat(./csv_files/).....................................................64A.2.4.1 IntermediateCSVFiles(./csv_files/Intermediate/).........................................................64A.2.4.2 FinalCSVFiles(./csv_files/final/).............................................................................................64A.2.5 GAM(./full_gam_fits/)...........................................................................................................................64A.2.6 SynopticMapTyping(./MAPTYPE/).............................................................................................64A.2.6.1 OutputFiles(./MAPTYPE/logReg/).........................................................................................65A.2.7 PCAFiles(./PCA/)...................................................................................................................................65

  • Work Order No. 582-18-81763-07 Final Report

    5

    List of Figures FIGURE1.ANNUALAVERAGEMDA8O3FORTHECOMBINEDDATASETOFALLMONITORINGSITES(COMB)ANDJUSTTHESITES

    WITHINELPASO,TEXAS(ELP)..................................................................................................................................................15FIGURE2.SMOOTHFUNCTIONSFORTHETOTALMDA8O3COMBGAMFIT.THEY-AXISSCALEISTHESCALEOFTHE“LINEAR

    PREDICTOR”,I.E.THEDEVIATIONOFTHENATURALLOGARITHMOFTHEMDA8O3INPPBVFROMITSMEANVALUE........21FIGURE3.GAMEVALUATIONPLOTSFORTOTALMDA8O3COMB..................................................................................................22FIGURE4. SMOOTHFUNCTIONS FORTHETOTALMDA8O3ELPGAMFIT.THEY-AXIS SCALE IS THE SCALEOFTHE “LINEAR

    PREDICTOR”,I.E.THEDEVIATIONOFTHENATURALLOGARITHMOFTHEMDA8O3INPPBVFROMITSMEANVALUE........23FIGURE5.GAMEVALUATIONPLOTSFORTOTALMDA8O3ELP......................................................................................................24FIGURE6.SMOOTHFUNCTIONSFORTHETOTALPM2.5UTEPGAMFIT...........................................................................................25FIGURE7.GAMEVALUATIONPLOTSFORTOTALPM2.5UTEP..........................................................................................................26FIGURE8.SMOOTHFUNCTIONSFORTHEBACKGROUNDMDA8O3GAMFIT.THEY-AXISSCALEISTHESCALEOFTHE“LINEAR

    PREDICTOR”,I.E.THEDEVIATIONOFTHENATURALLOGARITHMOFTHEMDA8O3INPPBVFROMITSMEANVALUE........28FIGURE9.GAMEVALUATIONPLOTSFORBACKGROUNDMDA8O3...................................................................................................29FIGURE10.ORIGINAL(DASHEDLINES)ANDMETEOROLOGICALLYADJUSTED(SOLIDLINES)ANNUALAVERAGESFORTOTALAND

    BACKGROUNDO3FORCOMB.EQUATIONSFORTHEOLSLINEARREGRESSIONSARESHOWNONTHEPLOTASWELL........31FIGURE11.ORIGINAL(DASHEDLINES)ANDMETEOROLOGICALLYADJUSTED(SOLIDLINES)ANNUALAVERAGESFORTOTALO3FOR

    ELP.EQUATIONSFORTHEOLSLINEARREGRESSIONSARESHOWNONTHEPLOTASWELL..................................................32FIGURE12.ORIGINAL(DASHEDLINES)ANDMETEOROLOGICALLYADJUSTED(SOLIDLINES)ANNUALAVERAGESFORTOTALPM2.5

    FORUTEP.EQUATIONSFORTHEOLSLINEARREGRESSIONSARESHOWNONTHEPLOTASWELL.......................................33FIGURE13.MONITORINGSITESINTHENEWMEXICO,JUÁREZ,ANDELPASOAREAPROVIDEDBYTHETCEQ.............................35FIGURE14.BOX-AND-WHISKERPLOTSSEASONAL(TOP)ANDANNUAL(BOTTOM)TRENDSINTHEBACKGROUNDMDA8O3FOR

    COMB ESTIMATED USING THE TCEQ METHOD. THE LINE IS THE MEAN. BOX EDGES SHOW THE 25TH AND 75THPERCENTILES (INTER-QUARTILERANGE,ORIQR),THEWHISKERS SHOWTHEDATARANGEUPTO±1.5*IQRANDTHECIRCLESSHOWDATAPOINTSOUTSIDETHATRANGE..................................................................................................................37

    FIGURE15.PCA-DERIVEDBACKGROUNDOZONEAPPLIEDOVERTHEENTIREOZONESEASONDATASET(X-AXIS,10YEARSANDALLSITES)COMPAREDTOOURORIGINALTCEQMETHODOFDETERMININGBACKGROUNDOZONE(Y-AXIS).............................39

    FIGURE16.MONTHLY(TOP)ANDYEARLY(BOTTOM)BACKGROUNDOZONE(PPBV)ASDERIVEDUSINGTHEPCAMETHOD.......40FIGURE17.SYNOPTICMAPSTYPESDETERMINEDFROM850MBARGEOPOTENTIALHEIGHTFIELDSFROMTHE12-KMRESOLUTION

    NAM-12METEOROLOGYTODRIVEHYSPLITINSTEADOFTHE32-KMRESOLUTIONDATAFROMNARRUSINGTHEMETHODOFHEGARTYETAL.(2007)........................................................................................................................................................43

    FIGURE18.RELATIVEFREQUENCYOFSYNOPTICMAPTYPESINEACHMONTH..................................................................................44FIGURE19.BOXANDWHISKERPLOTSOFTHEDISTRIBUTIONSOFBACKGROUNDMDA8O3COMB(PPBV,TOP),TOTALMDA8O3

    COMB(PPBV,BOTTOM).THEBOUNDARIESOFTHEBOXESARETHE25THAND75THPERCENTILES,ANDTHEWHISKERSCOVERTHERANGEOFTHEDATAORALLVALUESWITHIN1.5TIMESOFTHEINTERQUARTILERANGE(IQR)OFTHEBOX,WHICHEVERISSMALLER..............................................................................................................................................................45

    FIGURE20.BOXANDWHISKERPLOTSOFTHEDISTRIBUTIONSOFTOTALMDA8O3ELP.THEBOUNDARIESOFTHEBOXESARETHE25THAND75THPERCENTILES,ANDTHEWHISKERSCOVERTHERANGEOFTHEDATAORALLVALUESWITHIN1.5T46

    FIGURE 21. PROBABILITY OF THE TOTAL MDA8 O3 COMB EXCEEDING 70 PPBV AS A FUNCTION OF AFTERNOON MEANTEMPERATURE(°C),DAILYWINDSPEED(M/S),ANDSYNOPTICTYPE(ASDEFINEDINSECTION4.1).................................49

    FIGURE 22. PROBABILITY OF THE TOTAL MDA8 O3 ELP EXCEEDING 70 PPBV AS A FUNCTION OF AFTERNOON MEANTEMPERATURE(°C),DAILYWINDSPEED(M/S),ANDSYNOPTICTYPE(ASDEFINEDINSECTION4.1).................................50

    FIGURE 23. PROBABILITY OF THE BACKGROUND MDA8 O3 EXCEEDING 55 PPBV AS A FUNCTION OF AFTERNOON MEANTEMPERATURE(°C),DAILYWINDSPEED(M/S),ANDSYNOPTICTYPE(ASDEFINEDINSECTION4.1).................................51

    FIGURE24.FLOWCHARTSHOWINGTHEPROCESSINGFROMTHEORIGINALDATASOURCES(GREENBOXES)TOTHEFINALCSVFILE(REDBOX)THATISUSEDASINPUTFORTHEGAMSCRIPTS......................................................................................................60

    FIGURE25.FLOWCHARTSHOWINGTHEPROCESSINGFROMTHEINPUTCSVFILEGENERATEDATTHEENDOFFIGURE24(REDBOX)TOTHEGAMOUTPUTFILES(LIGHTGREENBOX)............................................................................................................61

  • Work Order No. 582-18-81763-07 Final Report

    6

    List of Tables

    TABLE1.PROJECTEDSCHEDULEFORTCEQWORKORDERNO.582-18-81763-07...................................................................12TABLE2.AIRMONITORINGSITESPROVIDEDBYTHETCEQ..............................................................................................................14TABLE3.PM2.5DATACOVERAGEBYSITEFORTHEPERIODOFSTUDY.............................................................................................16TABLE4.IGRASITE...............................................................................................................................................................................16TABLE5.NCDCSURFACESITE.............................................................................................................................................................17TABLE6.METEOROLOGICALPARAMETERSUSEDINTHEGAMS.THECOLUMNNAMEISGIVENINITALICS....................................19TABLE7.CROSSVALIDATIONRESULTS................................................................................................................................................27TABLE8.ORIGINALANDMETEOROLOGICALADJUSTEDTRENDSFORGAMFITS...............................................................................30TABLE9.AQSSITENUMBERSFORTHESELECTEDBACKGROUNDSITES.............................................................................................34TABLE 10. PERCENTAGE OF OBSERVATIONS ABOVE THE CRITERIA CHOSEN TO REPRESENT “HIGH” VALUES OF TOTAL AND

    BACKGROUND(BKG)MDA8O3DURINGTHEO3SEASON(MAR.-OCT.).THECHOSENCRITERIAAREINPARENTHESESINTHEFIRSTCOLUMN..............................................................................................................................................................................47

    TABLE11.DEVIANCEEXPLAINED(%)ANDURBESCORE(UNITLESS)FORTHELOGISTICMODELSFORTOTALANDBACKGROUNDO3COMBANDTOTALELP.........................................................................................................................................................52

  • Work Order No. 582-18-81763-07 Final Report

    7

    List of Acronyms AER – Atmospheric and Environmental Research BG – Background CAMx – Comprehensive Air Quality Model with Extensions CMAQ – Community Multi-scale Air Quality Model COMB – COMBined dataset for El Paso and Cuidad Juárez CSV – Comma Separated Value CTM – Chemical Transport Model ELP – Dataset for El Paso only GAM – Generalized Additive Model GEO-CAPE – GEOstationary Coastal and Air Pollution Events GLM – Generalized Linear Model HYSPLIT – Hybrid Single Particle Lagrangian Integrated Trajectory Model IQR – Inter-Quartile Range LIDORT – Linearized Discrete ORdinate Radiative Transfer model MDA8 – maximum daily 8-hour average ozone MODIS – Moderate Resolution Imaging Spectroradiometer MT – Map Type NAAQS – National Ambient Air Quality Standards NCEP – National Centers for Environmental Prediction O3 – Ozone OE – Optimal Estimation OLS – Ordinary Least Squares OMI – Ozone Monitoring Instrument OSSE – Observing System Simulation Experiment PC1 – Principle Component 1 PCA – Principle Component Analysis PM2.5 – Particulate Matter with diameter below 2.5 microns ppbv – Parts Per Billion by Volume QAPP – Quality Assurance Project Plan RH – Relative Humidity TCEQ – Texas Commission on Environmental Quality TES – Tropospheric Emission Spectrometer URBE – Un-Biased Risk Estimator UTC – Coordinated Universal Time

  • Work Order No. 582-18-81763-07 Final Report

    8

    Executive Summary The purpose of this project was (a) to determine the effects of meteorology on trends in O3 and

    PM2.5 by developing new generalized additive models (GAM) for O3 and PM2.5 concentrations to selected meteorological variables for El Paso, Texas, (b) to estimate the regional background concentrations of O3 and PM2.5 for El Paso, and (c) to investigate the synoptic and urban-scale meteorological conditions that are associated with high concentrations of background and total O3 and PM2.5 in El Paso.

    As the formation and loss of pollutants such as O3 and PM2.5 are strongly influenced by meteorology, inter-annual trends in these pollutants represent a combination of changes due to inter-annual variability in meteorology and changes due to air quality policy actions and other economic and societal trends. Statistical techniques are thus used to account for the effect that meteorological variations have on the trends of O3 and PM2.5 so that the adjusted trends can be used to assess the effectiveness of air quality policy. A common approach to performing this “meteorological adjustment” is to use a generalized additive model (GAM) (Wood, 2006) to describe the potentially nonlinear relationship between measured urban O3 (maximum daily 8-hour average, or MDA8) or PM2.5 (daily average) concentrations and selected meteorological variables taken from an array of candidate meteorological variables (e.g., Camalier et al., 2007). While TCEQ has had such a model developed for other urban areas in Texas (Alvarado et al., 2015), no models of this type have been developed for the El Paso urban area.

    Daily surface concentrations of O3 and PM2.5 in urban areas can be considered as the sum of O3 and PM2.5 produced within the urban area (either through primary emissions of PM2.5 or through secondary chemical production of O3 and PM2.5) and a “regional background” that is transported into the urban area by the large-scale circulation. Accurate estimates of this regional background are critical to determining the potential for further reductions in O3 and PM2.5 concentrations in urban areas through control of local emissions of primary PM2.5 and the precursors of O3 and PM2.5.

    We also performed basic research into what synoptic- and urban-scale meteorological conditions are important in explaining and forecasting high concentrations of O3 and PM2.5 in the El Paso urban area. The goal was to identify necessary and/or sufficient meteorological conditions that lead to NAAQS exceedances or other high concentration events (e.g., above 90th percentile) for these pollutants. Meteorological conditions leading to both high regional background levels and high total levels of O3 and PM2.5 were identified. Output from meteorological reanalyses was used in this task, as well as the synoptic map typing method of Hegarty et al. (2007).

    The GAMs relating meteorological variables to the total MDA8 O3 for El Paso generally explain 51-52% of the deviance (i.e. variability), consistent with the results of Camalier et al. (2007). The GAMs also generally show good fits with normally-distributed residuals and little dependence of the residual variance on the predicted value. In contrast, the GAM relating meteorological variables to the total daily average PM2.5 for El Paso only explains 10% of the deviance, and generally show much poorer fits.

    After meteorological adjustment via the GAMs fit to total and background O3 and total PM2.5, no trends in pollutant metrics between 2007-2016 were observed to be significant at a 95% confidence level. The adjusted trend in total O3 for El Paso sites only was -0.07 ± 1.31 ppb/yr, while the regional background trend was 0.45 ± 0.45 ppb/yr. The adjusted trend in total PM2.5 was -0.340 ± 2.063 µg/m3/yr.

    The principal component analysis (PCA) based method of Langford et al. (2009) does not give reasonable values for background O3 in El Paso. These PCA-based background estimates are

  • Work Order No. 582-18-81763-07 Final Report

    9

    correlated with the values derived using the TCEQ method, and show similar seasonal and inter-annual variability, but give unphysically large and occasionally negative values.

    O3 and PM2.5 events in the urban area changed with afternoon mean temperature, daily average wind speed, and synoptic type. These predictors were chosen as they had been shown to be important in our GAMs and our previous analysis of the synoptic types. We used these probability models to investigate “necessary” (defined as giving a probability of a high event greater than 20%) and “sufficient” (defined as giving a probability of a high event greater than 80%) criteria for high O3 and PM2.5 events. The logistic regression models only explained 28% of the variability in the results, indicating a poor fit. The probabilities of high total and background O3 events were associated with high temperatures and low winds speeds, as expected, and that the probability of total O3 above 70 ppbv was always greater than the probability of background O3 above 55 ppbv for similar conditions in El Paso. Synoptic Type 3, with flow from the Gulf turning northward, and the stagnant conditions associated with no synoptic type, had a larger fraction of days with high total and background O3 than the other synoptic types. While the Gulf flow in Synoptic Type 3 does not directly impact El Paso, this flow is associated with low wind conditions in El Paso.

    We recommend that future work focus on: • Quantifying the impact of the relative sparsity of observations in El Paso on the robustness

    of our conclusions. • Refining the synoptic typing technique to classify more of the remaining days. • Developing GAMs to provide forecasts of air quality for El Paso. • Comparing these GAMs derived from monitor network data with similar GAMs fit to

    meteorological and chemical data from 3D Eulerian air quality models like CAMx and CMAQ to determine if these models accurately represent the dependence of O3 and PM2.5 concentrations, and the probability of high O3 and PM2.5 events, on meteorology. Differences discovered between the two sets of GAMs could point towards missing physics or incorrect parameterizations in the current Eulerian air quality models.

  • Work Order No. 582-18-81763-07 Final Report

    10

    1. Introduction 1.1 Project Objectives

    The objectives of this project were to: • Determine the effects of meteorology on trends in O3 and PM2.5 by developing new

    generalized additive models (GAM) that relate O3 and PM2.5 concentrations to selected meteorological variables for El Paso, Texas.

    • Estimate the regional background concentrations of O3 and PM2.5 for El Paso, Texas. • Investigate the synoptic and urban-scale meteorological conditions that are associated with

    high concentrations of background and total O3 and PM2.5 in El Paso. Table 1 summarizes the main tasks and deliverables for the project.

    1.2 Purpose and Background 1.2.1 Trends in O3 and PM2.5

    As the formation and loss of pollutants such as O3 and PM2.5 are strongly influenced by meteorology, inter-annual trends in these pollutants represent a combination of changes due to inter-annual variability in meteorology and changes due to air quality policy actions and other economic and societal trends. Statistical techniques are thus used to account for the effect that meteorological variations have on the trends of O3 and PM2.5 so that the adjusted trends can be used to assess the effectiveness of air quality policy. A common approach to performing this “meteorological adjustment” is to use a generalized additive model (GAM, Wood, 2006) to describe the potentially non-linear relationship between measured O3 (maximum daily 8-hour average, or MDA8) or PM2.5 (daily average) concentrations and selected meteorological variables (e.g., Camalier et al., 2007). In this project, AER derived GAMs for urban O3 and PM2.5 for El Paso following the procedures used in a previous project (WO #582-15-54118-01, Alvarado et al., 2015). To the extent possible, the variables used in the meteorological adjustments were kept similar so that the adjusted trends in different urban areas in Texas could be compared. AER used these models to account for the effect that meteorological variations have on the trends of O3 and PM2.5. 1.2.2 Regional Background Concentrations of O3 and PM2.5

    Daily surface concentrations of O3 and PM2.5 in urban areas can be considered as the sum of O3 and PM2.5 produced within the urban area (either through primary emissions of PM2.5 or through secondary chemical production of O3 and PM2.5) and a “regional background” that is transported into the urban area. Accurate estimates of this regional background are critical to determining the potential for further reductions in O3 and PM2.5 concentrations in urban areas through control of local emissions of primary PM2.5 and the precursors of O3 and PM2.5.

    In this project, AER determined daily regional background estimates of O3 for a ten-year period (2007-2016) for El Paso using the TCEQ method (i.e., the lowest value observed at defined “background” sites near the border of the area of interest, Berlin et al., 2013). Upon review of the available data, it was determined there wasn’t enough representative data over this 10-year period to estimate a regional background for PM2.5. This is discussed in detail in Section 3. AER then used the O3 background estimates to investigate the spatial and temporal trends of regional background O3. Instead, the values for the site with the best data coverage for the El Paso urban area was used to create a GAM for total PM2.5.

  • Work Order No. 582-18-81763-07 Final Report

    11

    1.2.3 Synoptic- and Urban-scale Meteorological Controls on O3 There are a variety of synoptic- and urban-scale meteorological conditions, some of which are

    important in explaining and forecasting high concentrations of O3 in the El Paso urban area. The goal of this task was to identify necessary and/or sufficient meteorological conditions that lead to NAAQS exceedances or other high concentration events (e.g., above 90th percentile) for these pollutants. Meteorological conditions leading to both high regional background levels (over 55 ppb of O3) and high total levels (over 70 ppb of O3) were identified. Output from meteorological reanalyzes were used in this task, as well as the synoptic map typing method of Hegarty et al. (2007). 1.3 Report Outline This Final Report documents the methods and pertinent accomplishments of this project, including comprehensive overviews of each task, a summary of the data collected and analyzed during this work, key findings, shortfalls, limitations and recommended future tasks. It satisfies Deliverable 5.2 of the Work Plan for Work Order No. 582-18-81763-07:

    Deliverable 5.2: Final Report delivered electronically via file transfer protocol or e-mail in Microsoft Word format and PDF format Deliverable Due Date: June 30, 2018

    This report contains three sections that describe the methods and major findings for Task 2 (Effects of Meteorology on O3 and PM2.5 Trends, Section 2), Task 3 (Estimating Background O3 and PM2.5, Section 3) and Task 4 (The Role and Importance of Synoptic or Mesoscale Meteorological Conditions in Creating High O3 and PM2.5 Days, Section 4). Section 5 discusses the Quality Assurance performed for the project, including answers to the assessment questions from the Quality Assurance Project Plan (QAPP). Section 6 summarizes the conclusions of our study, and Section 7 lists our recommendations for further research. In addition, Appendix A describes the files that are included in the final deliverable package (Deliverables 2.1, 3.1, 3.2, and 4.1).

  • Work Order No. 582-18-81763-07 Final Report

    12

    Table 1. Projected Schedule for TCEQ Work Order No. 582-18-81763-07

    Milestones Planned Date

    Task 1 - Work Plan

    1.1: TCEQ-approved Work Plan January 5, 2018

    1.2: TCEQ-approved QAPP January 5, 2018

    Task 2 - Effects of Meteorology on O3 and PM2.5 Trends 2.1: Deliver the models’ estimates of O3 MDA8 and average PM2.5

    alongside the daily observed values. In addition, at least one section of the final report shall be dedicated to a description of this task including:

    • a discussion/description of the statistical methodologies used to adjust pollutant trends in the El Paso urban area;

    • a description of the software used to analyze O3 and PM2.5 trends in the El Paso urban area;

    • a discussion/description of empirical data used in building the model;

    • a discussion/description of the meteorological variables chosen for the model and why they were chosen; and

    • a discussion of the model results that includes the analysis of possible errors.

    With Draft Report due June 1, 2018.

    With Final Report due June 30, 2018.

    Task 3 – Estimating Background O3 and PM2.5

    3.1: Deliver, as part of the draft and final reports, an analysis of the spatial and temporal trends in its estimates for regional background O3 and PM2.5. The analysis shall include a description/discussion of the method(s) chosen by the Contractor to estimate regional background for each pollutant.

    With Draft Report due June 1, 2018.

    With Final Report due June 30, 2018.

    3.2: Deliver daily estimates of regional background O3 and PM2.5 in a comma separated variable (*.csv) electronic data set.

    June 1, 2018

    Task 4 – The Role and Importance of Synoptic or Mesoscale Meteorological Conditions in Creating High O3 and PM2.5 Days

    4.1: Deliver, as part of the draft and final reports, an analysis of and description of the synoptic map types associated with high levels of total O3 and PM2.5 in the El Paso urban area. The deliverable shall also include a discussion of the meteorological conditions explaining the variability in O3 and PM2.5 levels

    With Draft Report due June 1, 2018.

    With Final Report due June 30, 2018.

    Task 5 – Draft and Final Reports

    5.1: Draft Report June 1, 2018

    5.2: Final Report June 30, 2018

  • Work Order No. 582-18-81763-07 Final Report

    13

    2 Task 2: Effects of Meteorology on O3 and PM2.5 Trends As described in the Work Plan, AER derived GAMs for O3 and PM2.5 for selected monitoring

    sites near the El Paso area. For O3, only data during the O3 season (March to October) was analyzed, but PM2.5 data for the entire year was analyzed. The O3 season was expanded beyond the May to October period used in Alvarado et al. (2015) as the mean O3 concentrations in May were higher than those in October, and extending the season to March gave a more symmetric variation of O3 concentrations across the season.

    AER fit the data to the eight meteorological parameters that were determined to give the best fit based on a previous project (Alvarado et al., 2015) and our recent work on air quality forecasting with GAMs in Texas urban areas (Pernak et al., 2017). We also ran HYSPLIT back-trajectories for the El Paso urban area, following the approach of Alvarado et al. (2015), but as the date range of interest for this project is later than that in Alvarado et al. (2015) and thus higher-resolution meteorological data is available for the whole period, we used the 12-km resolution NAM-12 meteorology to drive HYSPLIT instead of the 32-km resolution data from NARR. This should result in more accurate estimates of the path of background air impacting the El Paso area.

    One of the dangers of using GAMs to perform the meteorological adjustment of pollutant trends is the possibility of “over-fitting,” where some of the variability that is actually due to changes in air quality policy is accounted for in the GAM by the meteorological variables. AER explored the potential errors from over-fitting via cross validation. In cross validation, some of the data (the testing set) is removed before building the GAM. The remaining data (the training set) is used to derive the GAM parameters. The testing set can then be used to test the performance of the GAM in predicting “unseen” data (e.g., Starkweather et al., 2011). 2.1 Input Data and Processing 2.1.1 TCEQ Monitor Data

    The TCEQ provided AER with air quality and meteorological monitoring data covering a ten-year period (2007-2016) from the air quality monitoring network operated by the TCEQ, its grantees, or local agencies whose data is stored in the Texas Air Monitoring Information System (TAMIS) in and near El Paso, Texas. The TCEQ also provided data for several monitors in Ciudad Juárez, Mexico and the State of New Mexico for the same period. These monitoring stations are listed in Table 2. AER then used previously built Python scripts that processed the TCEQ air quality and meteorological data and calculated the average (daily, morning, afternoon, etc.) and derived quantities (e.g., deviations from 10-year monthly averages) needed for the GAMs. Following Camalier et al. (2007) and previous projects (Alvarado et al., 2015; Pernak et al., 2017), these average and derived meteorological quantities were calculated using a single surface site in the center of the urban area combined with the nearest radiosonde location available. The selected surface site is noted in Table 2 in bold.

  • Work Order No. 582-18-81763-07 Final Report

    14

    Table 2. Air Monitoring Sites Provided by the TCEQ. City/State/Country Site ID # Site Name Latitude (°) Longitude (°) El Paso, Texas, USA 481410029 Ivanhoe 31.7857687 -106.3235781 El Paso, Texas, USA 481410037 El Paso UTEP 31.7682914 -106.5012595 El Paso, Texas, USA1 481410044 El Paso Chamizal 31.7656854 -106.4552272 El Paso, Texas, USA 481410055 Ascarate Park SE 31.7467753 -106.4028059 El Paso, Texas, USA 481410057 Socorro Hueco 31.6675000 -106.2880000 El Paso, Texas, USA 481410058 Skyline Park 31.8939133 -106.4258270 Ciudad de Juárez, Mexico 800060004 Juárez Advance 31.689722 -106.459722 Ciudad de Juárez, Mexico 800060006 Juárez 20 30 Club 31.735556 -106.459722 Ciudad de Juárez, Mexico 800060007 Juárez Delphi 31.712222 -106.395278 New Mexico, USA 8 NA 31.930659 -106.631103 New Mexico, USA 17 NA 31.795940 -106.558044 New Mexico, USA 20 NA 32.041212 -106.409710 New Mexico, USA 21 NA 31.796218 -106.584434 New Mexico, USA 22 NA 31.787885 -106.683324 New Mexico, USA 23 NA 32.317593 -106.768337

    Note 1: Surface Meteorological Site Two additional python scripts (calc_GLM_all.py and calc_GLM_NCDC.py) were used to

    calculate the potential meteorological predictors. The TCEQ monitor data, Integrated Global Radiosonde Archive data (IGRA, Section 2.1.2) and the integrated surface hourly (ISH) database of the National Climatic Data Center (NCDC, Section 2.1.32.1.3), along with the previously calculated MDA8 and PM2.5 maximum and minimum concentrations and parameter from the HYSPLIT back trajectories (Section 2.1.52.1.5), were merged by a final script (merge_param_all_Camalier.py). This script then outputs the final CSV file used in the GAMs.

    2.1.1.1 MDA8 O3 We developed a python script (EP_o3.py) that calculated the MDA8 O3 (ppbv) for all of the

    monitoring sites. The MDA8 was calculated as follows: 1. A running 8-hour average was calculated for each hour, averaged over that hour and

    the following seven hours. At least 6 hours in this 8-hour range had to have valid O3 measurements for the 8-hour average to be considered valid.

    2. The largest of each of the calculated 8-hour averages in a day was selected as the MDA8 for that day.

    3. The maximum and minimum of the valid MDA8 O3 values for all sites in the urban area were determined.

    4. The minimum of the valid MDA8 O3 values for the selected background sites were determined as the daily background concentration for that area.

  • Work Order No. 582-18-81763-07 Final Report

    15

    Since El Paso’s urban area borders Ciudad Juárez (further referred to as Juárez), total MDA8 O3 was calculated for two groups of monitors. The first group includes all monitoring sites used in this study referred to as “COMB”, while the second group includes the Texas sites only, referred to as “ELP”. Figure 1 compares the annual averages for the ozone season (March-October) for both groups. As seen in the figure below, the annual average totals for MDA8 O3 for ELP are always less than that of COMB.

    Figure 1. Annual average MDA8 O3 for the combined dataset of all monitoring sites (COMB) and just the sites within El Paso, Texas (ELP).

    2.1.1.2 PM2.5 A similar script (EP_pm25.py) was used to calculate daily average PM2.5 values from the

    available hourly data. This average was calculated as follows: 1. If more than one PM2.5 instrument was active for a site, the reported hourly values were

    averaged. 2. A daily average PM2.5 value was then calculated for each site. At least 18 hours of that

    day had to have valid PM2.5 measurements for the daily average to be considered valid. 3. The maximum and minimum of the valid PM2.5 values for all sites in the urban area

    were determined. 4. The minimum of the valid PM2.5 values for the selected background sites were

    determined as the daily background concentration for that area. We initially calculated total daily average PM2.5 values for all sites provided, and in the GAMs

    we noticed the values didn’t behave similarly to the PM2.5 of nearby cities studied in a previous project. Further review of the monitor data was done, and we found that the data coverage was spotty for many of the sites over the period of study. Table 3 below shows the number of days of

    49

    51

    53

    55

    57

    59

    61

    2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

    MDA8AverageforM

    ar-Oct(ppb)

    Year

    ELP COMB

  • Work Order No. 582-18-81763-07 Final Report

    16

    valid PM2.5 values for each of the sites (New Mexico and Texas) over the period of study. No PM2.5 sites for Juárez were provided. Only two sites (New Mexico Site 21 and Texas Site 481410037) have significant data coverage to do a trend analysis or properly fit a GAM. The Texas Site 481410037 (University of Texas at El Paso, UTEP) was used in for further analysis. Due to the lack of sites and sufficient PM2.5 data, other analyses of the background PM2.5 concentrations were not possible.

    Table 3. PM2.5 Data Coverage by Site for the Period of Study

    Count of Hourly PM2.5 Data for Sites wrt Year NM Sites TX Sites Year 16 17 21 22 481410037 481410044 481410053 481410055 481410057 2007 8692 8706 8604 8518 8713 8719 2008 8651 8721 8756 8706 8541 8588 2009 8694 8258 8683 8715 8516 8393 2010 8680 6953 8437 8669 8446 11 8627 598 2011 8682 8229 8695 8052 7576 8458 8467 2012 8554 8277 8682 7205 6704 7841 7437 564 2013 7986 8514 8698 8550 8208 7813 7908 2014 6541 8317 7261 8666 6246 8507 8632 2015 8649 8694 7702 8630 8589 2016 8237 8707 8086 6975 8527

    2.1.2 IGRA Radiosonde Data

    The Integrated Global Radiosonde Archive (IGRA Version 2) provided upper atmosphere data used to derive the meteorological predictors for the GAMs. These data can be downloaded at ftp://ftp.ncdc.noaa.gov/pub/data/igra. The relevant measurements include the geopotential height, temperature and dewpoint depression at several altitudes with - 99999 values as missing. Table 4 describes the site selected for this case based on proximity to the center of El Paso that had continuous data for the 2007-2016 period. Table 4. IGRA Site

    ID Station Name Lat. (o) Lon. (o) 72364 Santa Teresa 31.8728 -106.6981

    2.1.3 NCDC Integrated Surface Hourly Data

    We have also added data from the integrated surface hourly (ISH) database of the National Climatic Data Center (NCDC) to our dataset. We used the NCDC data to get estimates of surface pressure and relative humidity, as this data was not generally available in the TCEQ dataset. The NCDC site used is described in Table 5 below. This site was selected because it is the closest site to the center of El Paso and had continuous data for the 2007-2016 period.

  • Work Order No. 582-18-81763-07 Final Report

    17

    Table 5. NCDC Surface Site

    USAF-WBAN_ID Station Name Lat. (o) Lon. (o) 722700 23044 EL PASO INTERNATIONAL AIRPORT 31.811 -106.376

    2.1.4 NAM-12 Meteorological Data

    The higher spatial resolution North American Mesoscale Forecast System (NAM) 12-km data was used in this project instead of the North American Reanalysis (NARR) meteorological data used in Alvarado et al. (2015), as the NAM-12 data is available for the entire period of interest here on a 6 hourly 12-km grid. The NAM is one of the primary vehicles by which NCEP's Environmental Modeling Center provides mesoscale guidance to public and private sector meteorologists. It is prepared using the Weather Research and Forecasting (WRF) model initialized with a 6-h Data Assimilation (DA) cycle with hourly analysis updates. The NAM data can be downloaded from NOAA’s public server at https://nomads.ncdc.noaa.gov/data/. 2.1.5 HYSPLIT Back Trajectories

    We ran 24-hour HYSPLIT back-trajectories for the 2007-2016 period. These back-trajectories were calculated using the 12 km horizontal resolution NAM data, as these data were available in a form suitable to drive HYSPLIT for our entire study period (2007-2016), as opposed to the NARR data, which were only available for 1979-2014. As in Camalier et al. (2007), these back-trajectories are calculated assuming an initial height of 300 m above ground level (AGL) and are started at noon local solar time. The starting point for the back-trajectories is the selected surface meteorological site given in Table 5 above. The HYSPLIT model (Draxler and Hess, 1997, 1998) is available for download from the HYSPLIT website (http://ready.arl.noaa.gov/HYSPLIT.php). The performance of HYSPLIT driven with NAM meteorological fields has been evaluated with tracer release studies (e.g., Hegarty et al., 2013).

    The endpoints of the back-trajectories were used to calculate the 24-hour transport direction and distance for each urban area for the 2007-2016 period. This was done using the R functions bearing and distMeeus from the geosphere package (see the script ./HYSPLIT/calc_trajec.R, described in Section A.2.3). The function bearing gets the initial bearing (direction; azimuth) to go from point 1 to point 2 following the shortest path (a Great Circle). The function distMeeus calculates the shortest distance between two points (i.e., the ’great-circle-distance’ or ’as the crow flies’) using the WGS84 ellipsoid.

    The HYSPLIT back-trajectories used in the model development appear reasonable and are generally consistent with the surface wind speed and direction measured near the center of the area. The HYSPLIT back-trajectory distance is generally correlated with the urban area average surface wind speed with a linear correlation coefficient (R) of 0.46. The frequency of the daily average wind direction is 240-270o (southwesterly to westerly) and the HYSPLIT back-trajectory bearings peak around 210-240o (southwesterly). However, the HYSPLIT back-trajectory bearings also show a secondary maximum at 0o (north) not seen in the daily average wind directions. 2.2 Generalized Additive Model

    The easiest way to understand the GAM approach is to contrast it with two related, but simpler, approaches: ordinary linear models and generalized linear models. In an ordinary linear model (e.g., Wood, 2006, p. 12), the model equation is:

  • Work Order No. 582-18-81763-07 Final Report

    18

    𝛍 = 𝐗𝛃𝐲~𝑁(𝝁, 𝑰𝒏𝜎/) where µ is a vector of the expected values of the observation vector, y, (both of dimension Nobs), which is assumed to be normally distributed around the expected values with a constant variance of σ2. X is a matrix of predictor variables (dimension Nobs by Npreds), and β is the (initially unknown) vector of best-fit coefficients for the predictor variables. Note that this functional form is not as limited as it first appears. For example, known non-linear functions of the predictor variables (e.g., 𝑥2/, sin

    678

    69:) can be used as new predictor variables, and the observation vector y can be similarly

    transformed to make it normally distributed (e.g., taking the logarithm of a log-normally distributed observation).

    However, ordinary linear models have two inherent limitations. The first is the requirement that the observation be distributed according to a normal distribution. This rules out the use of ordinary linear models to predict observations that follow other distributions, such as when you wish to predict the probability that the result of an experiment will be true or false based on a set of predictors (e.g., logistic regression), and thus your observations are expected to follow a binomial distribution. Generalized linear models (GLM) (Wood, 2006, p. 59) relax this normality requirement so that distributions of any exponential family (Poison, Binomial, Gamma, Normal) can be used, as well as a set of “link” functions – smooth, monotonic functions of the expected value vector µ.

    The second limitation of ordinary (and generalized) linear models is that they require that the functional dependence of the observation on the predictor variables be specified ahead of time, with only the linear coefficients β of those functions allowed to vary. This makes these approaches less useful where the functional form of the response is not known, or where it might be highly complex. In this case, a generalized additive model can be used (Wood, 2006, p. 121). The response of each predictor variable is expected to be a non-linear but smooth function constructed as a linear sum of group of simpler basis functions of the predictor. By fitting the coefficients of these basis functions, one can estimate the previously unknown smooth function of the predictor. Cubic splines are generally used as the basis functions, as this ensures the resulting smooth function is continuous up to the second derivative. In our procedure, we fit the total MDA8 O3 value and the maximum 24-hour average PM2.5 value for each urban area using the GAM function in the mgcv package (Wood, 2006) in R (R Core Team, 2015). The GAM can be written as follows:

    𝑔(𝜇2) = 𝛽> + 𝑓AB𝑥2,AC+𝑓/B𝑥2,/C + ⋯𝑓EB𝑥2,EC + 𝑓F(𝐷2) +𝑊I + 𝑌K where i is the ith day’s observation,𝑔(𝜇2) is the “link” function (here, a log link is used), 𝑥2,L are the n meteorological predictors fit, with the corresponding 𝑓LB𝑥2,LC being a (initially unknown) smooth function of 𝑥2,L made from a cubic-spline basis set. Following Camalier et al. (2007), three non-meteorological predictors are also included: a smooth function 𝑓F(𝐷2) of the Julian day of the year (Di); a factor for the day of the week 𝑊I and a factor for the year 𝑌K. As we are only fitting O3 data during the O3 season (March-October), 𝑓F(𝐷2) is built with a non-periodic cubic spline basis for O3, but for PM2.5, a periodic cubic spline basis is used. To reduce the possibility of over-fitting the data, we set the “gamma” parameter to 1.4 for these fits, as recommended by Wood (2006).

  • Work Order No. 582-18-81763-07 Final Report

    19

    2.2.1 GAMs Description In a previous project (Alvarado et al., 2015) AER described three different GAMs that related

    meteorological variables to measured MDA8 O3 and PM2.5. Starting with the meteorological parameters suggested by Camalier et al. (2007) and comparing results using different meteorological parameters, we found some were more significant than others, and selected the variables that were highly significant for most of the Texas urban areas studied as our common set of predictor variables (Alvarado et al., 2015). Further work on using GAMs to forecast O3 in Texas urban areas found that using the water vapor density in g/m3 as a predictor gave better performance than using dew point or relative humidity (Pernak et al., 2016, 2017). Thus, for El Paso, we used the predictors identified by Alvarado et al. (2015) but with the humidity variable replaced with the water vapor density (Table 6). This was done to keep the results of the meteorological adjustment consistent for the different Texas urban areas to allow comparisons of trends between the areas. The previous project used the difference between morning temperature at 925mb and the surface to estimate the impact of atmospheric stability. However, the surface in El Paso is commonly at a lower pressure, so here we used the difference in the 700mb temperature instead to represent the lower atmospheric stability more accurately.

    GAMs were developed for the total MDA8 O3 for all sites (COMB), background MDA8 O3 for all sites, and for total MDA8 O3 for El Paso, Texas sites only (ELP), as discussed in Section 2.1.1. A total PM2.5 GAM was only developed for one site (UTEP), as discussed in Section 2.1.1. Table 6. Meteorological parameters used in the GAMs. The column name is given in italics.

    Afternoon mean temperature (oC, afternoon_mean_T, 1-4 PM CST) Diurnal temperature change (oC, diurnal_T) Daily average wind speed (m/s, daily_ws) Daily average wind direction (degrees clockwise from North, daily_wd) Daily average water vapor density (g/m3, SWVP) Morning surface temperature difference (1200 UTC) (temperature at 700 mb–temperature at surface at 1200 UTC) (oC, T_dif_700mb) Transport direction (degrees clockwise from North, HYSPLIT_Bearing) Transport distance (m, HYSPLIT_dist)

    2.2.2 MDA8 GAM Results

    2.2.2.1 Total MDA8 O3 for COMB Figure 2 shows the smooth functions from the GAM fit of the natural logarithm of the total

    MDA8 O3 COMB values to the meteorological predictors in Table 6. 95% confidence intervals are shown in red. The day of year (doy) function is also shown. This model explains 52% of the deviance of the MDA8 O3 values. This is lower than the Camalier et al. (2007) results, which showed the predictive power of their models (measured by the R2 statistic) to be between 0.56 and 0.80 for the cities in that study. In this case, all of the eight meteorological predictors are statistically significant at the 𝛼 =0.001 level. As expected, the model fit shows O3 generally increasing with daily maximum temperature, decreasing with SWVP, decreasing with wind speed, and increasing with vertical stability (positive values of T_diff_700mb). The day-of-year function

  • Work Order No. 582-18-81763-07 Final Report

    20

    may reflect the fact that the mean mixing height increases in the summer, leading to a decrease in MDA8 O3 in the middle of the ozone season.

    Examining the functional fits of the GAMs shows that: • O3 only slightly increases with the afternoon mean temperature • O3 slightly decreases with water vapor density, as expected (SWVP). • O3 increases strongly as a function of the diurnal temperature difference, which may

    account for the relatively weak dependence on the afternoon mean temperature. • O3 increases with decreasing wind speed for wind speeds of 4 m/s or less, but shows

    an unphysical increase at higher wind speeds (~8 m/s). • The dependence on wind speed and HYSPLIT distance is similar, with O3 slightly

    decreasing with HYSPLIT back-trajectory distance up to approximately 1000 km (and with wind speed up to 4 m/s), at which point the functional forms become highly uncertain due to the low number of points, but the mean fit increases, which does not seem physically realistic.

    • The day-of-year function shows a maximum at approximately 200 Julian days (July). • The O3 has a slight decrease as stability increases (T_diff_700mb). • O3 has a peak with a HYSPLIT_Bearing of 0°, suggesting northerly winds (i.e. wind

    from the US) are associated with slightly higher O3. The standard GAM evaluation plots (made with the gam.check function in R) for this case

    are shown in Figure 3. These plots indicate a good fit, as the model residuals are roughly normally distributed and show no trend versus predicted value. The variance of the residuals is lower for low values of the predictor, but this reflects the fact that the measured MDA8 O3 values cannot go below 0.

  • Work Order No. 582-18-81763-07 Final Report

    21

    Figure 2. Smooth functions for the total MDA8 O3 COMB GAM fit. The y-axis scale is the scale of the “linear predictor”, i.e. the deviation of the natural logarithm of the MDA8 O3 in ppbv from its mean value.

  • Work Order No. 582-18-81763-07 Final Report

    22

    Figure 3. GAM evaluation plots for total MDA8 O3 COMB

    2.2.2.2 Total MDA8 O3 for ELP Figure 4 shows the smooth functions from the GAM fit of the natural logarithm of the total

    MDA8 O3 ELP values to the meteorological predictors in Table 6. 95% confidence intervals are shown in red. The day of year (doy) function is also shown. This model explains 51% of the deviance of the MDA8 O3 values. This is lower than the Camalier et al. (2007) results, which showed the predictive power of their models (measured by the R2 statistic) to be between 0.56 and 0.80 for the cities in that study. In this case, all of the eight meteorological predictors are statistically significant at the 𝛼 =0.001 level.

    The functional fits for ELP are very similar to those found for the combined (COMB) dataset. The standard GAM evaluation plots (made with the gam.check function in R) for this case are

  • Work Order No. 582-18-81763-07 Final Report

    23

    shown in Figure 5. These plots indicate a good fit, as the model residuals are roughly normally distributed and show no trend versus predicted value. The variance of the residuals is lower for low values of the predictor, but this reflects the fact that the measured MDA8 O3 values cannot go below 0.

    Figure 4. Smooth functions for the total MDA8 O3 ELP GAM fit. The y-axis scale is the scale of the “linear predictor”, i.e. the deviation of the natural logarithm of the MDA8 O3 in ppbv from its mean value.

  • Work Order No. 582-18-81763-07 Final Report

    24

    Figure 5. GAM evaluation plots for total MDA8 O3 ELP 2.2.3 PM2.5 GAMs

    Due to the data processing findings discussed in Section 2.1.1.1, we fit a GAM for total daily average PM2.5 for UTEP only, rather than attempting to have a “background” and “total” value. The UTEP GAM for PM2.5 had an R2 value of 0.095, which is not a very good fit. The smooth functional fits are shown in Figure 6, but as the plots in Figure 7 also show a very poor fit (residuals do not follow a normal distribution), we do not think these fits are providing much accurate information on the variability of PM2.5 under different meteorological conditions.

  • Work Order No. 582-18-81763-07 Final Report

    25

    Figure 6. Smooth functions for the total PM2.5 UTEP GAM fit.

  • Work Order No. 582-18-81763-07 Final Report

    26

    Figure 7. GAM evaluation plots for total PM2.5 UTEP 2.2.4 Cross Validation Analysis

    In order to test for over-fitting in our GAMs, as well as to test the robustness of our results for the functional relationships between the meteorological predictors and O3, we performed a 10-fold cross-validation for each GAM. We used the “CVgam” function in the R package.

    Table 7 shows the results of the cross validation for each GAM. The “GAMscale” is the mean of the squares of the errors of the original GAM fits. The “CV-mse-GAM” is the mean of the squares of the errors calculated for the 10% of data not included in the fit for each of the 10 cross-validation GAMs. While it is expected that the CV-mse-GAM will be larger than the GAMscale, a large difference between these values would suggest that the GAMs are over-fitting the data, as the performance is much poorer on the data not included in the fit during the cross-validation.

  • Work Order No. 582-18-81763-07 Final Report

    27

    However, for our fits both values are very close to each other, suggesting that our models are not over-fitting.

    Table 7. Cross Validation Results

    GAM GAMscale CV-mse-GAM Total MDA8 O3 COMB 52.30113 53.60897 Bkg MDA8 O3 COMB 54.74721 56.29425 Total MDA8 O3 ELP 55.68996 57.10085 PM2.5 UTEP 17.20852 17.5357

    2.3 GAMs for Background O3

    We also used the same approach used to derive the GAMs described in Section 2.2.1 to fit GAMs for the background MDA8 O3.

    Figure 8 shows the smooth functions from the GAM fit of the natural logarithm of the background MDA8 O3 values to the meteorological predictors in Table 6. 95% confidence intervals are shown in red. The day of year (doy) function is also shown. This model explains 51% of the deviance of the MDA8 O3 values. This is lower than the Camalier et al. (2007) results, which showed the predictive power of their models (measured by the R2 statistic) to be between 0.56 and 0.80 for the cities in that study. In this case, all of the eight meteorological predictors are statistically significant at the 𝛼 =0.001 level. The model fit shows similar functional forms as the total MDA8 O3 fits (Figure 8), but the residuals deviate more from a normal distribution in the lower tail (Figure 9) and the functional fits have greater uncertainty (Figure 8, red bands).

    The standard GAM evaluation plots (made with the gam.check function in R) for this case are shown in Figure 9. These plots indicate a good fit, as the model residuals are roughly normally distributed and show no trend versus predicted value. The variance of the residuals is lower for low values of the predictor, but this reflects the fact that the measured MDA8 O3 values cannot go below 0.

  • Work Order No. 582-18-81763-07 Final Report

    28

    Figure 8. Smooth functions for the background MDA8 O3 GAM fit. The y-axis scale is the scale of the “linear predictor”, i.e. the deviation of the natural logarithm of the MDA8 O3 in ppbv from its mean value.

  • Work Order No. 582-18-81763-07 Final Report

    29

    Figure 9. GAM evaluation plots for background MDA8 O3 2.4 Meteorologically Adjusted Trends of O3 and PM2.5

    We used the GAMs described in Sections 2.2.2, 2.2.3, and 2.3 to determine the meteorologically adjusted trends in total and background MDA8 O3 for COMB and total MDA8 O3 for ELP as well as total PM2.5 for UTEP. In this procedure, we use the Yk terms from the GAM equation in Section 2.1.1 to determine the relative difference between the annual averages after meteorology has been taken into account. Our equation for the annual averages is thus

    𝑔(𝜇K) = 𝛽> + 𝑌K + 𝑐>

  • Work Order No. 582-18-81763-07 Final Report

    30

    where k is the kth year’s average and co is a constant. The constant co is needed because of how R treats factor variables. In order to have an identifiable model, one of the factor levels, in this case the year 2007, must be set to have a value of Yk = 0. However, the year 2007 is frequently the year with the largest annual average O3 values in the original data set. This results in Yk values that are predominantly less than 0, leading to meteorologically adjusted annual averages that do not have the same 10-year average as the original data set. To avoid this issue, we add a constant co to the meteorologically adjusted annual averages so that the 10-year averages in the original and meteorologically adjusted trend data are identical. The value of the meteorologically adjusted linear trends over 2007-2016 is relatively insensitive to the value of co.

    The original and meteorologically adjusted annual averages are shown in Figure 1010 below for total and background MDA8 O3 for COMB. Figure 11 shows same for total MDA8 O3 for ELP, and Figure 12 for total PM2.5 for UTEP. The trend estimates, determined by ordinary least squares (OLS) linear regression of the annual averages, are summarized and provided in the figure below.

    No statistically significant trends with time are observed El Paso for 2007-2016 either before or after meteorological adjustment. Using the GAM, the background MDA8 original trend and meteorologically adjusted linear trends are provided in Table 8. Table 8. Original and meteorological adjusted trends for GAM fits

    GAM fit Original Met Adjusted Total O3 COMB 0.030 ± 1.034 ppb/yr 0.250 ± 1.019 ppb/yr Background O3 0.430 ± 0.547 ppb/yr 0.450 ± 0.449 ppb/yr Total O3 ELP -0.310 ± 1.120 ppb/yr -0.070 ± 1.306 ppb/yr Total PM2.5 ELP -0.330 ± 2.101 µg/m3/yr -0.340 ± 2.063 µg/m3/yr

  • Work Order No. 582-18-81763-07 Final Report

    31

    Figure 10. Original (dashed lines) and meteorologically adjusted (solid lines) annual averages for total and background O3 for COMB. Equations for the OLS linear regressions are shown on the plot as well.

  • Work Order No. 582-18-81763-07 Final Report

    32

    Figure 11. Original (dashed lines) and meteorologically adjusted (solid lines) annual averages for total O3 for ELP. Equations for the OLS linear regressions are shown on the plot as well.

  • Work Order No. 582-18-81763-07 Final Report

    33

    Figure 12. Original (dashed lines) and meteorologically adjusted (solid lines) annual averages for total PM2.5 for UTEP. Equations for the OLS linear regressions are shown on the plot as well.

  • Work Order No. 582-18-81763-07 Final Report

    34

    3 Task 3: Background O3 3.1 Daily Estimates of Regional Background O3 (TCEQ Method)

    As described in the Work Plan, our approach to calculating regional background concentrations follows the TCEQ method described in Berlin et al. (2013). This method requires: the selection of background sites; the calculation of the MDA8 for O3 at each site; estimating a preliminary background value as the lowest of the valid values for the background sites; and then further investigations to ensure the values are appropriate background estimates. These steps are described in detail below.

    The initial data for our analysis was provided by Erik Gribbin of the TCEQ, which consisted of hourly-average measurements of O3 and PM2.5 at several sampling sites surrounding the El Paso area. These background sites were chosen based on their distance from the approximate center point of the study. The data from Texas, New Mexico, and Mexico were merged into a single dataset for O3. We then processed the data using the previously developed scripts from Alvarado et al. (2015). Background MDA8 O3 values were calculated for this combined dataset (COMB) only. Figure 13 shows the different monitoring sites in reference to the approximate center of El Paso, including distance rings out to 20 km. Any site outside approximately 12 km from the center was considered a “background site”. Table 2 lists all of the sites provided by the TCEQ and Table 9 lists the sites denoted as “background”.

    Table 9. AQS site numbers for the selected background sites.

    Total # of Sites

    # of Background

    Sites

    AQS Site Numbers of Background Sites

    14 8 '481410029','481410057','481410058','800060004','8','20','21','22'

  • Work Order No. 582-18-81763-07 Final Report

    35

    Figure 13. Monitoring sites in the New Mexico, Juárez, and El Paso area provided by the TCEQ.

    3.2 Temporal Trends of Background O3

  • Work Order No. 582-18-81763-07 Final Report

    36

    Figure 14. Box-and-whisker plots seasonal (top) and annual (bottom) trends in the background MDA8 O3 for COMB estimated using the TCEQ method. The line is the mean. Box edges show the 25th and 75th percentiles (Inter-Quartile Range, or IQR), the whiskers show the data range up to ±1.5*IQR and the circles show data points outside that range.

    14 shows the seasonal (top) and annual (bottom) trends in the background MDA8 O3. Background MDA8 O3 appears to peak in May and in August. Annual background MDA8 O3 has lower values in years 2011 and 2012.

  • Work Order No. 582-18-81763-07 Final Report

    37

    Figure 14. Box-and-whisker plots seasonal (top) and annual (bottom) trends in the background MDA8 O3 for COMB estimated using the TCEQ method. The line is the mean. Box edges show the 25th and 75th percentiles (Inter-Quartile Range, or IQR), the whiskers show the data range up to ±1.5*IQR and the circles show data points outside that range.

    3.3 Alternative Methods to Determine Regional Background O3 3.3.1 Determining Background O3 with PCA

    In Langford et al. (2009), principal component analysis (PCA) was applied to the large dataset of MDA8 values at 30 sites across HGB for a 2.5-month timespan (August to October 2006). The PCA approach attempts to isolate the large day-to-day regional changes in the MDA8 O3, and Langford et al. (2009) were able to associate regional meteorological patterns with the patterns of covariance as determined by the PCA using associated meteorological data provided by the National Centers for Environmental Prediction (NCEP) reanalysis. They found that nearly 84% of the variance in the MDA8 O3 near HGB was described by the first Principal Component (PC1) and could be attributed to the regional background ozone concentration. PC2 and PC3 described

  • Work Order No. 582-18-81763-07 Final Report

    38

    6% and 3.5% of the variance in MDA8 O3, and were attributed to local photochemistry and transport, respectively. After determining that PC1 described that large majority of variance and represented regional background ozone, the following equation was applied to calculate the hourly background ozone for HGB, 𝑂QRS(𝑡),

    𝑂QRS(𝑡) = 𝑂QUUUU + ℴ(𝑂Q)𝔣A𝛼A(𝑡) where 𝑂QUUUU is the mean of all MDA8 ozone values for the entire time period, ℴ(𝑂Q) is the standard deviation of that mean, 𝔣A is the variance contribution of PC1 (0.84 in Langford et al., 2009) and 𝛼A(𝑡)is the score (or amplitude) of PC1 at each hour.

    Performing a PCA of our MDA8 O3 data for COMB required that we first create a full, interpolated dataset without any missing values. We calculated MDA8 values following the steps described in Section 2.1.1. We then filtered out any sites where less than 75% of data points were valid for the 10-year period during the ozone season (March to October, 2007-2016). After filtering the data for proper processing, it turns out all data for the year 2012 was thrown out. Next, we spatially interpolated the dataset to replace any missing MDA8 O3 values. If the data point for that day was located outside of the cluster of sites with valid data points, we applied a nearest-neighbor interpolation, and if that point was located within the cluster of sites with valid data points, we applied a cubic interpolation in latitude and longitude.

    Once this complete dataset was established, we applied the PCA using the eigenvector- eigenvalue calculation in R to the entire 10-year time span for COMB, which resulted in a similar variance contribution to that of Langford et al. (2009), where PC1, PC2 and PC3 had variance contributions of 83%, 6%, and 3%, respectively. However, when assuming PC1 represented the regional background contribution and applying the above equation, the values were correlated to our original background estimates from the TCEQ method (R2 = 0.53), but produced a much larger and unrealistic range of background concentrations (-35 to 115 ppbv), as seen in Figure 15.

  • Work Order No. 582-18-81763-07 Final Report

    39

    Figure 15. PCA-derived background ozone applied over the entire ozone season dataset (x-axis, 10 years and all sites) compared to our original TCEQ method of determining background ozone (y-axis).

  • Work Order No. 582-18-81763-07 Final Report

    40

    Figure 16. Monthly (top) and yearly (bottom) background ozone (ppbv) as derived using the PCA method.

    3.3.1.1 PCA-derived Background O3 Temporal and Spatial Analysis Figure 16 shows the monthly and yearly background MDA8 O3 box plots for COMB as

    determined by the PCA analysis described above. These results compare well with the temporal analysis discussed above in Section 3.1 of MDA8 background ozone as determined using the TCEQ method, except for the PCA analysis shows an increase in 2011 when the TCEQ method shows a decrease.

    3.3.1.2 Comparing Trends in Background O3 from the PCA and TCEQ methods After completing background estimates using the PCA and TCEQ methods, we performed a comparison of the yearly average concentrations for both methods. The PCA-method background estimates are larger than those of the TCEQ method, but the annual trend is smaller and effectively

  • Work Order No. 582-18-81763-07 Final Report

    41

    zero (-0.010 ± 0.639 ppb/yr). However, as discussed above we believe the TCEQ-based method is more reliable for calculating regional background in El Paso.

  • Work Order No. 582-18-81763-07 Final Report

    42

    4 Task 4: The Role and Importance of Synoptic or Mesoscale Meteorological Conditions in Creating High O3 and PM2.5 Days

    4.1 Synoptic Map Type Analysis 4.1.1 Technical Method and Results

    The five most common synoptic map types during this period were identified using the methods of Hegarty et al. (2007) and Alvarado et al. (2015), which enabled the classification of 69.4% of the days during the 10-year study period (and 62.3% of the days in the ten March-October O3 seasons) as being under the influence of one of those types. The five types, shown in Figure 17, are described below.

    1. MT (Map Type) 1 occurred on 688 days (494 during the O3 seasons) and featured an anticyclone over the eastern Gulf of Mexico with a trough in the Central Plains extending into northwest Texas. These features produced a general south-southwest flow over much of Texas.

    2. MT 2 occurred on 474 days (166 during the O3 seasons) and featured a cyclonic circulation centered over the Midwest with a ridge extending southeast to northwest over Mexico and extreme southern Texas. This pattern likely produces a light NW flow over much of Texas.

    3. MT 3 occurred on 486 days (463 during the O3 seasons) and featured a large anticyclone centered over the eastern Gulf of Mexico states extending in to the Gulf and westward in to eastern Texas. A broad trough is aligned along the eastern Rocky Mountains. This pattern produces moderate to strong southeasterly flow over much of eastern Texas but low wind conditions over El Paso.

    4. MT 4 occurred on 574 days (236 during the O3 seasons) and featured a broad trough in the Central Plains with an anticyclone centered over the western Caribbean of southern Florida and extending westward in to eastern Texas. This pattern produces a general southwest flow over Texas.

    5. MT 5 occurred on 318 days (177 during the O3 seasons) and features an anticyclone over the western Gulf of Mexico. This pattern features a south to southwestern flow over much of Texas.

    Days that do not fit any of the five types are indicated as type “-999”. This generally occurred under conditions of weak synoptic forcing, which is generally consistent with stagnant conditions in the area. Figure 18 shows a chart of the relative frequency of each synoptic type with month. We can see that the frequency of MT 3 (Gulf flow) shows a strong seasonal cycle, peaking in July at ~26% of days from near zero values in the winter. MT 2 and MT 4 shows an opposite seasonal cycle, being much more frequent in winter than in summer. Unclassified days (MT -999) with little synoptic forcing are most frequent in August and September.

    We then determined the mean, standard deviation, and quartiles of both total and background MDA8 O3 for each synoptic type. Figure 19 shows box plots of the O3 distributions for each synoptic type for COMB and Figure 20 shows the same for ELP.

    In order to determine if there was a relationship between synoptic type and the likelihood of high total or background MDA8 O3 values, we first needed a quantitative definition of a “high” value of each metric. We derived these metrics by examining the 90th percentile of the distribution of each. For background MDA8 O3 the 90th percentile is 51.6 ppb and for total MDA8 COMB is 69.1 ppb. We then chose criteria that were roughly in line with these 90th percentiles: 70 ppbv for total MDA8 O3, 55 ppbv for background. The percentage of days below these criteria for each urban area (i.e., the percentile corresponding to the chosen criteria) for

  • Work Order No. 582-18-81763-07 Final Report

    43

    total COMB and background MDA8 O3 is 94% and 97% respectively. Table 10 shows the percent of criteria denoted as “high” for each MT.

    Figure 17. Synoptic maps types determined from 850 mbar geopotential height fields from the 12-km resolution NAM-12 meteorology to drive HYSPLIT instead of the 32-km resolution data from NARR using the method of Hegarty et al. (2007).

  • Work Order No. 582-18-81763-07 Final Report

    44

    Figure 18. Relative frequency of synoptic map types in each month

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

    MT5

    MT4

    MT3

    MT2

    MT1

    MT-999

  • Work Order No. 582-18-81763-07 Final Report

    45

    Figure 19. Box and whisker plots of the distributions of background MDA8 O3 COMB (ppbv, top), total MDA8 O3 COMB (ppbv, bottom). The boundaries of the boxes are the 25th and 75th percentiles, and the whiskers cover the range of the data or all values within 1.5 times of the interquartile range (IQR) of the box, whichever is smaller.

  • Work Order No. 582-18-81763-07 Final Report

    46

    Figure 20. Box and whisker plots of the distributions of total MDA8 O3 ELP. The boundaries of the boxes are the 25th and 75th percentiles, and the whiskers cover the range of the data or all values within 1.5 t

  • Work Order No. 582-18-81763-07 Final Report

    47

    Table 10. Percentage of observations above the criteria chosen to represent “high” values of total and background (bkg) MDA8 O3 during the O3 season (Mar.-Oct.). The chosen criteria are in parentheses in the first column.

    Pollutant Metric Synoptic Type Percentage Above Criteria (%)

    Total MDA8 O3 COMB (70 ppbv)

    -999 10.94 1 5.27 2 4.82 3 12.96 4 3.39 5 5.08

    Total MDA8 O3 ELP (70 ppbv)

    -999 5.8 1 3.85 2 3.01 3 7.99 4 1.69 5 3.39

    Bkg MDA8 O3 (55 ppbv)

    -999 5.8 1 4.67 2 3.01 3 3.89 4 5.08 5 3.95

    4.1.2 Discussion

    With these criteria, there are no synoptic types where high values never happen, and there are no synoptic types where high values always happen. Thus, the synoptic type by itself is neither necessary nor sufficient to determine if a given day will have elevated levels of O3.

    In addition, the distribution of background and total O3 (both COMB and ELP) is relatively constant with synoptic type as shown in Figures 19 and 20, suggesting that the synoptic type has relatively little influence on the background and total O3 concentrations in El Paso. However, Type 3 does have a higher percentage of days that exceed the total O3 threshold of 70 ppbv (13% for COMB, 8% for ELP), suggesting that this type is associated with more frequent extreme values of total O3. The percentage of days with background O3 above 55 ppbv seem less influenced by synoptic type, and are highest for the generally stagnant conditions that are not classified to a given type (MT -999).

  • Work Order No. 582-18-81763-07 Final Report

    48

    4.2 Urban-Scale Meteorological Predictors of O3 4.2.1 Logistic Regression Approach

    One goal of this project (Deliverable 4.2) is to determine if there are necessary and/or sufficient synoptic or urban-scale meteorological criteria for events of “high” total and background O3 (here we again define “high” using the criteria in Section 4.1.1). There are likely no conditions where the probability of high O3 is negligibly close to zero or one. Thus, in order to make our investigation of “necessary and/or sufficient” conditions for high O3 tractable, we adopt the following probability definitions, recognizing that they are arbitrary choices:

    • “Necessary” will refer to conditions that must be true for the probability of high O3 (as defined in Section 4.1.1) to be greater than 20%.

    • “More likely than not” will refer to conditions that, when true, give a greater than 50% chance of high O3.

    • “Sufficient” will refer to conditions that, when true, give a greater than 80% chance of high O3.

    Two ways to determine necessary and/or sufficient meteorological conditions have already been presented. First, the GAMs described in Sections 2.2.1 and 2.3 can be used to predict the actual values of total and background O3 given the set of urban-scale meteorological predictors listed in Table 6. These predicted values and their confidence intervals can be used to estimate the probability that there will be high O3 given a set of meteorological conditions. Second, in Section 4.1.2 we have shown that the probability of high O3 events does vary between synoptic types, with type MT3 having the highest frequency of high O3 events.

    Here we use the technique of logistic regression to create GAMs relating smooth functions of urban-scale and synoptic-scale meteorological variables to the probability that a high O3 event will occur. Similar to the GAM equation in Section 2.1.1, the logistic regression equation is given by

    𝑔(𝜇2) = 𝛽> + 𝑓AB𝑥2,AC+𝑓/B𝑥2,/C + ⋯𝑓EB𝑥2,EC + 𝑆Y where µi is the ith day’s observation of whether or not a high O3 event occurred (coded as 1 for true and 0 for false),𝑔(𝜇2) is the “link” function (here, a logit link is used with a binomial probability distribution, unlike the log link and Gaussian distribution used for the GAMs of Section 2.1.1), and 𝑥2,L are the n urban scale meteorological predictors fit, with the corresponding 𝑓LB𝑥2,LC being a (initially unknown) smooth function of 𝑥2,L made from a cubic-spline basis set. We do not include the day of week, year, and day of year variables in our logistic regression. Instead, we include a factor (Sm) describing the synoptic types described in Section 4.1. To reduce the possibility of over-fitting the data, we set the “gamma” parameter to 1.4 for these fits, as recommended by Wood (2006).

    In order to simplify our analysis, we focused on just two urban-scale meteorological predictors, afternoon mean temperature and daily average wind speed. These variables were chosen as they seemed to have the biggest impact on the predicted values of both total and background O3 in our GAM fits from Sections 2.2 and 2.3. We then plot the probability of a high O3 event estimated by the logistic regression equation as a function of afternoon mean temperature and daily average wind speed, with a separate plot for each synoptic type. This results in three figures with six panels in each figure (Figure 21, 22, and 23).

  • Work Order No. 582-18-81763-07 Final Report

    49

    Figure 21. Probability of the total MDA8 O3 COMB exceeding 70 ppbv as a function of afternoon mean temperature (°C), daily wind speed (m/s), and synoptic type (as defined in Section 4.1).

  • Work Order No. 582-18-81763-07 Final Report

    50

    Figure 22. Probability of the total MDA8 O3 ELP exceeding 70 ppbv as a function of afternoon mean temperature (°C), daily wind speed (m/s), and synoptic type (as defined in Section 4.1).

  • Work Order No. 582-18-81763-07 Final Report

    51

    Figure 23. Probability of the background MDA8 O3 exceeding 55 ppbv as a function of afternoon mean temperature (°C), daily wind speed (m/s), and synoptic type (as defined in Section 4.1).

  • Work Order No. 582-18-81763-07 Final Report

    52

    4.2.2 Results and Discussion The percent of the deviance 1 explained by the logistic model and the Un-Biased Risk

    Estimator2 (UBRE) score for each logistic model is given in Table 11. The models explain the largest percentage of deviance for total MDA8 O3 ELP, but even here only 28% of the deviance is explained, suggesting most of the variability is due to other parameters not included in the model.

    The daily average wind speed and afternoon temperature is always a significant predictor at the a = 0.001 level. The differences between the factors for the synoptic types are occasionally significant, but many types are found to be similar to each other, as expected from our discussion in Section 4.1.2.

    Figures 21 through 23 show that high total O3 events in the combined El Paso- Ciudad Juárez -New Mexico area (COMB, Figure 21) and the El Paso area only (ELP, Figure 22) are primarily a function of wind speed and temperature, with high temperatures and low wind speeds favoring high O3 events, as expected. The regional background O3 shows a similar dependence on temperature and wind speed, but the probability of having background O3 above 55 ppbv is always lower than the probability of having total O3 above 70 ppbv. Table 11. Deviance explained (%) and URBE score (unitless) for the logistic models for total and background O3 COMB and total ELP.

    GAM Deviance (%) URBE Total MDA8 O3 COMB 26.9 -0.54667 Bkg MDA8 O3 COMB 13.4 -0.65355 Total MDA8 O3 ELP 28.0 -0.69676

    1 “Deviance” plays a similar role in GAMs as the variance of the residuals in ordinary linear

    models (see Wood, 2006, p. 70 for the full definition). The percent of deviance explained by a GAM is a generalization of r2 for ordinary linear models (Wood, 2006, p. 84).

    2 For logistic regression with GAMs, minimizing the UBRE score (see Wood, 2006, p. 172 for the full definition) is equivalent to minimizing the expected mean square error of the model. The lower the score, the better the model fit.

  • Work Order No. 582-18-81763-07 Final Report

    53

    5 Quality Assurance Steps and Reconciliation with User Requirements All work on the project was done in accordance with the Quality Assurance Project Plan

    (QAPP). All scripts and data files used in this project were inspected by team members different from the original author to ensure they were correct, and any errors noted in early versions were fixed. Other required evaluations are contained within the report. In addition, if further analysis or feedback from the TCEQ uncovers any errors in the provided files, we will correct those and provide the TCEQ with corrected files.

    In addition, the QAPP listed several questions that needed to be addressed for each project task. These questions are addressed below. 5.1 Task 2: Development of GAMs

    • Do the relationships between meteorological variables and O3 and PM2.5 described in the developed GAMs make physical sense given our conceptual models of O3 and PM2.5 emissions, chemistry, and transport? The functional dependencies in the GAMs between the predictors related to temperature, water vapor density, wind speed, vertical stability, and HYSPLIT bearing are all qualitatively consistent with our conceptual understanding of O3 and PM2.5 emissions, chemistry, and transport.

    • Are these relationships consistent with the scientific literature? Our GAMs for MDA8 O3 here are consistent with those found for eastern US cities by Camalier et al. (2007) and for other Texas urban areas by Alvarado et al. (2015).

    • Are the HYSPLIT back-trajectories used in the model development reasonable? How sensitive are these trajectories to the initial location? The HYSPLIT back-trajectories used in the model development appear reasonable and generally consistent with the surface wind speed and direction measured in El Paso.

    • How well does the GAM reproduce the testing sets in the cross-validation evaluation? The ten-fold cross-validation showed that the GAMs fit the data withheld from the training about as well as they fit the training data, giving little evidence of over-fitting (Section 2.2.4).

    • Does the cross-validation evaluation of the models show evidence of over-fitting? As noted in Section 2.2.4, there is no evidence of over-fitting in the MDA8 O3 predictions.

    • Under what conditions are the GAMs expected to be valid? What conditions give exceptionally large residuals? Strictly speaking, the GAMs are only expected to be valid during the periods for which they were fit, and when the data is taken from the sources and sites noted in this memo. Extrapolations to other times and monitoring locations may be problematic, and the GAMs ability in this regard has not been assessed in this project. We have not identified any set of necessary or sufficient conditions that lead to large residuals in the GAMs.

  • Work Order No. 582-18-81763-07 Final Report

    54

    5.2 Task 3: Background O3 and PM2.5 • Are the derived background estimates, and their spatial and temporal variation,

    consistent with our conceptual models of O3 and PM2.5 emissions, chemistry, and transport? The trend in regional background O3 for the El Paso and Ciudad Juárez urban area is increasing but the trend is not statistically significant. While US emissions of NOx are decreasing, which should decrease regional background O3, the influence of Mexican emissions may be keeping the trend effectively zero. Background O3 peaks in May and August, but in mostly stable between these months as well.

    • Are these estimates consistent with the scientific literature? We are unaware of any previous literature estimates of background O3 in the El Paso and Ciudad Juárez urban areas. However, the values derived here are consistent with our previous work in other urban areas in Texas (Alvarado et al, 2015).

    • What are the uncertainties in the background estimates, and under what conditions are they valid? The major uncertainties in the background estimates calculated using the TCEQ method are, first, that they assume the regional background can be estimated as the lowest value observed at a selected number of sites around the urban area. This neglects the fact that urban areas in Texas and Mexico likely infl