18
06/15/2022 U.S. Environmental Protection Agency 1 Conflation, Data Quality and MADness ESRI Developer Meetup June 7 th , 2011 USEPA Office of Environmental Information David G Smith PE PLS 202-566-0797 [email protected] Twitter:@DruidSmith

ESRI DevMeetup 201100607

Embed Size (px)

DESCRIPTION

ESRI Developer Meetup June 7 2011 - Conflation, Data Quality and Record-Level Metadata

Citation preview

Page 1: ESRI DevMeetup 201100607

04/12/2023U.S. Environmental Protection Agency

1

Conflation, Data Quality and MADness

ESRI Developer MeetupJune 7th, 2011

USEPA Office of Environmental InformationDavid G Smith PE PLS 202-566-0797

[email protected]:@DruidSmith

Page 2: ESRI DevMeetup 201100607

Metadata??

04/12/2023 U.S. Environmental Protection Agency 2

Page 3: ESRI DevMeetup 201100607

FRS Overview

• Facility Registry System• FRS is a data aggregator• FRS performs integration, validation and QA

across over 30 federal databases and over 50 state, territory and tribal databases

• FRS contains information on nearly 2.8 million facilities

• > 80% of facilities have lat/long information

Page 4: ESRI DevMeetup 201100607

• FRS improves program facility data validity from 40—95% by selecting best contact and location information from multiple data sources

• Allows EPA, public, academic, and investment communities to evaluate compliance with environmental regulations

• Provides robust, complete view of facility information, facilitating cross-media analyses:– Community-based initiatives– Environmental justice analyses– NEPA assessments– Emergency response– Other mission needs (TMDL program, climate change analysis, etc.)

04/12/2023 U.S. Environmental Protection Agency 4

What FRS Does

Page 5: ESRI DevMeetup 201100607

FRS Features• Provides a more complete, holistic, cross-media view of key

facility information– through verification and – data management procedures

• Incorporates layers of quality control – the FRS record is checked for completeness, consistency, and validity and is owned by FRS

• Integrates information from program national systems, state master facility records, tribal partners, and other federal agencies

• Supported by a network of data stewards covering– both geographic and – programmatic areas of expertise.

• Fully integrated with the Locational Data and the Integrated Error Correction Process (IECP)

5

Page 6: ESRI DevMeetup 201100607

FRS Features• Provides essential support for applications that rely

on integrated views of facilities

– GIS applications (EnviroMapper, MyEnvironment)– Public access applications (Envirofacts, Cleanups in

My Community (CIMC)– Enforcement systems and applications (IDEA, OTIS,

ECHO, ICIS)• Offers specialized services to applications in need of

accurate facility information– Emergency Response– TRI-ME web– DMR Loadings Tool

• Provides web services, enabling data exchanges with state partners on the Environmental Exchange Network

6

Page 7: ESRI DevMeetup 201100607

FRS ScopeMajor Programs Represented in FRS

http://www.epa.gov/enviro/html/frs_demo/new_crosswalks.html

• AirAFS AQSCAMDBS EGRIDNEI RBLCRFS (Ethanol)

• WaterPCS ICIS-

NPDESSDWIS CWNS

• Chemical ReleasesTRIS RMPTSCA SSTS FRP BRAC

• Hazardous WasteACRES CERCLISRCRAINFO RADINFO

• Enforcement/ComplianceICIS ECRMNCDB

• SchoolsNCES GNIS BIA INDIAN SCHOOL

• Other

LANDFILL

Page 8: ESRI DevMeetup 201100607

FRS Data Model

IndividualIndividual

Organization

IndividualIndividual

Mailing Address

AffiliationAffiliation

Environmental InterestEnvironmental Interest

IndustrialClassificationIndustrialClassification

SupplementalInterest

AlternativeName

Geospatial

Facility/Site

High Level Data Model

Page 9: ESRI DevMeetup 201100607

FRS Data Pipeline

Clean & Validate

• Geo-codes & parses addresses

Integrate & Match

• Assigns a unique ID to each facility record

Select Best Pick

• Uses business rules to select the best contact/address & location

Page 10: ESRI DevMeetup 201100607

QA Process

FormatAddresses

GeocodeAddresses

Standardize and Validate

Geo Coordinates

Determine Facility Best Coordinate

FacilityAddresses

StandardFormat

Addresses

Program and State Geo

Coordinates

ValidatedGeo

Coordinates

Best GeographicCoordinates

FRS Facility Geocoded

Coordinates

Page 11: ESRI DevMeetup 201100607

Integration?

Air Permit Coordinate

Water PermitCoordinate

Toxics PermitCoordinate

Best Facility Coordinate?

Page 12: ESRI DevMeetup 201100607

Locational Data Accuracy and Best Pick

• FRS utilizes the EPA Lat/Long Data Standard• Locational Reference Tables (LRT)• Method Accuracy Description (MAD)• Best Pick

Page 13: ESRI DevMeetup 201100607

LRT Record IDConveyor of

Record

Program System Name

Program System ID

Program System

Subentity ID

Program Latitude

Program Longitude

Best ValueCollection

MethodAccuracy

ValueScale MOD Score

Reference Point

Insertion Date

Coordinate Source

Map Coordinate

12135178 CEDS CEDS 200000072141 37.4511 -77.4339 N 932.3551 39105 MAP

12651135 NEI NEI NEIVA2561 37.450939 -77.434273 N UNKNOWN 932.3551AIR RELEASE

STACK 39105 MAP

14542018 RCRIS RCRAINFO VAD009305137 37.451667 -77.433333 N 1137.0184 39112REGULATED

ENTITY MAP

15512736 RMP RMP 1E+11 37.451917 -77.433361 N 2.37CENTER OF

FACILITY 39967 MAP

15727233 PCS PCS VA0004669 001N9 37.448888 -77.423888 N 898.2445WATER

RELEASE PIPE 39234 MAP

15727234 PCS PCS VA0004669 101R9 37.448888 -77.423888 N 1.58WATER

RELEASE PIPE 39234 MAP

15727235 PCS PCS VA0004669 101N9 37.448888 -77.423888 N 1.58WATER

RELEASE PIPE 39234 MAP

15727236 PCS PCS VA0004669 101B9 37.448888 -77.423888 N 898.2445WATER

RELEASE PIPE 39234 MAP

15727237 PCS PCS VA0004669 101A9 37.448888 -77.423888 N 1.58WATER

RELEASE PIPE 39234 MAP

15727238 PCS PCS VA0004669 102N9 37.448888 -77.423888 N 1.58WATER

RELEASE PIPE 39234 MAP

15727239 PCS PCS VA0004669 37.451111 -77.433889 NINTERPOLATI

ON-MAP 50 24000 2.37FACILITY

CENTROID 39819 MAP

15727240 PCS PCS VA0004669 003A9 37.454166 -77.3875 N 1.94WATER

RELEASE PIPE 39234 MAP

15727241 PCS PCS VA0004669 002A9 37.454166 -77.3875 N 1.94WATER

RELEASE PIPE 39234 MAP

15727242 PCS PCS VA0004669 103N9 37.448888 -77.423888 N 1.58WATER

RELEASE PIPE 39234 MAP

16137349 AIRS/AFS AIRS/AFS 5104100001 37.451111 -77.433889 N 932.3551 39819 MAP

16137350 AIRS/AFS AIRS/AFS 5104100001 17 37.451111 -77.434167 N 932.3551 39819 MAP

16137351 AIRS/AFS AIRS/AFS 5104100001 16 37.451111 -77.434167 N 932.3551 39819 MAP

16137352 AIRS/AFS AIRS/AFS 5104100001 15 37.451111 -77.434167 N 932.3551 39819 MAP

16137353 AIRS/AFS AIRS/AFS 5104100001 14 37.451111 -77.434167 N 932.3551 39819 MAP

16137354 AIRS/AFS AIRS/AFS 5104100001 12 37.451111 -77.434167 N 932.3551 39819 MAP

16137355 AIRS/AFS AIRS/AFS 5104100001 10 37.451111 -77.434167 N 932.3551 39819 MAP

16137356 AIRS/AFS AIRS/AFS 5104100001 9 37.451111 -77.434167 N 932.3551 39819 MAP

16137357 AIRS/AFS AIRS/AFS 5104100001 8 37.451111 -77.434167 N 932.3551 39819 MAP

16137358 AIRS/AFS AIRS/AFS 5104100001 6 37.451111 -77.434167 N 932.3551 39819 MAP

16137359 AIRS/AFS AIRS/AFS 5104100001 19 37.451111 -77.434167 N 932.3551 39819 MAP

16137360 AIRS/AFS AIRS/AFS 5104100001 18 37.451111 -77.434167 N 932.3551 39819 MAP

16137361 AIRS/AFS AIRS/AFS 5104100001 2 37.451111 -77.434167 N 932.3551 39819 MAP

16137362 AIRS/AFS AIRS/AFS 5104100001 4 37.451111 -77.434167 N 932.3551 39819 MAP

16137363 AIRS/AFS AIRS/AFS 5104100001 1 37.451111 -77.434167 N 932.3551 39819 MAP

16137364 AIRS/AFS AIRS/AFS 5104100001 5 37.451111 -77.434167 N 932.3551 39819 MAP

16446261TRIS-

PREFERRED TRIS23234DPNTSUSHI

G 37.451667 -77.435 N UNKNOWN 28.6397 UNKNOWN 39323 MAP

16446262TRIS-

REPORTED TRIS23234DPNTSUSHI

G 37.451666 -77.435 N 898.2445 40265 MAP

17937134 PCS PCS VA0004669 101O9 37.445833 -77.429167 N 898.2445WATER

RELEASE PIPE 39819 MAP

All underlying information from programs is

retained, to include locational data

For any given facility, there may be multiple individual locations that have been gathered, e.g.

an associated air stack location, water outfall location, front

gate location, et cetera

http://www.epa.gov/enviro/html/locational/lrt_viewer.html

MAD Codes help us to assess how to handle locational data quality as well as understanding what it represents

Locational Reference Table

Page 14: ESRI DevMeetup 201100607

MAD Codes

• MAD Codes help us to assess how to handle locational data quality

• As well as understanding what it represents

Page 16: ESRI DevMeetup 201100607

• FRS maintains a database table of manual verifications in the LRT.

– EPA/Regional verifications trump State verifications.– Manually verified locations trump all the rest regardless of

calculated accuracy or qa checks.

• In automated processing, Superfund NPL Site locations trump everything

• Our “normal” process is based on supplied or implied accuracy and QA checks performed (MAD codes).

– EPA Latitude/Longitude Data Standard (http://www.exchangenetwork.net/standards/Lat_Long_Standard_08_11_2006_Final.pdf)

Select the “Best Pick” Information

Page 17: ESRI DevMeetup 201100607

• Users benefit from high quality integrated locational data for facilities toward enforcement, compliance, analysis, assessment and community impact

• Being able to assess and manage large amounts of data of varying quality, e.g. VGI

Business Case

Page 18: ESRI DevMeetup 201100607

Thank You - URLs

Topic URL

FRS Home Site http://www.epa.gov/enviro/html/fii/

FRS Geodata Download

http://www.epa.gov/enviro/geo_data.html

My Environment http://www.epa.gov/myenvironment/

EPA Geospatial Program

http://www.epa.gov/geospatial/index.html

EPA Geodata Gateway

https://geogateway.epa.gov/geoportal/catalog/main/home.page

EPA Geo Metadata

https://geogateway.epa.gov/EME/