Chapters 9 and 10, Longley et al.
Data Bases: Population and Maintenance
Geog 176B Lecture 8
Data Collection
One of most expensive GIS activitiesMany diverse sources (source integration, data fusion, interoperability)Two broad types of collection
Data capture (direct collection)Data transfer
Two broad capture methodsPrimary (direct measurement)Secondary (indirect derivation)
Stages in Data Collection Projects
Planning
Preparation
Digitizing / TransferEditing / Improvement
Evaluation
Data Collection Techniques
Raster VectorPrimary Digital remote
sensing imagesGPS measurements
Digital aerial photographs
Survey measurements
Secondary Scanned maps Topographic surveys
DEMs from maps
Toponymy data sets from atlases
Primary Data Capture
Capture specifically for GIS useRaster – remote sensing
e.g. SPOT and IKONOS satellites and aerial photographyPassive and active sensors
Resolution is key considerationSpatialSpectralTemporal
www.spot.ucsb.edu
Imagery for GIS
Vector Primary Data Capture
SurveyingLocations of objects determines by angle and distance measurements from known locationsUses expensive field equipment and crewsMost accurate method for large scale, small areas
GPSCollection of satellites used to fix locations on Earth’s surfaceDifferential GPS used to improve accuracy
Total Station
Pen/Portable PC and GPS
Secondary Geographic Data Capture
Data collected for other purposes can be converted for use in GISRaster conversion
Scanning of maps, aerial photographs, documents, etcImportant scanning parameters are spatial and spectral (bit depth) resolution
Scanner
Raster to vector conversion
Vector Secondary Data Capture
Collection of vector objects from maps, photographs, plans, etc.Digitizing
Manual (table) Heads-up and vectorization
Photogrammetry – the science and technology of making measurements from photographs, etc.
Digitizer
Data Transfer
Buy vs. build is an important questionMany widely distributed sources of GIIncludes geocodingKey catalogs include
Geodata.govGeography Network
Access technologiesTranslationDirect read
Managing Data Capture ProjectsKey principles
Clear plan, adequate resources, appropriate funding, and sufficient time
Fundamental tradeoff among Quality, accuracy, speed and price
Two strategiesIncremental‘Blitzkrieg’
Alternative resource optionsIn houseSpecialist external agency
Map scale Ground distance corresponding to 0.5 mm map distance
1:1250 62.5 cm
1:2500 1.25 m
1:5000 2.5 m
1:10,000 5 m
1:24,000 12 m
1:50,000 25 m
1:100,000 50 m
1:250,000 125 m
1:1,000,000 500 m
1:10,000,000 5 km
A useful rule of thumb is that positions measured from maps are accurate to about 0.5 mm on the map. Multiplying this by the
scale of the map gives the corresponding distance on the ground.
Positional Accuracy (cont.)within a database a typical UTM coordinate pair might be:Easting 579124.349 mNorthing 5194732.247 mIf the database was digitized from a 1:24,000 map sheet, the last four digits in each coordinate (units, tenths, hundredths, thousandths) would be questionable
Testing Positional AccuracyUse an independent source of higher accuracy:
find a larger scale mapuse precision GPS
Use internal evidence:digitized polygons that are unclosed, lines that overshoot or undershoot nodes, etc. are indications of errorsizes of gaps, overshoots, etc. may be a measure of positional accuracy
Testing Accuracy (cont.)Compute accuracy from knowledge of the errors introduced by different sourcese.g., 1 mm in source document0.5 mm in map registration for digitizing0.2 mm in digitizingif sources combine independently, we can get an estimate of overall accuracy...
(12 + 0.52 + 0.22) 0.5 = 1.14 mm
Definitions
Database – an integrated set of data (attributes) on a particular subjectGeographic (=spatial) database - database containing geographic data of a particular subject for a particular areaDatabase Management System (DBMS) – software to create, maintain and access databases
A GIS links attribute and spatial data
Attribute Data• Flat File• Relations
Map Data• Point File• Line File• Area File• Topology• Theme
Advantages of Databases over Files
Avoids redundancy and duplicationReduces data maintenance costsFaster for large datasetsApplications are separated from the data
Applications persist over timeSupport multiple concurrent applications
Better data sharingSecurity and standards can be defined and enforced
Disadvantages of Databases over Files
ExpenseComplexityPerformance – especially complex data typesIntegration with other systems can be difficult
Types of DBMS Model
HierarchicalNetworkRelational - RDBMSObject-oriented - OODBMSObject-relational - ORDBMS
Relational Databases rule now
Characteristics of DBMS (1)
Data model support for multiple data types
e.g MS Access: Text, Memo, Number, Date/Time, Currency, AutoNumber, Yes/No, OLE Object (MS Object linking and embedding), Hyperlink, Lookup Wizard
Load data from files, databases and other applicationsIndex for rapid retrieval
Characteristics of DBMS (2)
Query language – SQLSecurity – controlled access to data
Multi-level groups (e.g. census, NGA)
Controlled update using a transaction managerVersioningBackup and recovery
Characteristics of DBMS (3)
ApplicationsForms builderReportwriterInternet Application ServerCASE tools
Programmable API (Applications program interface)
Geographic Information
System
Database Management
System
• Data load• Editing• Visualization• Mapping• Analysis
• Storage• Indexing• Security• Query
Data
System TaskRole of DBMS
Relational DBMS (1)
Data stored as tuples (tup-el), conceptualized as tablesTable – data about a class of objects
Two-dimensional list (array)Rows = objectsColumns = object states (properties, attributes)
Table
Row = objectVector feature
Column = attribute
Relational DBMS (2)
Most popular type of DBMSOver 95% of data in DBMS is in RDBMS
Commercial systemsIBM DB2InformixMicrosoft AccessMicrosoft SQL ServerOracleSybase
SQL
Structured (Standard) Query Language – (pronounced SEQUEL)Developed by IBM in 1970sNow de facto and de jure standard for accessing relational databasesThree types of usage
Stand alone queriesHigh level programmingEmbedded in other applications
Types of SQL Statements
Data Definition Language (DDL)Create, alter and delete dataCREATE TABLE, CREATE INDEX
Data Manipulation Language (DML)Retrieve and manipulate dataSELECT, UPDATE, DELETE, INSERT
Data Control Languages (DCL)Control security of dataGRANT, CREATE USER, DROP USER
Relational Join
Fundamental query operationOccurs because
Data created/maintained by different users, but integration needed for queries
Table joins use common keys (column values)Table (attribute) join concept has been extended to geographic case
JoinRecord ID
Address
#cars
1241 123 State St. 3
1242 1801 Main St. 1
1243 2106 Elm St. 2
1244 7262 Pine Drive 1
1241 Ford 2003
1241 Subaru 2000
1241 Honda 1999
1241 123 State St.
Ford
1241 123 State St.
Subaru
1241 123 State St.
Honda
1242 1801 Elm St.
Kia
Spatial indexing
Many maps tiledB-tree (Balanced) Grid indexingQuad tree: Points/regionsR-tree (Based on MBR)
New global/spatial grids: QTM
Go2 Grids38:53:22.08N 077:02:06.86WUS.DC.WAS.54.18.28.83.11US.CA.SBA.UCSB.UCEN
Spatial Search:Gateway to Spatial Analysis
Overlay is a spatial retrieval operation that is equivalent to an attribute join. Buffering is a spatial retrieval around points, lines, or areas based on distance.