Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Aug. 15, 2013 Slide 1
Discovering Persistent Change Windows (PCW) in Spatiotemporal Climate Datasets
3rd Annual Workshop of NSF Expeditions in Computing:
Understanding Climate Change: A Data Driven Approach.
Evanston, IL, August 15th-16th, 2013
Sponsor: NSF CISE/EIA
Shashi Shekhar
Peter K. Snyder
Joseph F. Knight
Stefan Liess Students:
Xun Zhou
Zhe Jiang
Aug. 15, 2013 Slide 2
Change Windows
• Time-Window of Change, e.g., interesting interval in a time series
• Space-window of Change (e.g., Region or Sub-path ):
• Spatio-Temporal Window of Change:
Desertification Deforestation Urban sprawl
2001 2006 2012
Irrigation in Saudi Arabia (Google Time Lapse[5])
Aug. 15, 2013 Slide 4
Spatiotemporal (ST) footprint of changes
• “Where” and “when” a change occurs?
• A taxonomy of ST change footprint:
Temporal
footprint
Interval in time series
Point in time series
Between Few snapshots
Single Snapshots
Spatial
footprint
Local
Focal
Zonal
Point
Lines
Polygon
Raster Vector
4
Aug. 15, 2013 Slide 5
Spatiotemporal change footprint (raster)
Temporal
Snapshot Few
Snapshots
Time series
(point)
Time series
(interval)
Time series
collections
Spat
ial
Loca
l
Remote
sensing
change
detection
Change Point
Detection (e.g.,
CUSUM)
Time series
change interval
(published)
Foca
l
Edge Detection
(Wombling)
Zonal
Sub-path of
change
(published)
Persistent Change Window (in progress)
5
Aug. 15, 2013 Slide 6
Spatial Sub-path of Change
• Spatial footprint of Change
• Ex. Sahel – sharp change in vegetation cover
• Transition between ecological zones (ecotones)
• Vulnerable to climate change
11.3 16.8 22.2 27.6 33.0NA
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Latitude
ND
VI
Va
lue
The change is
persistent and rapid
W1=[12N, 17N]
A snapshot of vegetation cover in Africa [6]
Aug. 15, 2013 Slide 7
Temporal Sub-path of Change
• Time footprint of Change
৹ Abrupt shift in precipitation, temperature, etc.
৹ Climate change detection.
Raw Sahel precipitation anomaly (JJASO)[7] Smoothed Sahel precipitation anomaly (JJASO)
Aug. 15, 2013 Slide 8
Computer Sc. Problem: Interesting Sub-path Query (ISQ)
• Input
• A statistical interest measure & thresholds.
• A path and its attribute
• Output
• All dominant interesting sub-path
• Constraints
• Correctness & completeness
• Automation & scalability to large datasets
• Algebraic Interest Measure,e.g., average slope
Average change (slope) ≥ 3.5
[1,2], [5,11]
Aug. 15, 2013 Slide 9
Related Work, Its Limitations, Novelty of Our Approach
Interesting sub-region query
Change-points sub-paths e.g., ISQ (Our Work) e.g., CUSUM[8]
sub-regions (PCW) (Our new Work)
[1,2], [5,11] [6]
Aug. 15, 2013 Slide 10
Naive Approach does not scale!
Naive approach
Phase 1: Evaluate interest measure for all O(N2) sub-intervals
Phase 2: Identify dominant sub-paths (compare sub-path pairs)
Complexity: O(n4) for a path,
where n = number of locations on the path
Example: Find intervals of high gradient along all longitudes
using 0.07 degree resolution dataset
=> 103 locations per path, 102 longitudes,
=>1014 computations
Window footprint
1-D (path)
2-D (spatial) 3-D (spatiotemporal)
# locations 103 106 109
# windows 108 1012 1018
computation 1014 1024 1036
Search spaces O(n2) O(n4) O(n6)
Aug. 15, 2013 Slide 11
Computational Structure of ISQ
• Not Dynamic Programming!
• GRID-DAG (Directed Acyclic Graph)
• Node = sub-paths
• Edge = Dominance relationship
• Interest Measure (node) = f ( leftmost & rightmost leafs )
• Q? Which traversal order avoids unnecessary work?
En
d l
oca
tion
1
2
3
4
5
6
7
8
9
10
11
12
1 2 3 4 5 6 7 8 9 10 11 12 Start location
Dominated by
Valid interval
Dominant
Dominated
5-11
1-2
1-12
Aug. 15, 2013 Slide 13
Backup slides start here A Comparison of Techniques for Traversing G-DAG
DFS or BFS
Bottom-Up with in-Row Pruning (BURP)
BFS with Sub-Graph Pruning (BSGP)
Row-Traversal with Column Pruning (RTCP)
A: Avoid Redundant leaf visits No Yes Yes Yes
B: Avoid Unnecessary visits to dominated non-leafs
No No Yes Yes
C: Memory need for B (n = number of locations in path)
O(n2) O(n) O(n2) O(1)
5-11
1-2
1-12
Insights: • Interest measure is a algebraic function => leaf scan • Dominance = partial order among sub-paths => pruning • The partial order is a Grid-DAG => O(1) memory traversal & pruning via row-wise scan (of non-leaf)
Aug. 15, 2013 Slide 18
• Theoretical Evaluation:
• RTCP is Correct and Complete
• Correct: All the reported sub-paths are qualifying dominant sub-paths
• Complete: All the dominant interesting sub-paths are reported
• Experimental Evaluation
• RTCP is faster than competitions
• RTCP needs less memory than competition
Theoretical and Experimental Evaluations
Aug. 15, 2013 Slide 19
Effect of Dataset Size on Memory Needs
• Setup: Pattern length ratio (PLR) is fixed at 0.1
Dataset – synthetic (left), real (right)
• Trends: RTCP has smaller memory cost than competitions
• Memory cost are not sensitive to pattern length ratio
Aug. 15, 2013 Slide 20
Effect of Data Size on Computational Cost
• Setup:
• Data: GFDL-CM2.0 coupled model realization (1861-2000, entire world)
• Global weighted average of max daily temperature
• Data length: 51100 :: Pattern Length Ration (PLR) = either 0.1 or 1
• Note difference in scale across 2 plots
• Trends: RTCP is faster than competitions
RTC
P
PLR = 0.1 PLR = 1
Aug. 15, 2013 Slide 21
• Pattern Length Ratio =
• Ratio of length of longest interesting sub-path and the length of the entire path,
• between 0 and 1.
• Synthetic data: generated with Gaussian distributed unit values.
• Trend: RTCP is faster than competitions with any pattern length
Effect of Pattern Length
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
2
3
4
Pattern Length Ratio (PLR)
Ru
n t
ime
(seco
nd
s)
BSGP
BURP
RTCP
Data length: fixed at 20k units
RTCP
Aug. 15, 2013 Slide 22
Case Study
AVG{∆}
AVG≥α{∆}
• Data: NDVI by GIMMS [4], Africa, 1981 August.
• Resolution: 8km. Smoothed within 1x1 degree.
• Path: along each longitude (south north)
• Interest measure: (Slope) Sameness degree
• ∆ : unit slope
• Thresholds: α= 20% percentile, SD ≥0.5 CUSUM
ISQ
Aug. 15, 2013 Slide 23
Case Study: Global Data and Ecotones
• NDVI of the entire world
• Aug 1-15 1981, 0.07 degree (8km) resolution
• a = 10%, SD ≥ 0.5
Sahel
region
Boundary
of
Amazon
Rocky
Mountain Himalaya
Mountain
Gobi Deserts
23
Aug. 15, 2013 Slide 26
Contributions : Summary
• Formalize Interesting Sub-path Query (ISQ) problem
• Computational structure of efficient 1D interval enumeration
• Grid-DAG traversal strategy
• Bottom-up with in-row pruning (BURP) 1, BFS with sub-graph pruning (BSGP)2
• Row-wise with column-pruning (RTCP)3
• Evaluation : Experiments, Case Studies
• Next Steps
• GPU platform
• Efficient 3D region enumeration
• submitted to ACMGIS 2013.
• Visit poster session for details
1 From our early work: J. Kang, et al, Discovering Flow Anomalies: A SWEET Approach. In IEEE conference on Data Mining
(ICDM 2008). 2 Published in ACM SIGSPATIAL GIS 2011. 3 Manuscript to be submitted to IEEE Transaction on Knowledge and Data Engineering (TKDE).
1-D (path)
2-D (spatial) 3-D (spatio-temporal)
# locations 103 106 109
# windows 108 1012 1018
computation 1014 1024 1036
Search spaces O(n2) O(n4) O(n6)
Aug. 15, 2013 Slide 27
Accelerating ISQ with GPUs – Preliminary Results
• Motivation: Google Time-lapse like visualization of spatio-temporal change windows
• Example: GIMMS NDVI
• 611 snapshots (26 years every 16 days)
• Africa, 8km resolution: 1152 pixels x 1152 pixels
• Initial results with BURP (Bottom-up with in-row pruning)
• Setup: Platform: CUDA, 1 thread per longitude
• Trend: 10x speedup with 1 GPU
• CPU time (matlab): 240 seconds / video
• GPU/CUDA: 19 seconds / video
• Next steps: • More algorithms (e.g., RTCP), Platforms (e.g., multi-GPU), Datasets
* Collaborated with Dr. S. Prasad et al. in Georgia State Univ. Results submitted to CyberGIS AHM’13
Aug. 15, 2013 Slide 28
Case Study
CUSUM (pre-selected area) vs. Proposed approach (a=20% quantile, SD = 0.5)
Aug. 15, 2013 Slide 32
Persistent Change Window (PCW) Discovery: Problem
• Given:
• A spatial time series dataset
• An average change rate threshold
• A spatial aggregate function (e.g., sum, average)
• Find:
• All the dominant persistent change windows {Si, Ti}
• Constraints
• Correctness & completeness • Automation & scalability to large datasets
Highlighted (3x3) over Time [1,4]: Sum(T1) = 90, Sum(T4) = 53
Average decrease rate = [(90-53)/90]/3= 17.3%
T = 1 T = 2 T = 3 T = 4
Aug. 15, 2013 Slide 33
Persistent Change Window (PCW) Discovery: Challenges
• A six-dimensional enumeration space
• An interval along each dimension
(0, 0, 0)
(0, 1, 0)
(2, 2, 2)
(0, 2, 0) [0,2],[0, 2],[0, 2]
[0,1],
[0,2],
[0,2]
[0,2]
[0,1]
[0,2]
[0,2]
[0,2]
[0,1]
[1,2],
[0,2],
[0,2]
[0,2]
[1,2]
[0,2]
[0,2]
[0,2]
[1,2] (1, 0, 0) (2, 0, 0)
[0,1],
[0,1],
[0,2]
[0,1]
[0,2]
[0,1]
[0,0]
[0,2]
[0,2]
[0,1],
[1,2],
[0,2]
[0,1]
[0,2]
[1,2]
[0,2]
[0,1]
[0,1]
[1,2]
[0,1]
[0,2]
[0,2]
[0,1]
[1,2]
[0,2]
[1,2]
[0,1]
[0,2]
[1,2]
[0,1]
[0,2]
[0,0]
[0,2]
[0,2]
[0,1]
[0,1]
[0,2]
[0,2]
[0,0]
[1,2]
[0,2]
[0,1]
[1,2]
[1,2]
[0,2]
[1,2]
[0,2]
[1,2]
[1,1]
[0,2]
[0,2] …
Aug. 15, 2013 Slide 35
2000 2002 2004 2006 2008 2010 2012
0.2
0.25
0.3
0.35
0.4
Year
ND
VI
An annual increase of
11.5%, 2001-2012
Persistent Change Window Discovery: Case Study
• Initial Results • Initial algorithms
• Case Study: MODIS 250m NDVI data (16 days)
• Time:2000-2012. Annual: July 27/28 of each year.
50 100 150 200 250 300 350 400
20
40
60
80
100
120
140
160
180
200 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
50 100 150 200 250 300 350 400
20
40
60
80
100
120
140
160
180
200 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
50 100 150 200 250 300 350 400
20
40
60
80
100
120
140
160
180
200 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Study area
2001 2006 2012
Average NDVI in outlined window
Irrigation in Saudi Arabia, shown by
Google Time lapse [5]
Results of the
proposed algorithm
with average change
rate >= 10%
(outlined window)
Aug. 15, 2013 Slide 38
List of Publications and References
Contributors’ Publications:
[1] Xun Zhou, Shashi Shekhar, Pradeep Mohan, Stefan Liess, Peter K. Snyder: Discovering interesting sub-paths in spatiotemporal datasets: a summary of results. GIS 2011: 44-53
(A full journal version to be submitted to IEEE Transection on Knowledge and Data Engineering)
[2] Xun Zhou, Shashi Shekhar, Pradeep Mohan. Spatiotemporal (ST) Change Pattern Mining: A Multi disciplinary Perspective. In Book: Mei-Po Kwan, Douglas Richardson, Donggen Wang and Chenghu Zhou (eds) (2013) Space-Time Integration in Geography and GIScience: Research Frontiers in the US and China. Dordrecht: Springer (in Press)
[3] Xun Zhou, Shashi Shekhar, Reem Y. Ali. Spatiotemporal (ST) Change Pattern Mining: A Multi-disciplinary Perspective. Submitted to the Wiley’s Interdisciplinary Review on Data Mining and Knowledge Discovery (DMKD).
[4] Xun Zhou, Shashi Shekhar, Dev Oliver. Discovering Spatiotemporal (ST) Persistent Change Windows: A Summary of Results. Submitted to ACM SIG SPATIAL GIS 2013.
References:
[5] Google Earth Engine (Accessed: June 22, 2013).
[6] Tucker, C. J., J. E. Pinzon, M. E. Brown. Global inventory modeling and mapping studies. Global Land Cover Facility, University of Maryland, College Park, Maryland, 1981--2006.
[7]. Joint Institute for the Study of the atmosphere and Ocean(JISAO). Sahel rainfall index. http://jisao.washington.edu/data/sahel/.
[8] E. Page. Continuous inspection schemes. Biometrika, 41(1/2):100--115, 1954.
Aug. 15, 2013 Slide 39
Google Time lapse
• Google Time lapse main page
• Irrigation in Saudi Arabia
• Amazon Deforestation