Migue final presentation_v28

Preview:

DESCRIPTION

Bio-inspired computational techniques applied to the clustering and visualization of spatio-temporal geospatial data

Citation preview

1

Bio-inspired computational techniques applied to the clustering and visualization

of spatio-temporal geospatial data

Miguel BARRETO-SANZ

June 27, 2011

2

More data has been created since 2005 than in the previous 40,000 years

3

1980 First

commercial

vendors of

Geographical

information

Systems (GIS)

software

1972 Landsat 1,

1st civilian

Earth

observation

satellite

1993 It is

launched

the 24th

Navstar

satellite

completing

the Global

Positioning

System

2000 Civilian

demand

for GPS

products

2010 Social networks

Geotag

2005 Google

Earth

2006 GPS

receiver

built into

cell

phones

1997 Tropical

Rainfall

Measuring

Mission

(TRMM)

1992 Internet

explosion

Geospatial data timeline

4

These data are critical for

decision support, but their

value depends on our ability

to extract useful information

5

NASA earth observatory (Information from several missions

e.g. Terra, TRMM, SRTM)

Challenges

• Highly-dimensional • Large quantity of data • Unlabeled samples (labeling is

expensive and time consuming process)

-30.1

30.5

Mean annual

temperature (ºC)

0

12084

Annual

precipitation (mm)

Worldclim (climate data from weather stations)

Elevation Slope Aspect

Landscape

Class

Moisture

Solar

Radiation Exposure Curvature

Derivate variables

6

Spatio-temporal challenges Spatio-temporal representations at several levels

Fuzzy boundaries in geographical space

Variables and clusters evolved in a temporal context

Visualization of clusters in geographical and feature space

Hours

Days

Months

Years

7

Thesis

Clustering

Visualization and projection

Spatio-temporal data

FGHSON Tree-structured SOM component planes SOM GHSOM Colombia (Ecoregions)

South America (Ecoregions)

Colombia

(agroecozones,

ecoregions)

8

Visualization and projection

9

3 1

3 2

Data set SOM training

Visualization

Visualization by using Self-organizing Maps

10

Correlation hunting

Exploration

Similar

Partial correlations

Visualization by using Self-organizing Maps

11

Climate variables. • Average Temperature (TempAvg) • Average Relative Humidity (RHAvg) • Radiation (Rad) • Precipitation (Prec) Soil variables. • Order (Ord) • Texture (Tex) • Deep (Dee) Topographic variables. • Landscape (Ls) • Slope (Sl). Other variables. • Water Balance (WB) • Variety (Var) Production

A real world problem: Classification of agro-ecological variables related with

productivity in the sugar cane culture.

Total 54 variables

12

5 Variables

Classical approach: scatter plot matrix

13

23 Variables

Classical approach: scatter plot matrix

14

54 Variables

Classical approach: scatter plot matrix

15

5 Variables

SOM component planes

16

23 Variables

SOM component planes

17

54 Variables SOM component planes

18

54 Variables

SOM component planes

19

Correlation Hunting

20

SOM of component planes

21

Tree-structured SOM component planes

22

54 Variables

Tree-structured SOM component planes

23

Tree-structured SOM component planes

24

Clustering

25

Hierarchical Self-organizing Structures

• It combines the advantages of the Hierarchical representation and Soft Competitive Learning

• In the state of the art all the methods are crisp

approaches

• In geospatial applications crisp memberships are

not the optimal representation of clusters.

26

Real world data and its fuzzy nature

Crisp

Fuzzy

27

An approach to tackle this problem consists in allowing a fuzzy representation in the hierarchical structures

28

α-cut

α-cut

α-cut

Breadth grow process

Depth

gro

w p

rocess

Hierarchy Fuzzy membership

Fuzzy Growing Hierarchical Self-Organizing Networks FGHSON

29

Precipitation

Temperature

Similar Zones

Case study-South America Cali Colombia

30

Case study-South America Cali Colombia

31

To finding the right prototype

Case study-South America Cali Colombia

32

Level 1

33

Level 2

34

Fortaleza Brazil

Cali Colombia

Level 3

35

Spatio-Temporal Clustering

36

Space - Where

Time – When

Spatio-Temporal Clustering

Homologues places for Colombian coffee production. Brazil, Equator, East Africa, and New Guinea.

37

Space and time – Where and when

Argentina

United States Maize (Zea maize L.)

Spatio-Temporal Clustering

38

Objective: to find similar environmental zones trough time in South America.

In these experience we are looking for regions with similar patterns in time

windows of three months.

Spatio-Temporal Clustering

39

Spatio-Temporal Clustering

40

Precipitation

Temperature

Similar Zones to Cali in the period jan-feb-mar?

Spatio-Temporal Clustering

41

Spatio-Temporal Clustering

42

Conclusions

1. Original contributions

FGHSON • Capability to reflect the underlying structure of a dataset in a hierarchical fuzzy way

• It does not require an a-priory definition of the number of clusters.

•The algorithm executes self-organizing processes in parallel.

•Only three parameters are necessary to the setup of the algorithm.

43

Conclusions

Tree-structured SOM component planes

• It creates structures that allow the visual exploratory data

analysis of large high-dimensional datasets.

• Similarities on variables’ behavior can be easily

detected (e.g. local correlations, maximal and minimal values and outliers).

44

Conclusions

2. Test of methodologies for clustering and visualization of georeferenced data • GHSOM

• SOM

• FGHSON

3. Methodology contributions • Clustering of spatio-temporal datasets through time by using FGHSON.

45

The COCH project

4. Agroecological knowledge contribution • In sugar cane productivity

• In sugar cane agroecoregionalizacion

• In Andean blackberry production

Conclusions

46

Questions

Recommended