30
Hauptseminar Context-based Presentation of Information for Mobile Users Prof. Dr. Uwe Baumgarten, Prof. Gudrun Klinker, Ph.D., Prof. Dr. Donald Kossmann Spatial Data Mining Elmar Witte Dec. 19, 2001

Spatial Data Mining

  • Upload
    tommy96

  • View
    2.151

  • Download
    3

Embed Size (px)

Citation preview

  • 1.Spatial Data Mining Page 1 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Hauptseminar Context-based Presentation of Information for Mobile Users Prof. Dr. Uwe Baumgarten, Prof. Gudrun Klinker, Ph.D., Prof. Dr. Donald Kossmann Spatial Data Mining Elmar Witte Dec. 19, 2001

2. Spatial Data Mining Page 2 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Presentation overview Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 3. Spatial Data Mining Page 3 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Introduction: Spatial Data Structures Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 4. Spatial Data Mining Page 4 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Features of Spatial Data Structures (1) Introduction: Spatial Data Structures spatial (ger. rumlich) data mining means discovery of knowledge in spatial databases (similar but not identic to relational data mining ) spatial databases store (hugh amounts) of spatial data spatial data contain some geometrical information - objects are defined by points, lines, polygons - objects described by spatial data have an area or volume from [5] 5. Spatial Data Mining Page 5 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Features of Spatial Data Structures (2) Introduction: Spatial Data Structures Spatial data is stored in spatial databases. Multidimensional trees are used, in order to build indices for these data (e.g. quad trees, k-d trees, R-trees, R*-trees). Often attributes of spatial objects are still one-dimensional, so that this non-spatial part can be stroed in relational databases with references to the spatial data. Note: Spatial operations like spatial join and map overlay are the most expensive. There are efficient algorithms that handle these problems. 6. Spatial Data Mining Page 6 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Database Primitives for Spatial Data Mining Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 7. Spatial Data Mining Page 7 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Database Primitives for Spatial Data Mining Rules - spatial characteristic rule general description of spatial data - spatial discriminant rule description of features discriminating or contrasting a class of spatial data from another class - spatial association rule description of implications a set of features has to another set of features Thematic Maps Presentation of a spatial distribution of a few attributes. There are two different types: raster and vector. 8. Spatial Data Mining Page 8 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Database Primitives for Spatial Data Mining Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 9. Spatial Data Mining Page 9 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Neighborhood Relations Database Primitives for Spatial Data Mining The major difference between mining in relational databases and mining in spatial databases is that attributes of the neighbors of some object of interest may have an influence on the object itself. Various factors of mutual influence: topology distance direction Generic representation of spatial objects: sets of points Point p = (p1, p2, ... pd) Spatial object O 2Points 10. Spatial Data Mining Page 10 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Neighborhood Relations (Topological Relations) Database Primitives for Spatial Data Mining Invariant under topological transformations, like rotation, translation or scaling. Topological Relations between A and B: A disjoint B A meets B A overlaps B A equals B A contains B B inside A A covers B B covered-by A 11. Spatial Data Mining Page 11 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Neighborhood Relations (Distance Relations) Database Primitives for Spatial Data Mining Let dist be a distance function, let be one of the arithmetic predicates or =, let c be a real number and let O1 and O2 be spatial objects, i.e. O1, O2 2Points. Then a distance relation A distancec B holds iff dist(O1,O2) c. Distance Relations between A and B: A distance= 0 B A distance= c B c A distance< c B c distinction between source objects O1 and destination objects O2 one representative point rep(O1) of the source object (e.g. the center of the object) is compared to all points of the destination object 12. Spatial Data Mining Page 12 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Neighborhood Relations (Direction Relations) Database Primitives for Spatial Data Mining The representative point of an object is used as the origin of a virtual coordinate system and ist quadrants define the directions. Direction Relations: 9 relations (north, east, south, west, northeast, northwest, southeast, southwest, any_direction) For each pair of spatial objects at least one relation holds. The direction relation between two objects may not be unique. The smallest direction relation (in terms of a partial order) is called the exact direction relation. B north A C east A D east A D south A D southeast A C D B A rep(A) 13. Spatial Data Mining Page 13 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Neighborhood Relations Database Primitives for Spatial Data Mining There are different kinds of neighborhood relations: topological relations distance relations direction relations These relations can be combined by logical operators (and) as well as (or). The result is called a complex neighborhood relation. 14. Spatial Data Mining Page 14 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 15. Spatial Data Mining Page 15 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases There are different methods for discovering knowledge: generalization-based methods for mining spatial characteristic and discriminant rules aggregate proximity technique for finding characteristics of spatial clusters two-step spatial computation technique for mining spatial association rules 16. Spatial Data Mining Page 16 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 17. Spatial Data Mining Page 17 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Generalization-Based Knowledge Discovery Knowledge Discovery in Spatial Databases Generalization means a reduction of attribute values to a certain (small) set of categories ( concept hierarchy). This reduction often requires the existence of background knowledge. two algorithms: - spatial-data-dominant generalization - non-spatial-data-dominant generalization 18. Spatial Data Mining Page 18 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Spatial-Data-Dominant Generalization Knowledge Discovery in Spatial Databases Algorithm - Collect all data described by the query. - Perform generalization first on the spatial data until the generalization threshold is reached. - Retrieve and analyze non-spatial data for each spatial object Computational complexity is O(N log N), where N is the number of spatial objects. 19. Spatial Data Mining Page 19 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Non-Spatial-Data-Dominant Generalization Knowledge Discovery in Spatial Databases Algorithm - Collect all data described by the query. - Perform generalization on the non-spatial attributes until the generalization threshold is reached. - Merge together neighboring areas with the same generalized attributes. Computational complexity is O(N log N), where N is the number of spatial objects. 20. Spatial Data Mining Page 20 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Generalization-Based Knowledge Discovery Knowledge Discovery in Spatial Databases Problem of generalization: Hierarchies may not be present a priori Quality of mined characteristic rules depents much upon the given concept hierarchies. Solution: Algorithm that does not depend on spatial concept hierarchies. 21. Spatial Data Mining Page 21 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 22. Spatial Data Mining Page 22 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Spatial Clustering Knowledge Discovery in Spatial Databases Clustering means to divide all objects in different groups (clusters) so that all members of a cluster are as similar as possible whereas the members of different clusters differ as much as possible from each other. In the spatial context objects are compared by their distance. The central object of a group is called medoid. Non-medoid objects belong to the nearest medoid object. Clustering algorithms (basic concepts): 1. Find k medoids randomly and calculate for all other objects the nearest medoid. 2. Find better medoids from the non-medoid objects, so that the overall distance decreases. 3. Exchange new found medoid and reallocate non-medoid objects. 4. Repeat step 1 or 2 or break. 23. Spatial Data Mining Page 23 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Spatial Clustering: Algorithms (1) Knowledge Discovery in Spatial Databases Examples of clustering algorithms: PAM (Partitioning Around Medoids) - always search the best partner for exchange CLARA (Clustering LARge Applications) - use PAM on a random subset - iterate pass CLARANS (Clustering Large Applications based on RANdomizes Search) - exchange immediately if possible - iterate pass (parameterized) In experiments CLARANS turned out to be most efficient. 24. Spatial Data Mining Page 24 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Spatial Clustering: Algorithms (2) Knowledge Discovery in Spatial Databases Based upon CLARANS there are two spatial data mining algorithms: - SD(CLARANS) spatial dominant CLARANS Clustering of all spatial attributes using CLARANS Generalization of non-spatial attributes for each object. - NSD(CLARANS) non-spatial dominant CLARANS Generalization of non-spatial attributes For each generalized tuple, spatial data is collected and clustered using CLARANS If possible merge clusters 25. Spatial Data Mining Page 25 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 26. Spatial Data Mining Page 26 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Spatial Association Analysis Knowledge Discovery in Spatial Databases Spatial association rule (X, Y sets of spatial or non-spatial predicates, c% confidence): X Y (c%) Example: is_a (x, school) close_to (x, park) (80%) "80% of schools are close to parks" Only association rules that apply to a high percentage of objects are interesting (minimum support threshold) Only association rules that have a high confidence are interesting (minimum confidence threshold) 27. Spatial Data Mining Page 27 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Overview Knowledge Discovery in Spatial Databases Introduction: Spatial Data Structures Database Primitives for Spatial Data Mining Overview Neighborhood Relations Knowledge Discovery in Spatial Databases Overview Generalization Clustering Association Analysis Conclusion 28. Spatial Data Mining Page 28 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Lessons learned Conclusion Spatial Data Mining extends relational data mining with respect to special features of spatial data, like mutual influence of neighboring objects by certain factors (topology, distance, direction). Spatial data mining is based on techniques like generalization, clustering and mining association rules. Some algorithms require further expert knowledge that can not be mined from the data, like concept hierarchies. 29. Spatial Data Mining Page 29 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Effects on Active Campus Garching Conclusion Need of data mining, especially association rules Need of efficient algorithms Not only spatial but also temporal or better spatial-temporal data mining will be crucial. Combination of different concepts (especially from the last three talks). 30. Spatial Data Mining Page 30 / 30 Dec. 19, 2001 Elmar Witte, [email protected] Bibliography [1] Agrawal, Imielinski, Swami. Mining Association Rules between Sets of Items in Large Databases [2] Ester, Frommelt, Kriegel, Sander. Spatial Data Mining: Database Primitives, Algorithms and Efficient DBMS Support [3] Schober. Seminararbeit: Spatial Data Mining [4] Knecht. Data Mining Verfahren: bersicht [5] Koperski, Han, Adhikary. Spatial Data Mining ... merry christmas ! THE END.