Topological Data Analysis and Network Coverage
EE122 Project, Spring 2014
Rey Blume, Eric Chu
TDA: Motivation and Intro
• http://www.coloquios.info/ponencias/MBGT-TopologicalDataAnalysis.pdf• Algebraic topology • Increasing amount of data produced, high-dimensional, large amount of
data• Qualitative information to make sense and structure of data• Metrics and coordinates used to calculate statistical values (means,
distances, etc.) often unjustified, e.g. biological problems• Clustering algorithms brittle to choice of epsilon• Why topology?
– Study of qualitative geometric information, connectivity– Less sensitive to actual choice of metrics, coordinate-free– Functoriality (inclusion maps between spaces – in our case, complexes – allow
us to make conclusions at the global level from local pieces)
• TDA: Applications
Basic ApproachEx. Torus
Algebraic Topology: Topology
• ‘Rubber-sheet geometry’
• Continuity
• Different levels of ‘sameness’
– Homeomorphism, homotopy equivalence
– Homology computationally tractable
– Ex.
Simplicial Complexes
• Thm. Triangulation of a space X is a simplicial complex K with a homeomorphism |K| -> X– Choice of triangulation doesn’t matter
– Therefore, can go in reverse: point-cloud -> complex -> space
• A simplicial complex K is a set of simplices where– Any face of K is also in K
– Intersection of any two simplices i, j in K is a face of both i and j
Simplicial Complexes
• Cech Complex– Intersection of epsilon /2 balls = edge
– Too computationally intensive for anything n-simplex with n > 1
• Therefore, Rips Complex– Pair-wise computations: if distance < epsilon
– Add high dimension simplices whenever possible (all its faces have been added)
– Not homotopy equivalent to the cover of the set, but seems to work reasonably well
Rips Complex
Homology Groups
• Groups– Set of elements with an operation that satisfy closure, associativity, identity element, inverse element– Ex. Integers with addition; Integers with multiplication is NOT a group (no inverses); symmetry group (e.g.
square with rotations)
• Chain groups– k-chain = sum of oriented k-simplices– C_k (K) := kth chain group, set of all chains – Relate chain groups of successive dimensions through the boundary operator
• Alternating sum of its faces
Homology Groups
• Chain complex C*
• kth cycle group
• kth boundary group
• Homology groups from chain groups
– “cycles mod boundaries” - boundaries become identity element in new group
– Cycles that aren’t boundaries are holes
Betti Numbers
• Rank of nth homology group = nth Betti number
• Computation of rank is just linear algebra given simplices– Rank-Nullity Thm.
• Euler-Poincare Formula – Summation of Betti numbers related to Euler Characteristic (topological invariant)
• Betti-0 = # connected components
• Betti-1 = # loops
• Betti-2 = # cavities
• Ex. Circle, Torus, Solid Sphere
Persistent Homology and Barcodes
• Want to ignore noise, capture features that persist
• Increasing epsilon creates different complexes over time
• Barcodes simply neat way of capturing that information– Current research includes
statistical methods to analyze barcodes
Persistent Homology and Barcodes
Network Coverage: Problem, Classical Solutions
• most work fell into one of two groups - approaches that utilized geometric analysis to obtain an exact answer and those that sought a non-deterministic approximation but assumed significant capabilities of the sensors.
• The former approach requires a great deal of prior knowledge about the geometry of the domain and the exact location of the sensors, or at least exact distances for every pair of sensors. The latter does not require this exactness, but often requires a uniform distribution of nodes or a high level of intelligence in the sensors
• http://www.elizabethmunch.com/math/research/ElizabethMunch-TimeVaryingPersistence.pdf
Network Coverage: Why TDA?, Applications
• Ghrist– GPS can be unattractive due to: cost, power consumption, accuracy
limitations
• Coverage problem can be solved if we have:– Exact knowledge of the coverage area shape,– Exact knowledge of each sensors’ position, and– Centralized information gathering and processing
• But, using TDA, can solve even if we have:– Unknown coverage area shape– Crude proximity information– Centralized information gathering processing (still need this one)
• Topology gives global information from local inputs
Network Coverage: Why TDA?, Applications
• Especially applicable for ad-hoc networks, which are a hot area of research, entering public usage
– new iPhone mesh network functionality
– Egypt, openmeshnetwork
• Robotic sensors
Simulation: Intel Lab Data
• http://db.csail.mit.edu/labdata/labdata.html
• 54 sensors in Intel Berkeley Research Lab collecting humidity, temporate, light, etc.
• Computations done using Javaplex and Matlab
Simulation: Intel Lab Data, Euclidean Position
Max_filtration = 100, num_divisions = 100, vietoris rips
Simulation: Intel Lab Data, Euclidean Position
Max_filtration = 100, num_divisions = 50, vietoris ripsSmaller divisions = greater time b/n homology calcuation = lose some granularity
Simulation: Intel Lab Data, Euclidean Position
Max_filtration = 100, num_divisions = 10, vietoris rips
Simulation: Intel Lab Data, Complexity
• Rips complex construction can be a bottleneck
– # of simplices for intel example
• Witness/Lazy-witness creates far fewer simplices than rips
– Landmark points
• 1) random sampling of landmark points L
• 2) greedy inductive selection process called sequential maxmin
• Formal definitions of each
Simulation: Intel Lab Data, Complexity
Simulation: Intel Lab Data, Complexity
• Results: # simplices, run-time for each complex under different parameters
• Num_divisions = 50
– Rips: t=21.2005s, num_simplices=342540
– Witness; Lazy Witness:
• 20 L pts: 0.9828, 6195; 0.6864, 6195
• 30 L pts: 1.2792, 31930; 0. 1.0920, 31930
• 40 L pts: 3.9156, 102090; 3.6816, 102090
Witness with only 20 landmark pts is still largely accurate
Simulation: Intel Lab Data, Connectivity Data
• Data: probability that sensor A will be able to talk to sensor B– Asymmetric
• Create 1-simplex if P(A,B) > thres && P(B,A) > thres
• Create 2-simplex if directional pairs in triplet > thres
• Global connectivity data from local data• Studies show poor correlation between distance
and signal anyway
Threshold = 0.05, 3, 0, 2026
Threshold = 0.3, 3, 349Threshold = 0.1, 3, 2, 981
Threshold = 0.3, 4, 118Threshold = 0.3, 4, 7, 96
Threshold = 0.5, 11, 18Threshold = 0.5, 11, 1, 3
Threshold = 0.7, 46, 0
Another Dataset/Moving Network?
• http://crawdad.cs.dartmouth.edu/all-byname.html• - Dataset of mobility traces of taxi cabs in San Francisco, USA. !!• - Dataset of WiFi-based connectivity between basestations and
vehicles in urban settings.• - Dataset of received signal strength indication (RSSI) collected from
within an indoor office building.• - Dataset of coverage and performance-related information of
MetroFi, a 802.11x municipal wireless mesh network in Portland, Oregon in 2007. !!
• - Data set consisting of measurements from two different wireless mesh network testbeds (802.11g and 802.11a).
• http://www.wings.cs.sunysb.edu/wiki/doku.php?id=mutli-channel-dataset
Future Research
• Ns-3, mobile ad-hoc network– Google Loon, Facebook Drones
• Distributed homology calculation• Pursuit evasion problem: Betti-0 = 0 over time