I2.1: In-Network Storage NS-CTA INARC Meeting 23-24 March 2011
Cambridge, MA
Slide 2
Research Objective Devise techniques to enhance the efficacy of
data dissemination and query processing in an information network
that is distributed over an unreliable communication network
Provenance Dissemination References M. Srivatsa, W. Gao and A.
Iyengar. Provenance driven Data Dissemination in Disruption
Tolerant Networks. Under review (Fusion 2011) (IBM/INARC &
PSU/CNARC) W. Gao, A. Iyengar, M. Srivatsa and G. Cao. Supporting
Cooperative Caching in Disruption Tolerant Networks. In IEEE Intl
Conference on Distributed Computing Systems (ICDCS), 2011
(IBM/INARC & PSU/CNARC) Y. Zhang, W. Gao, G. Cao, T. La Porta,
B.Krishnamachari, and A. Iyengar Social-Aware Data Diffusion in
Delay Tolerant MANETs". Book chapter to appear in Handbook of
Optimization in Complex Networks: Communication and Social
Networks, Springer. (IBM/INARC & PSU/CNARC) W. Gao, A. Iyengar
and M. Srivatsa. System and Method for Caching Provenance
Information. Patent filed. (IBM/INARC & PSU/CNARC) Discussions
with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew
Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN
in-network caching code to run experiments on traces typical of
military scenarios
Slide 7
2010 IBM Corporation7 Disruption Tolerant Networks (DTNs)
Opportunistic and intermittent network connectivity Low node
density Unpredictable node mobility
Slide 8
2010 IBM Corporation8 Network Model Network contact graph at
time t An edge iff the nodes i and j have contacted before time t
Each edge is modeled by a contact process (e.g., a homogeneous
Poisson process) Pairwise inter-contact time is exponentially
distributed Parameter: pairwise contact rate We can predict when
the next contact will happen
Slide 9
2010 IBM Corporation9 Basic Idea Semi-ring provenance model d =
a v (b ^ c) Provenance level partial order: {} < {b}, {c} <
{a}, {b, c} < {a, b, c} Quantifying marginal provenance level of
a data item d = f(a 1, , a n ) d = a. (b + c) -> w d b = w d c =
; w d a = Utility-based data placement A unified probabilistic
framework Two caching nodes optimize their caches upon contact
Partition Map Network Contact Graph Routing Table (opportunistic
path) Worker Message Queue Overlapped P">
Preliminary Experiments: Running Sedge on DTNs Data Web graph:
30M vertices, 150M edges # of partitions = # of nodes in contact
graph Contact Graph 1.Complete Contact Graph: 12 nodes 2.MIT
Bluetooth Contact: 9 nodes, 24 hours
(http://crawdad.cs.dartmouth.edu) Assumption: pairwise contact
follows Poisson distribution. Other sophisticated contact types are
also supported by Sedge. Query h-step Random Walk Query
Slide 24
Complete Contact Graph RW with random start =average contacts
per superstep
Cyclades Cyclades is Sedge for DTNs BSP model: a synch
corresponds to a contact in DTN Naively running Sedge on DTNs may
be inefficient (e.g., requires a large number of contacts ->
high query latency) Revisit graph query processing with DTN
constraints Machines have a high data-rates to each other whenever
they are in contact Contact opportunities may be rare Cyclades
innovations: New metric: minimize the number of contacts (synchs in
BSP) Algorithms based on intra-machine speculative execution and
inter-machine opportunistic aggregation We present results on
computing shortest paths based metrics (e.g., node
betweenness)
Slide 27
Cyclades initial results Dataset: DBLP data with 1.2M papers
Information network Each paper is a node Two nodes have an
undirected edge if they have a common author Weight of an edge is
the inverse of Jacquard distance between the author lists of the
papers Problem Compute node betweenness centrality
Slide 28
Cyclades initial results Approach Partition nodes into
clusters; define a perimeter node as a node that has an edge to a
node in another cluster Intra-cluster: compute all-pair shortest
path matrix M between perimeter nodes (using only edges within the
partition) Inter-cluster: on an opportunistic contact between two
partitions i and j, merge the all-pair shortest path matrices M i
and M j into M ij Guarantee: M ij is the all-pair shortest path
matrix on perimeter nodes in the merger of partitions of i and
j
Slide 29
Cyclades initial results Clustering on DBLP data with 100k and
200k nodes Shows the number of perimeter nodes and edges And the
maximum size of clustered partitions
Slide 30
Cyclades initial results Comparison of four shortest path
algorithms Centralized (Dijkstra algorithm) Pregel random (Pregel
with random partitioning) Pregel cluster (Pregel with node
clustering) Cyclades (requires provably minimum number of contacts
in the BSP model; guarantees on communication and computation cost
being explored)
Slide 31
Cyclades initial results Number of synch operations and
communication cost between partitions Each synch operation requires
at least one DTN contact Since contacts in DTNs are rare, we have
to minimize the number of synchs, to ensure low query latency
Slide 32
Cyclades initial results Improved communication cost in our
approach comes at the cost of higher computation and storage cost
Our approach tradeoffs the number of synch operation (query
latency) with computation and storage cost
Slide 33
Cyclades initial results Computing the node betweenness
centrality Randomly chosen uv pairs Random walk sampling (picks
nodes with high degree) Expansion sampling (greedily picks nodes
with maximum expansion: |N(S)|/|S| Figure shows the accuracy of
node betweenness with number of samples
Slide 34
Military and Network Science Relevance Tactical military
networks: intermittent connectivity, multiple modalities of
communication, unreliable communication Needs disruption tolerance
data dissemination and query answering Needs the ability control
tradeoffs between quality and performance Enhancing trust in
decision making: provenance dissemination and distributed trust
computation Joint analysis of communication and information network
to enhance the efficacy of information delivery and query
processing Examine a spectrum of expressiveness of information
network models
Slide 35
Path Ahead In-network analytics and query answering on DTNs
(with CNARC) Examine diversity based (partial) redundancy
elimination mechanisms; quantify tradeoffs between quality and
performance of query answering Characterize graph query processing
algorithms that benefits from hierarchical decomposition and/or
speculative execution Determine better graph partitioning, sampling
and clustering strategies to enhance query processing
Slide 36
Collaborations Within task Numerous telecons Xifengs student
(UCSB) -> IBM (summer 2011) to work on distributed graph query
processing in DTNs Within INARC I2.2: Provide query execution
interface for the DTN (battlefield) context; modify distributed
information network processing platform for DTNs. I1: I2.1 provides
a scenario and data set for investigation of QoI metrics for data
pools (in I1.2) that maximize quality of fusion; I2.1 offers a test
scenario for algorithmic advances in I1.1 that focus on improving
inference in resource constrained networks With CNARC (Guohong Cao,
PSU C2.1) Research interest in DTNs and in-network storage Guohongs
student (PSU/CNARC) -> IBM/INARC (summer 2010 & 2011) to
work on delta encoding in DTNs With IRC (Vikas Kawadia, BBN T2.3)
Distributed trust computation over in-network storage
Slide 37
Impact Publications M. Srivatsa, W. Gao and A. Iyengar.
Provenance driven data dissemination in disruption tolerant
networks. Under submission, Fusion 2011 W. Gao, A. Iyengar, M.
Srivatsa and G. Cao. Supporting Cooperative Caching in Disruption
Tolerant Networks. In ICDCS 2011 Y. Zhang, W. Gao, G. Cao, T. La
Porta, B.Krishnamachari, and A. Iyengar Social- Aware Data
Diffusion in Delay Tolerant MANETs". Book chapter to appear in
Handbook of Optimization in Complex Networks: Communication and
Social Networks, Springer W. Gao, A. Iyengar and M. Srivatsa.
System and Method for Caching Provenance Information. Patent filed
F. Le, M. Srivatsa, A. Iyengar and G. Cao. Resolving Negative
Interferences between In-Network Caching Methods. Under preparation
Demo/Transitions Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher,
Guohong Cao, PhotoNet: A Similarity-aware Image Delivery Service
for Situation Awareness, IPSN Demo, April 2011 Collaborations with
Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre
(ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN in-network
caching code to run experiments on traces typical of military
scenarios