Combs, Needles, Haystacks: Balancing Push and Pull for …graphics.stanford.edu/~jgao/qingfeng_comb.pdf · Combs, Needles, Haystacks: Balancing Push and Pull for Discovery in Large-Scale

Combs, Needles, Haystacks: Balancing Push andPull for Discovery in Large-Scale Sensor Networks

Xin Liu

Department of Computer ScienceUniversity of CaliforniaDavis, CA 95616, USA

Emails: [email protected]

Qingfeng Huang and Ying Zhang

Palo Alto Research Center (PARC) Inc.3333 Coyote Hill Road

Palo Alto, CA 94304, USAEmails:qhuang, [email protected]

Abstract— In this paper we investigate efficient strategies forsupporting on-demand information dissemination and gatheringin large-scale wireless sensor networks. In particular, we proposea “comb-needle” discovery support model resembling an ancientmethod: use a comb to help find a needle in sands or a haystack.The model combines push and pull for information disseminationand gathering. The push component features data duplicationin a linear neighborhood of each node. The pull componentfeatures a dynamic formation of an on-demand routing structureresembling a comb. It enables us to investigate the cost of aspectrum of push and pull combinations for supporting discoveryand query in large scale sensor networks. Our results showsthat the optimal routing structure depends on the frequency ofquery occurrence and the spatial-temporal frequency of relatedevents in the network. The benefit of balancing push and pullfor discovery in large scale geometric networks are demonstrated.We also raise the issue of query coverage in unreliable networksand investigate how redundancy can improve the coverage viaboth theoretical analysis and simulation.

I. I NTRODUCTION

The goal of this work is to investigate efficient and reliablerouting mechanisms for supporting queries on large scalead hoc wireless sensor networks. Recently, wireless sensornetwork has attracted a lot of attention from the researchcommunity [1], [2], [3], [4], [5], [6], [7], [8]. Many emergingsensor network applications involve dissemination of observedinformation to interested clients and demand efficient dissem-ination mechanisms due to resource limitations. For instance,a sensor network might be deployed for enhancing soldiers’battle field situation awareness when visibility is low (eitherin the night or due to smoke), as shown in Figure 1. Asolider might be interested in where the tanks are in thebattlefield. The nodes detecting the tanks can periodicallypush (broadcast) the information throughout the network inanticipation of soldier’s needs, as shown in Figure 2(a).This push-based information dissemination strategy is efficientwhen there are many solders in the network constantly in needof the information. But when the demand for the informationis low, this push-based strategy incurs high communicationoverhead since lots of broadcast bandwidth are wasted as thedemand for the data is low. Alternatively, the system maychoose a pull-based information dissemination strategy, asshown in Figure 2(b). The soldier broadcasts a query for the

information when it is needed, and when the nodes that havethe information receive query, they send the information tothesoldier. Even though this pull-based strategy has exact querybroadcast process, it is relatively more efficient than the push-based strategy when the frequency of query is relatively lowcompared to the frequency of the interested events, becausecommunication takes place only when it is needed. A naturalquestion is: can we combine the advantages of both push andpull strategies and build an efficient query-support mechanismthat adapts to the frequencies of query and events?

Fig. 1. A Query Example in Ad Hoc Networks

In this paper, we propose a novel “comb-needle” querysupport model that integrates both push and pull data dissem-ination, and analyze its the performance in large scale ad hocwireless networks. Using this model, we are able to explore awhole spectrum of push and pull strategies shown in Fig.3. Inthe comb-needle model, each sensor node pushes its data toa certain neighborhood and the query is disseminated only toa subset of the network. More specifically, the query processbuilds a routing structure dynamically that resembles a comband the sensor nodes push the data duplication structure likea needle, as illustrated in Figure 4. One can view this query

Fig. 3. A Spectrum of Push-Pull Strategies

(a)

(b)

Fig. 2. (a) Pure push-based strategy (b) Pure pull-based strategy

process like combing for needles in the sands or haystack. Thedesirable properties of this mechanism include:

• In most cases, the comb-needle strategy is more efficientthan both pure push and pure pull strategies. To someextend, pure push and pull strategies can be consider asthe end points of the combing strategy.

• The combing granularity is dynamically adjustable basedon demand. No infrastructure is builta priori.

(a)

(b)

Fig. 4. A Comb-Needle Example

We analyze the communication cost of the comb-needle strat-egy based on query frequency and event frequency, explorethe issue of query coverage and reliability in the case of faultynetworks, and study adaptive strategies for the case where thefrequencies of query and events are unknowna priori andtime-varying.

This study adopts a few assumptions about the environment,the network and the application scenario:

• the event of interest can occur randomly at anywhere atany time.

• the query entry point can be anywhere in the network andoccur at any time.

• the network is ad hoc and uniform, in the sense that allnodes are equivalent and the network does not have build-in hierarchy.

• the network is geometric and 2d. All nodes in the networkhave information of their locations in the 2d space.

The remainder of the paper is organized as follows: in Sec-tion II, we describe and analyze the comb-needle mechanism.In Section III, we present simulation results for supportingqueries on random grid networks. In Section IV, we explorethe coverage issues in an unreliable network and present morerobust strategies. We then propose the adaptive comb-needlestrategy in Section V, which is followed by a related workand discussion section, and a conclusion section.

II. BALANCING PUSH AND PULL

A. Motivation

We consider the global discovery type query such as “whereare all the tanks?”, “what locations have a temperature exceed-ing 90 degrees?” etc. For such queries, the whole network needto be traversed by the underlying query resolution mechanismin order to get a complete response. Note that not all queriesrequire a global discovery. Queries like “are there 5 tanks inthe region?” could only traverse part of the network (up to thepoint where the answer is positive.)

For resolving a global query, a frequently used querydissemination mechanism is the flooding-based querying(FBQ)[9]. In FBQ, the underlying query support mechanismfloods the request to the whole network. All nodes withrelevant information respond. The flooding often representthe most significant message overhead for such queries. Notethat we are only interested in one-shot query, rather thancontinuous query. In a continuous query, directed diffusiontype of schemes work sufficiently well since the initial floodingcost can be amortized over the duration of the continuousinformation flow from the sensor nodes to the querier.

FBQ represents a pull-based query support mechanism. Asnoted in the introduction, there is a potential advantage incombining a push based strategy with the pull based strategyto increase query efficiency. For instance, let’s consider asimple grid network like the one in Figure 5. Nodes(4, 2)and (7, 6) detected some abnormal eventE, and push theinformation vertically in the networks, and have it temporarilystored on the nodes traversed. Node(0, 8) is an entry point ofa discovery query interested in such event. Note that the queryonly needs to be disseminated horizontally to be resolved,i.e., the information pull only need to occur horizontallyrather than throughout the network. Assume that the eventsare only detected once at the two nodes, and the query onlyoccurred once as well, the total message cost for supportingthe query is about 30 by this push-pull model, comparing theapproximately 100 messages needed for the FBQ method. In

the present hardware state, communication cost often weightsmuch more than the storage cost, so this push-pull strategyseems desirable. One also can see that the relative benefit ofthis push-pull strategy really depends on the relative rates ofoccurrence of the events and query in the network. If there’smore queries, there is more saving in the message overheadfor supporting the queries. In this section we introduce a moregeneral pull-push model and analyze its performance.

Fig. 5. A Special Push-Pull Strategy

B. The Combing Strategy

The push-pull query support model we propose resemblesan action of combing for needles in a haystack or in a poolof sand, and is thus dubbed as the “comb-needle” model. Thecomb-needle query support model combines both push andpull in the following way: each sensor node pushes its datato a certain neighborhood and the query node disseminates itsrequest to a subset of the network. More specifically, the queryprocess builds a routing structure dynamically that resembles acomb and the sensor nodes push the data duplication structurelike a needle, as illustrated in Figure 4.

To analyze mathematically the cost and benefit of a querysupport mechanism, we need to have a model of the worldthat the sensor network is monitoring. In this paper, weassume the following random event and query model: eventsoccur uniformly in space and time across the sensor network,and the discovery query entry point can be any node in thenetwork with the same probability. Accordingly, we define twoparameters:

fq: the arrival frequency of discovery queries.fe: the arrival frequency of relevant events.

C. Analysis of the Combing Strategy

For simplicity, like in the earlier example, first consider agrid network withn2 nodes located at (i,j) where0 ≤ i, j < n.Each node sends its state update to2l of its vertical neighbors.For instance, node(i, j) will send its state update to nodes

(i, j + 1), (i, j + 2), ..., (i, j + l) and(i, j − 1), (i, j − 2), ...,(i, j − l). An example is shown in Fig. 6 withl = 2. Each

Fig. 6. Vertical Data Duplication

data push incurs costCl ∼ 2l (1)

in terms of the number of hops. We assume per-hop commu-nication costs for the query propagation and data propagationare the same.

The query process first sends the query vertically then fansout horizontally. For instance, assume that node(0, 0) in Fig. 7is the query source. A query is sent vertically from(0, 0)

Fig. 7. Combing

to (n − 1, 0), and it is also fanned out horizontally fromnodes(0, 0), (s, 0), (2s, 0),..., (bn−1

scs, 0), and (n − 1, 0).

The resulting routing structure looks like a comb. The querydissemination costCqd, in terms of network hops, is

Cqd = n− 1 + (n− 1)(bn− 1

sc+ 1) (2)

Note that if s = 2l + 1, the query dissemination reaches itsfinest level, in the sense that it can collect the sensing datafrom every node.

The cost of query response depends on the reply schemeused including data aggregation strategies used, if any. Forsimplicity, we first consider the simplest reply scheme withoutdata aggregation. Each node on the comb replies immediatelyif it has the matching data, including those data pushed fromthe nodes not on the comb . The reply paths are the reversesof the comb growing paths for the simplicity of analysis. Inreality, the reply path could be a shorter one found by anyad hoc geometric routing algorithm, since the location of thedestination is known.

Assume there is a match in node(i, j), where i =0, s, 2s, .... The reply cost is

cqr = j + i (3)

In the case when there is more than one matching data in atime window of τ , and they are uniformly distributed in thenetwork with densityfeτ/n2, the total reply cost is about:

Cqr ∼ (n− 1) ∗ feτ (4)

Assume that the query frequency isfq, the data eventfrequency isfe, andτ = 1 (in a unit time window).

The total cost per query:

C = Cqd + Cqr + fe ∗ Cl/fq (5)

' Cqr + n− 1 +(n− 1)2

s+ s ∗ fe

fq

(6)

≥ Cqr + n− 1 + 2(n− 1)

√

fe

fq

(7)

This minimum of the cost is around

soptimal ∼ (n− 1)

√

fq

fe

∼√

N

√

fq

fe

(8)

whereN is the total number of nodes in the network. One cansee that the result tells us that if the data frequency is relativelyhigher than the query frequency, the comb should be finer, i.e.,with smaller inter-spike spacings. In other words, when wehave more queries and less related events/data, the “combingdegree”S = s should be larger and the “push degree”L =2l+1 should be larger, i.e., more push and less pull is better insuch scenarios. On the other hand, when there are more eventsand less queries,L and S should be smaller. Furthermore,note that the scenario shown in Figure 7 is a global-pull-local-push one. From formula 8 we can see that whenfq > fe,soptimal ∼>

√N ∼ n. What really means is that whenfq >

fe, the global-pull-local-push model is no longer an optimalone, but rather, local-pull-global-push scenarios becomemoreeconomic. We will discuss this in more detail in Section VI.Henceforth, we focus on thefq ≤ fe scenario. The case wherefq < fe is essentially a symmetric one.

Note that the baseline cost (using FBQ) per query isO(N). So, comparing to the naive flooding based approach,

this comb-needle scheme dramatically reduces query costto O(

√N). Note that the possibility of achieving this cost

reduction partly relies on some knowledge about the expecteddata and query frequency. Note also that the duplication ofdata in neighboring nodes incurs some storage cost as well.In this study, we focus on communication cost by assumingcommunication (energy) is a more restricted resource thanstorage in sensor networks. Last, throughout the paper inanalysis, we assume that the per-hop communication costsfor query propagation, data duplication, and data report arethe same. Such an assumption can be easily generalized byincluding a weight in the communication cost.

III. S IMULATION

In this section, we verify the theoretical results given in theprevious sections via simulation. While in the previous sectionwe make simplified assumption about the network model, insimulation we adopt a more realistic one.

A. Radio Propagation Model

The radio propagation model determines the strength of atransmitted signal at a particular point of the space for alltransmitters in the system. Based on this information the signalreception conditions for the receivers can be evaluated andcollisions can be detected. The transmission model is givenby:

Prec,ideal(d)← Ptransmit

1

1 + dγ, where 2 ≤ γ ≤ 4 (9)

Prec(i, j)← Prec,ideal(di,j)(1 + α(i, j))(1 + β(t)) (10)

wherePtransmit is the signal strength at the transmitter andPrec,ideal(d) is the ideal received signal strength at distanced, α and β are random variables with normal distributionsN(0, σα) andN(0, σβ), respectively. A network is asymmetricif σα > 0 or σβ > 0. A node j can receive a packet fromnodei if Prec(i, j) > ∆ where∆ is the threshold. There is acollision if two transmissions overlap in time and both couldbe received successfully. Furthermore, an additional parameterperror models the probability of a transmission error by anyunmodeled effects.

B. Topology Model

While our analysis in the previous section is based on aregular grid network, in simulation we use a random gridtopology model which is more realistic in some deploymentscenarios. An instance of the random grid network is generatedas follows, first the nodes are put on a regular grid, then thexand y coordinates of each nodes is shifted independently viaa random offset generated from a uniform distribution.

C. Routing Protocol

Unlike in a regular grid network, building the comb-needlequerying in a random grid network needs extra attention,since the “left-right-up-down” in the random network is nolonger well-defined, and the “combs” and the “needles” canno longer be straight lines. To overcome this problem, we

build approximations of the combs and needles using a Con-strained Geographical Flooding (CFG) method, similar to theillustration in Figure 4(b). The “needles” and the spikes onthe “comb” has a width ofW , besides their lengths.

CGF is robust and also simple to implement. In CGF,whenever a new packet arrives, each node will decide if itshould rebroadcast the packet according to the geographicalconstraints for the type of packet. More specifically, for aquery packet, a node will broadcast only if it is at the combbranch, for an event packet, a node will broadcast only ifit is vertically within the duplication distance to the source;for a report packet, a node will broadcast only if it is at thecomb branch and closer to the query node. If the radio range islarger than the grid distance, duplicated number of packetsarereceived at each node while the total number of packets arestill bounded by the size of comb or needle. This increasesthe coverage and thus the robustness. The pseudo code ofthe protocol is shown in Figure 8, whereS, L and W arethe inter-spike distance, the needle push length and the CFGwidth, respectively.

D. The Simulation Environment

Prowler [10] is used for the simulation of comb-needlequery model. Prowler [10] is a probabilistic wireless networksimulator based on the event-driven structure. It can be setto operate in either deterministic mode (to produce replicableresults while testing the application) or in probabilisticmodethat simulates the nondeterministic nature of the communi-cation channel and the low-level communication protocol. Inparticular, Prowler provides the radio model in (9,10) anda MAC layer that simulates the Berkeley motes’ CSMAprotocol, including random waiting and back offs.

A topology model, an application model and a performancemodel are developed on the top of Prowler. The topologymodel is used to generate networks with grid or randomtopologies, the application model is to specify propertiesandlocations of sources and destinations, source/event rate andquery rate, etc. The performance model provides performancemetrics such as latency, success rate/loss rate, throughput,and energy consumption (for simplicity, we assume eachtransmission cost one unit of energy).

E. Performance Evaluations

Our simulations are all performed on a 10x10 network usingProwler’s default radio model, withσα = 0.45, σβ = 0.02,perror = 0.05 and ∆ = 0.1. Each test is performed with10 runs and then averaged (with query and event locationsrandomly selected). The transmit signal strength is set to1,so that the maximum radio range is about3d, whered is thedistance between two neighbor nodes in the grid. Figure 9shows an instance of the network connectivity of this radiomodel for a grid network with small random offset. Twoseparate tests are done in this environment. The first test isto discover what is the best spacing for the comb, givenquery rate and event rate, so as to verify the theoreticalresults discovered in the paper. The second test is to show

receivedqueryq at w doif new(q) andatComb(q.x, q.y, w.x, w.y, S,W ) then

broadcast q;events = getEvents(q);if events 6= ∅ then

broadcast r(events, q.x, q.y, w.x, w.y);end

endend

receivedevente at w doif new(e) andatNeedle(e.x, e.y, w.x, w.y, l,W ) then

copy e to memorybroadcast e;

endend

received reportr(events, qx, qy, sx, sy) at w doif new(r) andqx = w.x andqy = w.y do

return events;endif new(r) andatPath(qx, qy, sx, sy, w.x, w.y,W ) then

broadcast r;end

end

function out = atComb(qx, qy, wx, wy, S,W )d = mod (|qy − wy|, S);out = |qx − wx| < W or min(d, S − d) < W ;

function out = atNeedle(ex, ey, wx, wy, L,W )out = |ex − wx| < W and |ey − wy| < L + W ;

function out = atPath(qx, qy, sx, sy, wx, wy,W )out = |sy − wy| < W and (wx − sx)(qx − wx) > 0 or|qx − wx| < W and (wy − sy)(qy − wy) > 0

Fig. 8. Constrained geographical flooding for the comb-needle query model

0

Fig. 9. An instance of network connectivity

0

Fig. 10. A random network (grid with large offset)

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

900

1000

Simulation time (seconds)

En

erg

y co

nsu

mp

tio

n

l=0 s=1l=1 s=3l=2 s=5l=3 s=7

Fig. 11. Energy consumption:fq = 0.1, fe = 1

the robustness of the protocol with a varying query width fora grid network with large random offset (Figure 10).

The first test was run on a network like Figure 9. We firstlet event rate be 1 packet per second (p/s), and query rate be0.1 p/s (fq/fe = 0.1). Let the push lengthl be 0, 1, 2, 3and comb spacingS be 1 + 2l, i.e., 1, 3, 5, 7, respectively.The simulation runs 100 seconds and Figure 11 shows theresult of energy consumption. In this case,l = 1, S = 3 hasthe minimum energy consumption. This results matches wellwith the predication using formula (8). We then reduce theevent rate to 0.1 (fq/fe = 1). Figure 12 shows the resultof energy consumption. In this case,S = 5 and S = 7 areclose (withS = 5 slightly better) to be the minimum energyconsumption. It is partly due to the boundary effect, since bothS = 5 andS = 7 have the same number of nodes in the comb(b 95c = b 97c, but S = 5, l = 2 has shorter needles.

The second test was run on a network like Figure 10. In thiscase, we letfq = 0.1, fe = 1, l = 1 andS = 3, and set thequery widthW to 0.5, 0.6, 0.7, 0.8, and 0.9. Figure 13 showsthe success rate with respect to different CGF width. We can

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

500

600

700

800

900

1000


En

erg

y co

nsu

mp

tio

n

l=0 s=1l=1 s=3l=2 s=5l=3 s=7

Fig. 12. Energy consumption:fq = 0.1, fe = 0.1

40 50 60 70 80 90 1000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


Su

cces

s ra

te

w=0.5w=0.6w=0.7w=0.8w=0.9

Fig. 13. Success rate for different query widths

see that the wider theW is, the higher the success rate. Figure14 shows corresponding the energy consumption; the wider thecost is higher. Clearly there is a trade-off between the successrate and the communication cost. Note that whenW goes to1.5, the combing is similar to flooding. This simulation resultsalso raise the issue of reliability of the query support routingmechanism. Due to the nature of wireless communication indistributed networks, the link between the nodes are not totallyreliable. For instance, there is certain probability that collisionsmay occur and data transmission is not successful. This ledus to investigate the impact of link failures on the routingstructure and the issue of query support reliability.

IV. RELIABILITY

As discussed in the last section, wireless links are unreliabledue to the nature of wireless communications. In addition,sensor nodes are deployed in harsh environment unattended,and thus prone to failures. In this section, we first analyzethe effect of node failures on the query and report, and then

40 50 60 70 80 90 100200

400

600

800

1000

1200

1400

1600


En

erg

y co

nsu

mp

tio

n

w=0.5w=0.6w=0.7w=0.8w=0.9

Fig. 14. Energy consumption for different query widths

study reliable query-and-report strategies, namely, local re-enhancement and spacial redundancy schemes.

A. Query Coverage

First, we analyze the effect node failures on the querycoverage. There are usually two types of failures: transientfailure and permanent failure. Because wireless channels areunreliable and time-varying [11], [12], one possible causeof atransient failure is the fading environment, which is modeledby Eq. (10) in our simulation in Section III. When the channelbetween the transmitter and the receiver is in deep fading orcollisions occur during the transmission, link-layer failuresoccur. Examples of permanent failure include the situationswhere nodes run out of energy or are physically damaged. Weassume that each node has a failure probabilityp, which isindependent of other nodes. (The independent assumption isdiscussed later.) Note thatp can be a function of time andmodel the time-evolving reliability of the network: as timeelapses, more nodes run out of battery and become unreliable.We look at a snapshot of the network, and classify nodes asfailure nodes and active nodes.

We consider the general case where any node can generatea query request. The question is how large an area can receivethe query given the node failure model. The route of a querypropagation in Figure 15. If a node fails, then the querypropagation stops. In the figure, we indicate two node failures.The dashed lines with arrows indicate the designed query paththat fails because of node failures. This is a simple querypropagation scheme.

Now we calculate the average coverage area of queries bynode (i, j). Let A1(j) be the number of horizontal lines aquery message initiated by a node at rowj reaches andA2(i)be the number of nodes that a query reaches in a horizontalline initiated by a node in columni. Note thatA1(j) andA2(i) are random variables and leta1(j) and a2(i) be theirexpectations respectively. LetN(i, j) be the average number

Fig. 15. Query coverage

of nodes that a query from(i, j) can reach.

N(i, j) = E

A1(j)∑

k=1

A2(i)

= E(A1(j))E(A2(i))

= a1(j)a2(i).

BecauseA1(j) andA2(i) are independent, the second equationholds by the Wald’s equation (p. 417, [13]). We have

a1(j)

= 1 +

b j

sc∑

k=1

[(1− p)s]k

+

bn−j−1

s c∑

k=1

[(1− p)s]k

=1− (1− p)s(b j

sc+1)

1− (1− p)s+

1− (1− p)s(bn−j−1

s c+1)

1− (1− p)s− 1,

and

a2(i) = 1 +

i∑

k=1

(1− p)k +

n−1−i∑

k=1

(1− p)k

=1− (1− p)i+1

p+

1− (1− p)n−i

p− 1,

wherebxc denotes the largest integer that does not exceedx.Note thata1((n−1)/2) ≤ a1(j) ≤ a1(0) anda2((n−1)/2) ≤a2(i) ≤ a2(0) for all j andi. In words, queries from nodes inthe middle of the grid can reach more nodes on average thannodes near the boundary. For convenience, we assume thatnis an odd number. Ifn is even, then indexesn/2 andn/2− 1have the same property.

Ignoring the boundary effect, the percentage of nodes cov-ered by the query of node(i, j) is given by

η(i, j) = a1(j) ∗ a2(i) ∗ s/n2.

Herea1(j)∗a2(i) is the average number of nodes that receivethe query in the comb. Each node reached on a horizontal linecan cover2l + 1 = s nodes, including itself. In Figure 16, weshow the coverage of nodes(i, j) where0 ≤ i, j ≤ n− 1.

010

2030

4050

60

0

10

20

30

40

50

600.55

0.6

0.65

0.7

0.75

ji

cove

rage

Fig. 16. Query Reachability in the Network

B. Local Enhancement

In the previous section, we study the coverage property ofthe query message in the existence of node failures. Whenthere exists a single route between the query node and thedata node, any failure in the route will result the failure ofthequery. As the number of hops increases, the failure probabilityincreases dramatically. We now discuss more reliable query-and-report schemes. The basic idea of providing reliability isto use redundancy. There are usually two types of schemes toenhance the reliable transmission of a message.

a) Link Enhancement:Reliable data delivery throughvulnerable sensor networks has been a research challenge. Inparticular, the authors of [14] propose the usage of interleavingpaths to increase robustness. Figure 17 shows an interleavingmesh, where paths in the mesh are implicit and interleaving.In this design, all nodes send data to all nodes at the nexthop. The performance of such interleaving mesh is shown tobe very robust against both independent link-layer packet lossand node failures. A necessary and sufficient condition forthe route to exist is that there exists at least one active nodein each hop. If each hop hask nodes, and each node has afailure probability ofp, then the hop fails with probabilitypk,wherepk << p. The interleaves mesh is even more effectiveagainst independent link failures. For example, give a packet-loss probability of 0.3, 95% of packets can travel more than200 hops if each hop has three nodes [14]. CFG used in thesimulation in Section III exploits this type of strategy in arandom network. We should note that the interleaved meshdoes require additional power consumption for communica-tion overhead and to keep more nodes awake, although thebroadcast nature of wireless communication can be exploited,

b) Reroute: Another approach to provide more reliablequery and report is to develop alternative routes. Considerthecase where node(i, j) is communicating with node(i, j +1).In the previous studies, we assume that when a node(i, j +1)fails, the query propagation ends there. However, after leaning

Fig. 17. Interleaved mesh

the failure of node(i, j + 1), node (i, j) may detour theroute as(i + 1, j) → (i + 1, j + 1) → (i + 1, j + 2) →(i, j + 2). Algorithms studied in the literature for developingalternative/backup routes can be applied here to enhance thetransmission reliability.

Both the link enhancement and reroute schemes have theiradvantages and disadvantages. Link enhancement is simpleand has low overhead. There is minimum requirement on theMAC/routing protocols. For example, no retransmission oracknowledgment is required. (Of course, retransmission andacknowledgment can improve the reliability at the cost ofdelay and energy consumption.) The functionality of each nodecan be simple listening and forwarding. On the other hand,the reroute scheme requires more routing functionality at eachnode and possibly a routing table be attached to the message.To develop alternative routes incurs additional overhead anddelay in exchanging information. As discussed earlier, thereare transient failures and permanent failures. For transientfailures, especially in small time-scale, simple redundancyscheme may preform better while reroute is preferred forpermanent failures. Another important factor is the querysource. If the queries are generated by a small number of fixednodes, it may worth the overhead to develop alternative routes,especially in the existence of permanent node failures. If thequeries are generated by a large number of nodes randomly,it may be more cost-effective to use the link enhancementschemes.

C. Spacial Diversity

We note that failures may not be independent and randomlyin a network as usually modeled. Node failure can be highlycorrelated. For example, sensor nodes in one area may havea much higher probability of failure than other areas dueto some unusual incidents, such as water flooding, overheat,etc. In addition, node failure can propagate. When a nodefails, it neighboring nodes have to share the sensing andcommunication burden of the failed node. They consume moreenergy and die quickly. Thus, node failures propagate in thenetwork. The previous schemes, such as interleaving mesh,are not designed specifically to address such correlated nodefailures in the network. Thus, we propose the use of spacialdiversity to improve the reliability and coverage of query.

In Figure 18, we show a construction of query paths withspacial diversities. Basically, an additional vertical query prop-agation path is constructed. In each horizontal line, between

Fig. 18. level-2 redundancy

the two vertical lines, the query message can be passed ineither direction. If a node receives the same query messagefrom both directions, it stops the message propagation. Thebasic idea here is to provide spatial diversity and redundancywhile local enhancement can be used to for each path. Ifone part of the network is seriously damaged beyond localenhancement, say one of the vertical lines is gone, nodesin other part of the network can still receive query messagepropagated from the other vertical line.

Next, we calculate the probability of a node that receivesthe query message. We assume that a node is the center ofa (large-scale) failure with probabilityp. Assume that thequery message is generated by node(x, y). Let p(i, j|(x, y), k)be the probability that a node at(i, j) receives the querymessage when there is an additional vertical line at(k, y),i = 0, · · · , n − 1 and j = 0, s, 2s, · · · . Without loss ofgenerality, we consider the case0 ≤ x < n/2 and k > x.We consider three different cases, wherei < x, x ≤ i < k,and k ≤ i. When i < x, the node is on the left side of bothvertical lines. We have

c1 = (1− p)|y−j|

c2 = (1− p)2(k−x)+|y−j|

p(i, j|(x, y), k) = (c1 + c2 − c1c2)(1− p)x−i.

Wheny ≤ j < k, i.e., the node is between two vertical lines,we have

c1 = (1− p)|y−j|+i−x

c2 = (1− p)k−x+|y−j|+k−i

p(i, j|(x, y), k) = c1 + c2 − c1c2.

When j ≥ k, the node is on the right side of both verticallines. We have

c1 = (1− p)|y−j|+k−x

p(i, j|(x, y), k) = (2c1 − c21)(1− p)i−k.

010

2030

4050

60

0

10

20

30

40

50

600.74

0.76

0.78

0.8

0.82

0.84

0.86

0.88

ji

cove

rage

Fig. 19. Coverage probability when an additional vertical line is placed.

The average query success probability is

p(x, y) = maxk

∑

i,j

p(i, j|(x, y), k).

For nodes on the right-hand side of the vertical lines,pincreases ask increases. For all other nodes,p increasesas k decreases. However, numerical results show that theaverage success probability is not sensitive to the value ofk. In Figure 19, we show the query success probability of allnodes when the optimal value ofk is used. We haven = 60,p = 0.01, ands = 3, the same as in Figure 16. Compare withFigure 16, we observe that the query success probability hasbeen improved significantly.

The query-report success probability can be further im-proved by extending the needle length. For instance, wecan extend the vertical data duplication in both directions(approximately) equally and let the needle length be2S.Thus, each report can reach two nodes that are on the querypropagation path. If one of them receives the query successful,the query-report will be successful. We call this scheme a2-level redundancy scheme. In the basic comb scheme, wehave one vertical query propagation path, one horizontal querypropagation path, and one vertical data duplication path fromthe query node to a data note. To improve the reliability,redundancy is introduced to all three of them. To be morespecific, we allow one of the vertical lines, one of thehorizontal lines, and one of the data duplication lines to fail.The failure probability can be approximated by

Pf (i, j|(x, y), k) ≈ a1 + a2 + a3 − a1a2 − a2a3 − a3a1

+a1a2a3

≈ a1 + a2 + a3,

wherea1, a2, and a3 approximate the error probabilities offailures of both vertical lines, both horizontal lines, andboth

0 10 20 30 40 50 600

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

i

Fai

lure

Pro

babi

lity

Approx2−levelBasic

Fig. 20. Compare the difference between the calculation and approximation.

data duplication lines, respectively. We have

a1 = (1− (1− p)|y−j|)(1− (1− p)|y−j|+k)

a2 = (1− (1− p)|x−i|)2

a3 = (1− (1− p)q)2.

Furthermore, we can provide high-level redundancy protectionby providing more protections for vertical query, horizontalquery, and vertical data duplications.

In Figure 20, we show the error probabilities for querygenerated by node(0, 0) with k = 1 for data nodes(i, i),1 ≤ i ≤ n − 1. We setp = 0.005, n = 60, and s = 3.Node (0, 0) is chosen as the query node because it has thehighest failure probability on average. It is not surprising thatnodes closer to the query node has lower failure probabilityand nodes far away has high failure probability. It showsthat the above approximation works well. In the figure, wealso compare the redundancy scheme with the basic scheme.The spacial redundancy scheme significantly decreases theerror probability especially for smaller values ofp. Localenhancement (such as interleaving mesh) can be used toimprovep.

In summary, in this section, we study the effect of nodefailures on the query-report process. Reliable transmissionsstudied in the literature can be used to enhance reliabilitylocally and spacial redundancy can be implemented to furtherimprove the query coverage of wireless sensor networks forcorrelated network failures.

V. A DAPTIVE COMB-NEEDLE STRATEGY

The basic comb-needle strategy assumes the knowledge ofthe query and event frequencies and analyzes the performancein a snap shot. In practice, the query and event frequencies maybe time-varying, and thus a good query strategy should adaptto such changes. However, the values of the query and eventfrequencies may not be given. More importantly, other sensornodes may not know the value of the inter-spike interval (s)used by a query node. This motivates the development of theadaptive comb scheme, as discussed in this section. Simulationresults show that our adaptive scheme can track the changes

well and keep a high success rate while maintaining a lowcommunication cost.

We consider a time-slotted system where a query node isonly interested in the event happened in the most recent timeslot, i.e., τ = 1. Extension to the case whereτ > 1 is notdifficult. Let fq be the probability that a query is generated ina time slot. Let

fd =fe

n2

be the probability that a sensor node detects an event in a timeslot. In this case, we can rewrite Eq. (8) by

soptimal ∼√

fq

fd

.

Thus, to calculate inter-spike interval and the needle length, anode only needs to estimatefd andfq.

We assume that the distribution offd and fq are knownto both query nodes and data nodes. If such information isnot available, a default distribution can be assumed. For thequery node, it has an estimate offd and fq. (Initially, thebest estimate is the mean values offd andfq.) The larger thenumber of queries the node performs, the better the estimateoffd andfq are. Based on the estimate, the query node calculates.

Data nodes take a different approach. LetL the be totalneedle length at a data node. In particular, if the data nodedoes not duplicate its data to any other nodes, thenL = 1. Inaddition, we allow asymmetric data duplications to fine tunedthe needle length. For example, if the data is duplicated toone node south and to two nodes north, thenL = 4. Analysisin Section II can be extended to such a case. Initially, eachdata node estimates its own data frequencyfd. Based on theestimation offd and the knowledge of the distribution offq,each data node calculatesL such that

P (L > s) ≥ Pm,

wherePm a criterion for the importance of the data.Even if the needle length of a node is smaller than the inter-

spike interval, its data duplication may still reach one queryline if its distance to the line is smaller than the needle lengthin that direction. To elaborate, the success probability isL/sgiven L < s, and the over query success probability is

Ps = P (L ≥ s) + E

(

L

s|L < s

)

P (L < s)

= E

(

min

(

L

s, 1

))

.

In general, the estimates offd and fq at a data node maybe poor and so does the value ofL calculated based on theestimates, specially whenfd is small and the node missessome queries. Thus, thes value obtained from previous queryis important and should be used to estimateL. A node learnsabout the value ofs in two cases. First, a node may obtaininformation when its data is successfully queried. Second,weenhance the possibility that a node learns about the query by

0 500 1000 1500 2000 2500 30000

0.2

0.4

0.6

0.8

1

f q

0 500 1000 1500 2000 2500 30000

0.01

0.02

0.03

0.04

0.05

0.06

f d

time

Fig. 21. The query frequency and the event frequency.

implementing a rotation of the query horizontal duplications.Suppose that the current horizontal queries occur on lines0, s, 2s, · · · , b(n − 1)/sc ∗ s, then the next query occur onlines 1, 1 + s, 1 + 2s, · · · , 1 + b(n − 1)/sc ∗ s, and so on. Anode may not generate an event that is successfully queried.However, when it is on a spike in the comb, it obtains thequery informations. A node uses its most recent informationon s as an important factor to calculate its needle length.

Note that the knowledge of the distribution offd andfq isnot critical. A more important issue is that the needle lengthof data nodes should synchronize with the inter-spike interval.Since the query node can learn the value offd andfq fasterand better, it is important for data nodes to obtain the valueof s from the query node, which is enhanced by the queryrotation strategy. Given the value of the most recently obtaineds, the needle length at each data node is adjusted according toimportance of the data and the desired query success rate. Therotation scheme does not trigger additional communicationcost, but does require memory spaces. If there are a very largenumber of query nodes, a node may not be able to maintainthe required information for all query noes.

We simulate the adaptive scheme for 3000 time slots in a20 × 20 regular grid. Figure 21 shows the query frequencyand the data frequency. We assume node(0, 0) is the querynode, which generates query with probabilityfq in each timeslot. Each other node generates events independently withprobability fd in a time slot. In the adaptive scheme, both thequery node and data nodes estimate the values offq and fd

using a moving average. The inter-spike interval and the needlelength are calculated based on the estimations. In addition, thequery rotation scheme is implemented in the simulation.

The adaptive scheme is compared with the ideal case, wherethe query node knowsfq andfd precisely, and all data nodesknow the value of currents. Such the ideal case consumes theleast amount of energy to guarantee 100% query successes.

In this simulation, the query-report success ratio of theadaptive scheme is99.33% and the total communication costis 99.64% of the ideal case. The query success ratio is theratio of the successful reports in the adaptive scheme overthat of the ideal one. The communication cost is defined as the

0 500 1000 1500 2000 2500 30000

5

10

15

Time

S

SimuIdeal

Fig. 22. The query frequency and the event frequency.

total cost to report events to the query node, which includesquery cost, data duplication cost, and data report cost. Wecount one hop of query/event as one unit of communicationcost. The estimated case consumes less energy at the cost oflower success ratio. In particular, there are cases where needlesare not long enough to reach the query line. Such events arenot reported back to the query node. There exists a tradeoffbetween the success probability and the communication cost.

Figure 22 compares the ideal comb size and the adaptivescheme wherefd and fq are estimated. We can see thatthe adaptive scheme can track the changes in the ideal case.The fluctuation is due to the estimation errors onfq and fd,especially onfq, at the query node.

Table I compares the comb scheme with that of push andpull in different time periods where the values offd and fq

differ. Each entry in the table represents a communicationcost normalized over that of the ideal comb. We can observethat the saving of the comb strategy varies asfd and fq

vary. The communication costs of pure-push and pure-pullstrategies are132% and 149% of that of the ideal case forthe total 3000 time-slot period on average. In all time periods,the adaptive scheme achieves over 99% success rate withnegligible additional communication cost compared to theideal case.

In the simulation, we fix the query node to be(0, 0).However, the exact strategy applies to the case where thequery node is moving in the network, such as a soldier ismoving within the deployed sensor network. The adaptivecomb strategy also applies to the case where different nodesgenerate query requests. In such case, similar queries should becombined. In particular, a possible query node should estimatethe value offq, which is the query frequency for all nodes,by observing the queries of its own and others. A query nodecan also use the current value ofs in the network to estimatethe inter-spike interval of its own query.

VI. RELATED WORK AND DISCUSSION

Sensor network applications often need support for periodicor sporadic monitoring of the environmental state. A simplestmethod for obtaining state from a large scale sensor network

is to flood the query for the named data to the whole networkand get responses from relevant sources. However, floodingin large-scale wireless networks is a very expensive operationand in general needsO(N) radio transmissions, whereN isthe number of nodes in the network.

Many strategies have recently been proposed to reduce thecost of discovery and query in large-scale wireless networks.Most relevant to this work includes three different types ofapproaches.

• The first approach is to improve flooding efficiencyby reducing the number of potentially redundant for-warding in the flooding process using neighborhoodtopological information[15], [16] or via probabilistic re-transmissions[17]. The goal of this approach can beviewed as to reduce the constant implicit inO(N). Incontrast, the comb-needle scheme achievesO(

√N) in

the best case by balancing push and pull.• The second approach in the literature is to reduce dis-

covery/query cost by taking into account applicationsemantics[18]. For instance, discovery queries like “isthere a tank in the field?” is potentially resolvable byonly traversing part of the network in many cases. Asa result, for this kind of queries, expanding-ring searchor ACQUIRE[9] have been demonstrated to be moreefficient than flooding. ACQUIRE considers the queryas an active entity that is forwarded through the networkeither randomly or in some directed manner in searchof a solution: nodes on the path that handle the activequery use information from all nodes within d-hops inorder to partially resolve the query. When the activequery is fully resolved, the query forwarding processstops and a complete response is sent back to the queryingnode. This type of scheme limits the scope of necessaryflooding by recognizing the specific nature of the query.Another outstanding example in this category is theInformation-Driven Query Routing (IDSQ) scheme[19].For instance, when tracking mobile entities, sensor nodescan keep a certain memory, so when a query like “whereare the tanks?” comes in, the underlying system canquickly route the query to the nodes with most currentinformation by following the traces of information exist inthe network. This again eliminates the needs for floodingin many cases. In a recent work, [18] investigated therelative benefit of push and pull diffusion for queries ofcontinuous type and arrived the conclusion of a need fora families of protocols catering to different applicationrequirements. This paper investigates and demonstratesthe benefit of taking into account both the application andthe environmental characteristics in designing efficientdata discovery and dissemination protocols.

• The third types of approach is to reduce the cost ofsearch by using efficient distributed indexing schemes.For instance, [7] described a distributed indexing schemecalled Semantic Routing Tree (SRT). SRT only supportsquery from a fixed node and constructs a route tree basedon historical data, so only applies well to slow changing

Time 1-500 501-1000 1001-1500 1501-2000 2001-2500 2501-3000Push 179% 131% 118% 169% 160% 114%Pull 156% 138% 157% 131% 139% 153%

TABLE I

COMPARE THE COMB STRATEGY(IDEAL CASE) WITH PURE-PULL AND PURE-PUSH STRATEGIES.

sensor readings. [20] proposed a distributed index formulti-dimensional data (DIM) which allows query to beissued from any node. DIM achieves an average eventinsertion cost ofO(

√N) and average query cost of

O(√

N) in most cases. [20] also mentioned DIM out-performs flooding as long as the ratio of the numberof insertions to the number of queries is less than

√N .

Earlier exploration in this direction includes GHT[21],DIMENSIONS[22] etc. The push component in ourcomb-needle scheme is similar in spirit with DIM in thesense that the data generated at a node are hashed/pushedto different locations. However, our scheme is muchsimpler and achieves similar cost savings. Furthermore,we investigate how the query support structure performwhen links are unreliable, an important issue which islargely neglected in earlier works.

While in the paper we only discussed supporting discoveryand query in a whole network, the use of comb-needle modelis not limited to that. The idea is also applicable for supportinggeographic queries like “how many tanks are there in regionx?”. For supporting such geographic queries, one traditionalstrategy is to geocast the query to the region, i.e., unicastthequery to the region, and then broadcast the query in the region.Similarly, we can choose to unicast the query to the region,then “comb” the region with desired granularity. Note also thatthe push component not only helps to reduce the query cost inthe comb-needle scheme, but also comes with the benefit ofdata redundancy which is desirable in an unreliable network.

Furthermore, as we shown in section II, given the values offe andfq, the optimal values of the comb widths is calculatedin Eq. 8. Note that1 ≤ s ≤ n. If fq ≤ (n−1)2fe, thens = 1,which is equivalent to the pure pull strategy that flooding thequery message in the network. On the other hand, iffq = fe,then the optimal strategy is the one shown in Figure 5. Iffq > fe, then a reversed combing strategy is desired asshown in Figure 23. An extreme of the reverse combing isthe pure pull-based strategy where each data node broadcastsits event to all other nodes. This highlights the principle ofthe comb-needle structure: adjust the communication strategybased on the frequency of the query and events. In particular,the lower the (relative) frequency of the query/event, the largerthe number of nodes it propagates to. Combined with thereverse comb, the comb strategy covers the whole spectrumof the push and pull strategies.

VII. C ONCLUSION

In this paper we proposed the comb-needle model, a simpleyet efficient data discovery scheme for supporting queries in

event

query

Fig. 23. Reverse comb

large-scale sensor networks. We also used the comb-needlemodel as a substrate for study the benefit of balancing pushand pull in data gathering and dissemination in large-scalewireless networks. The comb-needle scheme (including thereverse one) covers a spectrum of the push and pull strategies,with the pure push-based and pure pull-based schemes in twoextremes. We have demonstrated that in general, the comb-needle scheme performs better than both pure push-based andpure pull-based schemes. Furthermore, we raises and studiedthe issue of query reliability and query coverage when theunderlying communication channel is not reliable. Finally, weproposed and studied an adaptive comb-needle strategy forcases where the utilization patterns and environmental activityfrequency are time-varying. The contribution is highlightednot only by the theoretical analysis of the problems studied,but also by the confirmation of the theoretical observationsviasimulations. We note that the proposed scheme can be used inthe hierarchical structure as well. Further, data aggregationand compression can be integrated into the comb-needlestrategy to further reduce the communication cost. These areall interesting future work to explore. We also look forwardto further validating this work in actual large-scale sensornetwork test-beds in the future and exploring new issues thatmight arise in the real world.

REFERENCES

[1] D. Estrin, R. Govindan, J. S. Heidemann, and S. Kumar, “Nextcenturychallenges: Scalable coordination in sensor networks,” inProceedingsof the Fifth Annual ACM/IEEE International Conference on MobileComputing and Networking, 1999, pp. 263–270. [Online]. Available:citeseer.ist.psu.edu/estrin99next.html

[2] G. J. Pottie and W. J. Kaiser, “Wireless integrated network sensors,”Commun. ACM, vol. 43, no. 5, pp. 51–58, 2000.

[3] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, and J. Anderson,“Wireless sensor networks for habitat monitoring,” inProceedings ofthe ACM International Workshop on Wireless Sensor NetworksandApplications., Atlanta, GA, September 2002.

[4] J. Liu, D. Petrovic, and F. Zhao, “Multi-step information-directedsensor querying in distributed sensor networks,” inProceedings of theInternational Conference in Acoustics, Speech and Signal Processing(ICASSP), 2003.

[5] D. Li, K. Wong, Y. Hu, and A. Sayeed, “Detection, classificationand tracking of targets in distributed sensor networks,”IEEE SignalProcessing Magazine, vol. 19, no. 2, March 2002.

[6] B. Blum, P. Nagaraddi, A. Wood, T. Abdelzaher, S. Son, andJ. Stankovic, “An entity maintenance and connection servicefor sensornetworks,” The First International Conference on Mobile Systems,Applications, and Services (MobiSys), San Francisco, CA, May 2003.

[7] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong,“The designof an acquisitional query processor for sensor networks,” in Proceedingsof the 2003 ACM SIGMOD international conference on on Managementof data. ACM Press, 2003, pp. 491–502.

[8] Q. Huang, C. Lu, and G.-C. Roman, “Spatiotemporal multicastin sensornetworks,” inProceedings of the First ACM Conference on EmbeddedNetworks Sensor Systems (SenSys), Los Angelos, California, Nov. 2003.

[9] N. Sadagopan, B. Krishnamachari, and A. Helmy, “Active query for-warding in sensor networks,”Elsevier Journal of Ad Hoc Networks,2003.

[10] G. Simon, “Probabilistic wireless network simulator,” 2003,http://www.isis.vanderbilt.edu/projects/nest/prowler/.

[11] J. Zhao and R. Govindan, “Understanding packet delivery performancein dense wireless sensor networks,” inACM Sensys 2003, 2003.

[12] A. Woo, T. Tong, and D. Culler, “Taming the underlying challengesof reliable multihop routing in sensor networks,” inACM Sensys 2003,2003.

[13] S. M. Ross,Introduction to probability models, 7th ed. HarcourtAcademic Press, 2000.

[14] F. Ye, G. Zhong, S. Lu, and L. Zhang, “Gradient broadcast: A robustdata delivery protocol for large scale sensor networks,”ACM WirelessNetworks (WINET), vol. 11, no. 2, March 2005.

[15] W. Peng and X. Lu, “On the reduction of broadcast redundancy in mobilead hoc networks,” inProceedings of the ACM Symposium on Mobile AdHoc Networking and Computing (MOBIHOC), 2000.

[16] A. Qayyum, L. Viennot, and A. Laouiti, “Multipoint relaying: Anefficient technique for flooding in mobile wireless networks,” INRIA,Tech. Rep. Research Report RR-3898, Feb. 2000. [Online]. Available:citeseer.nj.nec.com/qayyum00multipoint.html

[17] S.-Y. Ni, Y.-C. Tseng, Y.-S. Chen, and J.-P. Sheu, “The broadcast stormproblem in a mobile ad hoc network,” inProceedings of the FifthAnnual ACM/IEEE International Conference on Mobile Computing andNetworking, August 1999, pp. 152–162.

[18] J. Heidemann, F. Silva, and D. Estrin, “Matching data disseminationalgorithms to application requirements,” inProceedings of the firstinternational conference on Embedded networked sensor systems. ACMPress, 2003, pp. 218–229.

[19] M. Chu, H. Haussecker, and F. Zhao, “Scalable information-drivensensor querying and routing for ad hoc heterogeneous sensornetworks,”International Journal of High Performance Computing Applications,vol. 16, no. 3, 2002.

[20] X. Li, Y. J. Kim, R. Govindan, and W. Hong, “Multi-dimensional rangequeries in sensor networks,” inProceedings of the first internationalconference on Embedded networked sensor systems. ACM Press, 2003,pp. 63–75.

[21] S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin, andF. Yu, “Data-centric storage in sensornets with ght, a geographic hashtable,” Mob. Netw. Appl., vol. 8, no. 4, pp. 427–442, 2003.

[22] D. Ganesan, D. Estrin, and J. Heidemann, “Dimensions: Why dowe need a new data handling architecture for sensor networks?” inProceedings of the First Workshop on Hot Topics In Networks (HotNets-I), Princeton, NJ, October 2002.

Documents

Combs, Needles, Haystacks: Balancing Push and Pull for …graphics.stanford.edu/~jgao/qingfeng_comb.pdf · Combs, Needles, Haystacks: Balancing Push and Pull for Discovery in Large-Scale