Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Noisy Group Testing and Boolean Compressed Sensing

Venkatesh SaligramaBoston UniversityBoston University

Collaborator: George Atia

OutlineOutlineExamples …

Problem Setup– Noiseless Problem

A erage error• Average error, • Worst case error• Approximate reconstruction

– Noisy Problem• Additive Noise (False Alarms)• Dilution Effect (Misses)

Background MaterialBackground Material

Information-Theoretic Analysis: Achievability & Converse– Tradeoffs between #tests, #defectives, Noise, ,

What is Group testing?What is Group testing?Few soldiers in a large population have disease g p p– Detect by testing pooled blood samples

• Compressed Sensing in 1940s!!

Applications

DNA Screening (clones) [Du,Hwang’00]

Streaming [Gilbert-Strauss 08]– Telephone Calls

Compressed Sensing [Muthukrishnan 05]

Congested IP Link [Nguyen-Thiran07]

S t Vi l ti i C iti R di [Ati S S 08]Spectrum Violation in Cognitive Radio [Atia-S-S 08]MAC layer model of packet drops

XXXX XX XXXXXXXX X XXX XX XXXXTimeMedium few large few medium……

The group testing problemThe group testing problemGiven– N items: R1, R2, R3, …, RN ∈ {0,1}– K defectives: Rm = 1

• j ∈ S ⊂ {1 2 N}

A

j ∈ S ⊂ {1, 2, … N}

Test Matrix: C = [Xmn],

B

C D

– Xmn = 1 Put nth item in pool(test) m

Output (Ym) – mth test Outcome: +ve or –vep ( m)– Channel(noiseless)

Y = C RY = C R

Illustrative ExampleIllustrative ExampleY 0 0 1 0 1 1 0 0 0 0 1 0 1 0 0

1 0 0 0 0 1 1 0 1 0 1 0 0 0 0

TestsItems 1 2 3 … T

1

Defectives

1 0 0 0 0 1 1 0 1 0 1 0 0 0 0

0 0 1 0 1 1 0 0 0 0 0 0 1 0 0

0 0 1 0 1 1 0 0 0 0 1 0 1 0 0

1

2

N

Non-Adaptive vs Adaptive

1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 N

Non Adaptive vs AdaptiveAlgorithm: Greedy (Matching algorithm)

Noiseless ProblemNoiseless ProblemGiven, Sample size N, and Defective size K, p

– Find a Test Matrix • Small Misclassification Error• #Tests = T minimum

Misclassification– Defectives indexed by

– Decoder Indicator function

– Pointwise Error:

Indicator function

– other errors

Problem Statements (Noiseless)Problem Statements (Noiseless)

Our Focus: Does a design matrix exist? g

– Average Error is small

– Worst-case error is small

– Asymptotic Approaches Zero with large K and N. y p pp g

– Distortion:

Distance function

Noisy Case 1Noisy Case 1

Additive noiseAdditive noise

W Bernoulli(q) m=1 2 TWm Bernoulli(q), m=1, 2, …T

Motivation:

•False alarms of tests•Background losses (Wireless)Background losses (Wireless)

Noisy case 2: Dilution effectNoisy case 2: Dilution effect0 0Situation when item is diluted in the pool

1 1

1-s

s

Situation when item is diluted in the pool

1 1

•Dilution effects in blood tests or DNA screeningP b bili ti d i l t i i (A il ’08)•Probabilistic adversarial transmission (Asilomar’08)

•Link Losses [Nguyen-Thiran07]

Allowable transmission pattern

Dilution

0 1 1 0 1 0 0 0 1 0 …...

1-s s s 1-s

Actual transmission pattern 0 0 1 0 1 0 0 0 0 0 …...

ProblemProblem

Does there exist a matrix C Does there exist a matrix C

d

– Small misclassification error

Averaged over noise

Small misclassification error– worst-case, average-case, distortion,

asymptoticsy p

Prior Work: Noiseless CasesPrior Work: Noiseless CasesAdaptive and non adaptive group testing (Du, p p g p g ( ,Hwang’2000)

Non Adaptive• Superimposed codes (Kautz and Singleton’64)• Deterministic designs (Dyachkov and Rykov’83) g y y

(Ruszinko’94)(Erdos’85)(Ngyuen’88)(Porat’08)• Random Designs (Dyachkov’76,’82)(Sebo’85)(Macula’96)• Compressed sensing and approximate identification(Gilbert’08)• Two-Stage Disjuntive Testing: (Berger-Levenshtein 2002)

Our Approach/ContributionOur Approach/ContributionRandom Coding perspectiveg p p

– Information theoretic relationship• Misclassification vs Existence of Random Matrix• Misclassification vs. Existence of Random Matrix

– Mutual Information Formula for different problemsM i i b d• Meets existing bounds

– Extensions to new problems Noisy Case

Main Result Main Result

Theorem (average error 0 asymptotically) if:( g y p y) f

sufficiency

Necessity

Xmn generated i.i.d. Bernoulli pmn g pX(K) corresponds to collection associated with K defective itemsX(i) subset of i defective items in K (that are mis-classified)Necessity: FANO BoundNecessity: FANO Bound

Avg Error InterpretationAvg Error Interpretation#ways i of the K items are mis‐classified

Amount of Info if K‐i items revealed

0Noiseless Case Computation

(1‐p)k‐1 H(p)

Prob(X(K‐1) = 0)

(1 p) H(p)

Noiseless- average Pe

X(K‐1)X(1)

i=1channel

Typical Set Decoding

X(K‐2)

(1)Ychannel

i=2X

1 error typical, 2 errors typical ….

M D ff lchannelX(2)Y

Main Difficulty:– Channel Coding:

Another Codeword is Typical is Independent f T t O t t Y

i=KX

of Test Output Y.

– Here: Another Collection can be overlapping with i=K

channel YX(K) overlapping with True Collection

Errors: •True collection not typical•Another collection typical

P(Ei): Probability that a set which differs from true coalition in exactly i users is jointly typical

Summary of ResultsSummary of ResultsN items; K defectives; T Pools/Tests

Noiseless- average Pe

Noiseless- Max Pe (exact reconstruction)Compare with CS

With distortion (approximate reconstruction)

Additive Noise (False alarms)

Dilution Effects

ConclusionsConclusionsRandom Coding Analysis of Group Random Coding Analysis of Group Testing

Mutual Info Expression

Easy to Compute

Extensions to Noisy Cases

Identity through interference fingerprints

MAC layer model of packet drops

XXXX XX XXXXXXXX X XXX XX XXXXXXXX XX XXXXXXXX X XXX XX XXXXTime

Primary observation y(t)Violation? identify culprits

Medium few large few medium……

of packet drops Violation? identify culprits

Discrimination step Identification step

Our Data is: Collision/No collision in the different time slots

Ati S h i d S li D ’08

1 0 0 1 1 0 0 1 1 1 0 0 0 1 0…

19

Atia, Sahai and Saligrama, Dyspan’08Atia, Saligrama and Sahai, Asilomar ‘08

TheoremFor Constant Time till conviction Tc and sparse number of culprits (K) we can support as many users N with throughput of order p*N (i e fixed utilization) order p N (i.e. fixed utilization)

20

Documents

Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping