20
Noisy Group Testing and Boolean Compressed Sensing Venkatesh Saligrama Boston University Boston University Collaborator: George Atia

Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Noisy Group Testing and Boolean Compressed Sensing

Venkatesh SaligramaBoston UniversityBoston University

Collaborator: George Atia

Page 2: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

OutlineOutlineExamples …

Problem Setup– Noiseless Problem

A erage error• Average error, • Worst case error• Approximate reconstruction

– Noisy Problem• Additive Noise (False Alarms)• Dilution Effect (Misses)

Background MaterialBackground Material

Information-Theoretic Analysis: Achievability & Converse– Tradeoffs between #tests, #defectives, Noise, ,

Page 3: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

What is Group testing?What is Group testing?Few soldiers in a large population have disease g p p– Detect by testing pooled blood samples

• Compressed Sensing in 1940s!!

Page 4: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Applications

DNA Screening (clones) [Du,Hwang’00]

Streaming [Gilbert-Strauss 08]– Telephone Calls

Compressed Sensing [Muthukrishnan 05]

Congested IP Link [Nguyen-Thiran07]

S t Vi l ti i C iti R di [Ati S S 08]Spectrum Violation in Cognitive Radio [Atia-S-S 08]MAC layer model of packet drops

XXXX          XX            XXXXXXXX        X      XXX     XX     XXXXTimeMedium        few  large few  medium……

Page 5: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

The group testing problemThe group testing problemGiven– N items: R1, R2, R3, …, RN ∈ {0,1}– K defectives: Rm = 1

• j ∈ S ⊂ {1 2 N}

A

j ∈ S ⊂ {1, 2, … N}

Test Matrix: C = [Xmn],

B

C D

– Xmn = 1 Put nth item in pool(test) m

Output (Ym) – mth test Outcome: +ve or –vep ( m)– Channel(noiseless)

Y = C RY = C R

Page 6: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Illustrative ExampleIllustrative ExampleY 0 0  1  0  1  1  0  0  0  0  1  0  1  0  0 

1 0 0 0 0 1 1 0 1 0 1 0 0 0 0

TestsItems 1  2  3 … T 

1

Defectives

1  0  0  0  0  1  1  0  1  0  1  0  0  0  0 

0 0  1  0  1  1  0  0  0  0  0  0  1  0  0 

0 0  1  0  1  1  0  0  0  0  1  0  1  0  0 

1

2

N

Non-Adaptive vs Adaptive

1  1  0  1  0  0  0  1  0  1  0  1  0  0  0 N

Non Adaptive vs AdaptiveAlgorithm: Greedy (Matching algorithm)

Page 7: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Noiseless ProblemNoiseless ProblemGiven, Sample size N, and Defective size K, p

– Find a Test Matrix • Small Misclassification Error• #Tests = T minimum

Misclassification– Defectives indexed by

– Decoder Indicator function

– Pointwise Error:

Indicator function

– other errors

Page 8: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Problem Statements (Noiseless)Problem Statements (Noiseless)

Our Focus: Does a design matrix exist? g

– Average Error is small

– Worst-case error is small

– Asymptotic Approaches Zero with large K and N. y p pp g

– Distortion:

Distance function

Page 9: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Noisy Case 1Noisy Case 1

Additive noiseAdditive noise

W Bernoulli(q) m=1 2 TWm Bernoulli(q), m=1, 2, …T

Motivation:

•False alarms of tests•Background losses (Wireless)Background losses (Wireless)

Page 10: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Noisy case 2: Dilution effectNoisy case 2: Dilution effect0 0Situation when item is diluted in the pool

1 1

1-s

s

Situation when item is diluted in the pool

1 1

•Dilution effects in blood tests or DNA screeningP b bili ti d i l t i i (A il ’08)•Probabilistic adversarial transmission (Asilomar’08)

•Link Losses [Nguyen-Thiran07]

Allowable transmission pattern

Dilution

0 1 1 0 1 0 0 0 1 0 …...

1-s s s 1-s

Actual transmission pattern 0 0 1 0 1 0 0 0 0 0 …...

Page 11: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

ProblemProblem

Does there exist a matrix C Does there exist a matrix C

d

– Small misclassification error

Averaged over noise

Small misclassification error– worst-case, average-case, distortion,

asymptoticsy p

Page 12: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Prior Work: Noiseless CasesPrior Work: Noiseless CasesAdaptive and non adaptive group testing (Du, p p g p g ( ,Hwang’2000)

Non Adaptive• Superimposed codes (Kautz and Singleton’64)• Deterministic designs (Dyachkov and Rykov’83) g y y

(Ruszinko’94)(Erdos’85)(Ngyuen’88)(Porat’08)• Random Designs (Dyachkov’76,’82)(Sebo’85)(Macula’96)• Compressed sensing and approximate identification(Gilbert’08)• Two-Stage Disjuntive Testing: (Berger-Levenshtein 2002)

Page 13: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Our Approach/ContributionOur Approach/ContributionRandom Coding perspectiveg p p

– Information theoretic relationship• Misclassification vs Existence of Random Matrix• Misclassification vs. Existence of Random Matrix

– Mutual Information Formula for different problemsM i i b d• Meets existing bounds

– Extensions to new problems Noisy Case

Page 14: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Main Result Main Result

Theorem (average error 0 asymptotically) if:( g y p y) f

sufficiency

Necessity

Xmn generated i.i.d. Bernoulli pmn g pX(K) corresponds to collection associated with K defective itemsX(i) subset of i defective items in K (that are mis-classified)Necessity: FANO BoundNecessity: FANO Bound

Page 15: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Avg Error InterpretationAvg Error Interpretation#ways  i of the K items are mis‐classified 

Amount of Info if K‐i items revealed

0Noiseless Case Computation

(1‐p)k‐1 H(p)

Prob(X(K‐1) = 0)

(1 p) H(p)

Noiseless- average Pe

Page 16: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

X(K‐1)X(1)

i=1channel

Typical Set Decoding

X(K‐2)

(1)Ychannel

i=2X

1 error typical, 2 errors typical ….

M D ff lchannelX(2)Y

Main Difficulty:– Channel Coding:

Another Codeword is Typical is Independent f T t O t t Y

i=KX

of Test Output Y.

– Here: Another Collection can be overlapping with i=K

channel YX(K) overlapping with True Collection

Errors: •True collection not typical•Another collection typical

P(Ei): Probability that a set which differs from true coalition in exactly i users is jointly typical

Page 17: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Summary of ResultsSummary of ResultsN items; K defectives; T Pools/Tests

Noiseless- average Pe

Noiseless- Max Pe (exact reconstruction)Compare with CS

With distortion (approximate reconstruction)

Additive Noise (False alarms)

Dilution Effects

Page 18: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

ConclusionsConclusionsRandom Coding Analysis of Group Random Coding Analysis of Group Testing

Mutual Info Expression

Easy to Compute

Extensions to Noisy Cases

Page 19: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

Identity through interference fingerprints

MAC layer model of packet drops

XXXX XX XXXXXXXX X XXX XX XXXXXXXX          XX            XXXXXXXX        X      XXX     XX     XXXXTime

Primary observation y(t)Violation? identify culprits

Medium        few  large few  medium……

of packet drops Violation? identify culprits

Discrimination step Identification step

Our Data is: Collision/No collision in the different time slots

Ati S h i d S li D ’08

1 0 0 1 1 0 0 1 1 1 0 0 0 1 0… 

19

Atia, Sahai and Saligrama, Dyspan’08Atia, Saligrama and Sahai, Asilomar ‘08

Page 20: Venkatesh Saligrama Boston UniversityBoston Universitypeople.ee.duke.edu/~lcarin/saligrama.pdff T t O t t Y X i=K of T est O utput Y . – Here: Another Collection can be overlapping

TheoremFor Constant Time till conviction Tc and sparse number of culprits (K) we can support as many users N with throughput of order p*N (i e fixed utilization) order p N (i.e. fixed utilization)

20