Complexity Theory Lecture 12 Lecturer: Moni Naor

Complexity Theory

Lecture 12

Lecturer: Moni Naor

RecapLast week:

Hardness and Randomness Semi-random sourcesExtractors

This Week:Finish Hardness and Randomness Circuit Complexity

The class NCFormulas = NC1

Lower bound for Andreev’s function

Communication characterization of depth

Derandomization

A major research question:• How to make the construction of

– Small Sample space `resembling’ large one– Hitting sets

Efficient.

Successful approach: randomness from hardness– (Cryptographic) pseudo-random generators– Complexity oriented pseudo-random generators

Extending the result

Theorem : if E contains 2Ω(n)-unapproximable functions then BPP = P.

• The assumption is an average case one• Based on non-uniformity

Improvement:Theorem: If E contains functions that require size 2Ω(n) circuits

(for the worst case), then E contains 2Ω(n)-unapproximable functions.

Corollary: If E requires exponential size circuits, then BPP = P.

How to extend the result

• Recall the worst-case to average case reduction For permanent• The idea: encode the function in a form allowing

you to translate a few worst case error into random errors

Properties of a code

Want a code C:0,12n 0,12ℓ

Where:• 2ℓ is polynomial in 2n • C is polynomial time computable

– efficient encoding– Certain local decoding properties

Codes and Hardness

• Use for worst-case to average case:truth table of f:0,1n 0,1

worst-case hard

truth table of f’:0,1ℓ 0,1average-case hard

0 1 0 01 0 1 0mf:

0 1 0 01 0 1 0 0 00 1 0C(mf)

Codes and Hardness

• if 2ℓ is polynomial in 2n, thenf E implies f’ E

• Want to be able to prove: if f’ is s’-approximable, then f is computable by a

size s = poly(s’) circuit

Codes and Hardness

Key point: circuit C that approximates f’ implicitly defines a received word RC that is not far from C(mf)

• Want the decoding procedure D to computes f exactly

0 1 1 00 0 1 0RC: 0 10 0 0

0 1 0 01 0 1 0 0 00 1 0

D C Requires a special notion of efficient decoding

C(mf):

Decoding requirements• Want that

– for any received word R that is not far from C(m), – for any input bit 1 · i · 2n can reconstruct m(i) with probability 2/3 by accessing only

poly(n) locations in R

Example of code with good local decoding properties: HadamardBut exponential length

This gives a probabilistic circuit for f of size poly(n) ¢ size(C) + size of decoding circuit

Since probabilistic circuit have deterministic version of similar size - contradiction

Extractor

• Extractor: a universal procedure for “purifying” imperfect source:

– The function Ext(x,y) should be efficiently computable– truly random seed as “catalyst”– Parameters: (n, k, m, t, )

seed

source stringnear-uniform

0,1n

2k strings Ext

t bitsm bits

Truly random

Extractor: Definition

(k, ε)-extractor: for all random variables X with min-entropy k:

– output fools all tests T:

|Prz[T(z) = 1] – Pry 2R 0,1t, xX[T(Ext(x, y)) = 1]| ≤ ε

– distributions Ext(X, Ut) and Um are ε-close (L1 dist ≤ 2ε)

Um uniform distribution on :0,1m

• Comparison to Pseudo-Random Generators– output of PRG fools all efficient tests– output of extractor fools all tests

Extractors: Applications

• Using extractors– use output in place of randomness in any application– alters probability of any outcome by at most ε

• Main motivation:– use output in place of randomness in algorithm– how to get truly random seed?– enumerate all seeds, take majority

Extractor as a Graph

0,1n

0,1m

2t

Size: 2k

Want every subset of size 2k to see almost all of the rhs with equal probability

Extractors: desired parameters

• Goals: good: optimal:short seed O(log n) log n+O(1)long output m = kΩ(1) m = k+t–O(1)many k’s k = nΩ(1) any k = k(n)

seed

source stringnear-uniform

0,1n

2k strings Ext

t bitsm bits

Allows going over all seeds

Extractors

• A random construction for Ext achieves optimal!– but we need explicit constructions

• Otherwise we cannot derandomize BPP

– optimal construction of extractors still open• Trevisan Extractor:

– idea: any string defines a function• String C over of length ℓ define a function fC:1… ℓ by

fC(i)=C[i]

– Use NW generator with source string in place of hard function

From complexity to combinatorics!

Trevisan Extractor• Tools:

– An error-correcting codeC:0,1n 0,1ℓ

• Distance between codewords: (½ - ¼m-4)ℓ – Important: in any ball of radius ½- there are at most 1/2 codewords.

= ½ m-2

• Blocklength ℓ = poly(n)• Polynomial time encoding

– Decoding time does not matter– An (a,h)-design S1,S2,…,Sm 1…t where

• h=log ℓ • a = δlog n/3 • t=O(log ℓ)

• Construction:Ext(x, y)=C(x)[y|S1

]C(x)[y|S2]…C(x)[y|Sm

]

Trevisan Extractor

Ext(x, y)=C(x)[y|S1]C(x)[y|S2

]…C(x)[y|Sm]

Theorem: Ext is an extractor for min-entropy k = nδ, with – output length m = k1/3 – seed length t = O(log ℓ ) = O(log n)– error ε ≤ 1/m

010100101111101010111001010

C(x):

seed y

Proof of Trevisan ExtractorAssume X µ 0,1n is a min-entropy k random variable

failing to ε-pass a statistical test T:

|Prz[T(z) = 1] - PrxX, y 0,1t[T(Ext(x, y)) = 1]| > ε

By applying usual hybrid argument:

there is a predictor A and 1 · i · m:PrxX, y0,1t[A(Ext(x, y)1…i-1) = Ext(x, y)i] >

½+ε/m

The set for which A predict well

Consider the set B of x’s such thatPry0,1t[A(Ext(x, y)1…i-1) = Ext(x, y)i] > ½+ε/2m

By averaging Prx[x 2 B] ¸ ε/2m

Since X has min-entropy k: there are at least

ε/2m 2k different x 2 B

The contradiction will be by showing a succinct encoding for each x 2 B

…Proof of Trevisan Extractori, A and B are fixed

If you fix the bits outside of Si to and and let y’ vary over all possible assignments to bits in Si. Then Ext(x, y)i = Ext(x, y’)i = C(x)[y’|Si

] = C(x)[y’]goes over all the bits of C(x)

For every x 2 B short description of a string z close to C(x) – fix bits outside of Si to and preserving the advantage

Pry’[P(Ext(x, y’)1…i-1)=C(x)[y’] ] > ½ + ε/(2m) and is the assignment to 1…t\Si maximizing the advantage of A

– for j ≠ i, as y’ varies, y’|Sj varies over only 2a

values!– Can provide (i-1) tables of 2a values to supply

Ext(x, y’)1…i-1

Trevisan Extractorshort description of a string z agreeing with C(x)

A

Output is C(x)[y’ ] w.p. ½ + ε/(2m) over Y’

y’

Y’ 0,1log ℓ

…Proof of Trevisan ExtractorUp to (m-1) tables of size 2a describe a

string z that has a ½ + ε/(2m) agreement with C(x)

• Number of codewords of C agreeing with z:on ½ + ε/(2m) places is O(1/δ2)=

O(m4) Given z: there are at most O(m4) corresponding

x’s• Number of strings z with such a description:

• 2(m-1)2a = 2nδ2/3 = 2k2/3

• total number of x 2 B O(m4) 2k2/3 << 2k(ε/2m)

Johnson Bound:A binary code with distance (½ - δ2)n has at most O(1/δ2) codewords in any ball of radius (½ - δ)n.

•C has minimum distance (½ - ¼m-

4)ℓ

Conclusion

• Given a source of n random bits with min entropy k which is n(1) it is possible to run any BPP algorithm with and obtain the

correct answer with high probability

Application: strong error reduction

• L BPP if there is a p.p.t. TM M: x L Pry[M(x,y) accepts] ≥ 2/3

x L Pry[M(x,y) rejects] ≥ 2/3

• Want: x L Pry[M(x,y) accepts] ≥ 1 - 2-k

x L Pry[M(x,y) rejects] ≥ 1 - 2-k

• Already know: if we repeat O(k) times and take majority– Use n = O(k)·|y| random bits;Of them 2n-k can be bad strings

Strong error reduction

Better: Ext extractor for k = |y|3 = nδ, ε < 1/6– pick random w R 0,1n

– run M(x, Ext(w, z)) for all z 0,1t • take majority of answers

– call w “bad” if majzM(x, Ext(w, z)) is incorrect

|Prz[M(x,Ext(w,z))=b] - Pry[M(x,y)=b]| ≥ 1/6

– extractor property: at most 2k bad w– n random bits; 2nδ bad strings

Strong error reduction

0,1n

0,1m

2t

Upper bound

on Size: 2k

Property: every subset of size 2k sees almost all of the rhs with equal probability

Bad strings for input at most 1/4

All strings for running the original randomized algorithm

Strings where the majority of neighbors are bad

Two Surveys on Extractors• Nisan and Ta-Shma, Extracting Randomness: A Survey and

New Constructions 1999, (predates Trevisan)

• Shaltiel, Recent developments in Extractors, 2002, www.wisdom.weizmann.ac.il/~ronens/papers/survey.ps

Some of the slides based on C. Umans course:www.cs.caltech.edu/~umans/cs151-sp04/index.html

Circuit Complexity

• We will consider several issues regarding circuit complexity

Parallelism

• Refinement of polynomial time via (uniform) circuits allow

circuit C

depth parallel time

size parallel work

depth of a circuit is the length of longest path from input to output

Represents circuit latency

Parallelism• the NC Hierarchy (of logspace uniform circuits):NCk = O(logk n) depth, poly(n) size circuits

Bounded fan-in (2)

NC = [k NCk

• Aim: to capture efficiently parallelizable problems• Not realistic?

– overly generous in size– Does not capture all aspects of parallelism– But does capture latency

• Sufficient for proving (presumed) lower bounds on best latency

What is NC0

Matrix Multiplication

• Parallel complexity of this problem?– work = poly(n) – time = logk(n)?

• which k?

n x n matrix A

n x n matrix B =

n x n matrix AB


arithmetic matrix multiplication…

A = (ai, k) B = (bk, j) (AB)i,j = Σk (ai,k x bk,

j)

… vs. Boolean matrix multiplication:

A = (ai, k) B = (bk, j) (AB)i,j = k (ai,k bk, j)

– single output bit: to make matrix multiplication a language: on input A, B, (i, j) output (AB)i,j


• Boolean Matrix Multiplication is in NC1

– level 1: compute n ANDS: ai,k bk, j

– next log n levels: tree of ORS

– n2 subtrees for all pairs (i, j)– select correct one and output

Boolean formulas and NC1

• Circuit for Boolean Matrix Multiplication is actually a formula.– Formula: fan-out 1. Circuit looks like a tree

This is no accident:

Theorem: L NC1 iff decidable by polynomial-size uniform family of Boolean formulas.


from small depth circuits to formulas• Proof:

– convert NC1 circuit into formula• recursively:

– note: logspace transformation • stack depth log n, stack record 1 bit – “left” or “right”


from forumulas to small depth circuits – convert formula of size n into formula of depth

O(log n) • note: size ≤ 2depth, so new formula has poly(n) size

D

C C1

1

C0

0

D

D

key transformation


– D any minimal subtree with size at least n/3 • implies size(D) ≤ 2n/3

– define T(n) = maximum depth required for any size n formula

– C1, C0, D all size ≤ 2n/3

T(n) ≤ T(2n/3) + 3

implies T(n) ≤ O(log n)

Relation to other classes

• Clearly NC µ P– P uniform poly-size circuits

• NC1 µ Logspace on input x, compose logspace algorithms for:

• generating C|x|

• converting to formula• FVAL C|x|(x)

– FVAL is: given formula and assignment what is the value of the output logspace composes!

Relation to other classes

• NL µ NC2:

Claim: Directed S-T-CONN NC2

– Given directed G = (V, E) vertices s, t– A = adjacency matrix (with self-loops)– (A2)i, j = 1 iff path of length at most 2 from node i to node j

– (An)i, j = 1 iff path of length at most n from node i to node j

– Compute with depth log n a tree of Boolean matrix multiplications, output entry s, t

• Repeated squaring!

– log2 n depth total

Boolean MM

NC vs. P

Can every efficient algorithm be efficiently parallelized?

NC = P• Common belief: NC ( P

?

P-Completeness

A language L is P-Complete if:• L 2 P• Any other language in P is reducible to L via a

Logspace reduction

P-complete problems are the least-likely to be parallelizableif a P-complete problem is in NC, then P = NC – we use logspace reductions to show problem P-complete and we

have seen Logspace in NC

Some P-Complete Problems

• CVAL – Circuit value problem– Given a circuit and an assignment, what is the value of

the output of circuit– Canonical P-Complete problem

• Lexicographically first maximal independent set• Linear Programming• Finding a happy coloring of a graph

NC vs. PCan every uniform, poly-size Boolean circuit family be

converted into a uniform, poly-size Boolean formula family?

NC1 = P

Is the NC hierarchy proper:

is it true that for all NCi ( NCi+1

Define ACk = O(logk n) depth, poly(n) size circuits withunbounded fan-in and gates

Is the following true:ACi ( NCi+1 ( ACi+1

?

Lower bounds

• Recall: NP does not have polynomial-size circuits (NP

P/poly) implies P ≠ NP

• Major goal: prove lower bounds on (non-uniform) circuit size for problems in NP– Belief: exponential lower bound

• super-polynomial lower bound enough for P ≠ NP – Best bound known: 4.5n

• don’t even have super-polynomial bounds for problems in NEXP!

Lower bounds

• lots of work on lower bounds for restricted classes of circuits

• Formulas

– Outdegree of each gate is 1

• Monotone circuits– No nots (even at the input level)

• Constant Depth circuits– Polynomial size but unbounded fan-in

Counting argument for formulas

• frustrating fact: almost all functions require huge formulas

Theorem [Shannon]: With probability at least 1 – o(1), a random function

f:0,1n 0,1 requires a formula of size Ω(2n/log n).

Shannon’s counting argument

• Proof (counting):– B(n) = 22n = # functions f:0,1n 0,1 – # formulas with n inputs + size s, is at most

F(n, s) ≤ 4s2s(2n)s

4s binary trees with s internal nodes 2 gate choices

per internal node

n+2 choices per leaf

Shannon’s counting argument

– F(n, c2n/log n) < (16n)(c2n/log n)

< 16(c2n/log n)2(c2n) = (1 + o(1))2(c2n)

< o(1)22n (if c ≤ ½)

Probability a random function has a formula of size s = (½)2n/log n is at most

F(n, s)/B(n) < o(1)

Andreev’s function

• best lower bound for formulas:

Theorem (Andreev, Hastad ‘93): the Andreev function requires (,,)-formulas of size at

Ω(n3-o(1)).


selector

yi

n-bit string yXOR XOR

. . .

log n copies;

n/log n bits each

The Andreev function A(x,y)

A:0,12n 0,1

n bits total x


Theorem: the Andreev function requires (,,)-formulas of size at

Ω(n3-o(1)). First show Ω(n2-o(1)).

Two important ideas:• Random restrictions• Using the existential counting lower bounds on a smaller

domain

General StrategyRestrict the function and show:• This must simplify the formula (a lot)But • The remaining function is still quite complex, so needs a relatively

large formula

Conclude: we must have started with a large formula

Definition: L(f) = smallest (,,) formula computing f – measured as leaf-size– Directly related to formula size

Random restrictions

key idea: given function f:0,1n 0,1

restrict by ρ to get fρ

– ρ sets some variables to 0/1, others remain free

• R(n, m) = set of restrictions that leave m variables free

Random restrictions

Claim: Let m = n. Then EρR(n, n)[L(fρ)] ≤ L(f)

– each leaf survives with probability • may shrink even more…

– propagate constants

What happens to the Xor of a subset under random restriction: – if at least one member of the Xor survives, the Xor is not fixed

and can get obtain both values

shrinkage resultFrom the counting argument: there exists a function h

h:0,1log n 0,1 for which L(h) > n/2loglog n.

– hardwire truth table of that function into y to get function A*(x)– apply random restriction from set

R(n, m = 2(log n)(ln log n)) to A*(x).

selector

n-bit string yXOR XOR

. . .

The lower bound

– probability a particular XOR is killed by restriction: • probability that we don’t “miss it” m times:

(1 – (n/log n)/n)m ≤ (1 – 1/log n)m

≤ (1/e)2ln log n ≤ 1/log2n– probability even one of the log n XORs is killed by the

restriction is at most: log n(1/log2n) = 1/log n < ½.

The lower bound

– probability even one of XORs is killed by restriction is at most:

log n(1/log2n) = 1/log n < ½.– by Markov’s inequality:

Pr[ L(A*ρ) > 2 EρR(n, m)[L(A*

ρ)] ] < ½.

Conclusion: for some restriction ρ’ both events happen:• all XORs survive and • L(A*

ρ’) ≤ 2 EρR(n, m)[L(A*ρ)]

The lower bound

– if all XORs survive, can restrict formula further to compute the hard function h

• may need additional ’s (free)

L(h) = n/2loglogn ≤ L(A*ρ’)

≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)L(A*))

≤ O( ((log n)(ln log n)/n)1 L(A*) )

– Conclude: Ω(n2-o(1)) ≤ L(A*) ≤ L(A). Shrinkage factor

Random restrictions and shrinkage factors

• Recall: EρR(n, єn)[L(fρ)] ≤ L(f)

– each leaf survives with probability

But may shrink even more by propagating constants

Lemma [Hastad 93]: for all fEρR(n, єn)[L(fρ)] ≤ O(2-o(1)L(f))

The lower bound with new shrinkage factor

– if all XORs survive, can restrict formula further to compute hard function h

• may need to add ’s

L(h) = n/2loglogn ≤ L(A*ρ)

≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)2-o(1)L(A*))

≤ O( ((log n)(ln log n)/n)2-o(1) L(A*) )

– Conclude: Ω(n3-o(1)) ≤ L(A*) ≤ L(A). Shrinkage factor

What can be done in NC1

• Addition of two number each n bits – In fact can be done in AC0

• Adding n bits– Can compute majority or threshold– Something that cannot be done in AC0

• Multiplication– Adding n numbers

• Division

Two different characterization of NC1

• Through communication complexity• Through branching programs

More on Depth: a communication complexity characterization of depth

• For Boolean function f:0,1n 0,1 let – X=f-1(1) – Y=f-1(0)

Consider relation Rf µ X £ Y £ 1,…,n where (x,y,i) are such that xi ≠ yi

• For Monotone Boolean functions define Mf µ X £ Y £ 1,…,n

where (x,y,i) are such that xi = 1 and yi = 0

A communication complexity characterization of depth

• What is the communication complexity of Rf D(Rf)

assuming – Alice has x 2 X=f-1(1) – Bob has y 2 Y=f-1(0)

Lemma: Let C be a circuit for f. Then D(Rf) · depth(C)

Lemma: Let C be a monotone circuit for f. Then D(Mf) · depth(C)

From circuits to protocolsboth monotone and non-monotone case

• For each gate Alice says which of the two inputs wires to the gate is ‘1’ under x– If both are ‘1’ picks one– This wire must be ‘0’ under y

• For each gate Bob says which of the two inputs wires to the gate is ‘0’ under y– If both are ‘0’ picks one– This wire must be ‘1’ under x

At the leaves, find an i such that xi ≠ yi if the circuit is monotone, then we know that xi = 1 and yi = 0

Property maintained for the subformula considered: Alice assignment yields ‘1’ and Bob’s assignment yields ‘0’

From protocols to circuits both monotone and non-monotone case

Lemma: Let P be a protocol for Rf. Then there is a formula of depth C(P)

z0 z1 z2 z3 z4 z5 z6 z7 ...z5

Label:

• Alice’s move with

• Bob’s move with

•Leaf with rectangle A £ B and output i with either zi or zi

A communication complexity characterization of depth

Theorem: D(Rf)=depth(f)

Theorem: for any monotone function D(Mf)=depthmonotone(f)

Applications:

• depthmonotone(STCON)=(log2 n)

• depthmonotone(matching)= (n)

Example: Majority

• Input to Alice x1, x2, …, xn such that the majority are 1

• Input to Bob y1, y2, …, yn such that the majority are 0

Partition the input into x1, x2, …, xn/2 input into and xn/2+1, xn/2+2, …, xn and report the result on each half

Documents

Complexity Theory Lecture 12 Lecturer: Moni Naor