Upload
grace-nelson
View
217
Download
0
Embed Size (px)
Citation preview
Complexity Theory
Lecture 12
Lecturer: Moni Naor
RecapLast week:
Hardness and Randomness Semi-random sourcesExtractors
This Week:Finish Hardness and Randomness Circuit Complexity
The class NCFormulas = NC1
Lower bound for Andreev’s function
Communication characterization of depth
Derandomization
A major research question:• How to make the construction of
– Small Sample space `resembling’ large one– Hitting sets
Efficient.
Successful approach: randomness from hardness– (Cryptographic) pseudo-random generators– Complexity oriented pseudo-random generators
Extending the result
Theorem : if E contains 2Ω(n)-unapproximable functions then BPP = P.
• The assumption is an average case one• Based on non-uniformity
Improvement:Theorem: If E contains functions that require size 2Ω(n) circuits
(for the worst case), then E contains 2Ω(n)-unapproximable functions.
Corollary: If E requires exponential size circuits, then BPP = P.
How to extend the result
• Recall the worst-case to average case reduction For permanent• The idea: encode the function in a form allowing
you to translate a few worst case error into random errors
Properties of a code
Want a code C:0,12n 0,12ℓ
Where:• 2ℓ is polynomial in 2n • C is polynomial time computable
– efficient encoding– Certain local decoding properties
Codes and Hardness
• Use for worst-case to average case:truth table of f:0,1n 0,1
worst-case hard
truth table of f’:0,1ℓ 0,1average-case hard
0 1 0 01 0 1 0mf:
0 1 0 01 0 1 0 0 00 1 0C(mf)
Codes and Hardness
• if 2ℓ is polynomial in 2n, thenf E implies f’ E
• Want to be able to prove: if f’ is s’-approximable, then f is computable by a
size s = poly(s’) circuit
Codes and Hardness
Key point: circuit C that approximates f’ implicitly defines a received word RC that is not far from C(mf)
• Want the decoding procedure D to computes f exactly
0 1 1 00 0 1 0RC: 0 10 0 0
0 1 0 01 0 1 0 0 00 1 0
D C Requires a special notion of efficient decoding
C(mf):
Decoding requirements• Want that
– for any received word R that is not far from C(m), – for any input bit 1 · i · 2n can reconstruct m(i) with probability 2/3 by accessing only
poly(n) locations in R
Example of code with good local decoding properties: HadamardBut exponential length
This gives a probabilistic circuit for f of size poly(n) ¢ size(C) + size of decoding circuit
Since probabilistic circuit have deterministic version of similar size - contradiction
Extractor
• Extractor: a universal procedure for “purifying” imperfect source:
– The function Ext(x,y) should be efficiently computable– truly random seed as “catalyst”– Parameters: (n, k, m, t, )
seed
source stringnear-uniform
0,1n
2k strings Ext
t bitsm bits
Truly random
Extractor: Definition
(k, ε)-extractor: for all random variables X with min-entropy k:
– output fools all tests T:
|Prz[T(z) = 1] – Pry 2R 0,1t, xX[T(Ext(x, y)) = 1]| ≤ ε
– distributions Ext(X, Ut) and Um are ε-close (L1 dist ≤ 2ε)
Um uniform distribution on :0,1m
• Comparison to Pseudo-Random Generators– output of PRG fools all efficient tests– output of extractor fools all tests
Extractors: Applications
• Using extractors– use output in place of randomness in any application– alters probability of any outcome by at most ε
• Main motivation:– use output in place of randomness in algorithm– how to get truly random seed?– enumerate all seeds, take majority
Extractor as a Graph
0,1n
0,1m
2t
Size: 2k
Want every subset of size 2k to see almost all of the rhs with equal probability
Extractors: desired parameters
• Goals: good: optimal:short seed O(log n) log n+O(1)long output m = kΩ(1) m = k+t–O(1)many k’s k = nΩ(1) any k = k(n)
seed
source stringnear-uniform
0,1n
2k strings Ext
t bitsm bits
Allows going over all seeds
Extractors
• A random construction for Ext achieves optimal!– but we need explicit constructions
• Otherwise we cannot derandomize BPP
– optimal construction of extractors still open• Trevisan Extractor:
– idea: any string defines a function• String C over of length ℓ define a function fC:1… ℓ by
fC(i)=C[i]
– Use NW generator with source string in place of hard function
From complexity to combinatorics!
Trevisan Extractor• Tools:
– An error-correcting codeC:0,1n 0,1ℓ
• Distance between codewords: (½ - ¼m-4)ℓ – Important: in any ball of radius ½- there are at most 1/2 codewords.
= ½ m-2
• Blocklength ℓ = poly(n)• Polynomial time encoding
– Decoding time does not matter– An (a,h)-design S1,S2,…,Sm 1…t where
• h=log ℓ • a = δlog n/3 • t=O(log ℓ)
• Construction:Ext(x, y)=C(x)[y|S1
]C(x)[y|S2]…C(x)[y|Sm
]
Trevisan Extractor
Ext(x, y)=C(x)[y|S1]C(x)[y|S2
]…C(x)[y|Sm]
Theorem: Ext is an extractor for min-entropy k = nδ, with – output length m = k1/3 – seed length t = O(log ℓ ) = O(log n)– error ε ≤ 1/m
010100101111101010111001010
C(x):
seed y
Proof of Trevisan ExtractorAssume X µ 0,1n is a min-entropy k random variable
failing to ε-pass a statistical test T:
|Prz[T(z) = 1] - PrxX, y 0,1t[T(Ext(x, y)) = 1]| > ε
By applying usual hybrid argument:
there is a predictor A and 1 · i · m:PrxX, y0,1t[A(Ext(x, y)1…i-1) = Ext(x, y)i] >
½+ε/m
The set for which A predict well
Consider the set B of x’s such thatPry0,1t[A(Ext(x, y)1…i-1) = Ext(x, y)i] > ½+ε/2m
By averaging Prx[x 2 B] ¸ ε/2m
Since X has min-entropy k: there are at least
ε/2m 2k different x 2 B
The contradiction will be by showing a succinct encoding for each x 2 B
…Proof of Trevisan Extractori, A and B are fixed
If you fix the bits outside of Si to and and let y’ vary over all possible assignments to bits in Si. Then Ext(x, y)i = Ext(x, y’)i = C(x)[y’|Si
] = C(x)[y’]goes over all the bits of C(x)
For every x 2 B short description of a string z close to C(x) – fix bits outside of Si to and preserving the advantage
Pry’[P(Ext(x, y’)1…i-1)=C(x)[y’] ] > ½ + ε/(2m) and is the assignment to 1…t\Si maximizing the advantage of A
– for j ≠ i, as y’ varies, y’|Sj varies over only 2a
values!– Can provide (i-1) tables of 2a values to supply
Ext(x, y’)1…i-1
Trevisan Extractorshort description of a string z agreeing with C(x)
A
Output is C(x)[y’ ] w.p. ½ + ε/(2m) over Y’
y’
Y’ 0,1log ℓ
…Proof of Trevisan ExtractorUp to (m-1) tables of size 2a describe a
string z that has a ½ + ε/(2m) agreement with C(x)
• Number of codewords of C agreeing with z:on ½ + ε/(2m) places is O(1/δ2)=
O(m4) Given z: there are at most O(m4) corresponding
x’s• Number of strings z with such a description:
• 2(m-1)2a = 2nδ2/3 = 2k2/3
• total number of x 2 B O(m4) 2k2/3 << 2k(ε/2m)
Johnson Bound:A binary code with distance (½ - δ2)n has at most O(1/δ2) codewords in any ball of radius (½ - δ)n.
•C has minimum distance (½ - ¼m-
4)ℓ
Conclusion
• Given a source of n random bits with min entropy k which is n(1) it is possible to run any BPP algorithm with and obtain the
correct answer with high probability
Application: strong error reduction
• L BPP if there is a p.p.t. TM M: x L Pry[M(x,y) accepts] ≥ 2/3
x L Pry[M(x,y) rejects] ≥ 2/3
• Want: x L Pry[M(x,y) accepts] ≥ 1 - 2-k
x L Pry[M(x,y) rejects] ≥ 1 - 2-k
• Already know: if we repeat O(k) times and take majority– Use n = O(k)·|y| random bits;Of them 2n-k can be bad strings
Strong error reduction
Better: Ext extractor for k = |y|3 = nδ, ε < 1/6– pick random w R 0,1n
– run M(x, Ext(w, z)) for all z 0,1t • take majority of answers
– call w “bad” if majzM(x, Ext(w, z)) is incorrect
|Prz[M(x,Ext(w,z))=b] - Pry[M(x,y)=b]| ≥ 1/6
– extractor property: at most 2k bad w– n random bits; 2nδ bad strings
Strong error reduction
0,1n
0,1m
2t
Upper bound
on Size: 2k
Property: every subset of size 2k sees almost all of the rhs with equal probability
Bad strings for input at most 1/4
All strings for running the original randomized algorithm
Strings where the majority of neighbors are bad
Two Surveys on Extractors• Nisan and Ta-Shma, Extracting Randomness: A Survey and
New Constructions 1999, (predates Trevisan)
• Shaltiel, Recent developments in Extractors, 2002, www.wisdom.weizmann.ac.il/~ronens/papers/survey.ps
Some of the slides based on C. Umans course:www.cs.caltech.edu/~umans/cs151-sp04/index.html
Circuit Complexity
• We will consider several issues regarding circuit complexity
Parallelism
• Refinement of polynomial time via (uniform) circuits allow
circuit C
depth parallel time
size parallel work
depth of a circuit is the length of longest path from input to output
Represents circuit latency
Parallelism• the NC Hierarchy (of logspace uniform circuits):NCk = O(logk n) depth, poly(n) size circuits
Bounded fan-in (2)
NC = [k NCk
• Aim: to capture efficiently parallelizable problems• Not realistic?
– overly generous in size– Does not capture all aspects of parallelism– But does capture latency
• Sufficient for proving (presumed) lower bounds on best latency
What is NC0
Matrix Multiplication
• Parallel complexity of this problem?– work = poly(n) – time = logk(n)?
• which k?
n x n matrix A
n x n matrix B =
n x n matrix AB
Matrix Multiplication
arithmetic matrix multiplication…
A = (ai, k) B = (bk, j) (AB)i,j = Σk (ai,k x bk,
j)
… vs. Boolean matrix multiplication:
A = (ai, k) B = (bk, j) (AB)i,j = k (ai,k bk, j)
– single output bit: to make matrix multiplication a language: on input A, B, (i, j) output (AB)i,j
Matrix Multiplication
• Boolean Matrix Multiplication is in NC1
– level 1: compute n ANDS: ai,k bk, j
– next log n levels: tree of ORS
– n2 subtrees for all pairs (i, j)– select correct one and output
Boolean formulas and NC1
• Circuit for Boolean Matrix Multiplication is actually a formula.– Formula: fan-out 1. Circuit looks like a tree
This is no accident:
Theorem: L NC1 iff decidable by polynomial-size uniform family of Boolean formulas.
Boolean formulas and NC1
from small depth circuits to formulas• Proof:
– convert NC1 circuit into formula• recursively:
– note: logspace transformation • stack depth log n, stack record 1 bit – “left” or “right”
Boolean formulas and NC1
from forumulas to small depth circuits – convert formula of size n into formula of depth
O(log n) • note: size ≤ 2depth, so new formula has poly(n) size
D
C C1
1
C0
0
D
D
key transformation
Boolean formulas and NC1
– D any minimal subtree with size at least n/3 • implies size(D) ≤ 2n/3
– define T(n) = maximum depth required for any size n formula
– C1, C0, D all size ≤ 2n/3
T(n) ≤ T(2n/3) + 3
implies T(n) ≤ O(log n)
Relation to other classes
• Clearly NC µ P– P uniform poly-size circuits
• NC1 µ Logspace on input x, compose logspace algorithms for:
• generating C|x|
• converting to formula• FVAL C|x|(x)
– FVAL is: given formula and assignment what is the value of the output logspace composes!
Relation to other classes
• NL µ NC2:
Claim: Directed S-T-CONN NC2
– Given directed G = (V, E) vertices s, t– A = adjacency matrix (with self-loops)– (A2)i, j = 1 iff path of length at most 2 from node i to node j
– (An)i, j = 1 iff path of length at most n from node i to node j
– Compute with depth log n a tree of Boolean matrix multiplications, output entry s, t
• Repeated squaring!
– log2 n depth total
Boolean MM
NC vs. P
Can every efficient algorithm be efficiently parallelized?
NC = P• Common belief: NC ( P
?
P-Completeness
A language L is P-Complete if:• L 2 P• Any other language in P is reducible to L via a
Logspace reduction
P-complete problems are the least-likely to be parallelizableif a P-complete problem is in NC, then P = NC – we use logspace reductions to show problem P-complete and we
have seen Logspace in NC
Some P-Complete Problems
• CVAL – Circuit value problem– Given a circuit and an assignment, what is the value of
the output of circuit– Canonical P-Complete problem
• Lexicographically first maximal independent set• Linear Programming• Finding a happy coloring of a graph
NC vs. PCan every uniform, poly-size Boolean circuit family be
converted into a uniform, poly-size Boolean formula family?
NC1 = P
Is the NC hierarchy proper:
is it true that for all NCi ( NCi+1
Define ACk = O(logk n) depth, poly(n) size circuits withunbounded fan-in and gates
Is the following true:ACi ( NCi+1 ( ACi+1
?
Lower bounds
• Recall: NP does not have polynomial-size circuits (NP
P/poly) implies P ≠ NP
• Major goal: prove lower bounds on (non-uniform) circuit size for problems in NP– Belief: exponential lower bound
• super-polynomial lower bound enough for P ≠ NP – Best bound known: 4.5n
• don’t even have super-polynomial bounds for problems in NEXP!
Lower bounds
• lots of work on lower bounds for restricted classes of circuits
• Formulas
– Outdegree of each gate is 1
• Monotone circuits– No nots (even at the input level)
• Constant Depth circuits– Polynomial size but unbounded fan-in
Counting argument for formulas
• frustrating fact: almost all functions require huge formulas
Theorem [Shannon]: With probability at least 1 – o(1), a random function
f:0,1n 0,1 requires a formula of size Ω(2n/log n).
Shannon’s counting argument
• Proof (counting):– B(n) = 22n = # functions f:0,1n 0,1 – # formulas with n inputs + size s, is at most
F(n, s) ≤ 4s2s(2n)s
4s binary trees with s internal nodes 2 gate choices
per internal node
n+2 choices per leaf
Shannon’s counting argument
– F(n, c2n/log n) < (16n)(c2n/log n)
< 16(c2n/log n)2(c2n) = (1 + o(1))2(c2n)
< o(1)22n (if c ≤ ½)
Probability a random function has a formula of size s = (½)2n/log n is at most
F(n, s)/B(n) < o(1)
Andreev’s function
• best lower bound for formulas:
Theorem (Andreev, Hastad ‘93): the Andreev function requires (,,)-formulas of size at
Ω(n3-o(1)).
Andreev’s function
selector
yi
n-bit string yXOR XOR
. . .
log n copies;
n/log n bits each
The Andreev function A(x,y)
A:0,12n 0,1
n bits total x
Andreev’s function
Theorem: the Andreev function requires (,,)-formulas of size at
Ω(n3-o(1)). First show Ω(n2-o(1)).
Two important ideas:• Random restrictions• Using the existential counting lower bounds on a smaller
domain
General StrategyRestrict the function and show:• This must simplify the formula (a lot)But • The remaining function is still quite complex, so needs a relatively
large formula
Conclude: we must have started with a large formula
Definition: L(f) = smallest (,,) formula computing f – measured as leaf-size– Directly related to formula size
Random restrictions
key idea: given function f:0,1n 0,1
restrict by ρ to get fρ
– ρ sets some variables to 0/1, others remain free
• R(n, m) = set of restrictions that leave m variables free
Random restrictions
Claim: Let m = n. Then EρR(n, n)[L(fρ)] ≤ L(f)
– each leaf survives with probability • may shrink even more…
– propagate constants
What happens to the Xor of a subset under random restriction: – if at least one member of the Xor survives, the Xor is not fixed
and can get obtain both values
shrinkage resultFrom the counting argument: there exists a function h
h:0,1log n 0,1 for which L(h) > n/2loglog n.
– hardwire truth table of that function into y to get function A*(x)– apply random restriction from set
R(n, m = 2(log n)(ln log n)) to A*(x).
selector
n-bit string yXOR XOR
. . .
The lower bound
– probability a particular XOR is killed by restriction: • probability that we don’t “miss it” m times:
(1 – (n/log n)/n)m ≤ (1 – 1/log n)m
≤ (1/e)2ln log n ≤ 1/log2n– probability even one of the log n XORs is killed by the
restriction is at most: log n(1/log2n) = 1/log n < ½.
The lower bound
– probability even one of XORs is killed by restriction is at most:
log n(1/log2n) = 1/log n < ½.– by Markov’s inequality:
Pr[ L(A*ρ) > 2 EρR(n, m)[L(A*
ρ)] ] < ½.
Conclusion: for some restriction ρ’ both events happen:• all XORs survive and • L(A*
ρ’) ≤ 2 EρR(n, m)[L(A*ρ)]
The lower bound
– if all XORs survive, can restrict formula further to compute the hard function h
• may need additional ’s (free)
L(h) = n/2loglogn ≤ L(A*ρ’)
≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)L(A*))
≤ O( ((log n)(ln log n)/n)1 L(A*) )
– Conclude: Ω(n2-o(1)) ≤ L(A*) ≤ L(A). Shrinkage factor
Random restrictions and shrinkage factors
• Recall: EρR(n, єn)[L(fρ)] ≤ L(f)
– each leaf survives with probability
But may shrink even more by propagating constants
Lemma [Hastad 93]: for all fEρR(n, єn)[L(fρ)] ≤ O(2-o(1)L(f))
The lower bound with new shrinkage factor
– if all XORs survive, can restrict formula further to compute hard function h
• may need to add ’s
L(h) = n/2loglogn ≤ L(A*ρ)
≤ 2EρR(n, m)[L(A*ρ)] ≤ O((m/n)2-o(1)L(A*))
≤ O( ((log n)(ln log n)/n)2-o(1) L(A*) )
– Conclude: Ω(n3-o(1)) ≤ L(A*) ≤ L(A). Shrinkage factor
What can be done in NC1
• Addition of two number each n bits – In fact can be done in AC0
• Adding n bits– Can compute majority or threshold– Something that cannot be done in AC0
• Multiplication– Adding n numbers
• Division
Two different characterization of NC1
• Through communication complexity• Through branching programs
More on Depth: a communication complexity characterization of depth
• For Boolean function f:0,1n 0,1 let – X=f-1(1) – Y=f-1(0)
Consider relation Rf µ X £ Y £ 1,…,n where (x,y,i) are such that xi ≠ yi
• For Monotone Boolean functions define Mf µ X £ Y £ 1,…,n
where (x,y,i) are such that xi = 1 and yi = 0
A communication complexity characterization of depth
• What is the communication complexity of Rf D(Rf)
assuming – Alice has x 2 X=f-1(1) – Bob has y 2 Y=f-1(0)
Lemma: Let C be a circuit for f. Then D(Rf) · depth(C)
Lemma: Let C be a monotone circuit for f. Then D(Mf) · depth(C)
From circuits to protocolsboth monotone and non-monotone case
• For each gate Alice says which of the two inputs wires to the gate is ‘1’ under x– If both are ‘1’ picks one– This wire must be ‘0’ under y
• For each gate Bob says which of the two inputs wires to the gate is ‘0’ under y– If both are ‘0’ picks one– This wire must be ‘1’ under x
At the leaves, find an i such that xi ≠ yi if the circuit is monotone, then we know that xi = 1 and yi = 0
Property maintained for the subformula considered: Alice assignment yields ‘1’ and Bob’s assignment yields ‘0’
From protocols to circuits both monotone and non-monotone case
Lemma: Let P be a protocol for Rf. Then there is a formula of depth C(P)
z0 z1 z2 z3 z4 z5 z6 z7 ...z5
Label:
• Alice’s move with
• Bob’s move with
•Leaf with rectangle A £ B and output i with either zi or zi
A communication complexity characterization of depth
Theorem: D(Rf)=depth(f)
Theorem: for any monotone function D(Mf)=depthmonotone(f)
Applications:
• depthmonotone(STCON)=(log2 n)
• depthmonotone(matching)= (n)
Example: Majority
• Input to Alice x1, x2, …, xn such that the majority are 1
• Input to Bob y1, y2, …, yn such that the majority are 0
Partition the input into x1, x2, …, xn/2 input into and xn/2+1, xn/2+2, …, xn and report the result on each half