40
18-859S: Analysis of Boolean Functions

18-859S: Analysis of Boolean Functions. Administrivia Me: Ryan O’Donnell; email: [email protected] [email protected] Office hours: Wean 7121, by appointment

  • View
    231

  • Download
    2

Embed Size (px)

Citation preview

18-859S:

Analysis of Boolean Functions

Administrivia

Me: Ryan O’Donnell; email: [email protected]

Office hours: Wean 7121, by appointment

Web site: http://www.cs.cmu.edu/~odonnell/boolean-analysis

Mailing list: Please sign up! Instructions on web page.

Blog: http://boolean-analysis.blogspot.com

Evaluation: ² About 5 problem sets.

² 2 or 2.5 scribe notes, graded (worth equal to that of a problem set)

The Boolean function

The boolean function

All things to all people

x f (x)

0000 0 0001 1 0010 1 0011 1 0100 0 0101 1 0110 1 0111 1 1000 0 1001 1 1010 1 1011 1 1100 1 1101 1 1110 1 1111 1

What: Truth Table

To whom: Complexity theorists,

Circuit designers

What: Subset of the Discrete CubeTo whom: Geometers of the Cube –

Combinatorialists, Coding theorists,Metric space types

(with Hamming Distance)

What: “Concept”

To whom: Machine Learning theorists

Objects

n “features” Visit our new online pharmacy store and save up to 80% From: Tami Curran <[email protected]> To: <[email protected]> Date: Nov 8 2006 - 12:55pm

Take that!  

Visit our new online pharmacy store and save up to 80%   Only we offer: - All popular drugs are available (Viagra, Cialis,Levitra and much much more ) - World Wide Shipping - No Doctor Visits - No Prescriptions - 100%

CLICK TO FIND OUT ABOUT MORE SPECIAL OFFERS  AND VISIT OUR NEW ONLINE PHARMACY STORE

“Viagra”

“Cialis”

“Levitra”

.com.ng

“Credit”

“Mortgage”

“Lottery”

ALL CAPS

1

1

1

0

0

0

0

1

SPAM / NOT-SPAMmessage

What: Set System

To whom: Combinatorialists, Extremal & Algebraic

n element “universe”

a set, X µ

, a collection of subsets

“Set System”

or “Hypergraph”

or “Simplicial Complex” (if f monotone)

What: Graph Property

To whom: Statistical physicists, Probabilists, Random k-SAT-ers

an actual graph

A property of graphs; eg., percolation (left-right crossing)

graph with n “potential” edges

Also good for:

Ising Model

Erdős-Rényi random graph model

Random k-SAT satisfiability (for k-reg. hypergraphs)

What: Voting Scheme / Social ChoiceTo whom: Econometricists,

Political scientists

0 = 1 =

votes winner

majority electoral college

dictatorship

n voters

What: Set of integers

To whom: Number theorists, Additive

combinatoricists

• “How dense a set do you need to guarantee an arithmetic progression of length k?”

• “Suppose f indicates the primes; is there a nontrivial solution tof (x) f (x+a) f (x+2a) = 1?”

“Fourier / Harmonic Analysis of Boolean Functions”

=

A set of techniques for studying structural properties

of boolean functions.

What does it mean for f to be…

• “simple”

• “fair”

• “symmetric”

• “spread out / concentrated”

• “pseudo- or quasi-random”

• “low-degree” ?

When is a Boolean function “simple”?

“Juntas”

Definition: f : {0,1}n ! {0,1} is called an

r-junta

if f actually only depends on some

subset of r out of n coordinates.

Fourier Analysis

As t ! 1: changes according to Heat Equation, a differential equation.

Basic solutions: 1, sin(2x), cos(2x), sin(4x), cos(4x), sin(6x), …

Every f : n !expressible as linear combination of these “frequencies”.

– Temperature

Fourier Analysis of Boolean Functions

Basic solutions: Parity (XOR) functions on on the 2n subsets of coordinates

Every f : {0,1}n !expressible as linear combination of these “frequencies”.

Fourier expansion of f, Fourier coefficients of f.

– Displacement?

As t ! 1: changes via a “Diffusion” differential equation.

Hallmarks of Fourier Analysis

1. Uniform probability distribution on {0,1}n.

2. Discrete cube graph structure.

Energy

Definition: For f : {0,1}n !{0,1},

is the average sensitivity, or edge-boundary (normalized), or total influence, or energy.

Energy

Highest energy f ?

Lowest energy f ?

Lowest energy balanced f ?

Majority?

Random function? ¼ n/2.

(Homework: f “balanced” ) ( f ) ¸ 1.)

Parity on all bits / its negation. = n.

Constants. = 0.

f (x) = xi, or xi. (“Dictator”)

Connection to Circuit Complexity

Theorem: [Linial-Mansour-Nisan + Håstad]

If f is computable by a circuit of size S and depth D, then

( f ) · O(log D−1 S).

In particular, f 2 AC0 ) ( f ) · polylog(n).

Hence:

• Parity AC0. Majority AC0.

• Pseudorandom function generators AC0.

Lowest Possible Energy

Lowest energy balanced function that “depends essentially on all n inputs”?

Example:

(Tribes) = (log n)

Friedgut’s Theorem: For all f : {0,1}n ! {0,1}, for all > 0,

f is -close to a 2O( ( f ) / ) -junta.

Ç

Æ Æ Æ Æ Æ¼ log2 n − (log log n)

¼ n / log2 n“Tribes”

When is a boolean function “fair”?

Influences

Definition: The influence of the ith coordinate on f is

I.e., probability ith voter is a “swing voter”.

Proposition: ( f ) = i Infi( f )

x with ith bit flippedImpartial Culture (IC)

assumption

AKA Banzhaf Power Index.

Influences

For a fair voting scheme, do you want influences large or small?

1Infi(Parity) = Infi(xj) =

Infi(Majorityn) = Infi(Tribesn) =

1 if i = j,0 else

Influential Coalitions

Theorem: [Kahn-Kalai-Linial]

If f : {0,1}n ! {0,1} is any balanced voting scheme,

at least one candidate can bribe o(1) frac. of voters, win with probability 1 −

o(1).

Corollary of:

KKL Theorem: For every balanced f, there is an i with Infi( f ) ¸

After collecting voters, they control with probability 1 − o(1).(Both theorems sharp: Tribes.)

Miscounted Votes

Definition: Noise sensitivity of f at

flip each bit of x indep. w.p.

Aside: In diffusion process: , where = ½ − ½ exp(−t).

The Best Scheme Against Miscounts

Theorems: For all 0 · · ½,

(Dictator) =

(Majorityn) !

(ElectoralCollege) ¼

(Tribesn) !

Majority Is Stablest Theorem:

If f is balanced and

( f ) · arccos(1 − 2) / − ,

then Infi( f ) ¸ O(1/) for at least one i.

arccos(1 − 2) /

½

as n ! 1,

as n ! 1,

as n ! 1,

Applications of to P vs. NP

Q: Is it possible that for every language L in NP,

there is a poly-size family of circuits computing L on

100% of all inputs (of length n, for each n)?

A: No. Assuming NP P (/poly).

What about 99%?

What about 75%?

What about 51%?

How hard is NP on average?

Avg. case NP: Slightly hard ) Very hard

Say f 2 NP, balanced, and “slightly hard”: best poly-size circuit is 99% right.

Impagliazzo’s Hard Core Theorem: 9 H ½ {0,1}n of size 2% ¢ 2n

such that no poly-sized circuit can compute f on ¸ (½ + negl.) fraction of H.

On typical input to F, about 2% ¢ 106 of the f -inputs come from H.

2%(Tribes106) ¼ 49%

Tribes

f f f f fn

106

Let F : {0,1}106 n ! {0,1} be 2 NP. (Why?)

Theorem: F is not 51%-computable by poly-circuits. )

When is a boolean function “pseudo-” or “quasi-random”?

The Opposite of Pseudorandom

Given f ’s value on M random points, can you predict f at other points?

One idea: Take some weighted majority of known f-values, based on Hamming distance.

Can this work with M ¿ 2n ?

01010011 111010111 100101011 101010010 000101001 011101111 1 ¢ ¢ ¢ ¢ ¢ ¢

f (xi)xi

Examples:

Predict: f (00010101) = ?

“Learning f

(from random examples)”

Learning from Random Examples

Works if f has “long-range correlations” – e.g., small ( f ) or ( f ).

LMN Algorithm: This will work (using an appropriate weighted majority) if

M ¸ nO(( f )).

E.g., depth-D, poly-size circuits predictable after only examples.

Similar theorem exists for functions with small .

Learning with Queries

Goldreich-Levin Theorem: From any “one-way function” g : {0,1}n ! {0,1}n,

can produce a “hard-core predicate” f : {0,1}2n ! {0,1}.

Proof by contraposition: gives a learning algorithm, using queries, for

learning f ’s large Fourier coefficients.

GL algorithm put to positive use in Learning Theory:

Theorem: [Mansour] Poly-size DNF (depth-2 circuits) learnable with queries

in time nO(log log n); Fourier techniques.

Jackson’s Theorem: Improved to poly time & queries, by adding an ML technique.

Quasirandomness

Fix a small set of simple statistical tests; quasirandom if you pass all of them.

For graphs: Graph G with edge density p is quasirandom if,

for each O(1)-size graph H,

G has the roughly the “expected” number of copies of H.

For boolean functions: Function f with E[ f ] = p is quasirandom if,

(one weak possible notion) for each O(1)-junta h : {0,1}n ! {0,1},

f has roughly 0 correlation with h.

(I.e., given h(x), you’d still guess p for Pr[ f(x) = 1].)

Quasirandomness & “Tests”

Håstad’s Test: ● Pick x ~ {0,1}n uniformly.● Pick y ~ {0,1}n uniformly. ● Set z = x © y.● Set w ~ z.

● Test whether f (x) © f (y) © f (w) = 0.

f balanced and random: would pass with probability ½.

f a Dictator: would pass with probability 1 − .

Theorem: If f is balanced and quasirandom, passes test with probability · ½ + o(1).

Almost the canonical Fourier Analysis problem; where we’ll start.

x = 101000011011111y = 000100000101110

z = 101100011110001w = 001100010100000

©

Håstad’s “Hardness of Approximation”

Corollary: [Håstad’s Test + “PCP machinery”]

Given a system of 3-variable linear equations mod 2,

x1 © x3 © x7 = 0

x2 © x4 © x7 = 1

x1 © x5 © x6 = 0

x6 © x8 © x9 = 0

which is 99%-satisfiable, no efficient algorithm can find

a solution satisfying 51% of equations. (Unless P = NP.)

e.g.,

Proof Idea

Test yields an NP-hardness gadget for reductions,

m-coloring graphs 3-variable mod-2 equations.

Blocks are 99%-satisfiable because of Dictators – these “encode” the m colors.

Håstad’s Test Theorem:

any f satisfying ¸ 51% of a block is noticeably correlated with O(1) coordinates

) “decodable” to O(1) Dictators/colors.

prob .05 of testing: f (000) © f (010) © f (011) = 0

weight .05: x000 © x010 © x011 = 0

vertex block of 2m vars, x000, x001, …

Thursday:

When is a boolean function “linear”?

And what is its Fourier expansion?