View
42
Download
0
Category
Preview:
DESCRIPTION
The Particle Swarm: Theme and Variations on Computational Social Learning. James Kennedy Washington, DC Kennedy.Jim@gmail.com. The Particle Swarm. A stochastic, population-based algorithm for problem-solving Based on a social-psychological metaphor - PowerPoint PPT Presentation
Citation preview
The Particle Swarm: Theme and Variations on
Computational Social Learning
James KennedyWashington, DC
Kennedy.Jim@gmail.com
The Particle Swarm
A stochastic, population-based algorithm for problem-solving
Based on a social-psychological metaphor
Used by engineers, computer scientists, applied mathematicians, etc.
First reported in 1995 by Kennedy and Eberhart
Constantly evolving
The Particle Swarm Paradigm is a Particle Swarm
A kind of program comprising a population of individuals that interact with one another according to simple rules in order to solve problems, which may be very complex.
It is an appropriate kind of description of the process of science.
Two Spaces
The social network topology, andThe state of the individual as a point in a Cartesian coordinate system
Moving point = a particleA bunch of them = a swarm
Note memes:“evolution of ideas”vs.“Changes in people who hold ideas”
Cognition as Optimization
Cognitive consistency theories, incl. dissonance
Feedforward Neural Nets
Minimizing or maximizing a function result by adjusting parameters
Parallel constraint satisfaction
Particle swarm describes the dynamics of the network, as opposed to its equilibrium properties
Mind and Society
Minsky: Minds are simply what brains do.
No: minds are what socialized human brains do.
… solipsism …
Dynamic Social Impact Theory
i=f(SIN)
Computer simulation – 2-d CA
Each individual is both a target and source of influence
“Euclidean” neighborhoods
Binary, univariate individuals
“Strength” randomly assigned
Polarization
Nowak, A., Szamrej, J., & Latané, B. (1990). From private attitude to public opinion: A dynamic theory of social impact. Psychological Review, 97, 362-376.
Particle Swarms: The Population
To understand the particle swarm, you have to understand the interactions of the particles
One particle is the stupidest thing in the world
The population learns
Every particle is a teacher and a learner
Social learning, norms, conformity, group dynamics, social influence, persuasion, self-presentation, cognitive dissonance, symbolic interactionism, cultural evolution …
Neighborhoods: Population topology
Gbest
Lbest
Particles learn from one another. Their communication structure determines how solutions propagate through the population.
(All N=20)
von Neumann (“Square”) Topology
Regular, easy to understand, works adequately with a variety of versions – perhaps not the best for any version, but not the worst.
The Particle Swarm: Pseudocode
Initialize Population and constantsRepeatDo i=1 to population sizeCurrentEvali = eval( )
If CurrentEval < pbesti then do
pbesti = CurrentEvali
For d=1 to Dimension pid =xid
Next d If CurrentEval i < Pbestgbest then gbest=i
End ifg = best neighbor’s indexFor d=1 to Dimension vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid)
xid = xid + vid
Next dNext iUntil termination criterion
x i
Pseudocode Chunk #1
CurrentEvali = eval( ) If CurrentEvali < pbesti then do pbesti = CurrentEvali For d=1 to Dimension pid =xid
Next d If CurrentEval i < pbestgbest then gbest=iEnd if
Can come at top or bottom of the loopIn gbest topology, g=gbestIt’s useful to track the population best
x i
Pseudocode Chunk #2
g = best neighbor’s indexFor d=1 to Dimension vid = W*vid + U(0, AC) × (pid – xid) + U(0, AC) × (pgd – xid)
xid = xid + vid
Next d
Constriction Type 1”: W=0.7298, AC=W*2.05=1.496Clerc 2006; W=0.7, AC=1.43Inertia: W might vary with time, in (0.4, 0.9), etc., AC=2.0 typically
Note three components of velocity
Vmax - not necessary, might help
Q: Are these formulas arbitrary?
Some Standard Test Functions
Sphere
Griewank
Rosenbrock
Rastrigin
Schaffer’s f6
Test Functions: Typical Results
Step-Size Depends on Neighbors
pi=0pg=0
pi=+2pg=-2
pi=+0.1pg=-0.1
Movement of the particle through the search space is centered on the mean of pi and pg on each dimension, and its amplitude is scaled to their difference.
Exploration vs. exploitation: automatic
… “the box” …
Search distribution – Scaled to Neighborhood
Q: What is the distribution of points that are tested by the particle?
Previous bests constantat +10
A million iterations
“Bare Bones” particle swarm
x = G((pi + pg)/2, abs(pi – pg))
G(mean, s.d.) is Gaussian RNG
Simplified (!)
Works pretty well, but not as good as canonical.
Kurtosis
Tails trimmed
Not trimmed
Empirical observations with p’s held constant
Peaked -- fat tails
Kurtosis
High peak, fat tails
Mean moments of the canonical particle swarm algorithm with previous bests set at +20, varying the number of iterations.
IterationsMean S.D.
Skew-ness
Kurtosis
1,000 0.0970 37.7303 -0.0617 8.008
3,000 0.0214 41.5281 0.0814 18.813
10,000 -0.0080 41.6614 -0.0679 40.494
100,000 0.0022 41.7229 0.2116 170.204
1,000,000 0.0080 41.3048 0.3808 342.986
Bursts of Outliers
-60-50-40-30-20-10
01020304050
“Volatility clustering” seems to typify the particle’s trajectory
Adding Bursts of Outliers to Bare Bones PSO
Center = (pid + pgd)/2
SD = |pid - pgd|
xid = G(0,1)
if Burst = 0 and U(0,1)< PBurstStart then Burst = U(0, maxpower)
Else If Burst > 0 and U(0,1)< PBurstEnd then Burst = 0End If
If Burst > 0 then xid = xid ^ Burst
xid = Center + xid * SD
Sphere
-60
-50
-40
-30
-20
-10
0
10
20
Rosenbrock
3
5
7
9
11
13
15
17
Rastrigin
4.1
4.3
4.5
4.7
4.9
5.1
5.3
5.5
5.7
Griewank30
-5
-4
-3
-2
-1
0
1
2
3
4
5
Griewank10
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1
0.4
f6
-8
-7.5
-7
-6.5
-6
-5.5
-5
-4.5
-4
(Bubbled line is canonical PS)
“The Box”Where the particle goes next depends on which way it was already going, the random numbers, and the sign and magnitude of the differences.
The area where it can go it crisply bounded,but the probability inside the box is not uniformly dense.
vid = W*vid +
U(0, AC) × (pid – xid) +
U(0, AC) × (pgd – xid)
xid = x id + v id
Empirical distribution of means of random numbers from different ranges
Simulate with uniform RNG, trim tails
TUPS: Truncated-Uniform Particle Swarm
Start at current position: x(t)
Move weighted amount same direction: W1× (x(t) – x(t-1))
Find midpoint of extremes, and difference between them, on each dimension (the sides of the box)
Weight that, add it to where you are, call it the “center”
Generate uniformly distributed random number around the center, range slightly less than the length of the side
That’s x(t+1)
TUPS: Truncated-Uniform Particle Swarm
x(t+1)=x(t) + W1× (x(t) – x(t-1)) + W2 × ((U(-1,+1) × (width/2.5)) + center) W1=0.729; W2=1.494
Width is difference between highest and lowest (p-x)
Center is width/2
Generates a point less than Width/2 from the center
Sphere
-23
-18
-13
-8
-3
2
0 1000 2000 3000
Canonical
TUPS
FIPS
Griewank30
-3.8
-2.8
-1.8
-0.8
0.2
1.2
2.2
0 500 1000 1500 2000 2500 3000
Griewank10
-2
-1.5
-1
-0.5
0
0 500 1000 1500 2000 2500 3000
Griewank10
-2
-1.5
-1
-0.5
0
0 500 1000 1500 2000 2500 3000
Rastrigin
1.4
1.6
1.8
2
2.2
2.4
2.6
0 500 1000 1500 2000 2500 3000
Rosenbrock
1.4
2.4
3.4
4.4
5.4
6.4
7.4
0 500 1000 1500 2000 2500 3000
f6
-3.2
-3
-2.8
-2.6
-2.4
-2.2
-2
0 500 1000 1500 2000 2500 3000
Binary Particle Swarms
S(x) = 1 / (1 + exp(-x))
v = [the usual]
if rand() < S(v) then x = 1else x = 0
Transform velocity with sigmoid function in (0 .. 1)
Use it as a probability threshold
Though this is a radically different concept, the principles of particle interaction still apply (because the power is in the interactions).
FIPS -- The “fully-informed” particle swarm (Rui Mendes)
v(t+1) = W1 × v(t) + Σ(rand() × W2/K × (pk – x(t)))
x(t+1)=x(t)+v(t+1)
(K=number of neighbors, k=index of neighbor, W2 is a sum.)
Note that pi is not a source of influence in FIPS.Doesn’t select best neighbor.Orbits around the mean of neighborhood bests.This version is more dependent on topology.
Deconstructing Velocity
xid (t+1) = xid (t) + W1(xid (t) - xid(t-1)) + (rand()×(W2)×(pid - xid(t)) + (rand()×(W2)×(pgd - xid(t))
Because x(t+1) = x(t) + v(t+1)
we know that on the previous iteration,
x(t) = x(t-1) + v(t)
So we can find v(t)
v(t) = x(t) – x(t-1)
and can substitute that into the formula, to put it all in one line:
xid (t+1) = xid (t) + W1(xid (t)- xid (t-1)) +
Σ(rand()×(W2/K)×(pkd - xid(t)))
Generalization and Verbal Representation
… or in words …
NEW POSITION = CURRENT POSITION + PERSISTENCE + SOCIAL INFLUENCE
We can generalize the canonical and FIPS versions:
Social Influence
has two components:
Central Tendency, and Dispersion
NEW POSITION= CURRENT POSITION + PERSISTENCE + SOCIAL CENTRAL TENDENCY + SOCIAL DISPERSION
… Hmmm, this gives us something to play with …!
NEW POSITION= CURRENT POSITION + PERSISTENCE + SOCIAL CENTRAL TENDENCY + SOCIAL DISPERSION
Gaussian “Essential” Particle Swarm
Note that only the last term has randomness in it – the rest is deterministic
meanp=(pid + pgd)/2 disp=abs(pid – pgd)/2
xid (t+1)= xid (t) + W1(xid (t)- xid (t-1)) + W2*(meanp – xid) + G(0,1)*disp
G(0,1) is a Gaussian RNG
Gaussian Essential Particle Swarm
xid (t+1)= xid (t) + W1(xid (t)- xid (t-1)) + W2*(meanp – xid) + G(0,1)*disp
Function Trials Canonical Gaussian
F6 20 0.0015 13E-10
GRIEWANK 20 0.0086 0.0103
GRIEWANK10 20 0.0508 0.045
RASTRIGIN 20 56.862 49.151
ROSENBROCK 20 41.836 44.197
SPHERE 20 88E-16 38E-24
Various Probability Distributions
Function Trials Canonical Gaussian Triangular Double-Exponential
Cauchy
F6 20 0.0015 13E-10 0.0057 0.001 0.0029
GRIEWANK 20 0.0086 0.0103 0.0275 0.0149 0.0253
GRIEWANK10 20 0.0508 0.045 0.0694 0.0541 0.0768
RASTRIGIN 20 56.862 49.151 140.94 47.26 33.829
ROSENBROCK
20 41.836 44.19767.894 41.308 42.054
SPHERE 20 88E-16 38E-24 1.70E-18 2.40E-22 1.60E-17
There is clearly room for exploration here.
Gaussian FIPS
FIPScenter= mean of (pkd – xid)FIPSrange = mean of abs(pid - pkd)
xid= xid + W1 × (xid(t)-xid(t-1)) + W2 × FIPScenter + G(0,1) × (FIPSrange/2);
Fully-informed – uses all neighbors
Gaussian FIPS
Function Trials Canonical Gaussian FIPS
F6 20 0.0015 0.001
GRIEWANK 20 0.0086 0.0007
GRIEWANK10 20 0.0508 0.0215
RASTRIGIN 20 56.862 36.858
ROSENBROCK 20 41.836 40.365
SPHERE 20 88E-16 41E-29
Gaussian FIPS compared to Canonical PSO, square topology.
t(38), alpha=0.05
func t p-value rank inv. rank
Newalpha
Sig.
SPHERE 3.17 0.0030 1 6 0.008333 *GRIEWANK10 2.99 0.0048 2 5 0.010000 *GRIEWANK 2.64 0.0118 3 4 0.012500 *RASTRIGIN 2.21 0.0333 4 3 0.016667 .
F6 0.45 0.6583 5 2 0.025000 .
ROSENBROCK 0.15 0.8782 6 1 0.050000 .
Gaussian FIPS vs. Canonical PSO
Ref: Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression. Thousand Oaks, CA: Sage Publications.
Modified Bonferroni correction
Mendes: Two Measures of Performance
Color and shape indicate parameters of the social network – degree, clustering, etc.
Best Topologies
Best-neighbor versions
FIPS versions
Worst FIPS Sociometries
and Proportions Successful
Understanding the Particle Swarm
Lots of variations in particle movement formulas
Teachers and learners
Propagation of knowledge is central
Interaction method and social network topology … interact
It’s simple, but difficult to understand
In Sum
There is some fundamental process• Uses information from neighbors• Seems to require balance between “persistence” and “influence”
Decomposing a version we know is OK• We can understand it• We can improve it
Particle Trajectories• Arbitrary? -- not quite• Can be replaced by RNG (trajectory is not the essence)
How it works• It works by sharing successes among individuals• Need to look more closely at the sharing phenomenon itself
Send me a note
Jim Kennedy
Kennedy.Jim@gmail.com
Recommended