Natural Computing - shahed.ac.irshahed.ac.ir/stabaii/Files/NaturalComputingText.pdf · 2015-02-10 · Natural Computing 1392 1- Computational Intelligence, 2nd ed.( A. P. Engelbrecht)

��

Natural Computing

��

��1392

1- Computational Intelligence, 2nd ed.(A. P. Engelbrecht)2-Bio-Inspired Artificial Intelligence Theories, Methods, and Technologies (D. Floreano & C. Mattiussi)

2

3

��

1-��

!�� "# $�%� &��'(�)��*+, �� &��+./�� ) ) &4�#� "4��$5…(

��+.��#�7� �8$�� :�� $�� $��; ) <=�/� >��? &�$�#�@��; >��.�8$AP )NP&�� >&4�� >(

2-�B�C#7#�� <��B� D�)�� &�#E�

<��B� �FG� �B�.�C H��

<��B� �F��B� :<J� ��7#�

3- K�.�C L.��#�7�GA��- K�.�C L.��#�7� )Canonical Genetic Algorithml - Crossover- Mutation - Control

ParametersGenetic Algorithm Variants

4Generic Evolutionary Algorithm Representation – The ChromosomeInitial Population Fitness Function Selection Reproduction Operators Stopping Conditions Evolutionary Computation versus Classical Optimization

5- �B�.�C �/�#� ��$� Tree-Based Representation. Initial Population Fitness Function Crossover Operators Mutation Operators Building Block Genetic Programming

4

6-Evolutionary Programming Basic Evolutionary Programming Evolutionary Programming Operators Strategy Parameters Evolutionary Programming Implementations

7- Evolution Strategies (1+1)-ES Generic Evolution Strategy Algorithm Strategy Parameters and Self-Adaptation Evolution Strategy Operators Evolution Strategy Variants

8- Differential Evolution Basic Differential Evolution DE/x/y/z Variations to Basic Differential Evolution Differential Evolution for Discrete-Valued Problems

9- Cultural AlgorithmsCulture and Artificial Culture Basic Cultural Algorithm Belief Space Fuzzy Cultural Algorithm

10- Coevolution Coevolution Types Competitive Coevolution Cooperative Coevolution

11-Simulated Anealing

13- Particle Swarm Optimization Basic Particle Swarm Optimization Social Network Structures Basic Variations Basic PSO Parameters Single-Solution Particle SwarmOptimization

14- Ant Algorithms Ant Colony OptimizationMeta-Heuristic Cemetery Organization and Brood Care .

15-ARTIFICIAL IMMUNE SYSTEMSClassical View- Artificial Immune System Algorithm Classical ViewModels Clonal Selection TheoryModels Network TheoryModels Danger TheoryModels

5

Applications and Other AIS models

16-Cellular Systems The Basic Ingredients Cellular Automata Modeling with Cellular Systems Some Classic Cellular Automata Other Cellular Systems Computation Artificial Life Analysis and Synthesis of Cellular Systems

17-Artificial Neural Network Artificial Neuron Learning Supervised Learning Neural Networks Supervised Learning Rules Ensemble Neural Networks Unsupervised Learning Neural Networks Hebbian Learning Rule Principal Component Learning Rule Learning Vector Quantizer Self-Organizing Feature Maps Radial Basis Function Networks Learning Vector Quantizer-II Reinforcement Learning Learning through Awards Model-Free Reinforcement LearningModel Neural Networks and Reinforcement Learning

17-FUZZY SYSTEMS Fuzzy set, Formal Definitions, Membership Functions, Fuzzy Operators , Fuzzy Set Characteristics , Fuzziness and Probability, Fuzzy Logic and Reasoning, Fuzzy Controllers, Components of Fuzzy Controllers

18-DNA Computing

19-Quantum Computing

20-Developmental Systems

21-Behavioral Systems-Autonomous Robots

1��

�� ; ��L./�� S$T !�� #� H�U �� ) ��$; ��5 �#�A D�)�V�, !��Natural Computing >Computational Intelligence ) Bio Inspired computing S? 4��# �� $� ��.

2 -1��Computing )��(: L ��$�'$� �� ) ��F8� !:� L L.��#�7� &�$V� �� "4��$V5 !�$��V� )!�� ,X�� . �� Y�� ) �� Z7 �� !Z7 [�� $V� ��,#VY#� ��V+� �V� \�� V��

!��. :Computation )��( K� �$�] ) <B�)$5 >L.��#�7� K� ^�� $� S? �� ; !�� &��$8 ��*+, <�T 4�

_: ��# �� H�� . �Intelligence �� : �*E/� <U ��#� )� [�5!��

�� :��F8� !:� �; !�� +.��#�7� ) �� <��#,) "#V S�#�� ( <=�V/� <VU !V�G ��#V�� $T ��`.�� . �� ; �a�? 4� ��V�� DV�� V�� V��'(�) learning )adaptation �V� �� #VG)

L./�� # �� b$� ^�� D�� $� �; �,#�c� ��+./�� #' �� . &��)� �� &��+.��#�7��; �+� �� 8��c� ) &��?�� S? �� *�+B� �� K� �B*� �.

1-1 �� 3 �# �� !�� "# d�� : �� "# >S�/�� FZ� "#�+G "# ) <��B��

1-1-1 �� !��"# $%� ��5 �; �#� S�/�� FZ� ��#� ) �$B*+, !8$' ��$T �G#� ��#� �; �� 5 D�7)� �&�'� �

� . �)�U &�$�50 �.��V &��+.V/�� &4�V/7�� V��5 $� ��4 D�� H��) �V�+.��#�7� ) $�#�@��V; !��V� �V��? &FZ� &��.�*��T ( H#��$' ��.��8#� � �� ) �$; ��5 . �� 8#� D�� e#V� S? �� ? �; ��,#Y#�

�� $; �+� <+, &4�� +�5�# K�� H$.�; >f�$g ��? !�� $�] ) . �� !h' S��4 �,#�c� "# �V�$�%� ��,#Y#� ) �� H$.�; >H�� "4��$5data mining !��$5 F��.

1-��

2

()�attention ��7� i#8 &��.�*��T 4�S�/�� FZ� �V; !�� $V� ��V,X�� !,$V� �V�� D�V� 4� �� \�� ; �� S)$�� ,X��. FZ� �; !/�� j�$� ��?� �hV7 >�V�; "4��$5 �� .8�� ,X�� +� ��#.

�� D�� 4� ��`.�� !�*��T()� \��$� ��,X�� [:� &)� \�8 �� F;$+.��G#� �� ) �+; S�V � �� . �� *��+� evolution and learning

H�� #�*�� H#�� $��<��B� �� D�� &�$� &��4 &�� . 4)$V� �#V� 4� ��V�.�*��T �B�C#7#�� #G#� ��? 4� S�#� �� ; �� <=�/� <U &�$��$� �#�.

��, (�� developmental : <�7� ��"# <��B� �� $../� �B�C#7#�� &��+./�� ) �V�$a� �� 4 H#�� 5 ��#�� .��? �; ��$�' �� F�k �#� ��U S��)� �� S�B�� `� l�� $m#V� ) ) \�V�� V�

�4�� DB+� �� #G#� $��. �+�$ , *��+�space: <�� $.*��; ) �7#*� ��k ��#G#� �� H#*� 4� ��? ��n8 $%� 4��# ��.

��) � ��) ��,� �� *��+�behavioral )��): ( ��+G ��.8��#G#� �V��#.� �V� �# �� <��B� ��k� S��4 ��

$.��; <+, ��G \��$ �� *�� +G ��#c� . �+��T �,#�c� "# 4� D��.�*��T �� behavioral autonomy, self-healing, social interaction,

��`.��$B�. �V�� a� �� #� $��Z� H�U �� ) �.�� &��g�� #G#� �� 4�G� �; ��.�*��T . �� +�� 80 �� *=�/� �� 4 D��

behavior-based robotics, evolutionary robotics ) � �.8$' ��; �� S�/�� FZ� �� "# ��c�� Y$8 �.�� $T H�#� ��#�. �� S�� ,#�c� "#

�V��? ��V � ) S��k�#� �'��4 >�B�.�C H#�� !�� &� ��$./' &� ��5 �� ? FZ� o$p 4� �G#��; �� 5 !`�. !�� &�'� � �� COMPUTATIONAL INTELLIGENCE ��V��#� FV��

�# �� T �,#�c� "# �� .�7� �;!�� +.

2 -2 �� 03�4*�#& �� 5�� &��+./�� $p��, �V��4 �V� ��4�V� ) "4��$V5 �V� !V�� $��B� �� *�� l�� q�$� 4� �� .

!�� $�� 4� ��+./�� D�� &��'(�). 1. �)$' Individuals, Entities, Agents

!�� )$' ��; ��$� �� &��+./�� *+, . <��, $ • Some degree of autonomy/identity

1-��

3

• Entity endowed with a (partial) representation of the environment, capable of acting upon itself and the environment, capable of communicating with other agents.

2.� "4��$5 &4�#Parallelism �# �� a�� <��#, \�#� &4�#� ��#c� ��*+, .!�� $�Z.� <��#, ��.

brain processing –1011nervouscellsin the brain immune functioning –1012lymphocitesin the immune system Insect societies,

3.<�� Interactivity $�4 &��)� �� $��B� �� <��

–Reproductively- <J� ��7#� –Symbiotically �./�F+ –Competitively- !��T� –Predator-prey- �k��B ) ��B –Parasitically- �67"#

4.�� l��# ��$T$� ��#p !�� DB+� q�$� )� �� <��#, D�� l��.

�7�-�L��./� l�� Connectivity•Nodes and connections encode information •Connectionist systems:

–Path ways of interaction between units –Connection quantifies degree of interaction –Structural organization in a network

e- L��./� $�] l��Stigmergy •General mechanism that relates individual and colony-level behaviors

–Individual behaviors modify environment –Environment modified behavior of other individuals –Indirect communication

–Example: Termite workers stimulated to act during nest building according to configuration of construction of other workers.

5.D.8�� q8) Adaptation \�� .8� L�%�� !�*��T)&��'4�� .( &�$8 rg� �� &��'4��learning s�+.G� rg� �� )evolution ��

�# ��. 6.K��8

<��, K�$�� ; ��a�� &�$��# �� .><��, $� ��; ��a�� $�m�� !�� K��8. !�J� K��8 ��; �� t$�� .�G#G �7#� >�#�� S�� >��+�� 5 ��

1-��

4

�`�� K��8 ��; �� n� �� t$�� . ��B �k��B ��- ��#VG#� �� V�U H��V�� &��F��B�- !�#V�

4��))�F�7#��.� ( ��U

7.��4�� 7�- Self-Organization�.8� �� i�`�� $�4 ��#� �� u�? ��

• –Birds gathering, chemical reaction patterns, fishschools, cells making tissues, patterns, • –Pattern: An organized arrangement of objects in space or time, –E.g., ants, termites,

bees,4� ��, ��.8� �'(�) ��4�� :

–Collectivity and interactivity –Dynamics in time –Emergent patterns –Nonlinearities –Complexity –Rule-based –Feedback loops

e- No Self-Organization1)Following a leader 2)Building a blueprint 3)Following a recipe 4)Templates

8.Complexity �� !7�U L./�� $� ��u�5 ��/� ��7#*� ��k !7�U 4� &��7#*��? ��7)�!�.

Large number of interacting components Aggregate behavioris:–Nonlinear , –Exhibits self-organization

!�� FG� K� K� <�./� ��.8� s#+a� 4� $� ��u�5 j+G ��.8� . �$�' �� <B �+./�� ; &#�� $V�4 ��.8�� S� � �#� 4� ��.

–Interaction –Diversity –Adaptivity 9.Determinism, Chaos, Fractals

�� #G) ��.�� S? v�F� ) Kk#; �4�� D�� ; �# �� !�� *B .�g�� &? ��$8 �� ;!�� TX, ��#� !�� L;�U v�F� �� Kk#; 4� ��.

1-��

5

10.self repair .

2 -3�9�� 7;# �,��&�'� � &4�/7�� #%�� .�� +# &� ��5 ��$�) �V� !�� ; !�� #� ��? 4� !�� [��F8�

!�� a�� <=�/� <U �� ? 4� &$�' �$��. 4� ��, ��? s)$ �:�� $�� ) ��,#Y#� D��:��7)� &��;

•1948, Turing: proposes “genetical or evolutionary search” •1962, Bremermann: optimization through evolution and recombination

H)� </�EC•Evolutionary Programming (Fogel), 1965, Genetic Algorithms (Holland),1975, Evolution

Strategies (Rechenberg, Schwefel) 1964 �)� </�EC

•– Genetic Evolution of Data Structures (Michalewicz), Genetic Evolution of Programs (Koza), 1992, Hybrid Genetic Search (Davis), Tabu Search (Glover

�#� </�EC•– Artificial Immune Systems (Forrest), Cultural Algorithms (Reynolds), DNA Computing

(Adleman), Ant Colony Optimization (Dorigo), Particle Swarm Optimization (Kennedy & Eberhart), Memetic Algorithms, Estimation of Distribution Algorithms

�� 7;# �;��)# �,�� *.:� b#g� �� !��#,� <�#��+ �� S�#� �� )L.��#�7� S? ^�� $� w�$:.��$;..

•Subatomic QuantumComputing •Atoms SimulatedAnnealing •Molecules MolecularComputing •Individual Immunocomputing •Individual NeuralNetworks •Populations EvolutionaryComputation •Populations SwarmComputing

1-��

6

•Populations Artificial Life

��7;# �6��+�Evolving Populations >�� &�$� � ��#' �#�� q8) $��Z� H�U �� \�� . i�V`�� <VJ� �V�7#� �� ; �B�.�C ��c: � $��Z�

�4�V� �V� $V/�� ; D�� ; ��; �� &)$�5 ��#�7� 4� �.8� ��. &�V�+.��#�7�Evolutionary computation (EC) #� ��#� D�� 4� �; ��./ �,#�c� �� #��&�$� !�� #G ��; �� `.�� $�4 ��#�.

Stochastic search & optimization Combinatorial optimization( Discrete solution space)> general optimization problems ) automated programming

��'& (+�� a�� S�/�� FZ� �$B*+, ��; ��5 �,#�c� ��c, �B� !V�� V�� . >��V jV�4#� ) &4�#V� "4��$V5�# �� x��7 � �B� D�� ; !�� FZ� !�*��T 4� &$�'��.

Swarm Intelligence ��; ��5 �� h] <�� ) �� D�� $�/� D�$� ��#; ��#� �� S��k�#� !��+G.

�h' �� [��+� �� [��$� D�� &�$� �� !�� #� $� � ��#�� $B*+, H#c�� +G &��.�*��T D��<��#, �$V�? ) �V��; �V� !�7��8 \�� K� �� ; !�� <�./� ) "# L;

�� $��Z�.�� #G) ��F;$� ��$8 �� $%� �� D�� D�� • H�V� �� .V/ &$�� T�8 ��.��+G D�� S� � �; ��7�g� D�7)�1986 &4�)$V5 �$VB*+, &)�

��$5� ��a�� j�$� &��)� \�#� S�' . 4� ��; �� 7#� S�'��$5 �; ��#�7� �; !�� D�� q�� a�.��; �� !�� $�4 �� ,�#T.

• • Separation: steer to avoid crowding local flock mates. • • Alignment: steer toward the average heading of local flock mates. • • Cohesion: steer toward the average position of local flock mates.

�$�' �� L�+c� S? ^�� $� #n, $ ) ��./ �*�� #.�� D��.

none of them said, “Form a flock.” S? �� ? �� # ��G �*�7� �� *8�T 4� �B��#p �� ) ��4 �� )� ��$� �� j��#� S�'��$5 ^�� D�� $� �V�

��#�5.

1-��

7

Immune System �� sXV�� &) �� &F;$� 4� �B�? S)�� # <=�T �)�`� e)$B�� $�] ) e)$B�� D�� #� �� S�� +�� L./��

�# .�# �� a�� $��Z� H�U �� +=�� \�� .�#�5 ) �� j�4#� ��#c� &�� .�� D��.

2 -4=�� >;�?� ��#� $�4 D�)��, >i#8 ��#� D��4��$�' �� $T y��.

Evolutionary Computation 1.Biological detail 11. Genetic algorithm 12. Genetic programming 13. Evolutionary Programming, 14. Evolutionary Strategy 15. Simulated Annealing 16. Differential evolution 17. Co evolution and cultural evolution 18. Swarm Intelligence

Particle Swarm Optimization AntColony Optimization

19. Artificial Immune Systems 20. Evolutionary Neural Networks 21. Gentic Fuzzy modelling 22. Cellular Automata, L system & Fractal Geometry 23. DNA Computing & Quantum Computing

z7�g� �.�� D�� *�8 $ ��5 $T �� #� �� $�' ��#� z*]� �� ) �V�#�$� &�V��F8� �$� ) �/�#� ��$� ��$' �� 8$��

2 -5 �A+B� 4 �� ,�C��D��E �� ,�C

�� 4� ��`.��3 !�� $%� �� 4#U1.<=�/� ��+./�� ) �$�#�@��; �� F8� !:� &$�' ��B�DNA . �V$�#�@��; ��FV8� !:V�

��./ ��#B�*�� &�#./�F��$� &��.�' �� a�� ; �; . L./�� ; D�+ S�/�� &FZ��# �� a�� )$� \�#� .{�� #G) �� #�#� &$�'��B� S�B�� ?

1-��

8

4� .�# �� S� � $�4 <�*�� >��,X�� V >�VY�� &4�V/7�� >��&4�� >&4�� > �� +�� ) ��u�5 &��; ��a�� )landing an aircraft. �� .�� q8#� ��/� �$B*+, .

�� 4 �� !8$ �5 zG#� $�#�@��; �� !8$ �5mathematical programming !�� $' F�� . ��+./��*�8 & �� F�� &�� .�F� ��; ��.

• . reliance on central processor or server • . not robust • . not scalable (or limited scalability) • . not adaptive • .static: only does what it's told to do (and how)

&�$� �; &��4 &��.�*��T L]$�*, &4)$�� &�$�#�@��;��)? �� L�$8 ��; ��`.��. $�h5 z��? ) ��u�5 ��/�� $T �.�$.�� #a }$�� ) ��#�.

e#*g� &$�#�@��; L./��: �� *p� o�computing �; !�� +./�� !�� 1. 4� ��~�� G�� #.�autonomy ��; ��;.2. ��4 F�#� �� ,X�� G�#� ��robust ��; <+,.

26.��$�� #p �� ) ��)? �� B� ��#c� �� #> �� .�� $��Z� �B�.27.�~�� LaU &4�#� ��#c� ) !,$� ��4� � ��; "4��$5 �� ,X��.28.��; ��5 �� \�� 4� &$�'�� T ) �# �.��; S�/�� !7�� 4�.29.�� a�� #� <+, ��#.� $.�� $; ��5 &�#:� �#� <��B� .

1-��

9

rY�)� ��#� �7�, ��/� �� .�� #G) �+./�� D��u�� $'� �; !��. &��+.V/�� )�V`� $V�4 H)�VG�� S� � �� B�C#7#�� L./�� 4)$�� .

•

5 IJ 4 K�$D *L��P vs NP �� <=�/� �� LaU $%� 4�2 �.��

• easy (or `tractable’, or ìn P)• hard (or ìntractable’, or `not known to be in P: NP’)

��# �� L�/�� . &�� .�� <=�/� 4� &��4 &��hard ��$�' �� $T .P �V�� V�� V; !V�� LaU S?�# �� D�� &� �*+G��k ��. $�4 &��7�J� ��

Polynomial: the dominant term in the expression is polynomial in n. E.g. n34, n.log.n, a. Correctly sort a set of n numbers: Can be done in around n log n steps b. Find the closest pair out of n vectors: Can be done in O(n2) steps

�� B��?P ^X; �� # �+� D��NP (non-deterministic polynomial) ��$�' �� $T. • Exponential: the dominant term is exponential in n. E.g. 1.1n, nn+2 , ..

Find best design for an n-structural-element bridge :: Can be done in O(10n) steps … ��a�� S��4 $'�1 ��*+, �U�)0.1μsec $V�4 H)�VG �� V�*+, ��Va�� &�$� 4�� #� S��4 >�# �.8$' $%� �� S� �.

H�U �� /;�*+G ��k S��4 �� z��.� �*U �� &�P X��; <=�/� &�$�NP !�� $B� ��5. &��7�J� $�� 4�P �� S�#� ��H�J� �$' H�c�� <T��U !�� $; �� .

c. Minimum Spanning Tree Problems (MST) �$' D�� l�� *T��U !�� #c� S?�� ; �7�J� ��$' �� $T$� � . �� S? �$��;

1-��

10

Comms network backbone; Electricity distribution networks, water distribution networks, etc …

!��. !�� _: � H�c�� $ ��F $�4 �*EV/� ��#p �� $��Z� �;�� Heasy ��hard H�J� &�$� �# �� <��:

S? �� G�� !��)�� T S�)F8�) 4� [�� G�� $' f�3 �� .�� (�� 4� �n�� &��5 &�$� �c: � �� T S�)F8� ��$��:� �B� K��

4� &��4 �� !�� DB+� \��$ D�� e�#G \��$ Xp� ��? �+ �� .��B� ��5 �� *E/�.

�� 0��+D �A+B� !�� $+ &��4 ��B � �� &��+./�� &��=��#� 4� &��$� ��.

1. �V� <��B� �#� 4� s�8� ) <J� ��7#� >��hZ� >!;$U <J� ��; ��k &�$� �� >��./ ��#%�� k �B�C#7#�� &��+./�� .� 4� <BV � �� V; �; !�� $��B� 4� <��#, q�T� S�$; ��G �F*./� �� <+, K� ��a�� &�$� S? &4�� @; )$��

��; ��.2.��./ �#� �:�� *� ) \�� $�m�� !�� B�C#7#�� &��+./�� . �:�� ) �#� \�� 4� ��? ��.8� S�$; ��G

�� _: � �*E/� &�$� �7#�$8 �� .�� &�$� ��?��+� �� &� ��u�5 ��; &4)$. 3.��; �=�� )�`.� &��B�� ��B� <=�/� <U �� B+; !�� 7�g� $'� �.U

&��+./�� &�$� ��G#�!�� .30.�$�' ��$T ��`.�� #� [��`�Y �� L./�� D�, �; !/�� D�� #%�� . ��V��5 4� _T�V� t�� .U

�#V &��$V� �@; S? &��.8� q�T� �:/� �+.U �; !/�� F7� ) �� `� ��#� �� &� . $.V �� &��7�� ^�� $� !�� 4� �� .8$' $� &��+.��#�7� �V�� V �.V#� �V�T�) ��5 . S? <V�7�

!�� #G#� &$�#�@��; ��B�� G#� �� S? &4�� 5 S�B�� S��)? L�$8 . �B�V ��.��V� �V� H�J� &�$� >!�� .8$' ��7� FZ� 4� �; ��c,electroactive polymers �Vu�� $B*+, �; ��)� �� t$�� S�#��

�� S�#� �� ; �� @; �� $;. 31.existence proof : �V.`' {�V�� $V/�� S? �,#�c� s#� !�� $k �.8� �� i�`�� !�� &F�k $'�

4� &$�' �� $�? �B*� !�� )�� !�� #��G �� 4� �� +�� z+� !�� #� &)�# �; �# ��!�� a�� S? !�� &�$� �B�$�? ��#� . z��$� D�+ �� D�� D| �� S�'��$5 4�)$5 �� 4�)$5 ��

�� $; ��).2. ��Va�� >�V��+� �V� DV)� �� S? ��V��? S#Vk K7h�� >�� #G) !�� 4� �� .8$' F��? !��8#� ")� &�)��

!�� G#� ��? &)� q��.32. �V� S�#.� !�� DB+� ��#+� S�#�, �� S�'�V�$5 �� )$V.B7� zVc� 4� S? H$V.�; ) <V+U &�$V� S? 4� �)� ��

�$; ��`.�� ,X�� &�)? j+G ) D��)�� .!8$' ��.�� $�? ��$8 FZ� $�� , �� . &�$� ��?4� ��$; ��`.�� &��7�� S�$B/U.

2 -6�N�� (� �D!�� e�#G D�$.�� D.8�� &4�� . D�$.�� e�#VG o�V j��V� ��objective function <T��VU ) �V�

$J;��U ( ��; . �� &4�� <T��U >&4�� )$�� 4�minimization �# �� .`' F��.

Objective Function & solution space $�4 H�J� ��

1-��

12

f(x) ) o� j��g(x) �#�T j��#� �constraint ��./ . S�$; L+�F;�� a�� f(x) S�$V; <T��VU H��V��–f(x) !��.

�V�� .V�� <T��VU KV� j�� $'�unimodal ��#Vc�� $V�] �� )multimodal !V�� . <BV $V�4 H�VJ� KV�multimodal !�� . L+�F;�V� S? �V� �V; !V�� $V.'�F� �V+ 4� �B� j�� &��++�F;�� D�� 4�global �V�� V� )

L+�F;��local �# �� .`'.

Solution Space DB+� &��#G �,#+a� ��solution space �V� &�V�� V�k >&�� K� >�.//' >�.�#�5 ��#� �� ; ��#' ��

4� &� �,#+a�"��" ��. i#8 <B ��x∈R

�.//' #a./G &�n8 H�J� &�$� ^�#� �� $� s#�� &4�� H�J� �� XJ��$�� $J;� �U !�� . ��V� $V &�$V�x ��V�� )�[0,1]

�; �� #G)1 !�� $� �� . ��$� H#�x !�� $��$�.x={x1,x2,......} xi∈[0,1]

!�� DB+� ��#G 4� �B�x={0,1,1,0,...} ��. #a./G &�n8 �a�� solution spacen x={0,1}n !��.

��F�� F e�#G D�$.�� D.8�� #c� �; !�� *E/� �� 4� ��O(log(n)) >O(n2) <=�V/� &�$�easy

�� O(5n) <=�/� &�$�hard !��.

H�J�1

1-��

13

��; }$84 ��4) (item 1: 20kg; item2: 75kg; item 3: 60kg, item4: 35kg) L�� .�� V� �;��5 �� *; S4) $J;��U �� S? 4� &� �,#+a� $�4 L��#� L��;. �� 4�� j�� V*; S4) !V�� . �V,#+a� $V�4

4� ��, &�:

�V; !�� rY�) ��/� �� &�$� �*E/� D�� e�#G �.�7�1111 !V�� ) S? ��V�4�� j��V�190 ��$V' �V� &�$V� �V7)

$�#�@��;)�/�#�� $� �� #� �; (e�#G ��? �� !�� #a./G �� \�8.

�N�� (� �D ��4� �./� � �B� >�*E/� \��$ �� &��)� 4� �� k �� z�� e�#G D.8�� &�$� !�� DB+� $�4.

1- Analytical Approach :�./� H#�$8 ^�� $� • !�� $�h5 q. � ) �.�#�5 o� j��)�� " F��O��# *D�P (��0 *L�� (•�� *E/� e�#G �� $�' �� $T $`p $��$� ) �� S? q. � �V�? !� . �V �� S��)? !V��

4� [�� G�� 7�� &�2 !/�� DB+� K�.�7��? ")� �� F�� .•<B � :��E *J !#� �� 4� Q�# �D #� ��4�� P#4 *L��.

2- Algorithmic Search ) &#a./GDeterministic (D�� H#�$8 ^�� $� • L.��#�7� K� �*E/� <U &�$� )H#�$8 K� ��F7� �� (��$' �� U�$� . �� *E/� ��; D�� &�$�

�� .�� !�*��T D�� . &�$V� �7) K�.�7��? ")� �� ")� D�� (�S��0 ��S� ��#

• e�#G �� $G� ��' �� ' �� L.��#�7� $�#�@��;global ��$�.•<B � :+ L.��#�7� S�$; ��5!/�� DB+� � �.•!�� #� ��#�� 4� &� �*+G ��k &� � �� D��.

H�J� :��,� S�$; z�$� �+.��#�7� ")� ��Problem: Given a set of n numbers, return them in order from lowest to highest Solution space: All possible orderings of the n numbers (size is n!) Objective function: f(x) is number of elements out of order in some ordering x

1-��

14

3- #a./GSearch �$�' ��#p ��#� �� $�4 &��)� �� #a./G.

1- Enumerative Approach :��.��Y) ��+� �� •� ��$� ��#G ��+�� #•�# �� 8$�� e�#G S�#�, �� D�$.��•<B � : ^X; Kk#; �� <=�/� &�$�P !�� `.�� <��T

2 - j�� z� ^�� $� #a./GGradeint Descent • ��$' �� e�:.�� g�� K�)�� 7)� ��,X�� ^�� $� �� 8��c� !�� DB+� e�:.��(• �� j�� z�)�� .�� #G) �� ( �� a��G z� ��.�� g�� ) e�/U �g�� S? ��

#G ��$� e�)!�� $`p z� e�#G ��(•<B � : e�#G &�$� ��+n�global �� #G).

3- �8��c� &#a./GRandom search •��# �� $� ) ��$' �� e�:.�� 8��c� ��#c� e�#G S�#�� g�� &��•�# �� 8$�� e�#G S�#�, �� ? D�$.��) .�� !�� DB+� ) �� e�#G !�� DB+�(

4- Intelligent search �� Simulated Annealing > Genetic Algorithm�$�] )

1-��

15

- Hillclimbing •��; ��$� �$�? ) e�:.�� g�� K� .x• ��; ��$� �$�? ) e�:.�� &$�� g�� S? ��/+ ��yo �g�� $'�y 4� $.��x �#��y �# !/� ) e�:.�� &$��o �g�� $'�y 4� $.��x �#�x=y �# �� $B� !*�*+, ).

Genetic Algorithm•��# �� $� ) e�:.�� 8��c� �#g� �g�� &��• &(��$.�� ^�� $� ��G l�� e�:.��GA �# �� a�� .•�$�' �� #p j��G L ) �*�� L ��#c� &#a./G•�� #k) ��$�+ &�$� ��+n�

2 -7��D (U��In a classification problem, we have a set of things to classify, and a number of possible classes. To classify s we use an algorithm called a classifier. So, classifier(s) gives us a class label for s.We can assign a fitness value to a classifier – this can be simply the percentage of examples it gets right.In finding a good classifier, we are solving the following optimisation problem: Search a space of classifiers, and find the one that gives the best accuracy.E.g. the classifier might be a neural network, and we may use an EA to evolve the NN with the best connection weights.

�~�#� 1.computing {!�� k ��2.��F� H�J� {!/�k ��+G <��B� ) &�$8 <��B� 4� �#%��.3. 4� ��4 �� $��;�� "#� ��$��

33.#�c� L.��#�7� ��k��$�� , .34.��$�� H�J� $;| �� &��+./�� *+, &��'(�)35.!/�k �*�8 �� &��+./�� z�� .36. <=�/� &�$�P )NP ��F� H�J�

1-��

16

37.�� r�Y#� �� H�J� $;| �� &4�� &��)�.38.{��F� H�J� �� #a./G &��)�

2. Simple genetic algorithm

2 -1Biological Evolution

2-1-1Q�4�#� *��+� (��" (��"1 :V" �UD Survival of the fittest

1. <J� ��7#� ��#c�� #G) ��#G#� �'��4 <B�� | �� 4� �� <�� !��+U ).2. q8#� j�� o$c� �� ; ��#G#� ��./ �)�� U j�� s#� [��F8� ��$�3. �� ) �� .�� #� �+� �� +natural selection ��; �� D�� .4. j�� fitness �� $��$� �; !�� &#��m �� K� �� <��B� ��offspring !��.

(��"2 :��# � %� �� V�� Diversity drives change

1. W��'X V�� ) ��trait ��#� �)4 . Vc� &�� #Vn, $_=� ) �V@�� #V�8Phenotypic traits( �� ; !�� # ��; �B�� ) ��.8� s#��� c�c�� [��F8� &�� </� ��4. ��7#� �a�.� ��><J� ��? �� 5 _=�c� ��G z�;$�.

(YAX• �8��c� ��$��Z� ��;$� ) _=�c�<J� ��7#� �� ; �.8� �� i�`��!�� s#�� a�� j��.

2- �� L.��#�7�B�.�C� ��

2-2

• �� #G#� _=�c� z�;$�)_=�c� e�:.�� ( q8) !�*��T �; �g�� $. �� !��+G $. �� [:� � </� �� <�B � ��.

• �� $+ s#��"_=�c� e�:.��" �; ��# �� y,��s#�> <��B� ��; ��5 .• �� #G) �G�� t$�� &)$��.

2-1-2�� *��+� #$)# G#� �FG� �� !�� !8$ �5 H�U �� .�#�5 �#g� &C#7#�� L*, �V� r�Y#� �� ? <�� #�� ) ��4 ��# . 4� �VB�

!�� <��B� &�#E� ��)�.�� D�� . $� �� <��B�4 !V�� #.V�� 5. !V��+Gpopulation s#V��diversity ) [��4!m��)heredity �� &�$� e�:.�� )selection .

�;# ( � ��)Population !��+G 4� &� �,#+a� <��#,!�� . D��$��; !��p �#G#� K� <��B� 4� S�#� �+��$.

Z ( �[#�4Heredity �# �� <�.�� S��4$8 �� _=�c� <J� ��7#� �a�.� �� D��$��./ D��7�) �@; $��Z� t�� u�

\ ( V��Diversity 1. <��#,��./ �)�`.� &�)�U �� &� �c�c� &�� . �)�`� D�� L KV� D�� V�#' ) ��#VG#� 4� �#Vg� )

L rY�) D�� &� ��#'�� #G) �*.:� . 2. �a�.� �� <J� ��7#� �� ) z�;$��? �� #G#� D��7�) 4� �)�`.� _=�c� [�G . 3. ��; �� 5 �)�� <J� ��7#� �� s#�� <��B� ��$%� �� 4�. 4. !�� $. �� <J� ��7#� ) �� #� &#� �� c: � $��Z� !�G ��$+ �; �� ; !�� G ��.�� ) ��.8�

�# �� Kk \�� S? !�*��T. 5. In practice, the complexity of the factors that affect the mechanisms and dynamics of

evolution has not yet been sufficiently understood to allow the development of a universal formalism. Nonetheless, several formal models have been developed to address specific issues, mainly in the field of population genetics.

� (Selection 1. �� >�� &�$� �� e�:.�� !��+G 4� � :� ��fitness $�~�V� &�V�� &��*V/� &�$V� !V�� $.V �� .

�� e�:.��Darwin (1859) )Wallace (1870) ��$; b$� L�4#� S$T ��.2. #]^ Q ��: ; �� e�:.��X !/�� 8��c� . S? �� #VG) ��hV] j��V�� !��)�� ; �g�� XJ�

�� &�$� &$�~�� h] &�)? j+G �� $. �� #� �; �.��.

2- �� L.��#�7�B�.�C� ��

2-3

3. ��0N�� ? �� Ecological selection: ��#G#� �4�� ; ��#.� ) �.�� &$. �� g�� &��'�� [��F8� �� !��+G �� #� �� $. ��.

4. neutral evolution : �$V� �V� FV�� V�� <J� ��7#� ) !/�4 $� �`�� $m� �; �.8�� $��Z� �� G &��'(�)��# �� <�.�� </� �� $.+; .��F7� $��Z� �; !�� S? �� S� � D�� H�� <J� ��7#� �$� [��F8� �

��5. Sexual selection : $. �� <J� ��7#� !�*��T6. Purifying selection :e#�� ohU >[�G ) mutation( ��; �� L; �� e#�� #G#�.7. Positive selection :�� [��F8� �� `� SC [�G.8. Balancing selection �� Ù �� !��+G �� s#��#9. frequency dependent selection ) �/�G !��hG �� #n,fitnes �� $�~��..

10. �V��G ��$T �� #� ��#G#� 4� �n�� B*� >!/�� )? �� !�� #n, K� �; &4��.�� ^�� $� �8$p ��; �� h] �� !�� &�$� �#� . K� �� B*� !/�� $%� ��#� &�$8 �� , D�� !�� b$g� s#�.

11. &� ��#'heterozygous 4� $. �� homozygous �� .12. Baldwinian evolution ��; �8$�� G e�:.�� #� �� &$�'�� !�*��T �� #G#�.

��+"1. ��G e�:.�� S�#�, �� e�:.�� 4� ��fittest ��" D�$.��" �# �� $� ��.2. !8$ �5 �� F7� �� e�:.��progress !V/�� S? &#VZ7 �� . $V.�� "$�hV5 �V�� V� �VB*�

!�� g�� \��$.

�,�B _ 4 *��+� �4�O�Progress ��1: &��V; �V�� ) !V�� %8�U �T�8 !�� >�� *�T </� 4� $.�� </� S� ��7#� ��#� �� !8$ �5

��5 �� a�� ; �� S�B� ) S��4 ��)�*�8 \�� (!�� . <V/� D�� #� �� ; ��B K� H�J� &�$� \�V�� $V�4 �V�� .�� !�� $; �� <+, �*�T \�� ; &��B !�*��T !�� DB+� ��; ��$8 ��#� ��

�� i$8 $��B� �� )��? .�*�T </� 4� $.�� G </� �; �� #G) ��+n� �B�� a�.� �V*�8 \�� $'� �� *�T \�� .:� ��.

��2: �V�� o�V &�� !V�� V; ��, D�� >�� ? �� S�#� $.�� #� �� !8$ �5 $�� !�� _: � <�T 4� b$� ) o� �T�8 !�� 7) >!��.

��3: �#G#� �� ; !�� D�� !`' S�#� �� !8$ �5 4� �; &F�k �� V�� <V/� �� \�V�� V� ��'4�� ; �� 5 [��F8�. !�� <��B� ��$�� S�+ ��$�� D��.

2- �� L.��#�7�B�.�C� ��

2-4

2-1-3�+ �"` a�� ;# (a�� #$)#

1.�� L6E": 4 ��+� �� s#�base 4� ��, �;�� #G) ��#*;#� ��:adenine (A), cytosine (C), guanine (G), and thymine (T){Uracil(U) in RNA).

2. �G�� # � : $codon �; !�� #��? K� o$�� 7�� <��3 H)�VG �� ; !�� #*;#� ��!�� S� � .20 �� #G) �)�`.� �� #��? . 4� z��$� D�� 27 >DB+� s#�20 �#VG) ��? &��

��.

3. �� "`�� b�4�_: ��./ �� #��? 4� ��p ��k��*� �$�a�4 ��E�)$5 . ><BV �;concentration )

behavior ��; �� 7#� �� &��'(�) ��? 4� K� $ . D�E�)$5 &��4 �� #��? �)�`.� �7�� . ��? !�� #� �� )�`

• When amino acids are chained together, the chain bends and twists in the three-dimensional space. The properties of a protein are determined mainly by its shape.

• �# �� #� SC K� �� D�E�)$5 $ &4�/7�� . • �B�.�C ��#� $. �� >��#' $ ��)S? &��; �� (��./ �� . 4. c4N�4�E �� b�4�_ F� I"N ��: )&4�/7�� (�� E�)$5 4� &� �$�a�4)��C ( �)4#�)$; �#V �V� �.`'.

)��E�)$5 4� &�� $� ��+�(5. !��"# c4N�4�E: �.� K� �)4#�)$;DNA �� ? H#� ) �� ) ��# �� $�A !`G ��#c� �; !��

!�� )�`.� �*.:� ��#G#� . S�/�� 23 �V� �V��? &��V��#*;#� ��V�� V; �� #VG) �)4#�)$; !`G�� S#�*�� p D��k .�� S� � �� )4#�)$; !`G ><B.

2- �� L.��#�7�B�.�C� ��

2-5

6. a6�: ��E�)$5 ��7#*� �./��)��)4#�)$; >��C (�� $T . �V�; �� D�� H#*� �'(�) D�E�)$5 s#�

�$�] ) #� >!�#5 H#*� <J� .7. DNA: D�E�)$5 K� &��=#*;#� � �� B�.�C �;DNA (deoxy ribonucleic acid) !�� S?.• ��#� �+� D�E�)$5DNA �� $��Z� �� . D��$��DNA �V�� V� �T�V� !��m �#G#� $+, H#� ��+� ��). $V��

�$�' ��$T �+�� j � � }$�� B�?(• H��.��DNA !�� !m��) S��4$8 �� .8. d �"` : S�/�� B�.�C � ��)��=#*;#� �7�#�( �� ;DNA >�� $T ��)4#�)$; d �"` �� #��#. • c"` !�� #' K� d��#�C &��C <��; �,#+a� .!�� )�`.� � ��#' D�� #�C F�� . ) S�/�� #�C F��

l�V�� V�� V� $V%� �V� ��#' �'��u�5 �� #�C F�� ; �# �� .`' !�G D�� 4� ��$��$� ��$�� #;�� v$�� .��. �� $�4 <�~� �� !�� DB+� �#�C F�� '�F�

• gene duplication :�� #G) &��$B� &��C �.�• nongenic DNA, : 4� � :�DNA !�� H��8 $�] ) !/�� D�E�)$5 �7#� 9. d ��,: �# �� .`' d��#�8 >d��#�C K� ^��$� �� .�� #G#� ��.

• !V�� V�u�5 dV��#�8 ) dV��#�C D�� l��. ��hV' $Vm� �#VG#� �V'(�) �V�k $V� !V�� DVB+� SC KV�

(pleiotropy) .�� '(�) K� <��, !�� DB+� &��4 &� SC(polygeny) . �V�C ��.��V� �� KVk#; $V��Z� �# �� #G#� �� Kk#; ��$��Z� �� $a�� (e.g., height, hair colour)

Z (�X�� DNA DNA �� #G) H#*� �./ �� . &�� #G#� K� &��7#*� ��+�DNA �� &��./ . &�V�7#B7#�DNA <��2 !+V/T)� 4� �V; ��.V/ $��B� <+B� ��*� �.� Sugar-phosphate backbones ��V� �V�“Sides” )

2- �� L.��#�7�B�.�C� ��

2-6

complementary base pairs �� “ladders” �� .8�� <�B � .�*5 4� �4 �V; ��+�V ��V� s#�base �V�*;#�#�!�� .�� # �� #� �� . D��4 4� ��, ��#*;#�:

adenine (A), cytosine (C), guanine (G), and thymine (T). �.� )� $DNA ��#V �V� <V`T LV �V� ) $��VB� <V+B� . S�V <V`T DV�� Adenine �V�Thymine )

Cytosine ��Guanine ��/k ��. &)�U �.� K� $'� �; &#�� ACA S? &)$�)� <+B� ��TGT !�� .)!�� %UX� F�� q��g� ��, �� k $'� .( �� H�c�� D��weak hydrogen bond ��; �� a��.

\ ( ��X��!` !�� D�E�)$5 K� o$�� SC $ �$�a�4 �;20 s#� codon )�� #��? (!��.

1. Introns: non-coding segments :��; �+� ��5 �@��#�8 �#+� ) ��#� H��8 $�].Exons: coding segments : �� ; !�� [:� D��RNA �V� �� D�E�)$5 �� #��? ��7#� �� !�� ) <��

�4��.2. enhancer and promoter codons :: facilate RNA alignment with the gene .3. regulatory region ��; �+� ��7#� D�E�)$5 �; !�� U�#� s#+a�

2- �� L.��#�7�B�.�C� ��

2-7

4. genes may overlap :. Introns in one gene can be exon in the other.5. existence of a gene can affect the activity of the other (ephistasis), A more complicated

picture. 6. \�8DNA ��; �� D�E�)$5 ��7#� �B�.�C.

7. At any point in time, a given gene can be active, inactive, or moderately active. The activity level of a gene is used to indicate the rate at which the corresponding protein is produced by means of RNA.

8. Q b�4�_ � ;� �; &)� 4�DNA ��!�� S� � D�E�)$5 K� ��7#� &�$��B9D �.� )� 4�DNA D�E�)$5 �; &)�U �;

�.� K� ) �� G L 4� !�� $%� ��#�RNA &�VG �� S? �� ; �$�' �� <BT �V�=#*;#�U �V.8$' ��$VT!�� .. <+, D�� transcription ��#' �� . H#�RNA 4� $.Bk#V; ��V/� !V�� V�=#*;#� �.� K� <�� ;

H#�DNA !��

• Z ( �X��Translation: : RNA ~�U ��? �#G#� �� #��? �V; �� #V� !V`G &� �V�7#� �V� $Va��

D�E�)$5(translation) �# �� 4�� . • The translation process and its speed are controlled by the presence of special proteins that bind to the regulatory region of the gene

2- �� L.��#�7�B�.�C� ��

2-8

• The shape of binding proteins is such that they can bind only to specific sequences of nucleotides on the regulatory region. If such a binding happens on the regulatory region of a gene, in some cases that gene expresses itself by initiating the translation of its coding region into an RNA molecule.

2-1-4*��+� c$ "�+� :*e� � ;�2 �� #G) �7#*� $�JB� s#� :<J� ��7#� ) ��

��: ��#G#� �7#*� ��mitosis &�� V; �#� �� @; >L�/�� <+, �� H#*� S? �� ; �# �� #�23 ��; �� 7#� �� !�� )4#�)$; !`G.

During mitosis, the two strands of the 46 DNA molecules are separated and each strand goes to one cell. Each strand then rebuilds the missing strand by recruiting the complementary nucleotides. The process ends with two exact copies of the double-stranded DNA molecule, one for each cell.

*e� � ;�

Meiosis occurs during the production of sex cells (sperm and eggs). Gametes (sperm and egg cells) contain 23 individual chromosomes rather than 23 pairs Gametes are formed by a special form of cell splitting called meiosis

�;#( > E�� Xover1. During meiosis the pairs of chromosome undergo an operation called crossing-over2. Chromosome pairs align and duplicate 3. Inner pairs link at a centromere and swap parts of themselves

4. Outcome is one copy of maternal/paternal chromosome plus two entirely new combinations

5. After crossing-over one of each pair goes into each gamete

2- �� L.��#�7�B�.�C� ��

2-9

Z (� %� �"`Genetic Mutations <J� ��7#� �� DNA [�G ��k� !�� DB+�(mutation) ��; $��Z� S? ��.�� ) �# . &��*B �� [�G

��$T D�� 4� ��? 4� �n�� ; �# q�� !�� DB+� �*.:�. 1. _=�c� z�;$� Recombination mutations D�� [�G s#� D�� #�� <B2 � �� )4#�)$;�� S� . ��? 4� �� :� �;

��$' �� #�� $��B� �� . �V�4$8 �� ;$� _=�c� H��.�� $a�� ) ��5 �� l#�$� ��)4#�)$; $'��# �� .�# �+� ��7#� D��7�) 4� &��G �c�c� $��Z� D��

2. ��G �c�c� • Substitution mutations 4� XJ� ��#*;#� K� $��Z�A ��G �$�] ) . KV� �V� $Va�� !�� DB+� $��Z� D��

�� #��? K� &�$� �; S? �� #c�� ; �# � ��Gsynonymous ��silent ��#' �� . �V� $a�� $'� �7) S? �� # ��G �� #��? K�nonsynonomous �� H�� B�.�C $��Z� �; ��#' ��.

• inversion mutations �� ; ��*� �7�� K� S?2 &� �.�DNA ��180 ��$k �� G��.

Sperm cell from Father Egg cell from Mother

New person cell (zygote)

Sperm cell from Father Egg cell from Mother

New person cell (zygote)

2- �� L.��#�7�B�.�C� ��

2-10

• Insertion and deletion 4� ��#*;#� ��*� &� �7�� S? �� ;DNA �8�VY� S? �� ohU ��#V �V� .�� L�%�� $��B� �� )4#�)$; 4� � :� <J� ��7#� �� ; �.8� �� i�`�� 4 $�� D�� .

• The mutation rate of DNA in mammals has been estimated to be 4�10 nucleotides substitution per nucleotide site per year

2- �� L.��#�7�B�.�C� ��

2-11

2 -2 f �"` 5��7;#(��_ ��; �� &��$� �:/� �� <��B� 4� �*��B� &��+.��#�7� . D�� &��$� �:/�

1. ��./ ��C �,#+a� �; ��)4#�)$; ��# �� $��.2. �F��B� e�:.�� z�;$� &�$� �)4#�)$; ��k �� )��# �� $��.3. �F��B� �� 7�+.U� [�G 4� �n��)4#�)$; �# �� D��.4. �F��B�� &�$� ��)4#�)$; e�:.�� ohU ))!��m �,#+a� �� ( �# �� =��.

2-2-1 �6E ��'9B�Evolutionary Computing (EC)• ��EC S��)? !�� o� ��; �� <T��U �� o� j�� ; !�� e�#G .�; �� <��B� �� $%� D�� 4�

open-ended adaptation process �VT�8 zV��$� D�V+ �V� !V�� )�V`.� !�� </� ��4� �� j�� ) !��!�� L.��#�7� &�)? s#� ) !�TX�

• &��+.��#�7�EC �T�8 z*]��$�+ ��m� &��+.��#�7� �; �� K��X;• EC <=�/� nondifferentiable �V�

!�� g� $�] &�$.��5 &��4 �� &��.• &�� 5 &��+.��#�7�EC ��5 $� ��? ��+� ��$�� 7) �� &��4 s#��4 �V�� <��B� S#.� : !V��+G-

s#��-�� $T e�:.�� ) !m��) . ) �V�� &�$V� e�V:.�� ) [V�G >!Vm��) H�`.�� >��; &� �F��B� �� ? �)�`�� +.��#�7� ��.�� ? ��+� D�� FG �� !�� <J� ��7#�.

• , ��+��#�7� D�� 4� �� :Genetic programming, Differential evolution >Evolutionary Strategy >Evolutionary Programming

• fS �"` 5��7;#Genetic algorithms (GA) &4�V� ��V &�$V� �V; !V�� +.��#V�7� D�V7)� ~�V+.U� \�#� �B�.�C &��+./��Fraser �@� )Bremermann and Reed et al. �� 5\�#� ) ��Holland ��V� ��

GA �$; ��5 �$� . )$�� 4�Holland ��5 ��GA �� .• Initially developed by John Holland, University of Michigan (1970’s) • Holland’s original GA is now known as the simple genetic algorithm (SGA)

• ��$T D�� 4� L.��#�7� D�� .��!��begin GA

• g:=0 { generation counter } • Initialize population P(g) • Evaluate population P(g) �� • while not done do

• g:=g+1 • Select parent P(g) from P(g�1) �� :

2- �� L.��#�7�B�.�C� ��

2-12

• Crossover P(g) �� : • Mutate P(g) �� : • Evaluate P(g) �� : • Select survived �� :

• end while end GA

2-2-2��X�� ;# (c4N�4�E g�� :h��E !`: !�� &$�� !�� k �� K�

c4��4�E: !�� e�#G ��? 4� �B� ) !�� &$�� 7�� K� �; ��C �$�a�4

� ��): &� �,#+a��# �� e�:.�� 4�� &�$� �; ��)4#�)$; 4� .!�� 8��c� ��7)� !��+G

Z(Z�9�"# �D��N�# iD�� :� �*E/� $ ��4�� j�� j�X�� #�. �� 4�� j�� d ��,!��. L+�F;�V� �7�VJ� �� $V'�

S�$; ��1 &$�� 7�� K� �� o�� . ��4�� j��)d��#�8 (�)4#�)$; $�43 !��.

Phenotype: an array of size n/8 (Byte). Each byte represents an element of a vector of ASCII characters

Selection: �B�#�� $' �� e�:.�� D��7�) ��4�� j�� ^�� $� : • �# �+� e�:.�� D�$.�� F7�.• �# �+� ohU D�$�� F7�.• !�� rG$� $.��• �� L�+c�!/�� +n�

\ (*e� � ;� 1 (Xover � ��> E�: One-point crossover !�� S? &��)� 4� �B�.

2- �� L.��#�7�B�.�C� ��

2-13

2 ( �� k�)Mutation :CGA F�� [�G >�$B*+, �#�� &�$� &�� &��,#� �� 7) !/�� [�G $�*+, &��!�� 8�Y� L.��#�7� ��. �� ; �� G l�� #a./G [�G

� ( k��$0�UD �#�D �� <�B � �� G </� �; e�:.�� &�� [��F' &(��$.�� ^�� $� S��4$8 ) D��7�) D�� 4�.

a�e�1 �� H�J� !V��+G ��4�� >��7)� !��+G �8��c� e�:.��

) <J� ��7#� ><J� ��7#� &�$�Xover > ��#p �� >[�G!��+G ��4�� a� D.8� �T#� l$ D�� ,

2- �� L.��#�7�B�.�C� ��

2-14

a�e� 2 :!#�� #VG) ��`�$V � ) ��V� K�V� �� )$� ) ��#� >�h] �=�� S�B�� ; ��$�� $%� �� S��#.�� H�J�

�� .��$' �� $�� C ��.�� *E/� D�� <U &�$�Gene Representation

��C : K� o$�� SC $!�� '(�)HAMBURGER PRICE: 1 = $ 0.50 price 0 = $10.00 price ACCOMPANYING DRINK: 1 = Coca Cola 0 = Juice RESTAURANT AMBIANCE: 1 = Fast service 0 = Leisurely service

<J� �4�� )4#�)$; K� ��C �$�a�4101 ��)$V� ) �#V�+�? �� $'$�+ d�� #�8 ��; _: � �; j�$V�!��.

�D��N�# iD��Fitness Functions �� c.�� e�#G !��+G &�n,� 4� K�$ �� &��, ��4�� j�� . e�#G &��)4#�)$; >S��#.�� H�J� &�$�

��#� ��8 �; ��$�� #� �� !7�Usearch space �� <�B � �� . �� ) �� /� H�J� D�� ; ��; !T� !��.

&�h] &#/��$8 S��#.�� $'�10 �~�&�#�+�? > ) ��$�h5 �7�, >��; �� =�� d��#�C)�)4#�)$; (3 �.�� S?000

!��

&�h] �; �7��)� K� S��#.�� &�$� )0.5 �~�&��#� > ��$�h5 ) /�)$� �*� ��; �� =�� 111 �!� .

&�� ; �� 4�� j�� $'�10 ��4�� j�� 7��)� K� S��#.�� ? >L�$�� )4#�)$;8 ) H)�VG ^�� $�

��; (!�� e�#G D�$.�� . ��:.�� ,#+a� $'�)��5 �� (100 SC K�$ &�$� )4 �� #a./G &�n8 �� #c.� !7�U4100 ��

&4�� 5 &�$�GA > �� 7)� !��+G4 �# �� <�B � �8��c� ��#c� #n, .> Initial Population

H�J� D�� !��+G <; ��8 !��

2- �� L.��#�7�B�.�C� ��

2-15

$'�4 �# e�:.�� 7)� !��+G S�#�� 8��c� ��#c� ��? &��

$ ��4�� j�� 

#n, ��4�� D�$�1 ��4�� D�$�~�� )>6 !��.

Q��;#4 Z�9�"# �V��; ��7#� �� S��4$8 D�$. �� >D��7�) D�$.�� ; !�� S? �a�� o� . �V; !V�� V��? >D�$V.�� 4� �#V%��

�� 4�� $+� D�$. �� . �V�� )� 4� �B� ^�� $� �n,� ��/� ��4�� .�� B�� &�$�proportional fitness

�� # �� . �� !�#� �@�[��F' � )��$' �� <J� ��7# . �a�� ")�4� !�� ,: 1- ��+ #n,2 #n, ) �# �� ohU ��YElite ��+3 �# �� $B� ) �$�' �� $�? &�G. 2- #n,elite ��+3 �)� �� </� ��R3- #n,4 �# �� [�G ��k�M4- #n,1 )2 z�;$� �� G2 ��4$81 )2 ��; �� 7#� C5- </� D�� 1 ��4�� j�� &#n,7 !�� *E/� e�#G �; �� #G).

s#�$�� $�4 ")� �� !/��#� �� $G�

2- �� L.��#�7�B�.�C� ��

2-16

!7�U D�� M ��; �+� <+, ��`� . $�� $G�

&��+.��#�7� �� s#��EC �$V; <VU S�#V� �V� S? �� *E/� D�+ �; !�� )�`.� &��)�. �� DV�� $��V��

�*��B� &��+.��#�7� �� !7�� $�4 &�$.��5 1. REPRESENTATION SCHEME• Fixed-length string, K = 2, L = 3 • Mapping:

a. Left bit (Price): 1 = $ 0.50 price 0 = $10.00 price b. Middle bit (Drink): 1 = Coca Cola 0 = Wine c. Right bit (Ambiance): 1 = Fast service 0 = Leisurely service

2. FITNESS MEASURE : Profit in dollars3. CONTROL PARAMETERS FOR THE RUN• • Major parameters: Population size, M = 4 Max number of generations, G = 6 • Secondary parameters

• Probability of crossover, Pc = 90% is typical, but Pc = 50% for this small population (i.e., 2 = 50% of M = 4) • Probability of mutation, Pm = 1% is typical, but Pc = 25% for this small population • Probability of reproduction, Pr = 9% is typical, but Pr = 25% for this small population

4. TERMINATION CRITERION AND RESULT DESIGNATION• Termination Criterion: Global maximum (known to be 7) attained OR total of G = 6 generations have been run • Method of Results Designation: Best-so-far (cached) individual from population

2- �� L.��#�7�B�.�C� ��

2-17

a�e�3 � Simple problem: max x2 over {0,1,…,31} � GA approach:

– Representation: binary code, e.g. 01101 ↔ 13 – Population size: 4 – 1-point xover, bitwise mutation – Roulette wheel selection – Random initialisation

� We show one generational cycle done by hand

�l#�

1. ��$�� <��B� �FG�.2. {!/�k <��B� ) !8$ �5 �)�`�3. &��"e�:.��" !/�k &C#7#�� .

2- �� L.��#�7�B�.�C� ��

2-18

4. �� r�Y#� �� B�.�C H�� &��+7�5. �� r�Y#� �� <J� ��7#� ��$8.6. �� r�Y#� �$�? &��;#*� ��*+, ) L�� K�.�C L.��#�7� ��k#*8

3-�B�.�C L.��#�7� ��

3-1

3Genetic Algorithm

<c89 jG$� e�.;

3-1 ��X�� 6E�+ �"` 5��7;# �03�4�+ �"` 5��7;#

1. �� $��; �#*, ) �� >��c.T� �� . ) &4�� &$m#� K��B�machine learning �� e�/U ��? . �Vg� �� S? ��4�� j�� ) �*E/� &)� &��T �$�4 !�� `.�� <��T <��/� 4� &� ��$./' s�#�� &�$�

��h' �+� �$�] ) S�#� $�h5 q. � >S�#� 2. ��; ��5 �� u�5 &��#G ��#� �� ) �� [#5 �� v�F� &#a./G &�n8 3. B+� �7) �� e�#G �B��F� �� !,$� �� #V ��~#V� e�#VG S�$V; qV�T� !�� D . &��V)� )$V�� 4�

��;$� ) &��+.��#�7�Field specific ) ��V� �*EV/� &�$V� �� L�%�� (, 4� &$V.�� $VB*+GA ��#V+,��.

4. GA )�� &��+.��#�7� ) (�� >��.�#�5�� &�� <=�/�$�] �V� j�� q. � ��? �� ; �8��c� >�g� �� #G)� �unreliable ��./ &�)$Y e�:.�� >!�� .

!#� S�#� �� #�� *+, D�� *p� ��$� Algorithm parameters: Population size, mutation operator, recombination operator,

mutation rate, recombination rate, survivor selection mechanism Coding: representation of individuals Let t = 0 be the generation counter; Initial population: Create and initialize an nx-dimensional population, C(0), to consist of ns

individuals; usually random while stopping condition(s) not true do

Evaluation: Evaluate the fitness, f(xi(t)), of each individual, xi(t); Parent selection: Select parents; (conventionally done probabilistically)

3-�B�.�C L.��#�7� ��

3-2

Recombination: Recombine pair of parents to create offspring; (perhaps 90%) Mutation: Mutate the resulting offsprings; (perhaps 1%) reproduction (perhaps 9%)

Insert offspring into population Evaluate new candidates Survivor selection: Select the new population, C(t + 1); Advance to the new generation, i.e. t = t + 1;

End

3-2��4� �6�9� h��E h��E1 �E ��U� �#�D(��0: ��!�� .//' �#� d��#�8 &��#� ��#Vc� �V; !�`�; ��

"e#�" "\�#.�" )"��" �# ��$��. ��$� ��#c� �;0 )1 �# �� S� �

3-�B�.�C L.��#�7� ��

3-3

$�� &��7�J� &�$V� �V; �� S� � �� .//' �; _�c:� 4� �� #+� $�4 H)�Gopinion> ��V,�) >�V��!�� `.��.

&$� �� #G#�l &�� k ��$' �� 8$�� .K ��GA $��$�2 $�� SC $ ) !��0 ��1 �� #� ��

�$�' h��E2 �E m �Y ��U� �#�D (��0(Integer): f� d �"` d��#�8Integer �� S? ��

&��2 �� #� �� . d��#�8 <J�98 S? d��#�C �; <01010100>.!�� .4� d��#�8 !V�� d��#�C <��? ��

0 27 +1 26+0 25+1 24+0 23 +1 22+0 21+ 0 20= 84.

Gray code � �� &$�� ; �k $'BCD <B � ��)� D�� 7) �$; ��`.�� S�#� ��Hamming Cliff �V� �� #G)

4� S�� &�$� �; �� D��7 ��8 ��0111 ��1000 ��4�� ; ��$�Hamming Distance $��$�4 !V��) $��Z�4 !�� ( �T�V�� D.V�� $+�� d��#�8 �� Kk#; $��Z� d��#�C �� Kk#; $��Z� s#Y#� �� *E/� D��

!��. <�� 4� <B � D�� j8� &�$�Gray code �*p�8 �; �# �� `.��Hamming �)�a� &��@��#�8 &�$�1 !��

h��E4 �#�D �EJob schedules : _�c:� �� $V; ��`.�� &$�� ")� 4� �� ;. &�$V� $'� H�J�7 c.�� $c, ) r�p &��4 �� ; �� `�A)��$; ��`.�� $�4 ��; 4� S�#� �� # �� .

XJ� )� 4� [�� 4 �� $'�4 S? �� 2 �# �� c.�� 7�#.� !��

3-�B�.�C L.��#�7� ��

3-4

h��E5 �#�D �ESequence � :� S�#� �� .�7��8 �7�#�� S� � �.//' ��#c� �; !�7��8 $

�$; �8$�� `7� o$U K� �� .�� S� � �$�? ��a�� z��$� �$�a�4 �� o$U $ !��T#� . �V�� $T $'� XJ�6 ��G$`�# ��4�� $B� S)�� 7� o$U K� K�$ �� F [�; o� ) �V�� # �� c.�� `

F.��#V� !�� @� ^))�� 4� s)$ S? d��#�8 �; ��……..!�� 4 !�� ) . �#�D30 ��30!=1032 �� f� ��D Z#)

h��E6 �E �#�D (��0 �#�&# ��B&#(Real) : d��#�8 $'�real �� #V�8 D�V� $V�4 H#�$8 ��

��$' �� $T $� �g�� .//' d��#�C ) d�� .

3-�B�.�C L.��#�7� ��

3-5

R= min +(max � min)*Code/(2m-1), S? �� ;[min, max] ) d��#�8 �� ) �;m D#7)4� !�� e#*g� . <�7� �� e�#VG �F�.��#V; &�g� V��$��

�� ) ��? �� !��T� 4� $.�� !� D#7)4��#� ��#:.

h��07 �E ��B&# �#�&# �#�D ��B&# Real-Valued Representations ��, K� d��#�C ")� D�� real !�� . H�V� <V��8)$5 �V�� !V�� `� �� 4�� #� ~�� !T� �.T) ")� D��

�+�5�# . ��, $ ")� D�� !�� +�5�# H�� <B $.��5 K� d��#�C �� . 4� �V.T) �*��B� �� &�$�*+,�� i$8 �.//' ��; !7�U �� # �� `.�� U ��;

*L��n ��D �*E/� $'�Nvar �� S? �)4#�)$; �� .�� S�#� �� #��.

�� $�Z.� )� S? �)4#�)$; �# �� #a./G &�� )� j�� K� L+�F;�� .T) XJ�

�� .�#�5 �� .//' !�� DB+� �$�Z.� . $'� �V� �V�#�� S? �.//' �)4#�)$; >�� $�Z.� ��k �)4#�)$;

��$'.

��+" 5��

�; �� c.�� &#�� $�? ) �$; ��`.�� G ��,X�� 4� �� ; ��c.�� : 1. �)� ~�� B�.�C &�$�*+, �� e�#G S�$; ��5 �� 2. �$��$� �� !�� e�#G &)�U �; ��n8 3. �� ; _�c:� �#�� F� �� $�m�� L.��#�7� �$B*+,

3-�B�.�C L.��#�7� ��

3-6

4. d��#�C��*� !T� &�n8 &#a./G �$� �� ~�� 7)��; �� L; �� $�+ !,$� . �8$� F7 &�Vn8 S#Vk#a./G �$G� �� ) �V� �V.8$' L.��#V�7� 4� ��V�u�5 &�V��#G D.8�� #� �# �� *E/� �� <�T 4�

�#. 5. �V�� )�V`.� #Vn, &��V�� e�:.�� <�� s#Y#� �; �# �� ./G$� �.T) ��; ")� !�+� s#Y#�

�� u�? ��.8� �� i�`�� u�5 �B��)$.B7� <��) �� S��$'�#� �� U�$�. 6. !�� ./��) �*E/� �� z�� ; ")� e�:.�� . �VB��)$.B7� ��V� �U�$� �� XJ� V��D DVB+� ��V'

4� !�� &�+/� �� &��A=AND )O=OR $�� ) �# ��`.��real ��# e�:.�� 7. �; �$; !T� �� 7) !/�� &� �� ; z�� e�:.�� D�� !�� • Preserve locality. :�� !� �� "��# �� $%�� . • Be closed under genetic operators: : �� &'�(� )��'* �� "��# �� $%�� ) �&�

�,� ��&� -&/� 0��1 �� 2��

h��E8 h��E ��4� �7�� genetic programming �� ) �V.�� ; 4�evolutionary programming �V; 4�finite state macine

(FSM) ��.�� [��+� &�$��# �� `.��.

3-3Initial Population �� 7)� !��+Gsufficiently large and diverse �� [#5 �� #a./G &�n8 &�� :� ��+�.

• �� F ) #a./G &�n8 �� 7)� !��+G ��" ��V�4��" �� .V/�. $V� S? ��V�� ~#V+�� . �� %�� #$ Q�� Y N# ) �� ./��) LaU ) #a./G &�n8�� !�� .

• If the evaluation of an individual requires real-world experiments, as in the case of robot evolution, the population size is often smaller than a hundred individuals.

• !�� !��#�B� �8��c� ��#+, ��7#� ")� .��U ) �� #G#� ��7)� ��,X�� $'� �#G) e�#G �� $; ��`.�� 7)� !��+G S�#�� ^$.�� e�#G &� �.8�� [�G 4� S�#� �� .��.

3-4 �� D��N�#Fitness Functions �� 1 �D��N�# iD�� #�U� : &��, ��4�� j�� c.�� n,� 4� K�$ � �; �V�$�� V� �.V/��) S?

/�!�� *E . • !�� DB+� &��#� �� 4�� j��8� ��k� multiobjective �� ; �+�5�# H�� U�$� <J� >

Lift >��4Drag L; �� &�)�� gT��$'. !�� =�� &��(��$.�� 8� ��k &4�� &�$��# �� L.� b�$� L�+c� �� ")� |�:�� !�� .

3-�B�.�C L.��#�7� ��

3-7

• :� D�� ) !�� 4 ��/� ~#+�� [ S��4 $. �� V� �V� ��Vc.�� #� �� . 4� �� [�; &�$� )$�� !�� T�) �B�F�8 L./�� B�� ) !�� u�5 ��/� L./�� H�� ; &��#� ��

)��T�) �B�F�8 ��+� ��*+, K� >�� ( 4� !�� DB+� �� :� �� SID �S��U� a�S� 5�S� ��P#4 �� &�$��$; ��`.�� 4�� j�� D. �V�? �V� !V�� ; ��#G !�� DB+� �$�4 �$; !T� �� z�$��

�# � �*p� L./�� T�) ��5.

�� 2 Subjective fitness • Subjective fitness is the name used when human observers rate the performance of evolving individuals by visual inspection. An early instance of subjective fitness was the biomorphs software described by Dawkins (1986) where an observer was presented with a screen filled by insect-like creatures whose genes defined their morphologies. The user could select individuals for reproduction by clicking on their shapes. Subjective fitness is often used in artistic fields, such as the evolution of figurative art, architectural structures, and music, where it is difficult to formalize aesthetic qualities into objective fitness functions. Subjective fitness can also be combined with objective fitness (Takagi 2001).

�� 3 ��" N� ��#proportionate fitness : ��$VT KV� ) $`p D�� ; ��/� 4��.�� >q*g� 4��.�� &�G ��# �� `.�� !�� $.�� /�� &�$� ) �$�' ��.

�� 4 Scaling : �#%�� &�$V� $V�4 &��V)� 4� >&#VT <V��#, <V��; S� z7�] 4� &$�'#*G ) s#�� D��

�# ��`.�� !�� DB+� <��#, 4��.�� <��. ")� )�scaling ��$�� 4�

�# �� L; �4��.�� D�� *p�8 ")� )� D��

�� 5 N� ��#�# F�� ?X D�� n,� !7�U�� ^�� $� �# �� ? �� ) z�$� ��4�� j�� . �� S�#�� [��F' ��

�$�' �� $T ��`.�� #� S�#� $.��. �)� �� D�� 4� e�:.�� 4��.�� ) &$�$� �� $m� . �V4��.�� $V'� �V��20 �@�4 �@�1 &�4��.�� 3 >2 >1 �� T$8. D�� VY ��V�k $V #Vn, $ z��$� V/�� V�5 ��B��

• Rank-based fitness assignment behaves in a more robust manner than proportional fitness assignment and, thus, is the method of choice.

• �� )��1 24�

3-�B�.�C L.��#�7� ��

3-8

Fitness 7 1 4 2 8 Proportional fit. 0.318 0.045 0.181 0.09 0.363 Rank based fit. 2 5 3 4 1

S: selective pressure �� e�:.�� \�#.� H�+.U� �� $�$� �#G#� e�:.�� H�+.U� �� !�� $��$� .SP !V��+G 4� �V+�� [:� �� D��5

�� !;$ </� ��7#� �� .SP �;�� [:� �� ~��)�$�$� (��# e�:.�� <J� ��7#� &�$� . s#V�� !V7�� V;!/�� z�� ) �� [�;.

probability of the best individual being selected compared to the average probability of selection of all individuals

Linear Ranking : $'�N ) !��+G ��r �� K�$ ��)r=1 &�$� ) D�$V� D��5r=N D�$�~�V� &�$V� ( )SP ��selective pressure 4��.�� 4�� !�� $��$�

11)1(22)(

−−−+−=

NrSPSPrFitness

Linear ranking allows values of selective pressure in [1.0, 2.0]. �� &�$�Proportional linear ranking �n,� 4��.�� s#+a� �; �# �� `.�� $�4 �g�� 4�1 �# ��.

1)(

]11)1(22[1)(1)(

1=

−−−+−==

�=

N

iip

NrSPSP

NrFitness

Nrp

!7�U ��SP=2 �� $`p D�� &)�/� �*p�8 �� n,� ��SP �� c.��.

�� 6 �?X � ^ �# F��Non-linear ranking : 4� $.V �� ) �Vg� $�] !�UXp 4��.�� !7�U D�� SP

��$' ��#� ��

X is computed as the root of the polynomial:

3-�B�.�C L.��#�7� ��

3-9

Non-linear ranking allows values of selective pressure in [1, Nind - 2]. �� $`p D�� &)�/� $�] �*p�8 �� n,� �� !7�U D�� SP �� c.��.

�� 7 �6 �"�"H�E# �# F��Exponential ranking

�� 8 Multi-objective Ranking : ��V�4�� j��V� K� \�8 #n, $ �; �#� D�� $� }$8 �*�T &��J��

��; �� D�� #n, !��Y) �; ��. �V�� V�.� &�V��'(�) &�� V�� #Vn, ��T�) &��7�J� ��) j��V� �V�k�� #G) ��4�� (�� T��.� !�� DB+� �; . ")� �� !7�U D�� multi-objective ranking ��`.��

�$;.

3-5 k��$0> E�� #�D Selection [��F' &�$� �*E/� )� z�;$� &�$��,#+a� �4�� )�B��[��F' �. ��# g�?�.

3-5-1(&�I� FN#�"# 1. ��# ��) [��F' �� !��+G ��+� 2.Truncated rank-based selection

")� D�� n ��# �� e�:.�� <J� ��7#� &�$� H)� $`� . !��+G 4� !�� DB+� H�J� &�$�100 ��20 H)� $`� K�$ ) e�:.��5 ��+� ��7#� �� </� &�$� �#� 4� �@; .. $'�n �� &)�/� �� !��+G $. �� Kk#;

��)? �� !�� </� ��7#� .. D��V5 4�V�.�� ) ~�V� ��V�.�� LV,� <J� ��7#� &�$� &��4 �� S#k ")� D�� )$Y ��? &�$� q�T� ��4�� j�� # �� e�:.�� .

3.Tournament selection �8��c� ��#c� &�� ")� D�� )��4�� j�� G#� S)�� ( ��7#� &�$� ��4�� j�� ^�� $� �@� ) e�:.��

��# �� F' $� <J� .�# �� a�� B� ��; D�� D��7�) 4� K� $ e�:.�� &�$�. • If nts, is not too large: tournament selection prevents the best individual from dominating, thus having a lower selection pressure. • if nts is too small: the chances that bad individuals are selected increase. • If nts = ns, the best individual will always be selected, resulting in a very high selective pressure • if nts = 1, random selection is obtained.

Tournament selection: Randomly select q << λ individuals. Copy best of these q into next generation.

3-�B�.�C L.��#�7� ��

3-10

Repeat λ times. q is the tournament size (often: q = 2). All methods above rely on global population statistics Could be a bottleneck esp. on parallel machines Relies on presence of external fitness function which might not exist: e.g. evolving game players Informal Procedure: Pick k members at random then select the best of these Repeat to select more individuals

3-5-2k��$0 ��+#�1. .Random Selection • �� 5��6 �78� 9�� ;�� $/4�'* $� �� . �� < =�� SP 5��6 ��

24� �� 78� �>��* � &78 �� . 2.Roulette wheel selection

!�� $. �� ~�� 4��.�� #n, �� 8��c� ")� D�� . ��V�� zV/U $� �n,� �; !�� $�� 4� S? ��a�� ")��gT �#� 4��.�� H#� �� \� &)� &�1 ��; �� H�Z� �� . �@� &��V, D�V� �8��Vc�0 )1 �#V �V� �V�7#� .

�� S? ��gT &)� �8��c� ��, �; ��$' �� e�:.�� &#n, .� ��; D�� [��F' &�$N #n,N �#V �� $B� �� .� H)�G�� S� � �$�? 4� �7�J. �� #� \� ��gT &)� �;.

Fig. 3-3: Roulette-wheel selection

• �$81 �$8 ) D�$.��11 ��V� e�:.�� &�$� �/�� ; !�� D�$.`��Y . ��V, H)� �8��Vc� e�V:.��

0.8 e�:.�� $a�� ; !�� ? �$86 !�� . �$8 e�:.�� ; �� #� &#�� &�� 8��c� ��,�2 >9 >1 !�� .�� 5 �� .�� z��$� D�+ �� ).

• �� $� �� 9�%�� "�&� �* ?@ ��%� !�� )�� N A��@�B �� "N A��C �A�� &A78 A�� .

3-�B�.�C L.��#�7� ��

3-11

Roulette Wheel Selection program Let i = 1, where i denotes the chromosome index; Calculate p(i) sum = p(i) Choose r ~ U(0, 1); while sum < r do

i = i + 1, i.e. advance to the next chromosome; sum = sum + p(i)

end Return xi as the selected individual;

>�� ")� D��2 �� $��.

• �# e�:.�� U 4� [�� &#T #n, K� !�� DB+� ) !�� 4 e�:.�� ) • ��$�� e�:.�� &#T �$8 Xp� !�� DB+� �8$� 4�. • �� 4� $��4 �*�� B� 4��.�� 3. Stochastic universal sampling

�� 8��c� ")� �� #n, ��+&)�/� ��.�� S�#V� �8��c� <�7�� 7) ��V e#V� #Vn, !V�� DVB+� e�:.�� B� ��5 �)$V� �V*�� #V+�� e�#G &#/� !�� DB+� d��#�C ) . �)� ")� �� !V�� DVB+� �V� �Vn,�

~�� /� !�UXp ��)�� !�� +�� Xp� ��? 4� �n�� ; !�� DB+� ) ��; ��5 �� L�� D�$. ��.. When roulette wheel selection is used to create offspring to replace the entire population, ns independent calls are made to Algorithm 8.2. It was found that this results in a high variance

3-�B�.�C L.��#�7� ��

3-12

in the number of offspring created by each individual. It may happen that the best individual is not selected to produce offspring during a given generation.

�# �� `.�� G$� !�� S� � <B �� ; �.8�� $��Z� �8��c� ")� 4� )$�� 4�.

Stochastic Universal Sampling for i = 1, . . . , ns do

�i(t) = 0; end r ~ U(0, 1/� ), where � is the total number of offspring; sum = 0.0; for i = 1, . . . , ns do

sum = sum + p(i); while r < sum do

�i ++; r = r + 1/�;

end end return � = (�1, . . . , �ns );

3-6> E�� Crossover $�*+,Xover z�;$� ��recombination �� )�� !m��) <+, . !��+G &��.`G ��V z�;$� $��B� ��

��B�� a�� &��G </� ) . l�� D�� ; ��; �� 7#� �*�T l�� /+ �� &��G l�� *E/� <U $%� 4��$�' �� $T ��4�� #� �@� . z�;$� �� $�� , ��exploitation �V�� )�� g�� q�T� ��$� ��.

)� �� 4� �B� �� z�;$��# ��a�� !�� DB+� $�4 " • asexual, where an offspring is generated from one parent. • sexual, where two parents are used to produce one or two offspring. • multi-recombination, where more than two parents are used to produce one or more offspring.

3-�B�.�C L.��#�7� ��

3-13

> E�� #��_crossover rate, pc

�; !��+G 4� &�p�� z�;$� ��# �� !7�� Pc ��# �� S�� The crossover rate, pc, also bears significant influence on performance. With its optimal value being problem dependent, the same adaptive strategies as for pm can be used to dynamically adjust pc.

• �B�� K� e�#G �� #�� z�;$� _�:V � <V��T �V.U�� V� �V�; �V� e�$� �� e�#G �B�� !�� z�� #%�� D�� &�$� !/�� V�4�� j��V� �V� ) ��V�� &��7#� S��4$8 \�#.� ) D�$.�� 4�� j�� ;

�$� �� D��5 �a�.� $'� �# �/�� !��+G �� S�$�� &�Xover �� D��5 ��.

3-6-1 > E�� ;#44� ��D > E��1 One-point crossover

One-point crossover can be applied to discrete and real-valued representations. It consists of randomly selecting a crossover point on each of the two strings and swapping genetic

material between the individuals around this point.

> E��2 Uniform crossover : Randomly generated mask > E��3 Multipoint crossover

Multipoint crossover consists of randomly selecting n crossover points on the two strings and exchanging genetic material that falls between these points. Uniform crossover consists of

exchanging the genetic content at n randomly chosen positions

Generic Algorithm for Bitstring Crossover Let x1(t) = x1(t) and x2(t) = x2(t); if U(0, 1) � pc then

Compute the binary mask, m(t); for j = 1, . . . , nx do

if mj = 1 then //swap the bits

3-�B�.�C L.��#�7� ��

3-14

x1j(t) = x2j(t) ; x2j(t) = x1j(t); end

end end

MASK �� 6�& �#�DXover One-Point Crossover Mask Calculation Select the crossover point, � ~ U(1, nx � 1); Initialize the mask: mj(t) = 0, for all j = 1, . . . , nx; for j = � + 1 to nx do mj(t) = 1; end Two-Point Crossover Mask Calculation Select the two crossover points, �1, �2 ~ U(1, nx); Initialize the mask: mj(t) = 0, for all j = 1, . . . , nx; for j = �1 + 1 to �2 do mj(t) = 1; end Uniform Crossover Mask Calculation Initialize the mask: mj(t) = 0, for all j = 1, . . . , nx; for j = 1 to nx do

if U(0, 1) � px then mj(t) = 1; end • end

3-6-2 > E�� ;#4 4� ��B&# > E��4 : linear operator D��7�) )� 4�3 �V� e�V:.�� D�$V.�� 4� �V� )� �; �# �� 7#� ��4$8

��#. x1(t) + x2(t)) , (1.5x1(t) � 0.5x2(t)) and (�0.5x1(t) + 1.5x2(t)).

> E��5 Arithmetic crossover Arithmetic crossover instead creates a single genotype by taking the average of n randomly chosen positions of the two genetic strings.

#n, �a��i !�� &��. !�� 5 $.��5 e�:.�� ")� ��

•

3-�B�.�C L.��#�7� ��

3-15

•

• > E�� 6 Extended line recombination

��E �� ;� �� 5�� Q��;#4 Q D (E �?X �4� #� �"N�, . VB*� !V/�� \� &)� ��T� ��4$8 !��T#�� # �� D�� &$�� $.��5 �� ; �� S? o�$�� &� ��U�� #� ��.

Inside this possible area the offspring are not uniform at random distributed. The probability of creating offspring near the parents is high. Only with low probability offspring are created far away from the parents. If the fitness of the parents is available, then offspring are more often created in the direction from the worse to the better parent Offspring are produced according to the following rule:

> E�� directional heuristic

�;x2 4� $�� x1 ��.

> E�� 8 the blend crossover (BLX-�) :�# �� 7#� ��#�� 4$8 ")� D�� .

with �j = (1+2�)U(0, 1)��. The BLX-� operator randomly picks, for each component, a random value in the range

BLX-� assumes that x1j(t) < x2j(t).

�� = 0.5 �� ./� D��7�) D�� *p�8 �� ")� D�� 4$8 <�� ; �� 5 �� . DV�� $V'� �� 4 �*p�8��? �� !�� 4 F�� D��7�) ) ��4$8 D�� *p�8.

> E�� 9 �� (?�#4 Michalewicz et al. developed the two-parent geometrical crossover to produce a single offspring as follows:

3-�B�.�C L.��#�7� ��

3-16

> E�� 10 simulated binary crossover (SBX)

Deb and Agrawal developed the simulated binary crossover (SBX) to simulate the behavior of the one-point crossover operator for binary representations. Two parents, x1(t) and x2(t) are used to produce two offspring, where for j = 1, . . . , nx

where rj ~ U(0, 1), and � > 0 is the distribution index. Deb and Agrawal suggested that � = 1.The SBX operator generates offspring symmetrically about the parents, which prevents bias towards any of the parents. For large values of � there is a higher probability that offspring will be created near the parents. For small � values, offspring will be more distant from the parents.

> E�� 11 � JAY �D >�� !#�"N�, �#�� Wei et al. �V�; �� 5��D (��D#4 !G �D��N�# �#�U� �D ��_ � !#�"N�, �#�� (E . $V.�� V5 �V��

^�� $� S��4$8 �@� ��; ��7#� $. �� 4$8absolute fitness measure ) e�:.�� D�$�$� ) ��; !��T� L �� ")� �� 5 ��simulated annealing ��; !��T� . $'� !�� $� ��4$8 f(x'i(t)) < f(xi(t)) ��

�;� is the temperature coefficient )� (t) = ��(t�1), 0 < � < 1 ��V�$� ��V5 ��#c�� $�] �� !��

!�� The above metropolis selection has the advantage that the offspring has a chance of surviving even if it has a worse fitness than the parent, which reduces selection pressure and improves exploration.

3-6-3��;#4 �� > E�� > E�� 12 ��;#4 �� D �E �#�D �� E��

Bremermann proposed the first multi-parent crossover operators for binary representations. Given n� parent vectors, x1(t), . . . , xn�(t), majority mating generates one offspring using

where n’� is the number of parents with xlj(t) = 0.

!�� $ ��!�� D��7�) !�$J;� �� 4$8.

3-�B�.�C L.��#�7� ��

3-17

> E�� 13 ��/U ��7�) ��k Michalewicz ")�nu ��; �� 7#� ��#�� 4$8 K� �; !�� $; �� 5 �� $�4 ")� �� 7�).

with . > E�� 14 �� g��)

The geometrical crossover can be generalized to multi-parent recombination as follows:

where n� is the number of parents, and

> E�� 15 $E�� aJ �"N�,*U[ Q��;#4 :(unimodal distributed) UNDX : �a�� 3 ��; �� 7#� ��4$8 $. �� )� �7�) . $. �� D��7�) �� ")� D��3 � n� � ns S�#� �� F��

�� L�+��. !7�U D�� nu-1 4� �7�)nu �m F;$� ) ��$' �� e�:.�� 8��c� ��#c� ��< )��D�� ( ��V�� ?�# ��

From the mean, n� � 1 direction vectors, dl(t) = xl(t) –x� (t) are computed, for l = 1, . . . , n� � 1. Using the direction vectors, the direction cosines are computed as el(t) = dl(t)/|dl(t)|, where |dl(t)| is the length of vector dl(t). A random parent, with index n� is selected. Let xn�(t) –x� (t) be the vector orthogonal to all el(t), and = |xn�(t) –x� (t)|. Let el(t), l = n�, . . . , ns be the orthonormal basis of the subspace orthogonal to the subspace spanned by the direction cosines, el(t), l = 1, . . . , n� � 1. Offspring are then generated using

where ��4#� �� $ i#8 H#�$8 �� F;$� H#U�$; ��7#� S�#� �� D��7�) ��$G . K��F� ��4$8 ��7#� H�+.U� ")� D��

!��? �B��F� �� 4� $. �� D��7�) <�m F;$�. !7�U &�$� �� ")� D�� <B4 &�7�)n� = 4 �� S� �.

3-�B�.�C L.��#�7� ��

3-18

> E�� 16 SPX 1 : simplex crossover (SPX)

: ��# �� e�:.�� 7�) )� 4� $. �� ")� D�� ; D�$.��x1 D�$�� )x2 �V��. x�V�7 S)�V� D��V7�) D��V�� S�$;x2 ��? �� !�� $�4 H#�$8 �� 4$8 ) ��.

> E�� 17 SPX 2

")� D�� n� = nx + 1 &�n8 4� $��B� 4� <�./� �7�)nx &��# �� e�:.�� . For nx = 2, n� = 3, and

the expanded simplex is defined by the points

for l = 1, . . . , n� = 3 and � 0. Offspring are obtained by uniform sampling of the expanded simplex.

> E�� 18 Q��;#4 N# �+� aJ �"N�,PCX:

3-�B�.�C L.��#�7� ��

3-19

D��7�) <�m F;$� H#U �B�? &�G �� 4#� ")� D�� ) ")� ��UNDX ( 4� �VB� H#VU ��$V' ��7#� D��V7�)�# �� 7#�

For each offspring to be generated one parent is selected uniformly from the n� parents. A direction vector is calculated for each offspring as

where xi(t) is the randomly selected parent. From the other n��1 parents perpendicular distances, l, for i≠= l = 1, . . . , n�, are calculated to the line di(t). The average over these distances is calculated, i.e.

Offspring is generated using

where xi(t) is the randomly selected parent of offspring xi(t), and el(t) are the n� �1 orthonormal bases that span the subspace perpendicular to di(t). The effect of the PCX operator is illustrated in Figure 4.2(c).

> E�� 4�Gene Scanning

Q�# 5 �� 4�multipoint Xover ��# ��;#4 �� (D. �; !�� $T D�� 4� ��; �#��: �7� (�$�' �� &�G ��C �� $ �� ; !�� &�� &�� )4#�)$; $ . �� D�7)� ��.��) KV� !V�� DB+�

�� &�� ,� �� .�� (��$' �� e�:.�� . �� D�� marked component ��#' ��. e (scanning strategy �4$8 �� D�� #� SC �7�+.U� j�� D��7�) 4� K� $�� . w (marker update �$� �� SC ��7�) 4� �B� 4� H�+.U� j�� ) �# �� ;�� &�� @�. � (��# �� SC � �� +� �� D�� .�$�' <B <��; ��4$8 ). �V�k �V� �V��)4#�)$; �V; zV��$� D��

��# �� L�/�� [:� . ")� $�4 <Bdiagonal crossover �� S� � �� . S? �� ;n 1 ��)�V�� Vg��; �� _: � �� .�� S� � �� ? <B �; ��? �#G#� !�� DB+� ��4$8 $. �� K� ��7�) 4�.

3-�B�.�C L.��#�7� ��

3-20

�� scanning !�� =��

• Uniform scanning creates only one offspring. The probability, ps(xl(t)), of inheriting the gene from parent xl(t), l = 1, . . . , n�, as indicated by the marker of that parent is computed as

!�� $��$� �#� SC S�� &�$� D��7�) 4� K� $ ��.

• Occurrence-based scanning �� D $��@ �� E .�# $��@ "F �� $A� �A�D �� ,� ��G�� $� �&6 )�� H��.

• Fitness-based scanning,6�� ;�� 4��* "D ��IC H�� $� -��J� ;��. Considering maximization, the probability to inherit from parent xl(t) is

Roulette-wheel selection is used to select the parent to inherit from.

Gene Scanning Crossover Operator Initialize parent markers; for j = 1, . . . , nx do

Select the parent, xl(t), to inherit from; xj(t) = xlj(t);

3-�B�.�C L.��#�7� ��

3-21

Update parent markers; end

3-6-4 V#"#Xover �#�Dpermutation When the genotype represents a sequence, the crossover operator must respect more

constraints. • H�J� ��TSP �� $� ��+� H�U$ �� $B� S)�� VB�.�C $V�*+, �� #� �+� ) �� d��#�C ��

�# H�+,� �� $�*+, $B*+, &)� &�#�T )$�� 4� ��# ohU • The crossover operator creates a genotype by taking a randomly selected part of one string and filling the remaining slots with the remaining cities arranged in the order that appears on the other string with wraparound (figure 1.11, d)).

> E�� 20 Order 1 crossover

Copy randomly selected set from first parent

Copy rest from second parent in order 1,9,3,8,2

> E�� 21 Partially Mapped Crossover (PMX)

1 2 3 4 5

5 4 3 2 1

1 2 3 2 1

5 4 3 4 5

3-�B�.�C L.��#�7� ��

3-22

> E��22 Cycle crossover

Edge Recombination and so on

3-7Mutationz�;$� �� !�� $.�� [�G �B�� #� �� B�� a�.� !�� y�� 7��:

• �� ./� �*E/� s#� �� )�� )� $ !�� $.�� • [�G exploration ��; �� 5 �� e�#G �7�+.U� &��n8 �� a�� *� &��' ) . • !��+G ��$�+ <�7�� .T) �*�� L+�� 4� ��$8 &�$� [�G� <V+, $m#V� !V�� 8 �� z�;$�4��

��; �� • Mutation is applied at a certain probability, pm, to each gene of the offspring, xi(t), to produce the mutated offspring, x'i(t). The mutation probability, also referred to as the mutation rate, is usually a small value, pm ∈ [0, 1], to ensure that good solutions are not distorted too much. • In the literature we often find mutation probabilities on the order of 0.01 per position (which is much higher than in biology) • Given that each gene is mutated at probability pm, the probability that an individual will be mutated is given by

3-�B�.�C L.��#�7� ��

3-23

where the individual contains nx genes. • Assuming binary representations, if H(xi(t), x'i(t)) is the Hamming distance between offspring, xi(t), and its mutated version, x'i(t), then the probability that the mutated version resembles the original offspring is given by

3-7 -1��D �E �4� k�)

k�)1 Uniform (random) mutation 1. • Uniform (random) mutation , where bit positions are chosen randomly and the

corresponding bit values negated for j = 1, . . . , nx do

if U(0, 1) � pm then x'ij(t) = xij(t), where denotes the boolean NOT operator;

end end

k�)2 Inorder mutation, �:.�� 8��c� �g�� )�e ) �8��Vc� ��#c� �*p�8 )��? D�� &��.��

��# �� [�G ��k�. Select mutation points, �1, �2 ~ U(1, . . . , nx); for j = �1, . . . , �2 do

if U(0, 1) � pm then x'ij(t) = xij(t); where denotes the boolean NOT operator

end end

k�)3 Gaussian mutation: #�C �� &$�� d��real �V�#' F�#V� &��V�� S? �� ) <�� 4�Poisson distribution �# �� 8�Y�)mutated with Gaussian noise( �Va�.� �V� &$��V� ��Va� �@V�

��$'. Hinterding showed that Gaussian mutation on the floating-point representation of decision variables provided superior results to bit flipping. For large dimensional bitstrings, mutation may significantly add to the computational cost of the GA. In a bid to reduce computational complexity, Birru divided the bitstring of each

3-�B�.�C L.��#�7� ��

3-24

individual into a number of bins. The mutation probability is applied to the bins, and if a bin is to be mutated, one of its bits are randomly selected and flipped.

mutation rate, pm �#�D�;�J ��D

��GA Kk#; [�G �p�� 7)� &�pm z�;$� �p��) Kk#;pc � �� .8$' $%� �� !��m v�F�.. z�� $��pm )pc �� #�� L.��#�7� �$B*+, . ��$a� ")� �� $�� S��)? !�� 7)

<BV � 4� F�$5 &�$� !�� &$�' !T) ��; �S�� S� �S %� �+ �� 'D #� ��#��_ . �Vg�� DV�� Fogarty!�� 5 �� $�4 &��7#�$8 [�G �$� &�� 5 ")� �� </� �� </� 4� �*�/��#@/;� ��#c�

�� [�;

�� )�`.� ��.�� &�$� [�G �$� �� ; �# �� 5 ��+Y

S? �� ;j=1….nb ) !�� )4#�)$; !�� +j=nb �� l#�$�the least significant bit !�� . �$V� �V��

!�� $� D��5 "4�� L; !�� &�$� [�G .��? �� !�� )� D�� z�;$� ��.

A large initial mutation rate favors exploration in the initial steps of the search, and with a decrease in mutation rate as the generation number increases, exploitation is facilitated.

• $� �� )��K�4� �� L�� E�� M�(��&N(�� =�� $� "�&� �* !�� O@ ��* �>�� 6�� < �� L�� E�� 28�4.

• $� 24� �� L�� P� �� &@ �Q��4� : R�PS �&� �� < �� -��J� �� &� ��C �T� �� U�� L��&6 $�#..

• =��annealing V�T�� /4��* =�� &6 �* W'� �P� $�Pm 24�

3-�B�.�C L.��#�7� ��

3-25

• k�)4 �; &)� [�G &�� ,�

In real-value representations, a selected position is modified by adding a random value drawn from a Gaussian distribution N(0, �), where 0 is the mean and � is the variance, in order to produce few large mutations (figure 1.12, b)).

This resulted in the development of mutation operators for floating-point representations. One of the first proposals was a uniform mutation, where

where �(t, x) returns random values from the range [0, x].

mutational step size �� c4N�4�E �#�Dreal &��)4#�)$; &�$�real [�G $.��5step size �# L�%�� S? H#� �; !�� .� �V� s)$V zV�� &(��$.�

�� [#5 �8��c� ��#c� �� #a./G &�n8 ��+� ��#.� �� !�� v�F� ��' . H#� �� ) LV; [V�G &�V��'�# �� $.+;.

• Step sizes can also be proportional to the fitness of an individual, with unfit individuals having larger step sizes than fit individuals. • As an alternative to deterministic schedules to adapt step sizes, self-adaptation strategies as for EP and ES can be used.

k�)5 Macromutation Operator – Headless Chicken �� <J� ��7#�!�� 7#� �8��c� ) �.�� #G) �; &#n, �� #n, K� 4� ��V $V;| &��V)� 4� �VB� ��

�*�T .$�*+, �� !m��) <��; ��c: � �B�� <�7� �� K��B� D�� $�' �� $T [�G k�)6 ��; &�$� [�GPermutation

In representations that describe sequences, as in the example of the traveling salesman problem, mutation consists in swapping the contents of two randomly chosen positions on the genotype of the individual (figure 1.12, c)). In this latter case, the probability of mutation refers to individuals, not to positions in the genotype. Normal mutation operators lead to inadmissible solutions

3-�B�.�C L.��#�7� ��

3-26

• e.g. bit-wise mutation : let gene i have value j • changing to some other value k would mean that k occurred twice and j no longer occurred Therefore must change at least two values Mutation parameter now reflects the probability that some operator is applied once to the whole string, rather than individually in each position Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note that this preserves most of the order and the adjacency information

V#"#mutation �#�D permutation

•

• •

•

3-8k��$0 �UD �#�D </� �� D��7�) D�F��G S��4$8 ��G . !V�� DVB+� DV�� V�� &F�#� H�� u�5 L./�� $'�

�# e�#G 4� S� �)� �� $a�� . �$V�' �V� <BV S��V�4$8 ) D��7�) 4� ��;$� ��G </� <B � D�� j8� &�$� .!�� s#Y#� ��# �� ohU S��4$8 �� D��7�) ��; �B��.

1. elitism, consists of maintaining the n best individuals from the previous population. It is also possible to relax full generational replacement by inserting only a few offspring into the population in place of individuals that have obtained the worst fitness. In this case, we have a gradual generational rollover.

2. (�, �) and (�+�) ? �� ; !�� ^�� $� e�:.�� ")� Sμ D��7�) ��) !V��+G �4�� ( )λ ��V��!�� D��7�) K� $ 4� S��4#�

• After production of the offspring, (�, )-selection selects the best � offspring for the next population. • (� + )-selection, on the other hand, selects the best � individuals from both the parents and the offspring. 3. Survivor selection can be divided into two approaches: • Age-Based Selection: In SSGA can implement as “delete-random” (not recommended) or as first-in-first-out (a.k.a. delete-oldest) • Fitness-Based Selection

3-�B�.�C L.��#�7� ��

3-27

3-9Stopping Conditions1. �� <�T 4� </� �� 7#� 4� �� . 2. ��4�� j�� 4� _: � &�� 4� �� .�*�� $�� D�� # �.8$' $%� �� Kk#; 3. ��4�� j�� &�$� �c: � �; �� S��

If x�(t) represents the optimum of the objective function, then if the best individual, xi, is such that f(xi) � |f(x) � ε|, an acceptable solution is found; ε is the error threshold. If ε is too large, solutions may be bad. Too small values of _ may cause the EA never to terminate if a time limit is not imposed.

Terminate when the objective function slope is approximately zero, Reaching a certain population convergence )mechanisms can be applied to increase diversity in order to force further exploration. For example, the mutation probability and mutational step sizes can be increased. Stop when no improvements have been found in the last n generations Problem: no convergence guarantee Solution: run GA many times with different initial populations and parameter values

3-10 �,�B _ �#��"5��7;# Evolutionary Measures 3-10 -1 iD�� #��"�D��N�# *�" f�

q�� n,� �; �� S� � &�� )� ��#c� �$�? �h7 !/�� DB+� ��7)� !��+G &�� k j�� B�8�$' [��+� �#�� &)� &� ��,�Tx� , ��? ��4�� j�� )y !�� <B <J�

• j�� $'�flat [��FV8� �� !V��+G �V�� h7 �#� ��#� D��5 &$�h5 <��B� ) !�� L; s#�� $`p ��# $. �� 4�� j�� s#�� .

• &�� Xp� ��U �#�� &)� � ��#+� �B��/+ �; �# !T��B��F� �V�k &�Vn8 �� V��? �*p�8!/�� T�) &��.

• !�� D��k �� #n, S�#� �� >[�G <J� �#��$5� K� �$B*+, ��$� &�$�[�G �Va�.� ) �� $VT�#�� ) L�� 4�� $; ��$� �$�? 4� �� .. �� Va�.� D�$V.�� V; #Vn, K� H#U � �G !�� S�#� ��$; e�:.�� !��

3-�B�.�C L.��#�7� ��

3-28

• 3-10 -2fitness graphs• ��#+� !V��+G ��V�4�� j�� \�#.� average fitness ) D�$V.�� V�4�� j��V�best individual $V�

<B �� ; !��*/� z/U1.14 !�� S� �. • a�� 4� !�`�; �; �e�#G �� ./� ��7)� !��+G �)�`.� ��7)� !��+G �� D��k �$G� �� $B�

�# . • ��# �� L�� #+� &)� )� $ ) �# �� g�� $ &�$� ��) �� ) &��$G� \�#.�. • ��B� $��Z� S��4 �� S�./�� L./�� ; !�� `� ��4 ��#+� D�� .� �V� �.V/��) L.V/�� $'

$V�� V; �#V �V� �)�V`.� L.V/�� l#�$� �$G� �� $ $�4 �� !�� 4� �� #� �� #+� �� S��4�$; �/�� L �� a�.� S�#.�+�.

• $'�average and best fitness �� H�U �� $� �$G� . • �.T)best fitness �� VB�� !�� #� �� D�$.�� e�#G �; !/�� #*�� !��m

!�� .8� $�' �*�� #+��. • �V��#� �V� [V�G �V� �; �� !��+G �� s#�� D.8� D�� 4� <�7� �� !�� DB+� ��+� !��m ��#+� $'�

�# bXp�. • 7 �� S��'�� 7) ��+� o�p ��#+� !�� DB+� [�G L]$�*,�# �� ) $�Z� ��k� &� �%�..

3-�B�.�C L.��#�7� ��

3-29

3-11� ��) �� V�� )4 �� 4#�)$; ��; �� ; �$; b$� S�#� �� !��+G �� s#�� D�� &�$� &��4 &��)�-�) d��#�8�!�� ./ . �.T)

d��#�8real �$; ��`.�� S�#� �� $�4 �� 4� �� &$�� d��#�C )

��*T� �*p�8 �,#+a� �;)�g�� )� D�� *p�8 D�$.��#; ( ��Hamming ) &�$�V/� �� Vg�� )� D�V� �*Vp�8

�#�� &4�#� (!�� n,� !`G ��+� D�� . <B1.15 �� S� � �� D�� 4� &��#+�.

3-�B�.�C L.��#�7� ��

3-30

• <=�/� ��nonbinary genetic alphabets �V5)$.�� s#�� 4�entropic diversity ��`.V�� FV��# ��.

where fk(�) is the frequency of the character � of the genetic alphabet A at the position k in the population genome, and l is the length of the genomes.

H#� �� #�C �� $ H�J� D�� l �,#+a� &�n,� 4� �B�A={a1,a2,a3,a4,.........} !�� $; ��`.�� S�#� �+� �� )� D�� 4� �� $�Z.� d��#�C H#� �.T) .!/G �#� �� $� ��u�5 &��)� 4� )

3-12��#�7� 5��7;#GA :Holland’s Schema Theorem &��+.��#�7� ��$�+ ��$�EA <�7� �� S�#� &4�/7�

F�� S#�;� �B�#�� !��fairly crude models of very simple systems. �# �� `.��. John Holland &�#E�schema �V�#�k �B�� S�� S� � &�$� ��GA #a.V/G �� e�#VG &�Vn8 &$m#V� �#Vg�

�$; b$� ��B�� . �; �� S� � &�#E� �a�.�GA � &�� #; ��#c� ��; �+� ��$� �� !��+G &�n,� 4 �V; ��Vn,� �VB*�

��./ S�#� e�#G &~�� &��potential solutions ��# �� $�. 1. �� ; �� S� � )�reproduction �V� ��V�4 �*�/��#@/;� �#g� \�#.� 4� [�� 4�� j�� n,� ��

�� [�; �*�/��#@/;� �#g� D�� 4� $.+; ��4�� j�� n,� �� ) ��#. 2. �� [��F8� !��+G ��4�� j�� \�#.� z��$� D�� . 3. � ")� D�� #� F�� *��B� &��+.��#�7� �� &�$� ��!8$' ��$T ��`.��.

Schema K� !�� #n, K� �)4#�)$; K� �B�7�U ��schema !�� )4#�)$; 4� ��k �,#+a� K�

A schema (plural: schemata) is a set of points from the search space with specified similarities. A schema is described by a string of length L (i.e., same length as the strings in the search space) over an extended alphabet of size K+1 consisting of the alphabet of the representation scheme (e.g., 0 and 1 if K=2) and a don't care symbol (denoted by an #)

0001# #1# #0# The schema H1 = 1** covers the {100, 101, 110, 111} individuals.

• Schema order: l�� D�� schema Two values can be used to describe schemata, The Order , O(H) (or specificity) is the number of defined positions in schema. For example, the schema 1** is a schema of specificity 1 and consists of the set of points

3-�B�.�C L.��#�7� ��

3-31

from the search space with a 1 in the 1st position (and we don’t care what’s in the remaining positions). The order for the 0001# #1# #0# schema is 6.

• Defining length: D�� g�� )� D�� *p�8 D�$. �� The Defining Length is the length of sub-string between outmost defined positions

0001# #1# #0# 1- - - - - - - 10 -

10-1=9 E.g. ##1##0## etc

For the scema ##1##0##, the Order is 2 and the Defining Length is 3. The schema 11#, shown in the figure, is the line in the back connecting 110 and 111

f� �D��N�# (��schema

The true “fitness” of a schema H, in each population is taken by averaging over all possible values in the “don’t care” positions, but this is effectively sampled by the population, giving an estimated fitness f(H)

5��7;# �� *��+� VP4 ��D • Xover • �; �� S� � &�#E� D��Xover �V�C ��#V� z�;$� ��building block ��.��V� �V� �V*�T &�V

�� &$.�� .schema $�~�V� ��4�� j�� . �S� �� #� Z#) �� (D #� I��) �7�� & (D .exploitation

• Xover �$� �� $.��#; �*p�8 �� )4#�)$; &#/� �� )4#�)$;. . (e.g., <*,1,1,*,*> has a shorter distance than <1,*,*,*,1>).

• [�G

111

1##

000 100

01#

001

1#0

H=*1*={011,110,010} F(H)=(3+6+2)/3=3.67

3-�B�.�C L.��#�7� ��

3-32

• �V� �Vp$, ��) �� .V/�� V*�8 &#a./G �$�� ; �� G ��C ��#� S�$; ��) �� [�G �� ; exploration . D�� e#� L.��#�7� K�exploration )exploitation ��; ��$T$� ��~��.

&�#E� $'�schema �� # �.8$�h5 pc <J� ~��0.8 )pm <J� D��50.01 ��$' e�:.�� . z�� $�� .�7�!�� ./��) e�#G &#� �� !8$ �5 �#��

• &�#E� ��$� ��schema !V�� a�� e�#G &�n8 4� �.��#�B� &��$� ��#+� �; !�� D�� }$8 4� �; &� �,#+a� $�4 \�8 �$�4 �# �+� q�� D�� <+, �� 7)schema ��V�4�� #V� ��# �.�� #� ��

��$�' �� $T. • In other words, as the population converges toward a few genetic strings, it no longer represents a uniform sample of the schemas. • In the absence of building blocks or of a suitable crossover operator, the genetic recombination of individuals is analogous to a very large random mutation that may disrupt evolutionary progress.

�; }$8 ��H K�schema !�� )4#�)$; &�� <�� ; �� • m(H,t) &�� n,� ��H S��4 �� !��+G ��t • f(H,t) &�� &�n,� ��4�� \�#.�H S��4 ��t • f(x) #n, ��4�� j��x • f�(t) S��4 �� !��+G ��4�� \�#.�t

�� o�E(m(H,t+1)) &�� &�n,� �� H !�� </� �� . ^�V�� $V� �� V�� </� ��7#� $'�“fitness proportional reproduction” 26&� "�&� �* V�E� ��

) [�G S)��Xover L��

�; x∈ H !��+G &�n,� �� x t#*� &)�U �; !��H ��./

• &�� &�n,� $'� �; �� S� � �g��H ) ��V�4 S�� .�� \�#.� 4� $�~�� 4�� j�� [�; ��#c�� $�] ��.

&)� ��4�� L ) e$:� !7�U L [�G ) ��;$�#� S�$; �8�Y� ��schema �V.`�� i�V`�� #� �� . !V7�U $V'� &�$� D��5 �� L��; ��$� �� e$:�expectation ��? �� !��.

�[# Q��DXover ��$� &�$�one point Xover �� a�� S�#� �� D��.

• Pxo ��a�� H�+.U�one point Xover

E m H, t + 1( )( )=f x( )f t( )x ∈ H� = m H, t( ) f x( )

m H, t( )⋅ f t( )x ∈ H� = m H,t( )⋅f H, t( )

f t( ),

3-�B�.�C L.��#�7� ��

3-33

• Sc(H) t#*� �B�� H�+.U�H &�V �V,#+a� $V�4 4� KV� <T��VU ) �#V � z�$:� ��;$�#� �a�.� ��H��+� �T�� • $'�l ) �)4#�)$; H#�L(H) $.��5defining length �� S?#a��J qS6D >��9�H 4� $V.+;

!�� $�4 ��

One Point Crossover selects a crossover point at random from the l-1 possible points. For a schema with defining length d the random point will fall inside the schema with probability L(H)/(l-1). If recombination is applied with probability Pc the survival probability is

1.0 - Pc*L(H)/(l-1) which implies that the probability of survival of a schema from the effects of cross-over satisfies the inequality

k�) �[# Q��D � ��#�� [�G $m� D�$��$; H�� S�#� �

• pm ��; ��5 [�G !�� K� �B�� H�+.U�. • o(H) &��.�� H ��./ _: � �� &�� ; Order of H • Sm(H) &�� H�+.U�H �� !�� $��$� [�G 4� ��

Sm(H) (1 - pm)o(H). The final result, incorporating all the probabilities just derived (probability that the schema is generated and survives cross-over and survives mutation): Holland’s Schema Theorem.

j�� ^�� $��# �� H��$� 4��.�� #n, &�$� ��4�� .4� 4��.�� ) ��/� 4��.�� ")� )� D�� 4��.�� ")� >&� !�� $.�� &�.

3-13 �7�� V#"#GA

3-13-1 5 ��d0 �6�" Q D) �Y��D *�" �� Q��;#4 �rJ (Generation Gap Methods• ��generational genetic algorithms (GGA) D��# �OY �Y�� V� }#, �*�8 </� ��+� )

��#

pxoL H( )l −1

�

� �

�

� � ,

Sc H( )≥1− pxoL H( )l −1

�

� �

�

� � .

E m H,t +1( )( )≥f H,t( )

f t( )⋅ m H,t( )⋅ 1− pxo ⋅

L H( )l −1

�

� �

�

� � ⋅ 1− pm( )o H( ).

3-�B�.�C L.��#�7� ��

3-34

• ��steady state genetic algorithms (SSGA) Q��;#4 N# ��Y�� V�� #VnU �� </� �� . �V� z��$� D��GG ��GGA �� ) $`pSSGA !�� v�F�.

• &�$�SSGA �� y�� $�4 &��(��$.��. • Replace worst, • Replace random, • Kill tournament, where a group of individuals is randomly selected, and the worst

individual of this group is replaced with the offspring. Alternatively, a tournament size of two is used, and the worst individual is replaced with a probability, 0.5 � pr � 1.

• Replace oldest, This strategy has a high probability of replacing one of the best individuals. • Conservative selection combines a first-in-first-out replacement strategy with a modified

deterministic binary tournament selection. A tournament size of two individuals is used of which one is always the oldest individual of the current population. The worst of the two is replaced by the offspring. This approach ensures that the oldest individual will not be lost if it is the fittest.

• Elitist strategies of the above replacement strategies have also been developed, where the best individual is excluded from selection.

• Parent-offspring competition, where a selection strategy is used to decide if an offspring replaces one of its own parents.

3-13-2Z(Messy Genetic Algorithms :4N�4�E� %�� a� �D c • ��GA ��.��# �D�[ c4N�4�E FN#�"# �� MGA !/�� !��m H#� D�� . �)4#�)$; �a��

�$�a�4 �� $�Z.� ��#� �� ;#*� �� ; !�� C��k &��;#*� 4� &� . • �)4#�)$;�� V�� $V'� H�VJ� &�$� �� S� � �� t#*� s#� ) t#*� <�� ; !�� .`G �$�a�4

��;#*�4 ��#.�� )4#�)$; s#� K� �� ((1, 0)(3, 1), (4, 0)(1, 1))

9��&*�� X�P* $� 0 1�0. 24�. • � �� K� 4� [�� C !�� DB+over-specified ) SC <VJ�1 ( .V�� V�� XVp� �V�C �V� ) �V��

under specified ) SC <J�2 × . ( &�$� q6D �� !òver-specified ��V; ")� first-come-first-served &�$� ) !��under-specified K� ^�� $� ��Ccompetitive template �# �� c.�� .

• The competitive template is a locally optimal solution. As an example, if 1101 is the template, the fitness of 0 � 10 is evaluated as the fitness of 0101.

• �� 4�� z�� 4�� j�� • The objective of mGAs is to evolve optimal building blocks, and to incrementally combine optimized building blocks to form an optimal solution.

mGA !�� *U )� &�� .

3-�B�.�C L.��#�7� ��

3-35

�6X#� (U6J 4� ��, �; !�� ' �� &��: • Initialization to create a population of building blocks of a specified length, nm. • Primordial, which aims to generate small, promising building blocks. • Juxtapositional, to combine building blocks.

�)��X (U6J The outer loop specifies the size of the building blocks to be considered, starting with the smallest size of one, and incrementally increasing the size until a maximum size is reached, or an acceptable solution is found. The outer loop also sets the best solution obtained from the juxtaposition step as the competitive template for the next iteration.

Messy Genetic Algorithm Initialize the competitive template; for nm = 1 to nm,max do

Initialize the population to contain building blocks of size nm; Apply the primordial step; Apply the juxtaposition step; Set the competitive template to the best solution from the juxtaposition step;

end ��

The initialization step creates all possible combinations of building blocks of length nm. For nx-dimensional solutions, this results in a population size of

where

&�$� �� <B � K� D��mGA LaU �; ��)? �� L�$8 �D !G ��nm ) �S�E6D $��S� ( ��S� (SD

�� [��F8� • The fast mGA addresses this problem by starting with larger building block sizes and

adding a gene deletion operator to the primordial step to prune building blocks. • The primordial step is executed for a specified number of generations, applying only

selection to find the best building blocks. At regular intervals the population is halved, with the worst individuals (building blocks) discarded.

• No crossover or mutation is used. While any selection operator can be used, fitness proportional selection is usually used.

• Because individuals in an mGA may contain different sets of genes (as specified by the building blocks), thresholding selection has been proposed to apply selection to “similar” individuals. Thresholding selection applies tournament selection between two individuals

3-�B�.�C L.��#�7� ��

3-36

that have in common a number of genes greater than a specified threshold. The effect achieved via the primordial step is that poor building blocks are eliminated, while good building blocks survive to the juxtaposition step.

• The juxtaposition step applies cut and splice operators. The cut operator is applied to selected individuals at a probability proportional to the length of the individual (i.e. the size of the building block). The objective of the cut operator is to reduce the size of building blocks by splitting the individual at a randomly selected gene.

• The splicing operator combines two individuals to form a larger building block. Since the probability of cutting is proportional to the length of the individual, and the mGA starts with small building blocks, splicing occurs more in the beginning. As nm increases, cutting occurs more. Cutting and splicing then resembles crossover.

3-13-3Island Genetic Algorithms • �# �� L�/�� $��FG �� #a./G &�n8 �� D�� .�$.�� $�FG $ &(�GA ) �� #V� �V� l#�$�

��)$�� [�5 &4�#� ��#c� .!�� [�5 F�� <��$��FG D�� .�7� ). N# ��& a�� 6Y# ��#��_:

• $��FG �� • $�� $�FG �� $�FG K� 4� �$G�� 8� �� ) �� • �$G�� $/�/�

(� �N�� F�� _ �#�D �4�GA N# ��& �N#�: • Single-population master-slave GAs, where the evaluation of fitness is distributed

over several processors. • Single-population fine-grained GAs, where each individual is assigned to one

processor, and each processor is assigned only one individual. A small neighborhood is defined for each individual, and selection and reproduction are restricted to neighborhoods. Whitley refers to these as cellular GAs.

• Multi-population, or island GAs, where multiple populations are used, each on a separate processor. Information is exchanged among populations via a migration policy. Although developed for parallel implementation, island GAs can be implemented on a single processor system.

��island GA !��+G $�4 &��)�$�FG (�; ��#G): �� <��B� &4�#� ��#c� ��# �� a�� <�./� ��;$�#� ) [�G �$�FG$

��; �� &��B+ �$G�� #c� L �� $��FG

3-�B�.�C L.��#�7� ��

3-37

��)�� F�&�P

• A communications topology, :$�� $�FG �� $�FG $ 4� �$G�� $�/� For example, a ring topology (such as illustrated in Figure 16.4(b)) allows exchange of information between neighboring islands. The communication topology determines how fast (or slow) good solutions disseminate to other subpopulations. For a sparsely connected structure (such as the ring topology), islands are more isolated from one another, and the spread of information about good solutions is slower. Sparse topologies also facilitate the appearance of multiple solutions. Densely connected structures have a faster spread of information, which may lead to premature convergence.

• A migration rate, �$G�� a�� S��4 ) �p�� Tied with the migration rate is the question of when migration should occur. If migration occurs too early, the number of good building blocks in the migrants may be too small to have any influence at their destinations. Usually, migration occurs when each population has converged. After exchange of individuals, all populations are restarted. • A selection mechanism to decide which individuals will migrate. • A replacement strategy to decide which individual of the destination island will be

replaced by the migrant. >��F��G ) e�:.�� &(��$.�� ./�GA �.�� )� �� &� �$�FGstatic �# �� L�/�� K�� ) .

��GA Z�9�"# c$ "�+� f ��# �# F��$) 4� ��, ��F��G )��

• G��# �� #n, D�F��G e#� $ • �# �8��c� ��#c� #n, K� D�F��G e#� $G�� • �# �� #n, K� D�F��G �8��c� $G�� K� • �# �8��c� #n, K� D�F��G �8��c� $G�� K�.

!�� D��7�) [��F' &��)� S�+ �� e#� #n, ) e#� $G�� e�:.�� ")�

3-�B�.�C L.��#�7� ��

3-38

}#, �� #�D �`;_� �+ �� a��#�" ��)�� V� D�V�� 7�V+.U� j��V� KV� ^�� $� S? )�# .�# �� D�� 8��c� �c�� $�FG �.8� i�`�� $G�� $�FG K� 4� �.T).

• Tournament selection may be used, based on the average fitness of the subpopulations. Additionally, an acceptance strategy can be used to decide if an immigrant should be accepted. For example, an immigrant is probabilistically accepted if its fitness is better than the average fitness of the island (using, e.g. Boltzmann selection).

• Another interesting aspect to consider for island GAs is how subpopulations should be initialized. Of course a pure random approach can be used, which will cause different populations to share the same parts of the search space. A better approach would be to initialize subpopulations to cover different parts of the search space, thereby covering a larger search space and facilitating a kind of niching by individuals islands.

• Also, in multicriteria optimization, each subpopulation can be allocated the task to optimize one criterion. A meta-level step is then required to combine the solutions from each island.

3-13-4cooperative coevolutionary GA (CCGA) • 4� �,#�island GA !�� . D.�� &�G �� ")� D�� S��4N�4�E �SD �S ��) �S�N Q��

��E (� �D #�"G �� F�#� #� ��"` N# ��#�� ) ��N � (D s(D�B� . \V�8 !V��+G$�4 $V D��$��V��? �� !�� .��+G$�4 e�#G D�$.�� D.�#�5 L�� 4� �� e�#G ) �� e�#G 4� � :� .

• ��N #��N ��# �D��N�# �7"7� (D ��G �� k _ (E �6+B�- (SE �S� " *��E Z#) � ��)��E �D��N�# #�"G !#�D .�)$; &��.+/T �� ; �� D�� #� �� <U �� !��m �)4# �*��; e�#G D�$.�� $��$� )

�;�� !�� ? !�� H�� .. • The constructed complete chromosome is then a candidate solution to the optimization problem. • It has been shown that such a cooperative approach substantially improves the accuracy of solutions, and the convergence speed compared to non-cooperative, noncoevolutionary GAs.

3-13-5F (Population-based incremental learning (PBIL) �B��G �� ")� D�� &4�� 5 &�$� !��)��!�� z�� #G) �%8�U. �V��5PBIL &��V�4 <=�V/� ��

�� GA �B��G �� !�� "��F'Xover �� &$��; �$B*+, !�� 4��. !�� &$�� SC $ ) !��m �)4#�)$; $ H#�.

SC ��i �7�+.U� j�� $�� #:� ��#� �� K� �� $`p �� ;Pi �� ; �# �� !�/�1 D.8$'�� S� � �� . �,#+a� D��Pi �population string �� P $�� real �4��

The population string, which is initialized so that Pi = 0.5 for all i, 4. </� $ ��n �� #G#� H�+.U� j��#� �$�a�4 ^�� $� #n,P ��$�' �� $T ��4�� #� ) �# �� 7#�.

3-�B�.�C L.��#�7� ��

3-39

5. s ��C D�� ) ��# �� e�:.�� n,� D�$.�� 4� �� # �� ? &�Ii 6. �7�+.U� j��#� H�UP ��# �� !��5? ��#��

where 0 � � � 1 is an update constant. At any given time PBIL requires only storage of the population string P and of the s best individuals used for the update.

>�� B�� *E/� $'�PBIL ��V �5 Kk#; [�G S�)F8� S? j8� &�$� �; ��; �� $�' �*�� #+�� !�� .

�-��-�4 (Adaptive PBILL Urzelai and Floreano ")�ADAPTIVE PBIL ) �V��$�+ !,$V� �V; �� $; �� 5 ��robustness

�� [��F8� �� B�� <=�/� �� !�/� In A-PBIL the update constant is proportional to the fitness gain obtained by the s best individuals with respect to the average fitness of the population in the previous generation. After each update of the population string, the values Pi tend to return to their initial value 0.5.

�; �� S� � [��4?A PBILL 4� $.��GA )PBILL ��; �� ; �B�� \�� . 3-13-7Interactive Evolution

��GA �� V�� !V�� H#V�$8 �V� �V��$� KV� ��V�4�� <V��, ��.V�� V�� &��#V�evolving art, music, animations �# �� ) �� 4��.�� S�/�� [�� ) !/�� $�h5 S�B�� ; D�� . ^�V�� $V� �V; zV��$� DV��

�# �� .�� d�� #�8 d��#�C.�� F�#� N� ��# !G (D !#�0�� t�� D ��'D d ��,�0# . �S0# �S� �� F�#� N� ��# !�0�"�� t�� !G (D 4 F� �� d ��, ��D ��Y �� U ��. �V��#� �V� �V.�7�

�V�$�' ��$VT ��V+�� #V� �� ; <+, S? ^�� $� S�'�� $%� �; �� .�� #G) F�� 4�� j�� ,#�. !�� DB+� S�'��; !;$ �� ; !;$ ��#�_# (6J�� GA ��"��Xover �S;�X� $ " k�) 4

��E. Algorithm 4.9 Interactive Evolution Algorithm Set the generation counter, t = 0; Initialize the control parameters; Create and initialize the population, C(0), of ns individuals; while stopping condition(s) not true do

Determine reproduction operators, either automatically or via interaction; Select parents via interaction; Perform crossover to produce offspring; Mutate offspring;

3-�B�.�C L.��#�7� ��

3-40

Select new population via interaction; end

3-13-8Human-Competitive Evolution 4� ��`.�� $� �)X,EA S�#V� �� S? 4� <=�/� &4�� N# �S��D (SE S" 4 i��SD ��S�J�� S ;� ��

�� J#�� E F��O��# �� "��"# . H�� 2004 ��*U �� !8�� S? o� �; � �U�$� &� ��/�� $; �=�� S��+ �� ; !�� &� �� !�m ��,�$.�� 4� $.�� ; !�� *��B� L.��#�7� \�#� . $V�� , ��

K�.�C L.��#�7� ) $ � D�� /� ./� S�#�� ; �*=�/� <U D��$� �)X, <=� fS� �� *+SB�field (�X��S��E F��O��# !#� �� $ " �� " �'� !G �#�D �6J #� 4 ��# F��.

3-14�6��+� ��7;# �7�� V#"# �� #G) �� e�#G $.�� #� �+ �� ; �+.��#�7� K� ) �� ./� �*E/� s#� �� L.��#�7� �$B*+, . DV�� $�

�� 5 $�4 &��+.��#�7� ^�� #' �#� F�� ? 4� K� $ �B�� D+Y �� .� &�. • Genetic algorithms (Holland 1975) operate on binary representations of the individuals and emphasize the role of building blocks and crossover. • Genetic programming (Koza 1992) operates on tree-based representations of computer programs and circuits. • Evolutionary programming (Fogel et al. 1966) operates directly on the parameters that define the phenotype by applying perturbations drawn from a zero-mean Gaussian distribution (small perturbations are more likely than large perturbations). Evolutionary programming often relies on tournament-based selection with gradual population replacement and does not use crossover. • Evolutionary strategies (Rechenberg 1973) are similar to evolutionary programming, but the variance of the distribution used for mutation of the individual is genetically encoded and evolved along with the parameters that define the phenotype.

• Differential evolution (DE), which is similar to genetic algorithms, differing in the reproduction mechanism used. • Cultural evolution (CE), which models the evolution of culture of a population and how the culture influences the genetic and phenotypic evolution of individuals. • Co-evolution (CoE), where initially “dumb” individuals evolve through cooperation, or in competition with one another, acquiring the necessary characteristics to survive.

3-�B�.�C L.��#�7� ��

3-41

3-15�7�� 3�#��# 3-15-1 � P a��&#Constraint Handling 1. ��)�� 4� w�� &��n8 �; z��$� D�� # H�+,� L.��#�7� ��.8� �� 4�� j�� #a./G !��)��

�� $�h@�)��4�� j�� ( ��B� ��7#� ��)L.��#�7� ��.8�( 2. ")�Penalty :��# �� +�$G ��)�� 4� w�� &��n8 ��4�� j�� ")� D�� . �V�T �*E/� ")� D��

�# �� <�� T S)�� &4�� *E/� �� . 3. �+�$G !�� DB+�Multi level �� z��.� �; �� D�� V�� [��F8� ��T 4� 4)�a� �� . <BV �

[��F8� �� $.��5 �� [��F8� ")� D��level !/. 4. �B�� +�$G :�� [��F8� </� ��+ �� z��.� �+�$G �� ")� D�� . �� V; zV��$� DV�� V�

�� [��F8� ��.� �@� ) L; �+�$G #a./G &��.��.

3-15-2 ��D�� N�� (� �DMulti-Objective Optimization 4� ��, ��# �� `.�� ; ��)�:

• Weighted aggregation !�� 8� ��k &4�� H)��.� ")� D��

where nk 2 is the total number of sub-objectives, and �k∈ [0, 1], k = 1, . . . , nk with .

• DB+� ��4)�� S? �� ) �B�� >�8��c� >!��m !��. • Population-based non-Pareto S? ��#+�vector evaluated GA (VEGA) !�� . ")� D��

3-�B�.�C L.��#�7� ��

3-42

• �# �� !�/� j�� $ �� !��+G $�4 K� . &��V��; !V��+G $�4 $ ��Xover �V� e�V:.��$'.

• $�4 $ D��:.�� &��$.�� !�� 4�� e�:.�� ) �� j+G !��+G.

• Pareto-based approaches, Horn et al. [382] developed the niched Pareto GA (NPGA), where an adapted tournament selection operator is used to find nondominated solutions. The Pareto domination tournament selection operator randomly selects two candidate individuals, and a comparison set of randomly selected individuals. Each candidate is compared against each individual in the comparison set. If one candidate is dominated by an individual in the comparison set, and the other candidate is not dominated, then the latter is selected. If neither or both are dominated equivalence class sharing is used to select one individual: The individual with the lowest niche count is selected, where the niche count is the number of

individuals within a niche radius, �share, from the candidate. Dynamic Environments • A very simple approach to track solutions in a dynamic environment is to restart the GA when a change is detected.

• ��; H�� $��Z� ��E.� !�� DB+� L.��#�7� �� 8�; �4�� s#�� $'�. • </� 4� � :� �8��c� ��#��

3-15-3 Z#) �� !��$� !��E #� _Niching Genetic Algorithms • Niching refers to the formation of groups of individuals in a population. • Individuals within a group are similar to each other, • while individuals from different groups are very different from each other. • This method has the ability to locate multiple solutions to optimization problems.

3-15-4�"# (�,�0 �_ #� ��E Q�# �6�9� ��4� (D ��N ��7;#. Fitness Sharing,. Dynamic Noche sharing, Sequential sharing, Crowding, Coevolutionary Shared Niching and Dynamic Niche Clustering are some algorithms

3-16a�e� f �"` 5��7;# N# �� u"�?� a�e�: S�� $T8 ��B� [�; �� $��+ K�u� �; f�$g ��`p ��

3-�B�.�C L.��#�7� ��

3-43

�*E/� D�� &�$�88 !�� e�#G ��? 4� �B� �; �� #G) !7�U

"4�� j�� :�� $T [�; !��T#� �� ; � �� .

��B�� &�$� �# <T��U �� "4�� j�� D�� Small variation in one permutation, e.g.: swapping values of two randomly chosen positions,

Combining two permutations into two new permutations:

• choose random crossover point • copy first parts into children • create second part by inserting values from other parent:

• in the order they appear there • beginning after crossover point • skipping values already in child

� Parent selection:

– Pick 5 parents and take best two to undergo crossover � Survivor selection (replacement)

– When inserting a new child into the population, choose an existing member to replace by:

– sorting the whole population by decreasing fitness

3-�B�.�C L.��#�7� ��

3-44

– enumerating this list from high to low – replacing the first with a fitness lower than the given child

��0 F�4� F��4�, a�e�Travelling Sales Person TSP ��7)� �8��c� e�:.��

fitness(ACEDB)=32 ��# �� ? "4�� j�� S�#� ~�� <�7�� ; ��? �� !�� $�4 &��:.�� [�G �a�.� ��.

Fitnss(ACDEB)=33 �# ��

�� $�� &� [�G ��F(CADEB)=38 )F(AECDB)=33 �B�� F(DCEBA)=28 ��? �� !��.

3-17�� 3-17-1Demo Genehunter 4� �#5genehunter ��$�set up �# zc� ��$� �� ; �$G� �� . �� zc� ��$� ��

2-dimensional function optimization Traveling Salesman Problem

��; �$G� �#G#� &��7�J� &�$� �� . !+/T 4�"about" �V�� $V�? �#+�� ) j�� c: � ��#� �� . $V��Z� �V� &�$.��5GA ��)� $� �$.��5 $��Z� $�m�� >GA �� .

GeneHunter is a powerful software solution for optimization problems which utilizes a state-of-the-art genetic algorithm methodology. GeneHunter includes an Excel Add-In which allows the user to run an optimization problem from Microsoft Excel, as well as a Dynamic

3-�B�.�C L.��#�7� ��

3-45

Link Library of genetic algorithm functions that may be called from programming languages such as Microsoft® Visual Basic or C.

3-17-2"��# ��;�e�� !�� nd/surprise 96/journal/vol4/tcw2/~http://www.doc.ic.ac.uk/ &��7�J� ��#V� �$��V; 4�GA

!�� )?

3-17-3 ×MATLAB Genetic Algorithm Solver : (��"�Dga ��$�ga K�.�C L.��#�7�GA \�� &4�� &�$�MATLAB ��; �� $G� . \��$V� ��#.V�� $B��

��./ ��$�� 4�. % Genetic Algorithm and Direct Search Toolbox % Solvers ga - Genetic algorithm solver. gamultiobj - Multi-objective genetic algorithm solver. optimtool - Genetic algorithm GUI. simulannealbnd - Simulated annealing solver. threshacceptbnd - Threshold acceptance solver. % Accessing options gaoptimset - Create/modify a GA options structure. gaoptimget - Get options for genetic algorithm. saoptimset - Create/modify a simulated annealing or threshold acceptance options structure saoptimget - Get options for simulated annealing or threshold acceptance. % Fitness scaling for genetic algorithm fitscalingshiftlinear - Offset and scale fitness to desired range. fitscalingprop - Proportional fitness scaling. fitscalingrank - Rank based fitness scaling. fitscalingtop - Top individuals reproduce equally. % Selection for genetic algorithm selectionremainder -Remainder stochastic sampling without replacement. selectionroulette - Choose parents using roulette wheel. selectionstochunif - Choose parents using stochastic universal sampling (SUS). selectiontournament - Each parent is the best of a random set. selectionuniform - Choose parents at random. % Crossover (recombination) functions for genetic algorithm.

3-�B�.�C L.��#�7� ��

3-46

crossoverheuristic - Move from worst parent to slightly past best parent. crossoverintermediate - Weighted average of the parents. crossoverscattered - Position independent crossover fcn. crossoversinglepoint - Single point crossover. crossovertwopoint - Two point crossover. crossoverarithmetic -Arithmetic mean between two parents satisfying linear constraints and bound. % Mutation functions for genetic algorithm mutationgaussian - Gaussian mutation. mutationuniform - Uniform multi-point mutation. mutationadaptfeasible - Adaptive mutation for linearly constrained problems. % Distance function for multi-objective genetic algorithm distancecrowding - Calculates crowding distance for individuals % Plot functions for genetic algorithm gaplotbestf - Plots the best score and the mean score.

MATLAB GA GUI ��user interface �B�8�$'MATLAB �'�� S�#� ��ga �$; �$G� �� .��.�� #%�� D�� &�$�: • \�� MATLAB ��$�optimtool ��; �$G� ��. • ��#� ��solver ")� �� &4�� $�GA ��+� e�:.�� . • �� !��,Help �� $T ��`.�� #� �$�? �# �� $�A ��`p !�� !+� �� ; S?. • \�� $G� &�$� � ��$� ��#+� �� MATLAB !�� ? . \�V�� V ��$� D��GUI

��; �$G� $. �� o�g�� #� �� F��.

1- Q �� iD�� c� � �Rastrigin j�� D��

K� �*�� j��MATLAB �$; L�� $�? S�#� �� $�4 �#.�� ; !��.

plotobjective(@rastriginsfcn,[-5 5; -5 5]); �� S�#� �� $�? �#+�� #+� S��$k �� . �� S? �#+�� D�� &�$�GA !#� S�#� �� '��

gaoptimset('PopulationSize',10); [x fval reason] = ga(@rastriginsfcn, 2)

��; �$G� �$�? ��#�� /�#�� #� �� j�� #:� $'�. function Ras_min

3-�B�.�C L.��#�7� ��

3-47

options = gaoptimset('PopulationType','bitString','TolFun',0, ... 'FitnessLimit',0,'MutationFcn',{@mutationuniform,0.1}); % Add some plot functions options = gaoptimset(options,'PlotFcns',{@gaplotbestf,@gaplotscores}); [x fval] = ga(@myfun, 2,options) %-------------------------------------- function z = myfun(x) z =20+ x(1)^2 +x(2)^2-10*(cos(2*pi*x(1))+cos(2*pi*x(2)));

� �� ; D�� optimtool z��$� D�� a�� S�#� �� F��: optimtool �VV��; �$VVG� �� . 4�solver �VV��$� ga�VV��+� e�VV:.�� . �� fitness function ��

@rastriginsfcn &�$� ) ��$; ��) ��numberof variable=2 &�$V� )bounds �VU-1 )1 �V�� $VT �� . �� !+/Tpopulation='bitstring' !+/T �� )plot &��*��best fitness )best individual �V��F� !�X, �� .

�+;� ~�Ustart �V��F� �� !�$ . &��?�Q ��Y Z�\��GA �� ^�_ ��`|��$ �� \� �� .

3-�B�.�C L.��#�7� ��

3-48

2- Q ��c� � � f��#��_ iD�� f�

��.��Mfile /� �� j��4� . function y = parameterized_fitness(x,a,b) y = a * (x(1)^2 - x(2)) ^2 + (b - x(1))^2;

�@��; �$G� �� $�4 ��#.�. a = 100; b = 1; % define constant values FitnessFunction = @(x) parameterized_fitness(x,a,b); numberOfVariables = 2; [x,fval] = ga(FitnessFunction,numberOfVariables)

3-�B�.�C L.��#�7� ��

3-49

3-r& �� #�D �D��N�# iD�� !��$� (�� g�� K� �� j�� $ �*�T &� ��$� �� . !V��+G ��V�4�� j��V� L�� .��#� $'� ��

�$; <+, S�#� �� $�4 #�� $' �� B�. Consider the previous fitness function again: f(x) = a * (x(1)^2 - x(2)) ^2 + (b - x(1))^2;

GA �g�� k ��4�� j�� L��#:� !,$� [��F8� &�$� $'� ��; �� g�� K� &�$� ��4�� j�� $ K� �� call L��; �� $�� &��$� ��#c� �� 4�� j�� L��; e�/U

Create an M-file called vectorized_fitness.m with the following code: function y = vectorized_fitness(x,a,b) y = zeros(size(x,1),1); %Pre-allocate y for i = 1:size(x,1) x1 = x(i,1); x2 = x(i,2); y(i) = a * (x1^2 - x2) ^2 + (b - x1)^2; end We need to specify that the fitness function is vectorized using the options structure created using GAOPTIMSET. The options structure is passed in as the ninth argument. FitnessFunction = @(x) vectorized_fitness(x,100,1); numberOfVariables = 2; options = gaoptimset('Vectorized','on'); [x,fval] = ga(FitnessFunction,numberOfVariables,[],[],[],[],[],[],[],options) Optimization terminated: average change in the fitness value less than options.TolFun. x = 0.7875 0.6353 fval = 0.0682

5- ��D �#��D f� �� f� �#�� !��E 5��$E�� $� &��l=20 �V�� KV� �� $`p ��#� �� #n, $ �� ; !�� #n, . �V�GA > j��V�f ��

��; L+�F;��.

function maxones l=50; GenomeLength = l; % l bit representation %ga options setting options =gaoptimset('PopulationType','bitString',... 'TolFun',0,... 'FitnessLimit',0,...

3-�B�.�C L.��#�7� ��

3-50

'MutationFcn',{@mutationuniform,0.1}); % Add some plot functions options = gaoptimset(options,'PlotFcns',{@gaplotbestf,@gaplotscores}); FitnessFcn = {@counting_ones,l}; [x,val,reason] = ga(FitnessFcn,GenomeLength,options); fprintf(1,'Best: %s error: %d\n',char(x + '0'),val); function scores =counting_ones(pop,L) scores=-sum(pop)+L;

a�e�: GA �� $T ��`.�� #� �g�� )� D�� $��:� l�� $�/� D�� &�$� S�#� �� . �U�$�GA D�� 4�

!�� $T. Chromosome representation ��+ SC D�7)� �; !�� #� ��+ 4� &� �7�� )4#�)$; ��#� ) ��V��

��+ SC D�$�? ��#�!�� c�� .� �� )4#�)$; H#� $V�4 �V7�� #� �� )4#�)$; 4� �7�J� �� $�Z.� ��#��

Duplicate switches are ignored. The first chromosome represents a route from switch 1 to switch 3 to switch 6 to switch 10. 2. Initialization of population: Individuals are generated randomly, with the restriction that the first gene represents the origin switch and the last gene represents the destination switch. For each gene, the value of that gene is selected as a uniform random value in the range [1, nx]. 3. Fitness function: The multi-criteria objective function

is used where

nx ) ��u�#� <; ��|xi| $�/� &��u�#� ��xi

with

3-�B�.�C L.��#�7� ��

3-51

represent the objective to select routes with minimum congestion, where Bab denotes the blocking probability on the link between switches a and b,

maximizes utilization, where Uab quantifies the level of utilization of the link between a and b, and ensures that minimum cost routes are selected, where Cab represents the financial cost of carrying a call on the link between a and b. The constants �1 to �4 control the influence of each criterion. 4. Use any selection operator. 5. Use any crossover operator. 6. Mutation: Mutation consists of replacing selected genes with a uniformly random selected switch in the range [1, nx]. This example is an illustration of a GA that uses a numeric representation, and variable length chromosomes with constraints placed on the structure of the initial individuals.

3-18Assignments �8��c� �� 7#� $�� 4� �#� �8��c� ��, �� 4�� ; H�#� $ ��r={0.6, 0.3, 0.8, ...} ��; ��`.��.

1. 4 ��/�#�� ; s#� .F�� !Gray code {!/�k �� 2. &(��$.�� H#+��{!/�k ��7)� !��+G <�B � &�$� 3. ��4�� )�`�subjective )objective {!/�k �� 4. ��/�#�� K� $ z��) !�F� ) �� 4��.�� &��)� . 4��.��3 ��V�4�� #n,30 >20 )2 ")� �V�

linear rank based ��k{!�� 5. �� r�Y#� �� ")� ) !��+G $%� 4� <J� ��#� &�$� [��F' �#��. 6. 7 ) &�� ,� >�.//' ��;$�#� &��)� 4� H�J�permutation ��F� . 7. �)4#�)$; &)�[00101011] �8��c� $�� G#� ��r >2 point crossover &�$�2 �� a�� 4$8. 8. 4 4� H�J� ) &�� ,� >�.//' [�Gpermutation ��F� 9. 4� �#%��exploration )exploitation {��# �� L�%�� #�k )� D�� ) !/�k

10. &�$� z�� $��pr )pm {!�� k 11. �� r�Y#� <B L�� K�.�C L.��#�7� !8$ �5 [��+� &��)�.

3-�B�.�C L.��#�7� ��

3-52

12. 4� �#%��order )length ��schema {!/�kschema �k �� S� � �� K�.�C L.��#�7� ��$�+ ��#{��

13. ��/�#�� &�$� [��F' &� &(��$.�� 14. L.��#�7� K�.�C 4� �#%��messy )island {!/�k

15. *Compare the following replacement strategies for crossover operators that produce only one offspring:

(a) The offspring always replaces the worst parent. (b) The offspring replaces the worst parent only when its fitness is better than the worst parent. (c) The offspring always replaces the worst individual in the population. (d) Boltzmann selection is used to decide if the offspring should replace the worst parent. 16. Propose a multiparent version of the geometrical crossover operator. 17. Propose a marker initialization and update strategy for gene scanning applied to order-

based representations 18. Propose a random mutation operator for discrete-valued decision variables. 19. In the context of GAs, when is a high mutation rate an advantage? 20. Is the following strategy sensible? Explain your answer. “Start evolution with a large

mutation rate, and decrease the mutation rate with an increase in generation number.” 21. Discuss how a GA can be used to cluster data. 22. For floating-point representations, devise a deterministic schedule to dynamically adjust

mutational step sizes. Discuss the merits of your proposal. 23. Suggest ways in which the competitive template can be initialized for messy GAs. 24. Discuss the consequences of migrating the best individuals before islands have

converged.

�B�.�C &F�� $� ��

4-1

4. �/�#� ��$� �B�.�CGenetic Programming

�� "How can computers learn to solve problems without being explicitly programmed?

• \�#� ��; ��Koza �� 90 �7�� J. Koza, Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, 1992.

� b$g� . �� D�� ^�� $� ��$� K�� S? �$B*+, ) �$G� ��#Vc� �V��$� �@V� >�$V�' �� $T ��4�� #��? !�� &$.�� e�#G �� ; �� 5 <��B� ��+#.

• �� # H#�� k� ��$� 4� � :� \�8 �� <; !�� DB+� Task �� c: � $.�� a��. • �� *+,GP !�� 7) !�� D�� /�CPU D�� j�$� &� ��V�u�5 <=�/� <U &�$� ")�

��; �� 5 ��G. • It has been tried for Classifier systems, Engineering design, i.e. electronics, mechanics.

• �)�`�GA )GP !��? ��; s#� �� . ��;GP !�� .�� [��+�. " F��#

L.��#�7� D�� #� ��X�� h��0 N# F��O��# ")� S? j�� ; !�� # �� G [�G ) z�;$�.

4-1h��E : k��"��X��1.�X��: ��GP �; ��$� K� �)4#�)$; $@�!�� &$�#� �� ;!�� #V �V� �� S� � . ��.��V� �4��V�� )

!��; �� $��Z� ��8) ��#c� ) !/�� !��m Adaptive individuals: GP population will usually have individuals of different size, shape and complexity. Here size refers to the tree depth, and shape refers to the branching factor of nodes in the tree. The size and shape of a specific individual are also not fixed, but may change due to application of the reproduction operators.

2.�X�� v�&: !�� q+,) 4� �� &� �$' $J;��Uroot (r ��$V' �� e�:.�� .�� . �� qV+,Fig. 5.2 $��$�5 !�� .��# ��4�w ��#�, 5�X Q �r� �#�D v�&

�B�.�C &F�� $� ��

4-2

3.Termial �� )4�X ��4�4 �� • The terminal set, for leaf nodes, specifies all the variables and constants,

• !��m$�� >!7�U &�$�Z.� >��)�) >��#/�� H��

4.function sets �U?�� 4 �w�� #�_# ��

the function set, for the intermediate nodes, contains all the functions that can be applied to the elements of the terminal set. These functions may include

• mathematical,(sin, cos,…) • arithmetic (+,-,*,..) • Boolean functions. (AND,OR, If then else) • Subroutines or predefined functions

�� #��$5� • �./� e�#G ��; ��7#�)��)�) ��+� &�4� �� )�� $�� G)$�(Closure:

• the function set should be well defined for any combination of arguments, Example:•{AND, OR, NOT} with {T, NULL} is closed•{+,-,/,*} with {X,Y} where X and Y are real numbers is not closed because division by 0 is not acceptable. others are. Closure can usually be achieved easily such as

• Square root of a negative number • Logarithm of 0 • Division by zero: define a protected division function which returns 1 when

division by 0 is attempted • Define special functions which modify unacceptable input conditions

•�� 8�; e�#G �� &�$� j��#� ) ��)�)Sufficiency: • There should be sufficient believe that there are compositions of functions that can yield a

solution to the problem • In some domains the requirements are well known, in others they are not so, ultimately, the

user must find a set which works •�.�� e�#G 4� $.��) �$�� )�) ) j��#� ��Universality:

�B�.�C &F�� $� ��

4-3

• the function and terminal sets should be larger than the minimum required for sufficiency. • However, the addition of extraneous functions may either degrade or improve the

performance of the GP system • •In practice, it seems that a few extra functions seem to improve the performance and the

range of application of a GP system

5.Domain-specific grammar: ��; ��$�� 4�a� &��.�� ; !�� &$��$' �� 4�� . �; ��, D�� !/�� 4�a� !�� <�� *+, $ .�`�� , L.��7 >$`p $� L�/�� <J�

a�e�: [��+��.�� Y�� H#�$8 S�� &�$�

a�e�: ��g�� \��)�

$�4 ��$� �� $� ) S��4 ��C �# �� `.��

int foo (int time) { int temp1, temp2; if (time > 10) temp1 = 3; else temp1 = 4; temp2 = temp1 + 1 + 2; return (temp2); }

S��4 �� $� D��LISP �# �� .#� ��#��. !�� #�� $� D�� .�� [��+� H�U

�B�.�C &F�� $� ��

4-4

(+ 1 2 (IF (> TIME 10) 3 4))

��)�) ) �G)$� 4� ��,: TERMINALS = {1, 2, 10, 3, 4, TIME} )j��#�FUNCTIONS = {+, IF, >}

a�e�:

a�e� 1: GP ��; � ; �� $�4 ��g�� j�� #.� �; ��; ��5 ��

and given a data set of interpretations and their associated target outputs (Table 5.1),

• the task is to evolve this expression. • For this problem, the function set is defined as {AND,OR,NOT}, • and the terminal set is {x1, x2} where x1, x2 ∈ {0, 1}. • The solution is represented in Figure 5.1.

�B�.�C &F�� $� ��

4-5

a�e�2: !�� G)$� ) &�)�) 4� �� #+� ^�� $� $�4 H#�$8 �� H�J� D��

• The terminal set is specified as {a, x, z, 3.4} with a, x, z ∈ R. • The minimal function set is given as {�,+, �, /, sin, exp, ln}. The global optimum is

illustrated in Figure 5.2. K� ��GP �� $�4 ��u�5

�B�.�C &F�� $� ��

4-6

4-25��7;# #$)# �� S� � �� L.��#�7� �FG� <B.

�B�.�C &F�� $� ��

4-7

A.Initial Population 1. �8��c� ��#c� ��7)� !��+G�# �� 7#� $�4 ��#� )� �� G#� ��..

• maximum depth and • semantics as expressed by the given grammar.

2.#n, $ ��7#� &�$� • a root is randomly selected from the set of function elements. • The branching factor (the number of children) of the root, and each non-terminal

node, are determined by the arity )��.��#�'�� ( of the selected function. • For each non-root node, the initialization algorithm randomly selects an element

either from the terminal set or the function set. • As soon as an element from the terminal set is selected, the corresponding node

becomes a leaf node and is no longer considered for expansion. •�� &��.�� !�� $.��# ��a�5 �#� <��B� $�/� �� 4�� $'� �� # ��7#� ��.�� .

3 � &�$� ��.�� 7)� !�� 7# ^�� $�Maximum initial depth of trees Dmaxis set 4� ��,: 1. Full method (each branch has depth = Dmax):

–nodes at depth d < Dmax randomly chosen from function set F –nodes at depth d = Dmax randomly chosen from terminal set T

2. Grow method (each branch has depth �Dmax): –nodes at depth d < Dmax randomly chosen from F T –nodes at depth d = Dmax randomly chosen from T

3. Common GP initialisation: ramped half-and-half, where grow & full method each deliver half of initial population. Ramped half and half: enhances population diversity

• q+, $J;� �U ��; }$86 �� . qV+, �V� ��V�.�� D�V� &)�/� �#g� !��+G ��#c�� 2 >3 >4 >5 )6 ��# �� L�/��.

•��.�� ,#+a� �� q+, L &)XJ� ( �+�full method �+�� )grow method ��$' ��.

B.�D��N�# iD�� 4�� j��problem dependent �� S� � �� e�#G �� S�� L.��#�7� !8$ �5 ��#�k ) !��.

• >!�� $� K� �)4#�)$; S#k�4�� j�� $G� !��8#� ) �$G� �*.:� &�)�) D��k ^�� $� ��$�!�� • �.8)��D �)4�X !�� G)$� &�$� !�� &��#G �� 4�� j�� >!�� • H#�a� j�� $'��D �w�� iD�� f� !�� 4�� T�) &��G)$� ) ��)�) 4� �7)�G ��

Each pattern contains a value for each of the variables (a, x and z) and the corresponding value of y. For each pattern the output of the expression represented by the individual is determined by executing the program. The output is compared with the target output to

�B�.�C &F�� $� ��

4-8

compute the error for that pattern. The MSE over the errors for all the patterns gives the fitness of the individual.

�#�D �D��N�# iD��Decision tree

• GP can also be used to evolve decision trees. For this application each individual represents a decision tree. The fitness of individuals is calculated as the classification accuracy of the corresponding decision tree. If the objective is to evolve a game strategy in terms of a computer program , the fitness of an individual can be the number of times that the individual won the game out of a total number of games played.

(��) a��&#• S�#� �� $./' �� q�+, !�� K� &�$�penalty �� $; ��) S? ��4�� j�� . D�� ; �� D��

4� �V� �V�; !V8�� +�$G $.'�F� !�� >�� $��B� K��F� &��#G �; q+, L; ) Kk#; !�� K��# �.��; !�� Y$, ) �7#� �� v�F�

• The fitness function can also be used to penalize semantically incorrect individuals. ��4�� j�� S�$; H��$�

• ––raw fitness: not transformed • ––standardized fitness: zero fitness value is always assigned to the fittest individual • ––normalized fitness: all values are between 0 and 1

�� ) ��;$�#� &�$� e�:.�� &��)� • –fitness-proportional selection • –truncation selection • –ranking selection • –tournament selection

C.Crossover Operators &�$� ")� )�z�;$� �; !�� S�#� �� 7#�K� #*T )� ) #*T

• Generating one offspring: A random node is selected within each of the parents. Crossover then proceeds by replacing the corresponding subtree in the one parent by that of the other parent.

• Generating two offspring: Again, a random node is selected in each of the two parents. In this case the corresponding subtrees are swapped to create the two offspring.

s�#�� <BXover �� S� � ��

Subtree Exchange Crossover

Self crossover Module Crossover

�B�.�C &F�� $� ��

4-9

D.utationOperators• Mu

tation was used in early experiments in the evolution of programs, It was not, however, used in (Koza, 1992) and (Koza, 1994), as Koza wished to demonstrate that mutation was not necessary and that GP was not performing a simple random search.

•zG#� D�� 4� [�G �; �GP ��$' ohU . �B�� • O’Reilly (1995) argued that mutation — in combination with simulated annealing or

stochastic iterated hill climbing —can perform as well as crossover-based GP in some cases.

• 4� ��FG [�G �4)$��GP ; �# �� p#� �7) !�� low level �� • Harries and Smith also found that mutation based hill climbers outperformed crossover-

based GP systems on similar problems. • Luke and Spector (1997) suggested that the situation is complex, and that the relative

performance of crossover and mutation depends on both the problem and the details of the GP system.

<B �� [�G s�#��5-4 (a) • Function node mutation: A non-terminal node, or function node, is randomly selected

and replaced with a node of the same arity, randomly selected from the function set. Figure 5.4(b) illustrates that function node ‘+’ is replaced with function node ‘�’.

• Terminal node mutation: A leaf node, or terminal node, is randomly selected and replaced with a new terminal node, also randomly selected from the terminal set. Figure 5.4(c) illustrates that terminal node a has been replaced with terminal node z.

�B�.�C &F�� $� ��

4-10

• Swap mutation: A function node is randomly selected and the arguments of that node are

swapped as illustrated in Figure 5.4(d). • Grow mutation: With grow mutation a node is randomly selected and replaced by a

randomly generated subtree. The new subtree is restricted by a predetermined depth. Figure 5.4(e) illustrates that the node 3.4 is replaced with a subtree.

• Gaussian mutation: A terminal node that represents a constant is randomly selected and mutated by adding a Gaussian random value to that constant. Figure 5.4(f) illustrates Gaussian mutation.

• Trunc mutation: A function node is randomly selected and replaced by a random

terminal node. This mutation operator performs a pruning of the tree. Figure 5.4(g) illustrates that the + function node is replaced by the terminal node a.

�B�.�C &F�� $� ��

4-11

[�G &�$.��5

1. –Probability pm to choose mutation vs. recombination. Individuals to be mutated are selected according to a mutation probability pm. The larger the mutation probability pm, the more individuals will be mutated. Remarkably pm is advised to be 0 (Koza’92) or very small, like 0.05

2. –Probability to choose an internal point as the root of the subtree to be replaced. Nodes within the selected tree are mutated according to a probability pn. The larger the probability pn, the more the genetic build-up of that individual is changed.

asexual operatorsIn addition to the mutation operators above, Koza proposed the following asexual operators:

• Permutation operator �; &�#��$5�n ��#� �� )�) ��a��G �� &�)�)n! ��V�5 �8��Vc� !7�U��;.

•Editing operator �� ,�T q�� $� �� #n, K� �#��$5� D��predefined rules �V�; �� !�� . XJV� x AND x is replaced with the single node, x.

• Building block operator � �V�;#*� DV�� ; ��+� �4�G� ) ��; �� 5 �� `� &��;#*� ��;�#� ��#c��# z�$:�

automatically identify potentially useful building blocks & good building blocks will not be altered by reproduction operators.

�B�.�C &F�� $� ��

4-12

4-3GP Parameters $.��5GP 4� ��, : ��U? ��population size ��a�� &�$� H�+.U�mutation )cross over !��.

It is common to create the initial population randomly using • ramped half-and-half with a depth range of 2–6. • The initial tree sizes will depend upon the number of the functions, • the number of terminals and the arities of the functions. • However, evolution will quickly move the population away from its initial distribution. Traditionally, 90% of children are created by subtree crossover. However, the use of a 50-50 mixture of crossover and a variety of mutations also appears to work well. In many cases, the main limitation on the population size is the time taken to evaluate the fitnesses, not the space required to store the individuals. As a rule one prefers to have the largest population size that your system can handle gracefully; normally, the population size should be at least 500, and people often use much larger populations.4 Often, to a first of runs R, the number of generations G, the size of the population P, the average size of the programs s and the number of fitness cases F. Typically, the number of generations is limited to between ten and fifty; the most productive search is usually performed in those early generations, and if a solution hasn’t been found then, it’s unlikely to be found in a reasonable amount of time.

�B�.�C &F�� $� ��

4-13

The folk wisdom on population size is to make it as large as possible, but there are those who suggest using many runs with much smaller populations instead. Some implementations do not require arbitrary limits of tree size. Even so, because of bloat (the uncontrolled growth of program sizes during GP runs; ), it is common to impose either a size or a depth limit or both .

�A+B�GP• Crossover and mutation can be too destructive,

-Uncontrolled tree growth 'Survival of the Fattest' - Intelligent crossover operations proposed.

Evaluating fitness: too slow, executing programs or simulations, halting problem a�e� : j�� D.8�� H�J� D�� o�Y=X2 + X + 1 !�� S? �G)$� ) &�)�) �� &)� 4� . ��V.�� #%�� D�� &�$�

��x 4� ��-1 ��1 ��' ��0.1 �G)$� ) �� Y ��$' �� D�� . &�)�) ��V�� Va�.� ��X=-1:0.1:1 ) j�� G)$� S? $A��.�Y !�� L��#� �� .

4� ��`.�� G)$� ) &�)�) $�� ^�� $� j�� S�$; ��5 o�GP !�� . �*EV/� DV�� V�SYMBOLICREGRESSION ��#' ��.

�*E/� <U &�$�

1. ��.��7)� !��+G �� 4 #n, �; ��$' �� 7#� �8��c� ��#c�) ��$' �� e�:.�� 2. ��$' �� ? ��4�� j��. s#Y#� D�� <B �� S� � ��.

�B�.�C &F�� $� ��

4-14

3. �� @�crossover )mutation �# �� .�� </�. 4. </� &�n,� 4� �B� �# �� 1 !/ �*E/� e�#G. #n, D�� <J� ��7#� �a�.�� ; &#n,a #Vn, )b

!8$' ��#p $`p </�.

��D� a��E �#�D ��4�4 4 iD#� ��

�� b$g� �� H$.�; �*E/� $'��$' ��$�� #� �� #�� j��#�. � Function set:

– F = { IF_OBJ, IF_GOAL, IF_FORWARD, IF_OBS1 } � Terminal set:

– T = { MOVE_FORWARD, MOVE_FORWARD & TURN_LEFT, MOVE_FORWARD & TURN_RIGHT, MOVE_BACKWARD,TURN_LEFT, TURN_RIGHT, RANDOM }

� Fitness function:

�B�.�C &F�� $� ��

4-15

4-4 V#"##�#�) ��7;#GP

A.Building Block Genetic Programming • �� ; $�� ")� ��building-block approach(BGP) >�#V �� #� !V��+G �V�7)� ) ��V� \V�8

��~ &��root ��~ S��4$8 ) �*p� �$' <�� ./1 �� . • �V�; �V� ��5 "$./' ��U? >��; ��a�� e�#G �� #�� #.� $�� U? D�� .T) . ��#Vc� "$.V/' DV��

!�� *�T t#*� �� G �8��c� t#*� K� S� �8�Y� . �� $�� , ��mutation �# �� H��. • #n, K� "$./' H�+.U�pe ��5 "$./' ��U? ��+� ��F7� �; �� D�� >!��; �+� �. &�$� ��.�� D��

GP ��; �� &��'��u�5 4�.

B. Linear Genetic Programming • ��$�GP ) !V��*+�7��#.�� V7�� #c� �g�3 s#V�stack-based, register-based, ) machine-

code �# �.#� !�� DB+�

•#�� $�#�@��; �; ��$' �� $� D�� *p� <�7�. ) ��; �� ; �� V�� $*��@��; �g��) �� !�� .

�)� �� ~�� $G� !,$� !�� &�G �� ")� D�� 4� ��`.�� . •GP !#� S�#� �� $8 D�� F�� .��. • ��GP �� V�� Va�.� ) �$�' �� %8�U �� $./�a�� k 4� �� #� &��)�) <+�7��#.�� K� �g�

$./�a�� K� <B <J� �� $T7-2

This means instructions in linear GP all have equivalent roles and communicate only via registers or memory.

�B�.�C &F�� $� ��

4-16

•LGP �# �+� <=�T H��$� ) j�� D�� )�`� • Also, in the absence of loops or branches, the position of the instructions determines

the order of their execution. Typically, this not the case for the structures representing trees.

• <+�7��#.��LGP DB+�!�� &�G ��machinecode ��$� K� 4� &�#.��high level ��C ��

C.�7�� V#"# $�� s�#��GP 4� ��,Graph GP >Probabilistic GP

4-5ApplicationsGP was developed to evolve computer programs. The problem types include Boolean expressions, planning, symbolic function identification, empirical discovery, solving systems of equations, concept formation, automatic programming, pattern recognition, game-playing strategies, and neural network design. Another applications are: decision trees, game playing, bioinformatucs, data mining and robotics

4-6 (��"�DGPLAB ��$� �� ? &�$�gplab ��manual ��; s#G� S?. �*EV/� <U &�$�X2+X+1 �V��$� �� $V�4 ��$V��Z�

demo ��; ��). x=[-1:0.1:1]';y=x.^2+x+1; save 'input.txt' x –ASCII;save 'output.txt' y –ASCII

�; ��; !T�x )y ��# �$��| S#.� ��#c� �+.U .�@� p.datafilex='input.txt';p.datafiley='output.txt'; p.testdatafilex=input.txt';p.testdatafiley=input.txt';

��; �)�� L�/�� ) e$Y >j+G �� j��#� ) ��$�� $%� �� F�� !��m &�)�) �*E/� &�$�. p=setterminals(p,'3'); p.functions={'plus' 2; 'times' 2;'mydivide' 2};

$��setting ��; L�%�� #� ��*� �� #� �� .

4-7 Assignments1.��5��/�#�� .�� ; &�$..

2. !��(x1 AND NOT x2) OR (NOT x1 AND x2) ��; L�� . 3. 4� �#%��sufficiency )universality {!/�k 4.{��F� �.�� ; �� [�G ) z�;$� 4� �7�J�

5. Explain how a GP can be used to evolve a program to control a robot, where the objective of the robot is to move out of a room (through the door) filled with obstacles.

�B�.�C &F�� $� ��

4-17

6. First explain what a decision tree is, and then show how GP can be used to evolve decision trees.

7. Is it possible to use GP for adaptive story telling? 8. Given a pre-condition and a post-condition of a function, is it possible to evolve the

function using GP? 9. Explain why BGP is computationally less expensive than GP. 10. Show how a GP can be used to evolve polynomial expressions. 11. Discuss how GP can be used to evolve the evaluation function used to evaluate the

desirability of leaf nodes in a game tree.

�B�.�C &F�� $� ��

4-18

5. ��" (��"�D�6��+�

Evolutionary Programming

<c811 jG$� e�.;

��) �� F��#: h��EFSM�� U,4 k�) 4 �6��+� k�) s. Basic Evolutionary Programming Algorithm Coding: Real & Finite state machine (new) Set the generation counter, t = 0; Initialize the strategy parameters; Create and initialize the population, C(0), of ns individuals; for each individual, xi(t) ∈ C(t) do

Evaluate the fitness, f(xi(t)); endwhile stopping condition(s) not true do

for each individual, xi(t) ∈ C(t) do (No Crossover)

Evolutionary and self adapting mutation (New)Create an offspring, x'i(t), by applying the mutation operator;

Evaluate the fitness, f(x'i(t)); Add x'i(t) to the set of offspring, C' (t);

endSelect the new population, C(t + 1), from C(t)∪ C' (t), by applying a selection operator; t = t + 1;

end

5 -�6��+� ��" (��"�D ��

5-2

!�� $T D�� 4� L.��#�7� ��G ��B�. 1( ��EP ��7)��; FSM !�� .�7�David B. Fogel $��)� ��1980’s �V�� &�$� �� &�� ,� ��;

�$; ��`.�� F�� &4�� . �$8 �; $��F7� ��FSM !/�� 2( EP )$�� 4� �� B�.�C <��B� &�G �� @��#�8 <��B� &)�> E�� N�D <J� ��7#� ��)��C q�`*� (��

�u�?EP ��Evolutionary Algorithm 4� !�� , ��F8� �� ;finite state machine(FSM) $ �*��B� S�[�G &�$.��5 . ��#c�)$�" i�N� iD�� )$�" (��#�( . &��X; $�4 4� �B� ��#� �� GA

!��.

5 -1h��E FSM Finite state machine coding ��EP K� �$8 $ ��7)�finite state machine(FSM) !��. .H�J� K�FSM �� .7�U !��7)� !7�U �; S? C !�� !�� S� � $�4 ��

FSM �� H��E (��"�D f� !�� )�� ~�U �� &�� )(state) ��)A, B, C,… (!�� . ��$�

�$�h5 �� &�)�) . �� L./�� &�)�) �� ./�state �)� �� $�� . $state ��; ��7#� ��#� �� F�� G)$� . z��$� D�� FSM ��; �� 7#� �� )�) 4� &� �7�� 4� �� G)$� 4� &� �7�� ; !�� . K�

FSM �# �� 8$�� #��.

S is a finite set of machine states, I is a finite set of input symbols, O is a finite set of output symbols � : S × I � S is the next state function � : S × I � O is the next output function.

�� ; �#��FSM �; !�� #�� 3( (;��"� 4 ��4�4 N# �# (;��"� � �" s�� F�#� �)4�X S�$; ��5 o�FSM �V��#.� �V; !�� e�#G

��; &4�� $�? . )L./�� ( 4( &��7#� �G)$� ��'Dstring ��# �$�' �� $T ��4�� #� �; 5( ��4�� j�� : �$8 K� &��7#� �G)$� ) �� G)$� D�� g� q*g� ��TFSM !��.

5 -�6��+� ��" (��"�D ��

5-3

6( �� D��N�# �� 4��.��EP ^�� $�relative fitness measure !�� . 7( relative fitness measure :��# �� z�$� Kk#; �� v�F� 4� ��4�� D�� . #Vn, $ ��

!�� $�� 4� S? �� ; !�� S? ��/� ��4�� j�� . ��V�� $V8 KV� ��V�4�� ui(t) �V,#+a� 4�� + � �; �# �� #��np &�$V� ��4�� ) �# �� e�:.�� ,#+a� D�+ 4� z�T� �N�S� cS� � �

�# �� #��

• (Wong and Yuryevich) ) (Ma and Lai) �8��c� ��, )� �; ��; �� 5r1, r2 ~ U(0, 1)

�� .��L��; �� 4�� $�4 �g�� ^�� $� ) L�

where the nP opponents are selected as . • In this case, if f(ui(t)) << f(ul(t))(the fitness of ui is significantly better than that of ul,)

then ui will have a high probability of being assigned a winning score of 1. This approach is less strict than the requirement that f(ui(t)) < f(ul(t)), somewhat reducing the effects of selection pressure.

8( ��mutation �# �� 7#� ��G </� .��k �� K� ��5 $ [�G �� ; �� 7#� ��4$8) �V� )� zV�;$�� #G) �� k .( �� 4$8 ��7#� �� $8 ��.8�Strategy Parameter ��; �� _: � . �8�Y� �� ; D��

�# �� a�� U? �� &��7#� F�#� H�� S�$; .5 �� [�G $�*+, Add a State, Delete a State, Mutate the Start State, Mutate a Link, Mutate an output symbol.

9( 4� K� $μ D��FSM K� !�� 4�a�)��k �� ( 4� ��`.�� 4$85 ��; ��7#� �� .`' [�G$�*+,.

5 -2xEvolutionary Mutation!�� #�� &F�#� [�G H#�$8

S? �� ;�ij(t) �� F�#� �ϕ j��scale ) �ij. o�$�� !�� . z��$� D�� s#� �*; !7�U ��

�`7#� $ &�$� F�#� j�� $.��5 ) j��j #n, $ 4�i �� )�`.� ��#� �� . &�$� F�#� j�� S�#� !��m }$8 �� >�+��#c� #n, $

5 -�6��+� ��" (��"�D ��

5-4

��? �� .�� &(��$.��6��+� k�)��# (��D#4 $ " �D��N�# iD�� #�U� (D k�) s. �� ; Dynamic

Strategies�# �� #� .. D�� H�� &(��$.�� D�� exploration )explotation ��4�� \��$ �� G#� ��; $��Z� �$8 ��#��:

��D# (� �#�D �4�� #�U� in which case offspring is generated using

In the above, � ∈ (0, 1].

• e�#G $'��#*�� ; �#� e#� �#�scale �� *p� e�#G ) �$8 D�� &�g� !/�� #*�� e�#G S#k ��scale !�� ? !�� H�U �� ; e�#G D�$.�� *p�8 S�#� �� !8$' $%� ��.

where y is the most fit individual. Distance in decision space can also be used:

where E(•, •) gives the Euclidean distance between the two vectors.

�� k�) �� E ��D ��, 4 ��B D � �w ��, (+��# (I �".

�;# (c�0 a� a��E • 4� �.T) �# ��`.�� !�� DB+� ��&K�$D �D��N�# iD��

� DB+� �� <B �� ' H#� ��#c�� ;� ) �# v�F� ! !�� 4� �� e#� &��#G L.��#�7�� . &)� 4� "$5 �� $a�� ; �� v�F� �� e�#G ��4�� j�� &��, �� .T) �# �� $�� jY)

�# � !�� DB+� e�#G. c�0 a� a��E : &�$� S�$; �)��$�� 4� ��? 4� �n�� ; �� =�� &��4 &�� 5 ��' H#�:

1) Fogel proposed an additive approach, where

where �ij and �ij are respectively the proportionality constant and offset parameter.

2) For the function , B ack and Schwefel [45] proposed that

where nx is the dimension of the problem (in this case, nx = 2). 3) Ma and Lai proposed that deviations be proportional to normalized fitness values:

where �ij is the proportionality constant, and deviations are calculated as

5 -�6��+� ��" (��"�D ��

5-5

6.41 with ns the size of the population. This approach assumes f is minimized.

5 -3 �U,4 k�)Self-Adaptive mutation ��# ��B0N�D �U,4 k�) a��, �� σ(t+1) ^�� $�σ(t) �# �� .

• �; !�� D�� ; &(��$.�� [�; f��.� ) ��4 ��.�� [�G. • �# �+� ��`.�� [�G ��' L�%�� &�$� ��4�� j�� 4� ��8) ")� ��.

These methods can be divided into three broad categories:

�;# (Additive methods

with � referred to as the learning rate. In the first application of this approach, � = 1/6. If �ij(t) � 0, then �ij(t) = �, where � is a small positive constant (typically, � = 0.001) to ensure positive, non-zero deviations. • As an alternative, Fogel [266] proposed

where

ensures that the square root is applied to a positive, non-zero value.

� (Multiplicative methods

6.51 where 1, 2 and 3 are control parameters, and nt is the maximum number of iterations.

� (Lognormal methods

6.52 with

Offspring are produced using

5 -�6��+� ��" (��"�D ��

5-6

6.55

�A+B� Self-adaptive EP

1( ��Self-adaptive EP �� f�E 4 #�7� �� !G �U,4 ��#��_ �4�G� )exploration �+� �� # �� F�#a� S? &�$� D��5 �U H�+,� D��$��minb .�� 5 �� D�� $�� !�� B�� D��5 �U.

where �min(t) is the lower bound at time step (generation) t, � ∈ [0.25, 0.45] is the reference rate, and nm(t) is the number of successful consecutive mutations (i.e. the number of mutations that results in improved fitness values). This approach is based on the 1/5 success rule of Rechenberg.

2( Matsumura et al. [565] ")�robust EP (REP) ��$V� &�� $V8 $V S? �� V; �$; �� 5 ��n� &#n, strategy parameter $�4 b$ �� !��

�;�i0 �� ;active strategy parameter vector �� #��# 4� ��`.�� 3 ��N �B�) �76�&

��0 �� ;�.• Duplication:

for l ∈ {1, 2, . . . , n�}. Then �ikj(t) is self-adapted by application of the lognormal method of equation (6.52) on the �ikj(t) for k = 0, 1, . . . , n�. • Deletion:

for l ∈ {1, 2, . . . , n�}. Then�ikj(t) is self-adapted by application of the lognormal method of equation (6.52) on the �ikj(t) for k = 0, 1, . . . , n�. • Invert:

5 -�6��+� ��" (��"�D ��

5-7

for l∈ {1, 2, . . . , n�}. The lognormal self-adaptation method of equation (6.52) is applied to �'i0j(t) and �'ilj(t) to produce �i0j(t) and �ilj(t) respectively. After application of the mutation operators, offspring is created using

• In a similar way, Fogel and Fogel [269] proposed multiple-vector self-adaptation. In their

strategy, at each iteration and before offspring is generated, the active strategy parameter vector has a probability of p� of changing to one of the other ��1 vectors. The problem is then to determine the best values for �� and p�, which are problem dependent.

5 -4 �N�� F�� _EP Standard EP

Procedure standardEP{ t = 0; Initialize P(t); /* of μ individuals */ Evaluate P(t); while (t <= (4000-μ)/μ){ for (i=0; i<μ; i++){ Create_Offspring(<xi,yi>,<xμ+i,yμ+i>): xμ+i = xi + sqrt(fiti)Nx(0,1); yμ+i = yi + sqrt(fiti)Ny(0,1); fitμ+i = Evaluate(<xμ+i,yμ+i>); } Compute Subjective Fitness if μ ≥ 10; P(t+1) = Best μ of the 2μ individuals; t = t + 1; } }

Meta EPProcedure metaEP{ t = 0; Initialize P(t); /* of μ individuals */ Evaluate P(t); while (t <= (4000-μ)/μ){ for (i=0; i<μ; i++){ Create_Offspring(<xi,yi,σi,x,σi,y>, <xμ+i,yμ+i,σμ+i,x,σμ+i,y>): xμ+i = xi + σi,x Nx(0,1); σμ+i,x = σi,x + η σi,x Nx(0,1); yμ+i = yi + σi,y Ny(0,1);

5 -�6��+� ��" (��"�D ��

5-8

σμ+i,y = σi,y + η σi,y Ny(0,1); fitμ+i = Evaluate(<xμ+i,yμ+i>); } Compute Subjective Fitness if μ ≥ 10; P(t+1) = Best μ of the 2μ individuals; t = t + 1; } }

5-4-1 improved FEP (IFEP) ��#; j�� B� ) ��#' j�� B� ��; �� 7#� ��4$8 )� �$8 $ ")� D�� . �� +�� T�� !�� $.�� B�?

��; !��T� �� &�$� ��5 ��

5 -4-2Evolutionary Programming with Local SearchA very simple approach to improve the exploitation ability of EP, is to add a hillclimbing facility to generated offspring. While a better fitness can be obtained, hillclimbing is applied to each offspring [235]. Alternatively, gradient descent has been used to regenerate offspring [920, 779]. For each offspring, x'i(t), recalculate the offspring using

where the learning rate is calculated as

As an alternative to gradient descent, Birru et al. [70] used conjugate gradient search, where line searches are performed for each component of the offspring. The initial search direction is the downhill gradient, with subsequent search directions chosen along subsequent gradient components that are orthogonal to all previous search directions. • The stochastic search developed by Solis and Wets [802] is applied to each offspring at a

specified probability. Based on this probability, if the local search is performed, a limited number of steps is done as summarized in Algorithm 6.2.

Algorithm 6.2 Solis andWets Random Search Algorithm for Function Minimization Initialize the candidate solution, x(0), with xj(0) ~ U(xmin,j, xmax,j), j = 1, . . . , nx; Let t = 0; Let �(0) = 1; while stopping condition(s) not true do

t = t + 1;

5 -�6��+� ��" (��"�D ��

5-9

Generate a new candidate solution as x_ (t) = x(t) + �(t)N(0, �); if f(x_(t)) < f(x(t)) then

x(t) = x_ (t); endelse

�(t) = ��(t � 1); x_ (t) = x(t) + �(t)N(0, �); if f(x_(t)) < f(x(t)) then

x(t) = x_ (t); endelse

for j = 1, . . . , nx doxj(t) � U(xmin,j, xmax,j); end

endend

end

5 -4-3Evolutionary Programming with Extinction

5 -5Applications �$��; 4� ��7�J� �a�� EP ��$' �� =��

5 -5-1Function Optimization j�� #+�� H�J� D��sin(2�x)e-x �*p�8 ��[0, 2] !��.

Representation : $.��5 K� j��(j=1) ��$' �� 8$�� &�� ,� ��, K� �� $8 $ ) �� The initial population �8��c� j�� xi ~ U(0, 2) ��; �� 7#�.

Fitness Evaluation ��sin(2�xi)exp(-xi) ��; �� 7#� Mutation [:� &�$�*+, 4� K� $ ��6-2-1 �� a�� #� ��.

5 -5-2Training Neural NetworksRepresentation S? &� �`7#� �; !�� c, �B� K� �$8 $ ��B��7 S4)W !�� $' ^�� )

Fitness Evaluation :�# �� $8 $ ^�� ) ��B��7 S4) ^��$� �G)$� ) H�+,� &�)�) >�B� �� . �� .�� G)$� ��k �B� $'�mean squared error (MSE), or sum squared error (SSE) ��

�# ��.

5 -�6��+� ��" (��"�D ��

5-10

Mutation :�B� ��.�� D��u+ ��; $��Z� ��#� �� ^�� ) ��B��7 S4) �� [�G �� ; $��Z� !�� DB+��# �� <p�U �$' ohU �� S�$; �8�Y� �� ;.

5 -5-3Real-World Applications &��$��; 4� � :� H)�GEP �� S� � ��.

5 -6Q��3( ��k#*8 �FG�EP �� r�Y#� ��.. 4( 4� �7�J�FSM ��F�. 5( ��F� H�J� {!/�k �� F�#� �7��k j��#� �$B*+, �)�`�. 6( [�G �g��EP �� r�Y#� �$�? <��#, �.#� ��. 7( �� r�Y#� \��)� D.#� �� $�? {!/�k �� *��B� ) ��8) [�G �)�`�. 8( 4� S�#.�� #�kEP &�$� �; K� D.8��5 �$; ��`.�� 8 �8$U

9) *Explain why mutational noise should have a mean of 0. 10) *Discuss the merits of the following approach to calculate dynamic strategy parameters:

where ˆy(t) is the best individual of the current generation.

5 -7Advanced TopicsConstraint Handling Approaches

Multi-Objective Optimization and NichingDynamic Environments

Hybrid with Particle Swarm Optimization Accelerated Evolutionary Programming

5 -�6��+� ��" (��"�D ��

5-11

1. V#"# $�" iD#��

• Uniform: 6.4 where xmin and xmax provide lower and upper bounds for the values of �ij. It is important to note that

to prevent any bias induced by the noise. Here E[•] denotes the expectation operator. • Gaussian, Cauchy, L evy, Exponential, • Chaos: A chaotic distribution is used to sample noise :

where R(0, 1) represents a chaotic sequence within the space (�1, 1). The chaotic sequence can be generated using

• Combined distributions: a linear combination of Gaussian and Cauchy distributions.

6.20 where

$�" �;�7� iD#� ��+6�& �4�O�• j��#; �h7 ��; ��7#� ��#� �� &$. �� v�F� &�� G S�#� D�5 <�7� �� exploration $.�� & • j�� D�� &#7 j�� &��5�#; #' ) �h7 !�� exploration ��#' 4� $.�� H�� &#7 j�� )$�� 4�

D�� #�exploration and exploitation ��; �� $T$� �#; ) ��#' �� /�� .

6. �6��+� �3�#��#Evolutionary Strategies

<c812 jG$� e�.;

��) F��# : iD�� i�N� k�) �#� ��)

ES \�#� Rechenberg �� 60 \�#� ) �=��Schwefel � �� \/� .Rechenberg �; �$; H~�.�� $8 K� �#� <��B� S#k ) ��./ �*��B� �B�C#7#�� &��$8K�C#7#�� !�� #� �X s�X �"#�D ��D

��E (� �D ��the evolution of evolution

Generic Evolution Strategy Algorithm A generic framework for the implementation of an ES is given in Algorithm 7.1. Parameters � and respectively indicate the number of parents and the number of offspring.

Evolution Strategy Algorithm coding: real Set the generation counter, t = 0; Initialize the strategy parameters; Create and initialize the population, C(0), of � individuals; for each individual, �i(t) ∈ C(t) do

Evaluate the fitness, f(xi(t)); endwhile stopping condition(s) not true do

for i = 1, . . . , doChoose � 2 parents at random; Create offspring through application of crossover operator on parent

genotypes and strategy parameters; Mutate offspring strategy parameters and genotype;

Evaluate the fitness of the offspring; endSelect the new population, C(t + 1);

6- �6��+� �3�#��# ��

6-2

t = t + 1; end

ES ��+ � a��"� #� �H ��, 5 4 �+ �"` *��+� 5 . 4�� q�$� 4� �; !�� B�.�C �; &�� $8 $ D�� $�� B�.�C <��B� z�;$� . [�G &�$.��5strategy parameters �&@ $� ��(E<��B�F�� *.

�7) !�� L.��#�7� $�� L.��#�7� ��.��(��$.�� ./ �)�`.�. (+��# 5�� (�+" strategy parameters ��D�� *��+� 4 �"� �� F�#� ( ;4# �#�U� 5 �.

6-1�#� ��) k�) �;# (�#� ��)$�" iD��

L.��#�7� D�� )�`.� �� # $�" i�N� iD�� #�D ��) ��X�� 8) �� *��B� >�B��.�� !�� DB+� �;�$�' ��$T ��`.�� #�. o�$�� &F�#� [�G ��σ !�� [�G $.��5 ��# �� $�� $�4 ��#c� ��$8� )

)),...,(),,...,(( ,1,,1, nxiinxiii xxX σσ=

�� \��$ D�� D 4� �;�J ��#�� .�� $gT �; !�� &#n�� &� �$�� j�4#� j�� jg�� rg��./ . �# �� 8$�� a�� ; &��G &(��$.�� r D �� FN�)#)��D 4� �;�J �� (�X�yD . ��

D�� H�+,� !7�U �� K��B�n <B �� ; ��? �� L�$8 e#*g� !�G �� j�4#� j�� &$�' !�G S�B�� &��!�� S� �

Z(� iD��7�� ) Q �� #�D ��$�� ; !�� 4�� j�� &$. �� ,X�� $�+ !,$� S�$�~�� &�$�Hessian ��#� ��

��; �=�� . ��$��Hessian !G �� %�� (D ��" iD�� c4� ��U�B� ��D�� z�� !�� . ��#c�� !�� $��$� [�G j��

where H is the Hessian matrix.

6- �6��+� �3�#��# ��

6-3

• !�� D�� S? �� B�� D+Y �� .�� )� q. � � �+ ��4�� j�� !/�� #*�� ; �a�? 4� Schwefel [769] proposed that the covariance matrix, C 1, described by the deviation strategy parameters of the individual, be used as additional information to determine optimal step sizes and directions. In this case,

N(0,C) �� ./�C ��; �� =�� )�`.� j�4#� .�4�/� ��#� �� $�4 &� j�4#� &�� )� !7�U �� ) S��.� ��#; ��$�� ; �a�? 4�positive definite !#� S�#� �� !��C=BD2B’ . �� #�� !�G

��$�� (�) ��$� !�GC !��. 1. Covariances are given by rotation angles which describe the rotations that need to be

done to transform an uncorrelated mutation vector to a correlated vector. If �i(t) denotes the vector of rotational angles for individual i, then individuals are represented as the triplet,

)),...,(),,...,(),,...,(( 2/)1(,1,,1,,1, −= nniinxiinxii xxXi ωωσσ 2. Individual step sizes and One correlation angle per coordinate pair (no. of i,j pairs) • and the covariance matrix C is defined as:

– cii = σi2

– cij = 0 if i and j are not correlated – cij = ½ • ( σi

2 - σj2 ) • tan(2 ωij) if i and j are correlated

The rotational angles are used to represent the covariances among the nx genetic variables in the genetic vector xi. Because the covariance matrix is symmetric, a vector can be used to represent the rotational angles instead of a matrix. The rotational angles are used to calculate an orthogonal rotation matrix, R(�i), as

��

�

�

��

�

�

−

=

11

)cos(1

)sin(

)sin(1

)cos(1

1

)(

0

0

ijij

ijij

ijR

αα

αα

α

which is the product of nx(nx �1)/2 rotation matrices. (aij=�ij) rll = cos(�ik) ) rlj = �rjl = �sin(�ik) .

k �� !�� $��$� F��|j-l|.

6-1-1Evolutionary mutationlet n� denote the number of deivation parameters used, and n� the number of rotational angles. The following cases have been used:

6- �6��+� �3�#��# ��

6-4

1( � �`7#� �+ &�$� o�$�� K�

)),,...,((')1,0())1,0(exp(

)),,...,((

1

0

1

σστσσ

σ

′′′=⋅′+=′⋅⋅=′

=

n

iii

n

xxXNxxN

xxX

– n

10 =τ

2( o�$�� <�./� &�$�$ �`7#�

)),...,(),,...,((')1,0())1,0()1,0(exp(

)),...,(),,...,((

11

11

nn

iiiiiii

nn

xxXNxxNN

xxX

σσσττσσ

σσ

′′′′=⋅′+=′⋅+⋅′⋅=′

=

t, t‘ are learning rates, again t‘: Global learning rate N(0,1): Only one realisation t: local learning rate Ni(0,1): n realisations Suggested by Schwefel*:

nn 2

121 ==′ ττ

3( o�$�� <�./� [�$k ��)�4 ) &�$�$ �`7#�

n� = nx, n� = nx(nx � 1)/2, where in addition to the deviations, rotational angles are used. The elliptical mutation distribution is rotated with respect to the coordinate axes as illustrated in Figure 7.1(c). Such rotations allow better approximation of the contours of the search space. Deviation parameters and rotational angles are updated,

))',...,'(),,...,(),,...,(('),0(

' =c')'2tan()''(21'

)1,0('))1,0()1,0(exp(

)),...,(),,...,(),,...,((

2/)1(111

2ii

22)(

2/)1(111

−

≠

−

′′′′=′+=′

−=

⋅+=⋅+⋅′⋅=′

=

nnnnii

iijjijiij

jjjiii

nnnn

xxXCNxx

c

NNN

xxX

ωωσσ

σωσσ

βωωττσσωωσσ

��

nn 2

121 ==′ ττ and β ≈ 5° σi’ < ε0 σi’ = ε0

| ω’j | > π ω’j =ω’j - 2 π sign(ω’j)

6- �6��+� �3�#��# ��

6-5

Adding the rotational angles improves flexibility, but at the cost of a quadratic increase in computational complexity.

4( &��4? �G�� !7�U

1 < n� < nx: This approach allows for different degrees of freedom. For all j > n�, deviation �n� is used.

6-2��+"�;# (Self-Adaptation Strategies

��8) &��(��$.��EP z*]� �; �$; ��`.�� S�#� �� lognormal self-adaptation �Additive methods �&6 �* )��K�4� .�&6 $�#�C ��,� 24� ,�* !�� 6�� F �� )�Y8.

• &(��$.��1/5 success rule &�� 5 Rechenberg �; !��.' �j ��4 �4�� K� �� $'� �� [��F8� 4� [�� q8#� &�� G ��1/5 �� [�; ��#c�� $�] �� ) �� ".

This rule resets σ after every k iterations by – σ = σ / c if ps > 1/5 – σ = σ • c if ps < 1/5 – σ = σ if ps = 1/5

where ps is the % of successful mutations, 0.8 ≤ c ≤ 1 • Schwefel �� 5 �� ; �� 4� t > 10nx $ !T) t mod nx = 0!�� &�� G q8#� nm,

�� 4�� [t � 10nx , t – 1] ��$+ �� ) S�#� $.��5 o�$�� #�� $'

6- �6��+� �3�#��# ��

6-6

7.4 where � = 0.85.

�� L.��#�7� $�� $.��5 &�$� ��+� �`7#� � �B� !�� j(t) = �, j = 1, . . . , nx. reinforcement learning : Lee et al. [507] and M’uller et al. [614] proposed that reinforcement learning be used to adapt strategy parameters, as follows:

where �i(t) is the sum of temporal rewards over the last n� generations for individual i, i.e.

1( Lee et al. [507] ��; �� 5 "��5 &�$� �� $�4 ")�

7.20 �f(xi(t)) = f(xi(t)) � f(xi(t � 1)) (7.21)

2( Muller et al. [614] ��; �� =�� $�4 &��)� �ij(t) = f(xi(t))�f(xi(t��t)), with 0 < �t < t.

This approach bases rewards on changes in phenotypic behavior, as quantified by the fitness function. The more an individual improves its current fitness, the greater the reward. On the other hand the worse the fitness of the individual becomes, the greater the penalty for that individual.

• �ij(t) = sign(f(xi(t))�f(xi(t��t))). This scheme results in +1, 0,�1 rewards. • �ij(t) = ||xi(t) � xi(t ��t)||sign(f(xi(t))� f(xi(t��t))). Here the reward (or punishment) is proportional to the step size in decision (genotypic) space.

Z ( �� %�> E�� # �� e�:.�� 8��c� ��#c� �; D��7�) �� $'�ρ=2 S? �� Local crossover �� 2 < � � �

��Global crossover �# �� #� . �� S� � $�4 ��#+� �� .8� �� i�`�� z�;$� $�*+, �� ; ��#`�!�� .

1) Local, discrete recombination, where

•

6- �6��+� �3�#��# ��

6-7

2) Global, intermediate recombination, which is similar to the local recombination above, except that the index i2 is replaced with rj ~ �l. Alternatively, the average of the parents can be calculated to form the offspring [62],

\ (�UD �#�D k��$0 �� [��F'GA �.T) !�� offspring 4�� parents �� #G)

�( : (� + ) – ES μ #n, ��G </� &��$.�� 4� >μ ) L��T </� #n,λ ��? S��4$8��G ��# ��. 2( (�, �)-ES :μ #n, ��G </� &��$.�� 4� >λ ��? S��4$8��# �� G 3( (�, �, ) :� ��lifespan ��$' �� ohU ��gT �$8 �# $. �� &�� 4� $'� �; !�� $8.

6-3Evolution Strategy Variants• Incremental Evolution Strategies • Surrogate Evolution Strategy: when the fitness function is computationally

expensive to evaluate. • Directed Mutation • Evolution Strategies with Directed Variation • Constraint Handling Approaches • Multi-Objective Optimization • Dynamic and Noisy Environments

6-4Assignments1( ��k#*8ES �� r�Y#� �$�? �FG� ) L�� . 2( ��/�#�� # �� b$g� �*��B� &(�$.�� ; F�#� j�� G ��.��. 3( �� [�G �g��ES �� r�Y#� �$�? �FG� �$B*+, �.#� ��.

……

�=

μ

μ 11,

1i

rix

6- �6��+� �3�#��# ��

6-8

7-1

7.Simulated Annealing

Q�#�D��D ��#� r& f� tU, 5��7;# Q�# • ) !/�� ; �� z�;$� • �# �� $' ) �.8$�h5 �g��$ !�� 8��c� [�G ..

��SA #a./G ��*+, s#� K� \�8 �; �� #G) ��7)� �#G#� K� \�8unary search ��; �� H�� .

(y9�� F��# (� �N 4 ��#�, a��SA : H�� 1953 >Metropolis et al ^�� $� �*+�7��#.�� $V� �; ��$; �=�� #7��; !�#� ")�

�V��$8 �Va�.� �� >H��V�� V� S�� &F*8 &��7�./�$; ��.�� $��Z� &4�� S�B�� S? ^��annealing ��? ��5 .

&�$� � �T# � !��8#� D��Kirkpatrick et al. ��V� �� &� �� L.��#�7� ��Simulated Annealing (SA) �� <=�)� �� &4�� &�$�80 �� 5> ��; H�+,� &��4 &4�� <=�/� ��#� �� ) .Cerny ��#c� F��

&�$� �� ")� D�� <�./�TSP �$� ��B� .

In metallurgy and material science, annealing is a heat treatment of material with the goal of altering its properties such as hardness. Metal crystals have small defects, dislocations of ions which weaken the overall structure. By heating the metal, the energy of the ions and, thus, their diffusion rate is increased. Then, the dislocations can be destroyed and the structure of the crystal is reformed as the material cools down and approaches its equilibrium state. When annealing metal, the initial temperature must not be too low and the cooling must be done sufficiently slowly so as to avoid the system getting stuck in a meta-stable, non-crystalline, state representing a local minimum of

energy. �� 7�./�$; ��.�� K� <�B � H�+.U�pos D�F.7#� �7�+.U� z�$Y �� K��# �� D�� .

where E(pos) is the energy of the configuration pos, T is the temperature measured in Kelvin,

and kB is the Boltzmann’s constant

7-2

7-1 5��7;# SA ")�SA �@; ��T� �,#�c� ��$8SA !�� $�4 �� S�$; x��7 ��

�;#(�D��N�# iD��

&4�� *E/� &�$�E(pos) ��4�� j�� S�+F !��

Z ( ��) � �P�)(�� {�U"( �# �� e�:.�� 8��c� �; !�� #a./G &�n8 �� &$�� g��.

\ ( iD��P : (D �]0 ��]_ a��J#s' :acceptance probability function

7-3

j�� K� 4� ��.�� $��Z� H�+.U�posi ��posi+1 $�4 �g�� 4� �� ;annealing �# �� .#� !��..

� E !�� &C$�� $��Z� . 1. $'��E<0 ��P=1 )s' ��$' �� .8$�h5 ��gT. 2. $'��E>0 "$�h5 ��s’ ��$� �G�� ; !�� 7�+.U�T !�� ./��).

r � P(ΔE) �� .r ��; �� 7#� K� ) $`p D�� !��#�B� �7��k j�� 8��c� �7#� K� �� .r∈U[0,1]

• ��G�� T ~��P(ΔE) K��F�1 ��# �� .8$�h5 ��G &��.7�U ��+� ��$�� ) !��. �V.T)T !V�� v�FV� . DV�� S�a� �*�� #+�� 7)� S��.8� $�' 4� �� L.��#�7� ��#� �� S�B�� . �.T)T→0 �� $`p �� "$�h5 D�� S�B��.

• [�; ��T "$�h5 H�+.U� >s' �� [�; ��./ ��4�� j�� [��F8� �� $+ �; �� . �� L.��#�7� \��$ D�� *�� #+�� #� �� .�7� �; �)� �� #+�� &#�.

• when T becomes 0, the procedure will reduce to the greedy algorithm—which makes the move only if it goes downhill.

�(-��#�J ()�� %� (��"�D Cooling schedule ��$U �G�� $��Z� ��$� ) j��SA �4�� $��Z� "#:.�� $�? �$B*+,. getTemperature(t)

• Tstart ��7)� ��$U �G�� *��$`p 4� $.'�F� �� [�; $`p �� !�� ; . ��$VU �VG�� [�V; !,$V�� a�� &$.�� &#a./G ��#.� �� 4 ��.

&�$� &��4 &��)�temperature schedule !V�� VB�.�C L.��#�7� 4� ��`.�� ? �*+G 4� �; !�� =�� . $�4 ��3 �� ")� �� `.�� ; &� ��$�� 8$�� .

1. T(n)= (1-�)T(n-1) 0<�<1

Barrier avoidance

7-4

�V�� .�� #G) j��G �#+�� ) S? D�� &��*� j�� ; �� *�� #+�� !�� DB+� . H�VU DV�� V��# �=�� *�� #+�� 4� S� w�� &�$�.

��I� V4�� ; ��5 �� ; �*�T &��$.�� 4� �B� �� ) �� $�? &��G �$8 �� &�G �� ; �� z�� !�� DB+� . �V�

<+, D��restarting ��#' �� . &�#V�� ' ��k 4� �� ) !�� v�F� �*�� 4�� j�� .T) �$V�� #Vp�# ��`.�� &(��$.�� D��.

��D��E

7-2 �D �N�� c� � �MATLAB

1- A Simple Objective Function We want to minimize a simple function of two variables function y = simple_objective(x) y=(4-2.1*x(1)^2+x(1)^4/3)*x(1)^2+x(1)*x(2)+(-4+4*x(2)^2)*x(2)^2; To minimize our objective function using the SIMULANNEALBND function ObjectiveFunction = @simple_objective; X0 = [0.5 0.5]; % Starting point [x,fval,exitFlag,output]=simulannealbnd(ObjectiveFunction,X0) Optimization terminated: change in best function value less than options.TolFun. x = -0.0899 0.7127 fval = -1.0316 exitFlag = 1 output = iterations: 1228 funccount: 1239 message: [1x80 char]

7-5

rngstate: [1x1 struct] problemtype: 'unconstrained' temperature: [2x1 double] totaltime: 0.5313 The first two output arguments returned by SIMULANNEALBND are x, the best point found, and fval, the function value at the best point. A third output argument, exitFlag returns a flag corresponding to the reason SIMULANNEALBND stopped.

2-Bound Constrained Minimization SIMULANNEALBND can be used to solve problems with bound constraints. The lower and upper bounds are passed to the solver as vectors. lb = [-64 -64]; ub = [64 64]; Now, we can rerun the solver with lower and upper bounds as input arguments. [x,fval,exitFlag,output]= simulannealbnd(ObjectiveFunction,X0,lb,ub); fprintf('The number of iterations was : %d\n', output.iterations); fprintf('The number of function evaluations was : %d\n', output.funccount); fprintf('The best function value found was : %g\n', fval); Optimization terminated: change in best function value less than options.TolFun. The number of iterations was : 1707 The number of function evaluations was : 1720 The best function value found was : -1.03163

3-Minimizing Using THRESHACCEPTBND To minimize our objective function using the THRESHACCEPTBND function, [x,fval,exitFlag,output]= threshacceptbnd(ObjectiveFunction,X0,lb,ub); fprintf('The number of iterations was : %d\n', output.iterations); fprintf('The number of function evaluations was : %d\n', output.funccount); fprintf('The best function value found was : %g\n', fval); Optimization terminated: change in best function value less than options.TolFun. The number of iterations was : 1531 The number of function evaluations was : 1552 The best function value found was : -0.635985

A.4- ��+6�& ��DSA

H#��$�� <BSA j�� #+�� S�$; ��5 �� f(x)= x+10*sin(5*x)+7*cos(4*x)+sin(x)

��)�� 0 ��5 ��$�� ;x=0.8 �� S� � ��.

7-6

B.�l#� 1- L.��#�7�SA �� r�Y#� �$�? &�$.��5 [��) ��/�#�� .

8-7

8. �6w�O� *��+�Differntial Evolution

<c813 jG$� e�.; ��) F��#:�6w�O� k�)

DE \�#�Storn and Price [696, 813] H�� 1995 � �=��!�� . L.��#�7�DE !�� $T D�� 4�:

General Differential Evolution Algorithm Coding: Real Set the generation counter, t = 0; Initialize the mutation parameters, � and cross over parameter pr; Create and initialize the population, C(0), of ns individuals; while stopping condition(s) not true do

for each individual, xi(t) ∈ C(t) do Evaluate the fitness, f(xi(t)); New mutation:

Select target vector; Randomly generate difference vectors Create the trial vector, ui(t) ;

New crossover: Create an offspring, x'i(t), by recombining xi(t) & ui(t); if f(x'i(t)) is better than f(xi(t)) then Add x'i(t) to C(t + 1); end else

Add xi(t) to C(t + 1); end

end end Return the individual with the best fitness as the solution;

��; >L.��#�7� ^�� $�real ��.�� )k�) �@�> E�� N�D �$V�' �� #p . L.��#V�7� DV�� 4� &� ��#��V� ��#c� �; �� #G)DE /X/Y/Z �8$�� ? �� ; ��# ��x &(��$.V�� |�S �#��SD Z�S9�"# >y �#�S��

�6w�O� ��#��D )z > E�� 3�#��# !��.

8-8

Differential evolution has been developed for optimizing continuous-valued parameters. However, a simple discretization procedure can be used to convert the floating-point solution vectors into discrete-valued vectors

8 -1 �6w�O� k�) �;# ( k��$0 ��,k�) �#�D

��Random selection �� elite $�� &�� # �� e�:.�� [�G ��; ��$8��.

Z (Difference Vector ��7#� &�$� $�� #n, )� <T��U o� #n, $� �)X,Difference Vector ��#c��,��'� ��#V �� e�:.��.

. 4� ��$� D��xi2 �� xi3 ��; �� <p) . 4� [�� :.�� $'�2 )ns �� !�� $��$� ��$�

\ (Trial vector

�$8 $ &�$�xi o� #n, K�xi1 &(��$.�� ^�� $�)F� ^ 4 Q��D( �# �� e�:.�� . ��$� )trial �� D��#.

H#�$8 D�� xi1 o� ��$� S�#�, �� )xi2 )xi3 e�:.�� 8��c�� .

i2, i3 ~ U(1, ns) S? �� ;i≠ i1 ≠ i2 ≠ i3 where �∈ (0,�) is the scale factor, controlling the amplification of the differential variation.

� (> E�� H�U��# �� #�� #� �.8�� [�G �� $8 &� �`7#� 4� �n�� 8��c� ��#c�..

8.3 where xij(t) refers to the j-th element of the vector xi(t), and J is the set of element indices that will undergo perturbation (or in other words, the set of crossover points).

(&�I� Q �� J �,#+a� D�� &�$�J �# �� `.�� $�4 ")� )� 4� z*]� �7) �� #G) &��4 &��)�.

• Binomial crossover

8-9

The crossover points are randomly selected from the set of possible crossover points, {1, 2, . . . , nx}, where nx is the problem dimension. In this algorithm, pr is the probability that the considered crossover point will be included. The larger the value of pr means that more elements of the trial vector will be used to produce the offspring, and less of the parent vector. Alg. 8.1 DE Binomial Crossover for Selecting Crossover Points

• Exponential crossover: From a randomly selected index, the exponential crossover operator selects a sequence of adjacent crossover points, treating the list of potential crossover points as a circular array. The pseudocode in Algorithm 8.2 shows that at least one crossover point is selected, and from this index, selects the next until U(0, 1) pr or |J | = nx.

Alg. 8.2 DE Exponential Crossover for Selecting Crossover Points

��+" • <; �� #� ��,X�� >�� !��#�B� ��7)� !��+G j�4#� �B��#p ��solution space ��? �� !�� . K� ��

!�� 4 ��U? D�� *p�8 ��$8� e#� j�4#� . �V�#� $V.+; �*p�8 D�� $. �� 7)� !��+G ��U? �� k$ �.�7��#�. • �# �� L; � �*p�8 ) �� i#� e�#G !+� �� !��+G S��4 !h' ��. • �; !�� g�� >�� 4 ��U? D�� *p�8 $'�step size ��+� �� $' e�:.�� v�F�solution space ��$� ��#�

�$�' ��$T . >�� L; ��U? �*p�8 $'� $�� o$� 4�step size H$.�; q�T� ��#c� o�$�� g�� $' e�:.�� Kk#; ��# .

• ^�� $� �� 4 !��+G ��$8� $'�central limit theorem ��#' [�G ��' �� j�4#� j��!��

8-10

• ��V�� &�$V� ��$8� e�:.�� j�4#� j�� $'� )difference vectors �V�� D��V�� >�V�� !V��#�B�difference vectors !�� $`p. �.+V� �V� �V�+G ��#Vc� �V; !V/�� ^�� &�� !��+G �; !�� D�� D�� S�#� $`p

��; !;$U .�� Kk#; F�� j�4#� j�� o�$�� !�� $' . • - Scaling factor: The scaling factor, �∈ (0,�), controls the amplification of the differential variations, (xi2�xi3 ).

Kk#; ��β e�#VG &)� 4� "$V5 zVG#� !V�� DVB+� S? v�F� �� ) �� [�; �� $�+ !,$��# �� .�# e�:.�� z�� S? &�$� D��$��.

�� !��+G [��F8� ��β &�$� v�F� ��' �� 4�� $�4 �� [�; ��exploration �#V� ��V�4 !��+G ) !/�� a�� ; D��.

• Empirical results suggest that large values for both ns and � often result in premature convergence, and that � = 0.5 generally provides good performance .

8 -1 5��7;# ��+6�& �+ ,#�0 k��" Figure 8.1(a) illustrates the mutation operator of the DE as described in Section 8.1.2. The optimum is indicated by x�, and it is assumed that � = 1.5.

The crossover operator is illustrated in Figure 8.1(b).

8-11

For this illustration the offspring consists of the first element of the trial vector, ui(t), and the second element of the parent, xi(t).

8 -2 DE /X/Y/Z s�#�� &��+.��#�7�DE ��$' �� 8$�� #��DE/x/y/z

• x o� ��$� e�:.�� #�� • y �*Y�`� &��$� �� • z z�;$� &�$� �� `.�� ")�

H�J� &�$�DE/best/1/bin �� *VY�`� &�V��$� �� ) �� e�:.�� o� S�#�� D�$.��1 zV�;$� ")� )binominal !�� .$�4 &��7�J� �� z��$� D�+ ��

DE/best/1/z: Parent: best ˆx(t), No. of Diff. vectors 1, crossover method: X

Any of the crossover methods can be used. DE/x/nv/z: No. of Diff. vectors ny

DE/rand-to-best/nv/z: !�� D�$.�� &$�� ) e�:.�� 8��c� �B� �; ��; �� `.�� o� )� 4� .� L�V�

��; �� _: � �� K�$.

8-12

DE/current-to-best/1+nv/z: With this strategy, the parent is mutated using at least two difference vectors. One difference vector is calculated from the best vector and the parent vector, while the rest of the difference vectors are calculated using randomly selected vectors:

��7;# �+ �� Z�9�"#

�V; �� S� � ��$�DE/rand/1/bin s#V��exploration ) e#V�DE/current-to-best/2/bin !,$V�� e#� ��$�+ .!�� =�� )� D�� 8) S�$; x��7 &�$� ��;$� ")� )� D�� 4� . H�V+.U� ^�V�� D�� $�

�# �� D�� #�� K�$ e�:.��. • . If ps,1 is the probability that DE/rand/1/bin will be applied, then ps,2 = 1�ps,1 is the probability that DE/current-to-best/2/bin will be applied. Then,

where ns,1 and ns,2 are respectively the number of offspring that survive to the next generation for DE/rand/1/bin, and nf,1 and nf,2 represent the number of discarded offspring for each strategy. The more offspring that survive for a specific strategy, the higher the probability for selecting that strategy for the next generation.

8 -3 Applications Table 1 Applications of Differential Evolution

��$� ��#+� K�MATLAB �� S? �#5De Mat ��$T . �#5 D�� deopt D��V�k �V; !V�� *p� ��$�

s#�DE ��; �$G� ��#� �� . &� ��$� ��$' �.�� ; j�� $ &�$�Rundepot >�$G� &�$.��5 &)�U �;PlotIt ) �� $�� $G� H#�� [��+� �;objfun �# �.#� �� !�� 4�� j�� ; . �V#5 DV�� 4 H�VJ�

8-13

�� #G) .��$' ��T� �� q�$� �� $�� &��7�J� &�$� . s�#��DE �V��, ��; �� . 5 ��$� D�� ;4�:

I_strategy= 1DE/rand/1: the classical version of DE. 2DE/local-to-best/1: a version which has been used by quite a number

of scientists. Attempts a balance between robustness and fast convergence.

3DE/best/1 with jitter: taylored for small population sizes and fast convergence. Dimensionality should not be too high.

4DE/rand/1 with per-vector-dither: Classical DE with dither to become even more robust.

5DE/rand/1 with per-generation-dither: Classical DE with dither to become even more robust. Choosing F_weight = 0.3 is a good start here.

6DE/rand/1 either-or-algorithm: Alternates between differential mutation and three-point-recombination.

8 -4 Assignments 1. ��k#*8DE �� r�Y#� �$�? �FG� ) L�� 2. [�G &(��$.��-z�;$� ��DE �� b$ ��. 3. [��+�DE/x/y/z �� r�Y#� ��.

4. *For the DE/rand-to-best/y/z strategies, suggest an approach to balance exploration and exploitation.

8 -5 Advanced Topics Particle Swarm Optimization Hybrids Population-Based Differential Evolution Self-Adaptive Differential Evolution Angle Modulated Differential Evolution Binary Differential Evolution Constraint Handling Approaches Multi-Objective Optimization Dynamic Environments

9-14

9.Cultural Evolution

�� :��

�� $� �#� S�B�� D�� 4� �#� ��+.��#�7� 4� �n�� .�� #�� +.��#�7� $B*+, S�B�� D�� S�$; �8�Y�.

(��U� Cultural evolution (CE), based on the principles of human social evolution, was developed by Reynolds in the early 1990s as an approach to bias the search process with prior knowledge about the domain as well as knowledge gained during the evolutionary process.

��, �[#�4: �*��VB� &�V L.��#�7� �� S? �,#�c� H�� ; �.8� �� i�`�� B�.�C <��B� ><J� ��7#� �a�.� �� `.��.

�7��, �[#�4 : �V�� $�m�V� �$B*+, �#�� $8 <��#, �B�.�C <��B� $� �)X, . �V�$8 �)�V`.� ��V��4� ��,:

• �� </� �; �#�� ) L�X, �,#+a� �# �� <�.�� </� • !�� )? !�� S��4 HX� �� !��+G K� �; �)�� $a� L*, �,#+a�. • �4�� F��+.� L 4� �� .��+G �; !��+G K� D| �� &F�� $� �,#+a� • ) �#� \�� !,$� �� 4�� T �� $8 ��$8 �7) !�� &��; ��/� ��$8 �B�C#7#�� <��B� !�� qV8

��. • 4� ) �� $.�� S? �� !��+G ��$8� �; !��.8� �.h' �� l#�$� �.�� ,#+a� ��$8 >�*��B� �� $%� 4�

��$�' �� $�m�� S? .�4�� H#�.� S��4 �� $8 !��+G ��+Y .

��X�� Cultural Evolution S�$; �8�Y� ��CA t#*� ��$' �� 8�Y� �,#+a� �� )��.

�;# (�4�D q6D ��X�� • Belief �� *�T &��$G� �� *�8 &�$G� 4� �� #� �� ; !�� $8 �� t#*�. • !�� $; ��5 ��.� &��*/� HX� !��+G �; !�� ,X�� ,#+a� �)�� • ��; �� ohU #a./G 4� �� `� $�] &��n8 �)�� • ,#� �)�� ) !��+G D�� H��.�� &�$��h�� $m� L$� ��#.� �,#+a� )� D�� !�� 4�� <B�)$5 �.

9-15

• �4�D q6D ��X��: �� .8$' ��$T ��`.�� #� &��.� &��.��: lattice to store schemata [716], vector representations [720], fuzzy systems [719,

725], ensemble structures [722], and hierarchical belief spaces Z (5��7;# ("�D �D �4�D q6D *�� • acceptance function ��h�� $�m�� )�� $� ��#� �� ; &��$8� ��; �� [��F'. • Influence Functions :!�� G </� ��7#� �� )�� $m� �#��.

\ (�4�D q6D 5 �� F�" • Adjusting the Belief Space !V�� [��F' ��$8� ^�� $� �)�� L�%�� &(��$.�� . ) zV�;$� 4� FV�� Va��

�# �� `.�� [�G .

• • A pseudocode cultural algorithm is given in Algorithm 10.1, and illustrated in Figure 9.1. Algorithm 9.1 Cultural Algorithm Set the generation counter, t = 0; Create and initialize the population space, C(0); Create and initialize the belief space, B(0); while stopping condition(s) not true do

Evaluate the fitness of each xi(t) ∈ C(t); Adjust (B(t), Accept (C(t))); Variate (C(t), Influence (B(t))); t = t + 1; Select the new population;

End

9-16

9-1 �4�D q6D ��X�� Knowledge Components Belief �� `7#� )� <T��U

�;# ( Q��D)�� ( *�" � :A situational knowledge component

track of the best solutions found at each generation

Z (I��) ��r, Q��D : A normative knowledge component �� ,X�� !V�� #a.V/G &�n8 �� l#�$� ��? ! . &4�V� �V�� XJV� the range of good areas to

search in each dimension !�� D�� .�# �� [��+� ��#�� `7#�.

where, for each dimension, the following information is stored:

\(�7�� (O;�:

• S#�; �� &��$.�� !/�7: A domain knowledge component, which is similar to the situational knowledge component in that it stores the best positions found. The domain knowledge component differs from the situational knowledge component in that knowledge is not re-initialized at each generation, but contains an archive of best solutions since evolution started – very similar to the hall-of-fame used in coevolution. • .� $.��5 ^�� $� e�#G ��c: � !/�7\�� $�Z A history knowledge component, used in problems where search landscapes may change. This component maintains information about sequences of environmental changes. For each environmental change, the following information is stored: the current best solution, the directional change for each dimension and the current change distance. • � !/�7�� #a./G X�T �; �� A topographical knowledge component, which maintains a multi-dimensional grid representation of the search space. Information is kept about each cell of the grid, e.g. frequency of individuals that occupy the cell. Such frequency information can be used to improve exploration by forcing mutation direction towards unexplored areas.

9-17

9-2 �4�D �D � [��Acceptance Functions • [��F' &�� 4� K�$EC �� ; &��$8� e�:.�� &�$� S�#� �� belief ��$T ��`.�� #� ��; �� <�.�� ,X��

��. • �$; e�/U ��#�� S�#� �� F�� $8� ��.

with � ∈ [0, 1]. Using this approach, the number of individuals used to adjust the belief space is initially large, with the number decreasing exponentially over time. Other simulated-annealing based schedules can be used instead.

• �$� �#� S�#� �� F�� $��Z� \��$ �� ./� �� $8� �� [��F' �� ; ��8) ")� 4�. Adaptive methods use information about the search space and process to self-adjust the number of individuals to be selected. Reynolds and Chung [719] proposed a fuzzy acceptance function to determine the number of individuals based on generation number and individual success ratio. Membership functions are defined to implement the rules given in Algorithm 10.2.

9-3 �4�D 5 �� Adjusting the Belief Space With the number of accepted individuals, nB(t), known, the two knowledge components can be updated as follows [720]: 1) Situational knowledge: Assuming that only one element is kept in the situational knowledge component,

where

2) Normative knowledge: In adjusting the normative knowledge component, a conservative approach is followed when narrowing intervals, thereby delaying too early exploration. Widening of intervals is applied more progressively. The interval update rule is as follows:

9-18

9-4 �D �4�D �[#*�" ��) :Influence Functions • Belief �� i#� $.�� ~�+.U� l�� &#� �� $8�.

The belief space is used to determine the mutational step sizes, and the direction of changes (i.e. whether step sizes are added or subtracted). Reynolds and Chung [720] proposed four ways in which the knowledge components can be used within the influence function: 1) Only the normative component is used to determine step sizes during offspring

generation:

where

is the size of the belief interval for component j. 2) Only the situational component is used for determining change direction:

10.14 where �ij is the strategy parameter associated with component j of individual i. 3) The normative component is used to determine change directions, and the situational component is used to determine step sizes. Equation (10.14) is used, but with

4) The normative component is used for both the search directions and step sizes:

where � > 0 is a scaling coefficient.

9-19

9-5 Assignments 1. ��; �)�� &� �`7#� <T��U 2. �� r�Y#� H#�$8 �� <��B� $� �)��$m� ) �)�� L�%�� #��.

10-20

10. �6��+� 5Coevolution

*',15 i)�� Z��E ? D�� <�� 4� ��`.��U �$VB*+, �#V�� V��#� �V� FV�� S�VB�� DV�� S�)FV8� �� #G) ��+.��#�7� 4� �n��

�� .�� H�� L.��#�7�.

(��U� • D�V� <V+B� <��B� �; !�� *��B� ��.�� ,#� �*��B� Lclosely associated species �V.8� �V� i�V`�� . ")�

��#� �� <��B�GA �� $�] ) .!�� $�Z.� F�� #� #a./G &�n8 �B*� �#G#� �� a�� F�BJ 4 F� 0 a�e�: H�J� �� ; !�� $ U ) ��' D�� .��T� <�� S? H�J�Holland �#V �� .�� . ��

�� &4�� D��1 ( �V�; s�8� �#� 4� �$ U <�� #.� �; !�� F��B� H�� '2 ( �$V U �VB�� D+VY!�� ' S��#� ��4�� &�$�. V �$ U ��)�? >�# �� ' �T�� $'� �� ) �V�; �V� S�V &#VT �V� s)$

��; Ln �� $4 �; �# �� +�F�? �7#� ��k�� $ U �# �� $4 �T�� B��#p . <V/�$ �� zV��$� DV�� <��B� ��' �,�8� ) �$ U �+G�� F��

a�e�predator-prey : �#V�� #V� �+G�V�� !�*��T �k��B ) �,�8� !�*��T ��B >�k��B ) ��B H�J� ��; �� 5 [��F8� )� $ �'��u�5 D�� D�� ) �� . �� l#�$� �G��B ) ��B H�J� �Y�� H�� Lotka

and Volterra in the mid 1920s �� $�4 \��)� �� K� $ !��+G �4�� ; !��.

• where N1 and N2 are the population sizes of the two species (number of prey and number of

predators, respectively), • r1 is the growth rate of prey in the absence of predators, r2 is the death rate of predators in

the absence of prey, • b1 is the death rate of prey caused by predators, and b2 is the effect of prey capture on the

reproduction rate of predators.

10-21

�*��B� L s#�)��+� 4 ��D�P� �� #G): 1- ��D�P�Competitive

<��B� ^X; )��# �� .��T�. • Competition, where both species are inhibited. Due to the inverse fitness interaction

between the two species, success in one of the species is felt as failure by the other species. • Amensalism, where one species is inhibited, and the other is not affected.

; �B� �� '��4 \�#� ��+� �� 7#� H�J��$� �� D��4� �� !:� �� /+ �#G#� &�$� ��U \��$ � . • 2-��)�� )�./�F+ )( Cooperative (and symbiosis)

�� * ��,�E 2�� %� VE �� Z��&�&* [&� �� . 4� ��, &��B+ �*��B� L s�#�� • Mutualism, where both species benefit. The positive fitness interaction leads to an

improvement in one species whenever the other species improves. • Commensalism, where only one of the species benefits, while the other is not affected.

��; �� o$c� �� B �� ; ��.`; H�J� • Parasitism, where one of the species (the parasite) benefits, while the other species (the

host) is harmed.

10 -1 ��X�� 6��+� 5��D�P� �;# (> P� �� ) ��CCE >�� #G) !��+G )�

• �� !/B �� )� !��+G 4� &$. �� ; �� 5 <��B� �; ��./ �*E/� e�#G !��+G K� ��$8� • �)� !��+G ��$8�test cases ��./ .��#:� !/B $.+; ) �� $� ��u�5 �� # �� <��B.� �;.

Z (�D�P� F�" 1. All versus all sampling [33], where each individual is tested against all the

individuals of the other population. 2. Random sampling [711], where the fitness of each individual is tested against a

randomly selected group (consisting of one or more) individuals from the other population. The random sampling approach is computationally less complex than all versus all sampling.

3. Tournament sampling [27], which uses relative fitness measures to select the best opponent individual.

• All versus best sampling [793], where all the individuals are tested against the fittest individual of the other population. • Shared sampling [738, 739, 740], where the sample is selected as those opponen individuals with maximum competitive shared fitness. Shared sampling tends to select opponents that beat a large number of individuals from the competing population.

10-22

\(�D�P� �D��N�# iD�� Competitive Fitness ~#+�� 4�� j�� 4)$�5 ��!�� )� !��+G ��$8� <�� H)� !��+G 4� &��$8�.

• ��EA ��.��!�� !��m �# �� ; ��4�� j�� ) �$�' �� #p !��m \�� <��B�. • �� 7)CoEA ��# �#�)� � [�� $ " t ��. S�� !V/B �� !��8#� ��4�� j�� )) !8$V �5 �V� ( &$V��

!�� .��B� �)� !��+G��; �� $��Z� ) ��$; ��5 <.

� ( �� N� ��# Relative Fitness Evaluation The following approaches can be followed to measure the relative fitness of each individual in a population. Assume that the two populations C1 and C2 coevolve, and the aim is to calculate the relative fitness of each individual C1.xi of population C1.

�. Simple fitness �� 2) Fitness sharing: A sharing function is defined to take into consideration similarity among the individuals of population C1. The simple fitness of an individual is divided by the sum of its similarities with all the other individuals in that population. Similarity can be defined as the number of individuals that also beats the individuals from the population C2 sample. The consequence of the fitness sharing function is that unusual individuals are rewarded. 3) Competitive fitness sharing: In this case the fitness of individual C1.xi is defined as

where C2.x1, …, C2.xC2.ns form the population C2 sample, and C1.nl is the total number of individuals in population C1 that defeat individual C2.xl. The competitive fitness sharing method rewards those population C1 individuals that beat population C2 individuals, which few other population C1 individuals could beat. It is therefore not necessarily the case that the best population C1 individual beats the most population C2 individuals. 4) Tournament fitness [27] holds a number of single elimination, binary tournaments to determine a relative fitness ranking. Tournament fitness results in a tournament tree, with the root element as the best individual. For each level in the tree, two opponents are randomly selected from that level, and the best of the two advances to the next level. In the case of an odd number of competitors, a single individual from the current level moves to the next level. After tournament ranking, any of the standard selection operators (refer to Section 8.5) can be used to select parents. 5) Hall of Fame:. At each generation the best individual of a population is stored in that population’s hall of fame. The hall of fame may have a limited size, in which case a new individual to be inserted in the hall of fame will replace the worst individual (or the oldest one). Individuals from one population now compete against a sample of the current opponent population and its hall of fame. The hall of fame prevents overspecialization.

10-23

10-1-1Generic Competitive Coevolutionary AlgorithmAlgorithm 10.1 Competitive Coevolutionary Algorithm with Two Populations Initialize two populations, C1 and C2; while stopping condition(s) not true do

for each C1.xi, i = 1, . . . , C1.ns do Select a sample of opponents from C2; Evaluate the relative fitness of C1.xi with respect to this sample;

end for each C2.xi, i = 1, . . . , C2.ns do

Select a sample of opponents from C1; Evaluate the relative fitness of C2.xi with respect to this sample;

end Evolve population C1 for one generation; Evolve population C2 for one generation;

end Select the best individual from the solution population, S1; Algorithm 10.2 Single Population Competitive Coevolutionary Algorithm Initialize one population, C; while stopping condition(s) not true do

for each C.xi, i = 1, . . . , C.ns do Select a sample of opponents from C to exclude C.xi; Evaluate the relative fitness of C.xi with respect to the sample;

end Evolve population C for one generation;

end Select the best individual from C as the solution;

10 -2 Cooperative Coevolution U? ")� D�� a�� ; &��B+ �� . &�� 5 ")� K� ��

1. �*E/� &�$�n &��n ��; �� #� �)� �� ; �� <B !��+G $�4. 2. ��? �#G#� j��G &��#G �� .�#�5 L�� !��+G$ 4� &�)�� . �� !�� ? ��4�� $�� ) ��4�� #G D��

��? 3. 4��.�� S? L�� ,#+a� $�4 �`7#� �� 4�� j�� z��.��$�' �� q*��. 4. �� 4��.�� D�� #n,$�4 �4�� L�$8 � $. �� <��B� ��4 �,#+a�.

Algorithm 10.3 Coevolutionary Training of Game Agents Create and randomly initialize a population of NNs; while stopping condition(s) not true do

10-24

for each individual (or NN) do Select a sample of competitors from the population; for each opponent do

for a specified number of times do Play a game as first player using the NNs as board state evaluators in a game tree; Record if game was won, lost, or drawn; Play another game against the same opponent, but as second player; Record if game was won, lost, or drawn;

end end Determine a score for the individual;

end Evolve the population for one generation;

end Return the best individual as the NN evaluation function;

10-2-1Assignments5. ��/�#�� *��B� L &��+.��#�7� &�� .��. 6. �.�� n,� D�� !��T� s�#��1 �.�� )2 ��/�#�� .

7-1

7 Particle Swarm Optimization

<c816 jG$� e�.;

7-1�� ) � SWARM INTELLIGENCE a#� :�D�� u�0 ��

��$�� $%� �� )$' �� f�' !��#�� K� . &�� #n, $1 ( e��F*8 K� ) !��2 ( #n, FV*8 H�� ) e�� V� sX�� #� �� )$' #n, D�$.B��F� �� #� !��T#�. � &��VB+ t$V�> �V��#� �V� j8�V�� jV�4#�

�� #��: "�� L�� D�$. �� *p�8 �� z��.� �� ) ��$�� $�� f�' �� ; &�". �)$' #n, $2 e�:.��: 1 (�� $� �#� �$�? ��+� f�' D.8�� #p �� ) ��$; <+, S�$�� G#� S)�� . ��

�$�� .8�� S�$�� $'� H�U D�� &� .2 ( �V; $.�#VT �.8�V�� H��V� !V�G �� ) ��$; &��B+ �)$' ��; !;$U !�� sX�� #n, D�$.B��F� .�)$' \�#� f�' D.8�� z��$� D�� V� ) ��$V; ��5 [��F8�

�� [��F8� F�� 7�+.U� �$�� z��$� D�+. !�� ^�� &��g�� &��B+ 4� �� H�J� K� D�� . !V�� ) ��V <V�.�� LV; �$V� �V� �*�� ,X��

�$�' �� $T �)$' �n,� �+ ��.�� S? 4� �*B. ��+G "# ��$� 4� �� !�� $�4 ��B��? �

7-1 -1�� 6�& 4 #$)# 1. \,)� ��)(: swarm �)$' �;$�.� ��$8� 4� ��relatively simple in structure ��./ 2.*�� :$� �� <�� w#8 ��U? D��)�� #G) $�4 i

• direct (by means of physical contact, or by means of visual, audio, or chemical perceptual inputs) or

7-2

• stigmergy: indirect (via local changes of the environment). The term stigmergy is used to refer to the indirect form of communication between individuals.

3.��]_ � [��: ��#� �)4 \, ��,� 4 \, Q D ��P {��#• the collective behavior of individuals shapes and dictates the behavior of the swarm. • swarm behavior has an influence on the conditions under which each individual performs

its actions. • i��D u��": (I �" �� [�� F�� "�� S F�S��_ (SD �SI�� (SE �� 0 �� Y t �� D

F� y _very complex ��0��.. i��D g��Emergence N# ��"• ) !�� S��4 �� ? D�� <�� &F;$� $7$.�;coordinated control !/�� D�� • �a�.�!/�� 4� K�u� !�p�� •!/�� w�.�.�� ) �� [�5 <��T �'�� b$�.

•!�� $8� 4� K� $ ��.8� 4� �g� $�] �� b$� • Q�# (� > ��swarm intelligence ��+� (E ��# ��D � ;��, �SD F�� #�,# N# �# (&�I�

�� *+� �7��+� �D �6�� +�� 0. (DSI ��) �collective intelligence (S�O0 $S "�� . �� 7;# �#�BJ 4 !#�"�) (�;�?� N#SI �S�# F�S�G ��D . 4 !#�"�S) QS�#

��& �#�BJN# �� :(�� ants � ("�� s termites �S��"N s bees �S��+�& sspiders 4 ��"�� F��_fish schools 4bird flocks.

7-1 -2��;�e� �� S�#� �� !�� j�� .8� 4� &��4 &��7�J�.

• Termites build large nest structures with a complexity far beyond the comprehension and ability of a single termite.

• Ant colony: Tasks are dynamically allocated within an ant colony, without any central manager or task coordinator.

• Foraging: Recruitment via waggle dances in bee species, which results in optimal foraging behavior ( ��V� �hV]). Foraging behavior also emerges in ant colonies as a result of simple trail-following behaviors.

• • Birds in a flock and fish in a school self-organize in optimal spatial patterns. Schools of fish determine their behavior (such as swimming direction and speed) based on a small number of neighboring individuals. The spatial patterns of bird flocks result from communication by sound and visual perception.

• • Predators, for example a group of lionesses, exhibit hunting strategies to outsmart their prey.

• • Bacteria communicate using molecules (comparable to pheromones) to collectively keep track of changes in their environment.

7-3

7-1 -3 ��) � �w�� N��;��•computational swarm intelligence (CSI) !V�� V��5 DV�� VY�� &4�/7��. �V; ��7�V� D�V� 4�

!�� w�$:.�� 1 ( �B�particle swarm optimization (PSO) �$8 $ S? �� ;

• (1) moves toward its closest best neighbor, • and (2) moves back to the state that the individual has experienced to be best for itself.

\�� G#� �� K� K� &�$� �; ��)� �� .��T#� �� $8� ��+� �a�.� ��)!�� D�$.�� S�$�� !��T#�. 2 ( &4�/7�� 8$� 4�ant colony optimization $.V �� V; !V�� &$�V/� 4� ~�V� H�+.U� ��`.�� .8�

�� $; ��`.�� k�#� . D.8�� >�a�.�D�$� ��#; $�/� !��. 3 ( ��5 ��ant cemetery ��$VT ��'��G &�� .�� ; �# �� D�� $a�� k�#� ��.8�

��$�'Clustering • (�#�# ��PSO 4 !�0�"�_ (�� N#4�_ *+� =��# �DAnt colony ��S�,� �N�S�;�� =�S�# �SD

�B � F�#� m w� ("�� 4 (��. 7-2Particle Swarm Search

PSO!�� .8$' S�'��$5 �.�� &4�)$5 �#�� 4� �; !�� +G &#a./G L.��#�7� K� > . ��V �V�7)� o� <BV ) ��V�'�� !�G $��Z� ) ��? �� S)$B�� 4�)$5 ��#.� �; !�� #� S�'��$5 �.�� !;$U �B�8�$' &4��

��; H�� 4�� ; &� ��[449] . ��PSO 1.\, !�� | �,#+a� . 2. F�}� &�n8 ��; �� !;$U �*E/� e�#G &�� k. 3. �#�} ��"#�: ��|��./ &� �� .8� &�� w#8 K� ��. ��? &�� !;$U �; e�#VG �� K��F� ��

4� �� • �X �6�P v,� (D�I�• *�& � U,� N# (E ��# �&A�# ��(�� "�#� �X.4.�E�J �3�#��#: ��# ("7��# �#�} N# f� � �E�J a��,.

11.1 �8��c� ��7)� \��$ ��xi(0) ~ U(xmin, xmax) !,$� ��$� ) �� !�� $��$�:

• F�} � �P� xij(t) is the position of particle i in dimension j at time step t,

7-4

• F�� Z�� >�#�w c1 and c2 are positive acceleration constants used to scale the contribution of the cognitive and social components respectively

• �,��'� >�#�w r1j(t), r2j(t) ~ U(0, 1) are random values in the range [0, 1], sampled from a uniform distribution. These random values introduce a stochastic element to the algorithm.

7-2 -1�&�� #��D �� (O;� !�� $�4 &� �`7#� &�� !,$� ��$�

1. F�} �&�� vij(t) is the velocity of particle i in dimension j = 1, . . . , nx at time step t, This can be seen as a momentum, which prevents the particle from drastically changing direction and is also referred to as the inertia component.

2. ��X�� (O;� The experiential knowledge of a particle is generally referred to as the cognitive component, which is proportional to the distance of the particle from its own best position (referred to as the particle’s personal best position) found since the first time step, c1r1j(yij �xij The effect of this term is that particles are drawn back to their own best positions, resembling the tendency of individuals to return to situations or places that satisfied them most in the past. Kennedy and Eberhart also referred to the cognitive component as the “nostalgia” of the particle [449].

�0#c2=0�� (�,�0 ��" �� cognition-only PSO ��? �� !�� . 3. �,�+.G� �`7#� The socially exchanged information is referred to as the social

component of the velocity equation.The social component, which quantifies the performance of particle irelative to a group of particles, or neighbors. Conceptually, the social component resembles a group norm or standard that individuals seek to attain. The effect of the social component is that each particle is also drawn towards the best position found by the particle’s neighborhood.

$'�c1=0 �# �#%��social-only PSO ��? �� !�� . Selfless PSO K� s#�social-only PSO �# �+� D�$.�� #� $8 �; !��.

• �� &��G &�$.��5 $.*��; &��+��#�7� ��inertia weight �# �� 8�Y� #a./G �g�� $�] ). �� $B� �%�7 ��t+1 >x(t+1) ��global best ˆy(t) �# �� K��F� <B�� ;11.1(b) ��V �� S�V �!��. >�V�$�� *�7�� ; �� S�B�� F�� D�� .�7�x(t+1) 4�global best ˆy(t) ��hV�� FV�� . Vp DV�� #2

�.8�� i�`�� !�� DB+� #��.

7-5

1. F��$8 �*�7� �� G !��T#�)��*� ��' ( 4� $.��global best ) ��#�� j��G D�$.�� #c�� !�� *�8��# �� ; S? &#� �� &�� &��' �� |.

2. 4� $.�� G !��T#�global best &��`7#� &�� &��' �� #c�� !/�� *�8 �,�V+.G� ) �.��V &#� �� ) �� ; z�, �*�8 !��T#� 4� �� |global best ��; �� S#+��.

3.#� S�$;#� ��;$� $m�. \�� \� &#� �� | �; !�� &� ��#' �� !��Tglobal best )lbest ��V ; �� .�� #G) ��m� �,�� D�� &�$�.

Figure 11.1 Geometrical Illustration of Velocity and Position Updates for a Single Two-Dimensional Particle

7-2 -2(�� 6�� # Q+�� (��global �� # �� L.��#�7� s#� )� ��$�� $a�� ;.

•i��) �7�� Global Best PSO ��gbest <; t �� I��) t ��(�� # �� *� �B� )F�� # !��.

11.2 where • �'9� (D�I� Q��D The personal best position, yi, associated with particle i is the best

position the particle has visited since the first time step.

11.3

7-6

• Q��D F�} �6E The global best position, y(t), at time step t, is defined as

11.4 where ns is the total number of particles in the swarm. The global best position can also be selected from the particles of the current swarm, in which case [359]

11.5 • <B11-2 !;$U $�/�gbest &�V�� )� #a.V/G &�Vn8 ) �� #VG) ��| D��k �.T)(x1,x2) �� !V��

!�� h�� o� �g�� ; �� S� � .

Figure 11.2 Multi-particle gbest PSO Illustration

The optimum is at the origin, indicated by the symbol ‘×’. Figure 11.2(a) illustrates the initial positions of eight particles, with the global best position as indicated. Since the contribution of the cognitive component is zero for each particle at time step t = 0, only the social component has an influence on the position adjustments. Note that the global best position does not change (it is assumed that vi(0) = 0, for all particles). Figure 11.2(b) shows the new positions of all the particles after the first iteration. A new global best position has been found. Figure 11.2(b) now indicates the influence of all the velocity components, with particles moving towards the new global best position.

The gbest PSO is summarized in Algorithm 11.1.

��$�MATLAB L.��#V�7�PSO �V#5 ��psotb �� $VT . �*Vp� �V��$�pso �� S? &�V$.��5 �V; !V��get_psooption ��$T .�$; Ln�� S? �� ) !#� �� *E/� $ &�$� ��4�� j��. j��#� �#5 ��

DeJong [sum ( x(i)^2] >Rastrigrin[sum (x(i)^2 - 10 * cos(2 * pi * x(i)) + 10)]

7-7

�� #G) &4�� &�$�. 7.2.2.1 Algorithm 11.1 gbest PSOCreate and initialize an nx-dimensional swarm; repeat

for each particle i = 1, . . . , ns do//set the personal best position if f(xi) < f(yi) then

yi = xi;end//set the global best position if f(yi) < f(y) then

y = yi;end

endfor each particle i = 1, . . . , ns do

update the velocity using equation (11.2); update the position using equation (11.1);

enduntil stopping condition is true;

•�6�� 7�� Local Best PSO pso ��/+ ��$a� 4� �*��a ring social network ��V,X�� 4� $V�� V�, �� ; �� `.�� !;$U &�$�

�$� �� $�� *�� ,X�� 4� ) ��$B� ��`.�� j��G.

11.6 • (�� Q��D (O;� where yij is the best position, found by the neighborhood of particle

i in dimension j. • ��/+ D�$.�� The local best particle position, yi, i.e. the best position found in the

neighborhood Ni, is defined as

(11.7) with the neighborhood defined as

11.8 for neighborhoods of size nNi . The local best position will also be referred to as the neighborhood best position.

�� ; ��; !T�PSO $V� �B*� !/�� n8 �B��F� ^�� $� ��/+ �h7 �� $��B� 4� �,X�� | �*p�!��? �;�� ^�� .�B�8�$' r�Y#� : <B11-3 �� | $�/�lbest ��V/+ 4� ��| �V; �� S� � ��

��$�' �� $�m�� K��F� &�

7-8

Figure 11.3 Illustration of lbest PSO

To keep the graph readable, only some of the movements are illustrated, and only the aggregate velocity direction is indicated. In neighborhood 1, both particles a and b move towards particle c, which is the best solution within that neighborhood. Considering neighborhood 2, particle d moves towards f, so does e. For the next iteration, e will be the best solution for neighborhood 2. Now d and f move towards e as illustrated in Figure 11.3(b) (only part of the solution space is illustrated). The blocks represent the previous positions. Note that e remains the best solution for neighborhood 2. Also note the general movement towards the minimum. More in-depth analyses of particle trajectories can be found in [136, 851, 863, 870].

•��E��# �7�� $�: 1. [�;�� It is computationally inexpensive, since spatial ordering of particles is not required. For approaches where the distance between particles is used to form neighborhoods, it is necessary to calculate the Euclidean distance between all pairs of particles, which is of complexity. 2. ��n8 !��T#� �� G#� S)�� e#� e�#G �� .�� It helps to promote the spread of information regarding good solutions to all particles, irrespective of their current location in the search space. 3- ��#@+ ��/+ It should also be noted that neighborhoods overlap. A particle takes part as a member of a number of neighborhoods. This interconnection of neighborhoods also facilitates the sharing of information among neighborhoods, and ensures that the swarm converges on a single point, namely the global best particle. The gbest PSO is a special case of the lbest PSO with . • Algorithm 11.2 summarizes the lbest PSO.

7-9

• Algorithm 11.2 lbest PSOCreate and initialize an nx-dimensional swarm; repeat

for each particle i = 1, . . . , ns do//set the personal best position if f(xi) < f(yi) then

yi = xi;end//set the neighborhood best position if f(yi) < f(yi) then

y = yi;end

endfor each particle i = 1, . . . , ns do

update the velocity using equation (11.6); update the position using equation (11.1);

enduntil stopping condition is true;

• �� 03�4gbest *D�U� �� lbest PSO 1. ��l�D ��#�7� �&�� Due to the larger particle interconnectivity of the gbest PSO, it

converges faster than the lbest PSO. 2. V��E However, this faster convergence comes at the cost of less diversity than

the lbest PSO. 3. �6�� c� � � �� !��E � 0 As a consequence of its larger diversity (which

results in larger parts of the search space being covered), the lbest PSO is less susceptible to being trapped in local minima. In general (depending on the problem), neighborhood structures such as the ring topology used in lbest PSO improves performance [452, 670].

7-3�7��: ��X�� #� �#�} Q D ��PSO•��/+�� F�� #@+ �;�� overlapping neighborhoods • ��;pso !�� ./��) S? �,�+.G� �B� ��.�� •�� <c.� X��; �B� �.T) �# �� <�.�� !,$� �� D�$.�� ,X�� .. �V��$�+ �V�� &4�� D��

�� #� �� *�� #+�� *� �� $+ �.�7� �; $.��$�. •�� <c.� X��; �B� �.T)�

For sparsely connected networks with a large amount of clustering in neighborhoods, it can also happen that the search space is not covered sufficiently to obtain the best possible solutions. Each cluster contains individuals in a tight neighborhood covering only a part of the search space. Within these network structures there usually exist a few clusters, with a

7-10

low connectivity between clusters. Consequently information on only a limited part of the search space is shared with a slow flow of information between clusters.

• �#�D �� # �� (+��pso N# ��& ��"G N# �r�D (E ��# F�� B _: • The star social structure, where all particles are interconnected as illustrated in Figure 11.4(a). • The ring social structure, where each particle communicates with its nN immediate neighbors. In the case of nN = 2, a particle communicates with its immediately adjacent neighbors as illustrated in Figure 11.4(b). Each particle attempts to imitate its best neighbor by moving closer to the best solution found within the neighborhood. It is important to note from Figure 11.4(b) that neighborhoods overlap, which facilitates the exchange of information between neighborhoods and, in the end, convergence to a single solution. Since information flows at a slower rate through the social network, convergence is slower, but larger parts of the search space are covered compared to the star structure. This behavior allows the ring structure to provide better performance in terms of the quality of solutions found for multi-modal problems than the star structure. The resulting PSO algorithm is generally referred to as the lbest PSO.

• The wheel social structure, where individuals in a neighborhood are isolated from one another. One particle serves as the focal point, and all information is communicated through the focal particle (refer to Figure 11.4(c)). The wheel social network slows down the propagation of good solutions through the swarm. • The pyramid social structure, which forms a three-dimensional wire-frame as illustrated in Figure 11.4(d).

7-11

• The four clusters social structure, as illustrated in Figure 11.4(e). In this network structure, four clusters (or cliques) are formed with two connections between clusters. Particles within a cluster are connected with five neighbors. • The Von Neumann social structure, where particles are connected in a grid structure as illustrated in Figure 11.4(f). The Von Neumann social network has been shown in a number of empirical studies to outperform other social networks in a large number of problems [452, 670]. &C#7#5#� D�$.�� While many studies have been done using the different topologies, there is no outright best topology for all problems. In general, the fully connected structures perform best for unimodal problems, while the less connected structures perform better on multimodal problems, depending on the degree of particle interconnection [447, 452, 575, 670]. ��/+ F�� Neighborhoods are usually determined on the basis of particle indices. For example, for the lbest PSO with nN = 2, the neighborhood of a particle with index i includes particles i�1, i and i+1. While indices are usually used, Suganthan based neighborhoods on the Euclidean distance between particles [820].

Figure 11.4 Example Social Network Structures

7-12

7-4 ��#��_PSO \V�#� �*EV/� <VU !�*��T �� a�.� �B�7�U ��pso e�#VG &#V� �V� �V��$�+ �V7) �V� �V� S�V � ��

consistent !�� #�� . ��+� �� run !�� .`�� i�`�� . �$B*+, �#�� &�$�PSO �8$�� $�4 &�$.��5� )�� )F8� �*p� L.��#�7� �.

1) inertia weight, 2) velocity clamping, 3)velocity constriction, 4)different ways of determining the personal best and global best (or local best) positions, 5) and different velocity models.

Q D z"l�D ��E �� Q �� #� 5��7;# ��E ��Y 4 !��"#� (y"Gexploration 4explotation (E ��# ��pso �� 0 �� Y �&�� #��D !��" �D.

7-4-1 �4��F��E �&�� Velocity Clamping ��PSO �� )� D�$.�� 4� �.T ) �p#c:� �# �� v�F� ��$� ��| !,$� ��$� �� .G#� D�� ; �# �� z

��$� �4$� �� ) w�� #a./G &�n8 4� ��| .the particles diverge ��#�� # �� )�� !,$� ��$� H#� <B � j8� &�$� �h7:

11.17 where v'ij is calculated using equation (11.2) or (11.6).

• K�$D ��U�Vmax,j facilitate global explorationrisk the possibility of missing a good region.may jump over good solutionscontinue to search in fruitless regions

• ��U� f�E Vmax,j encourage local exploitationmay not explore sufficiently beyond locally good regionsincrease the number of time steps to reach an optimum may become trapped in a local optimum

•�&�� !#�E >�� #�U� • Usually, the Vmax,j values are selected to be a fraction of the domain of each dimension of

the search space. That is,

11.18 where xmax,j and xmin,j are respectively the maximum and minimum values of the domain of xin dimension j, and ∈ (0, 1]. The best valueof should be found for each different problem using empirical techniques such as cross-validation.

7-13

•I��) ��) (��#X�" � %�� $��Z� F�� #a./G !�G ��' H#� S�$; �)��

• Firstly, velocity clamping changes not only the step size, but also the direction in which a particle moves. This effect is illustrated in Figure 11.5 (assuming two-dimensional particles). In this figure, xi(t+1) denotes the position of particle i without using velocity clamping. The position x'i(t+1) is the result of velocity clamping on the second dimension. Note how the search direction and the step size have changed. It may be said that these changes in search direction allow for better exploration. However, it may also cause the optimum not to be found at all.

Figure 11.5 Effects of Velocity Clamping •�&�� #��D �#�D �+ �� !#�E

�� !�� 4� �� *�� l�� L+�F;�� `7#� ��+� !,$� ��' $'� . S�$V; ) ��$�� <B � D�� j8� &�$� &�$� �B��!�� 5 !,$�

• Change the maximum velocity when no improvement in the global best position has been seen over � consecutive iterations [766]:

where � decreases from 1 to 0.01 (the decrease can be linear or exponential using an annealing schedule similar to that given in Section 11.3.2 for the inertia weight). • k�E �&�� !#�E �6 �"�"H�E# Exponentially decay the maximum velocity, using

where � is a positive constant, found by trial and error, or cross-validation methods; nt is the maximum number of time steps (or iterations). Finally, the sensitivity of PSO to the value of (refer to equation (11.18)) can be reduced by constraining velocities using the hyperbolic tangent function, i.e.

7-14

where v'ij(t + 1) is calculated from equation (11.2) or (11.6).

7-4-2Inertia WeightThe inertia weight was introduced by Shi and Eberhart [780] as a mechanism to control the exploration and exploitation abilities of the swarm, and as a mechanism to eliminate the need for velocity clamping [227]. • The inertia weight was successful in addressing the first objective, but could not

completely eliminate the need for velocity clamping. • The inertia weight, w, controls the momentum of the particle by weighing the contribution

of the previous velocity – basically controlling how much memory of the previous flight direction will influence the new velocity. For the gbest PSO, the velocity equation changes from equation (11.2) to

A similar change is made for the lbest PSO.

• ��w �� F/� !�+� L.��#�7� ��.8� �� The value of w is extremely important to ensure convergent behavior, and to optimally tradeoff exploration and exploitation. • For w 1, velocities increase over time, accelerating towards the maximum velocity

(assuming velocity clamping is used), and the swarm diverges. Particles fail to change direction in order to move back towards promising areas. Large values for w facilitate exploration, with increased diversity.

• For w < 1, particles decelerate until their velocities reach zero (depending on the values of the acceleration coefficients). A small w promotes local exploitation. However, too small values eliminate the exploration ability of the swarm. Little momentum is then preserved from the previous time step, which enables quick changes in direction. The smaller w, the more do the cognitive and social components control position updates.

• >�� #�U�w ��D �U,4 �� + ��# �"#� �� #� {��# (6b�� (DAs with the maximum velocity, the optimal value for the inertia weight is problem dependent [781]. Initial implementations of the inertia weight used a static value for the entire search duration, for all particles for each dimension. Later implementations made use of dynamically changing inertia values. These approaches usually start with large inertia values, which decreases over time to smaller values. In doing so, particles are allowed to explore in the initial search steps, while favoring exploitation as time increases. At this time it is crucial to mention the important relationship between the values of w, and the acceleration constants.

• >�� Z�9�"#w Z�9�"# �D {��# �� Dc1 4c2 ��D Van den Bergh and Engelbrecht [863, 870] showed that

7-15

guarantees convergent particle trajectories. If this condition is not satisfied, divergent or cyclic behavior may occur. A similar condition was derived by Trelea [851].

•��E ��D (�� !#� �� ("7��# #� �+ �� 6��;#��.1. • Random adjustments, where a different inertia weight is randomly selected at each

iteration. One approach is to sample from a Gaussian distribution, e.g. w~ N(0.72, �) (11.24)

where � is small enough to ensure that w is not predominantly greater than one. Alternatively, Peng et al. used [673]

w = (c1r1 + c2r2) with no random scaling of the cognitive and social components. 2. • Linear decreasing, where an initially large inertia weight (usually 0.9) is linearly

decreased to a small value (usually 0.4). From Naka et al. [619], Ratnaweera et al. [706], Suganthan [820], Yoshida et al. [941]

where nt is the maximum number of time steps for which the algorithm is executed, w(0) is the initial inertia weight, w(nt) is the final inertia weight, and w(t) is the inertia at time step t. Note that w(0) > w(nt). 3. • Nonlinear decreasing, where an initially large value decreases nonlinearly to a small

value. Nonlinear decreasing methods allow a shorter exploration time than the linear decreasing methods, with more time spent on refining solutions (exploiting). Nonlinear decreasing methods will be more appropriate for smoother search spaces. The following nonlinear methods have been defined:

• – From Peram et al. [675],

. ��w(0)=0.9 )nt !�� L.��#�7� ��$B� L+�F;��

• – From Venter and Sobieszczanski-Sobieski [874, 875],

where � = 0.975, and t' is the time step when the inertia last changed. The inertia is only changed when there is no significant difference in the fitness of the swarm. • Venter and Sobieszczanski-Sobieski measure the variation in particle fitness of a 20%

subset of randomly selected particles. If this variation is too small, the inertia is changed. An initial inertia weight of w(0) = 1.4 is used with a lower bound of w(nt) = 0.35. The initial w(0) = 1.4 ensures that a large area of the search space is covered before the swarm focuses on refining solutions.

7-16

• – Clerc proposes an adaptive inertia weight approach where the amount of change in the inertia value is proportional to the relative improvement of the swarm [134]. The inertia weight is adjusted according to

11.29 where the relative improvement, mi, is estimated as

11.30 with w(nt) � 0.5 and w(0) < 1. Using this approach, which was developed for velocity updates without the cognitive component, each particle has its own inertia weight based on its distance from the local best (or neighborhood best) position. The local best position, yi(t) can just as well be replaced with the global best position y(t). Clerc motivates his approach by considering that the more an individual improves upon his/her neighbors, the more he/she follows his/her own way, and vice versa. Clerc reported that this approach results in fewer iterations [134]. 4. • Fuzzy adaptive inertia, where the inertia weight is dynamically adjusted on the basis of

fuzzy sets and rules. Shi and Eberhart [783] defined a fuzzy system for the inertia adaptation to consist of the following components:

• – Two inputs, one to represent the fitness of the global best position, and the other the current value of the inertia weight.

• – One output to represent the change in inertia weight. • – Three fuzzy sets, namely LOW, MEDIUM and HIGH, respectively implemented as a left

triangle, triangle and right triangle membership function [783]. • – Nine fuzzy rules from which the change in inertia is calculated. An example rule in the

fuzzy system is [229, 783]: if normalized best fitness is LOW,

and current inertia weight value is LOW then the change in weight is MEDIUM

5. • Increasing inertia, where the inertia weight is linearly increased from 0.4 to 0.9 [958]. • �� $U �G�� $��Z� ��$� �� /� �g� $�] ) �g� ��$��simulated annealing ��!

7-4-3Constriction CoefficientClerc developed an approach very similar to the inertia weight to balance the exploration–exploitation trade-off,

�B�� z�$Y �� !,$� �; ¡ �� ;constriction coefficient �� .�� V� e$Y �#��$'[133, 136]. �� !�� $��$� !,$� �g�� a�.� ��

11.31 where

7-17

11.32 • with � = �1 + �2, �1 = c1r1 and �2 = c2r2. Equation (11.32) is used under the constraints that

� 4 and � ∈ [0, 1]. The above equations were derived from a formal eigenvalue analysis of swarm dynamics [136].

• The constriction approach was developed as a natural, dynamic way to ensure convergence to a stable point, without the need for velocity clamping. Under the conditions that � 4 and � ∈ [0, 1], the swarm is guaranteed to converge. The constriction coefficient, �, evaluates to a value in the range [0, 1] which implies that the velocity is reduced at each time step.

• The parameter, �, in equation (11.32) controls the exploration and exploitation abilities of the swarm. For � � 0, fast convergence is obtained with local exploitation. The swarm exhibits an almost hill-climbing behavior. On the other hand, � � 1 results in slow convergence with a high degree of exploration. Usually, � is set to a constant value. However, an initial high degree of exploration with local exploitation in the later search phases can be achieved using an initial value close to one, decreasing it to zero.

��$�� ")� �� ")� D�� The constriction approach is effectively equivalent to the inertia weight approach. Both approaches have the objective of balancing exploration and exploitation, and in doing so of improving convergence time and the quality of solutions found. Low values of w and � result in exploitation with little exploration, while large values result in exploration with difficulties in refining solutions. For a specific �, the equivalent inertia model can be obtained by simply setting w = �, �1 = �c1r1 and �2 = �c2r2. ")� )� oX.�� The differences in the two approaches are that • velocity clamping is not necessary for the constriction model, • the constriction model guarantees convergence under the given constraints, and • any ability to regulate the change in direction of particles must be done via the constants �1

and �2 for the constriction model. • While it is not necessary to use velocity clamping with the constriction model,

Eberhart and Shi showed empirically that if velocity clamping and constriction are used together, faster convergence rates can be obtained [226].

7-4-4Synchronous versus Asynchronous Updates • &��+.��#�7� ��gbest )lbest �V��G !V��T#� �@� ) !�� S��F+ �*�� ) j��G D�$.�� S�#� �� =��

�# �� . •!4�+��G �� ;4� � �P� Q �� N# ��D Q��D��E � %� ��# Q+�� r&# N# f��. .

Carlisle and Dozier reason that asynchronous updates are more important for lbest PSO where immediate feedback will be more beneficial in loosely connected swarms, while synchronous updates are more appropriate for gbest PSO [108].

7-18

• ��4�� D�$�~�� | ��#+, D�$.��)��/+ �� ) <; �� ( �V; �#V �� 5 �7) !�� #Vc� D�$V.��$' e�:.�� /+ j+G 4� �8��c�.

• Selection of the global (or local) best positions is usually done by selecting the absolute best position found by the swarm (or neighborhood). Kennedy proposed to select the best positions randomly from the neighborhood [448]. This is done to break the effect that one, potentially bad, solution drives the swarm. The random selection was specifically used to address the difficulties that the gbest PSO experience on highly multi-modal problems. The performance of the basic PSO is also strongly influenced by whether the best positions (gbest or lbest) are selected from the particle positions of the current iterations, or from the personal best positions of all particles. The difference between the two approaches is that the latter includes a memory component in the sense that the best positions are the best positions found over all iterations. The former approach neglects the temporal experience of the swarm. Selection from the personal best positions is similar to the “hall of fame” concept (refer to Sections 8.5.9 and 15.2.1) used within evolutionary computation.

7-5 �[# * 6�� #��_PSO 7-5-1Swarm size

�#�} ��N �#�� 1 ( �� [#5 �� #a./G &�n8 !��#�B� ) e#� #�� 2 ( �V� ~�V� �� V�� LVaU ��D3 ( �#�� iteration ��D Z#) (D ��# Q+�� E3 ( �S�# (��D#4 (6b�� V" (D �#�} �#��

Q D ��& ��;�?� 410 ��30 �� !�B" #� 4 ( ��#+ �*E/� j�� $'�smooth 4�V�� $.+; ��| �� !��.

7-5-2Neighborhood size Neighborhood size: • The smaller the neighborhoods, the less interaction occurs. While smaller neighborhoods

are slower in convergence, they have more reliable convergence to optimal solutions. Smaller neighborhood sizes are less susceptible to local minima.

• start the search with small neighborhoods and increase the neighborhood size proportionally to the increase in number of iterations [820]. This approach ensures an initial high diversity with faster convergence as the particles move towards a promising search area.

7-5-3�#�} ( ;4# � �P� initialized to uniformly cover the entire search space. It is important to note that the efficiency of the PSO is influenced by the initial diversity of the swarm, i.e. how much of the search space is covered, and how well particles are distributed over the search space.

7-19

H#�$8��7)� �� Assume that an optimum needs to be located within the domain defined by the two vectors, xmin and xmax, which respectively represents the minimum and maximum ranges in each dimension.

where rj ~ U(0, 1).

7-5-4( ;4# �&�� #�} • The initial velocities can be initialized to zero, i.e.

vi(0) = 0 (11.10) • �7) �# e�:.�� 8��c� ��#� �� 7)� !,$� ��; D�� ) KVk#; S? �� e�:.�� #p �� ) �� )$Y

!T� �� $' e�:.�� Large initial velocities will have large initial momentum, and consequently large initial position updates. Such large initial position updates may cause particles to leave the boundaries of the search space, and may cause the swarm to take more iterations before particles settle on a single solution.

7-5-5 � �P� Q��D( ;4# �'9� The personal best position for each particle is initialized to the particle’s position at time step t = 0, i.e.

yi(0) = xi(0) �� #a.V/G &�Vn8 ��| �V; !V�� V ��V �5 ��| !V��T#� �V�7)� &4��V�� &�$� &��.� &� &(��$.��

�� [#5 !��#�B�. pso �� T �� !V�� !V�+� &#��m �*U$� �� !�� #�k ��7)� j�4#� �B�� ; <U �� *E/� . jV�4#� �VB��

��VT ��$V' ��V�5 �V�� g�� @� ) �� .�� #G) �� g�� H#U �g�� &�� ) �� !��#�B� ��7)�� +� S� � �� L.��#�7� . )$�� 4�

7-5-6Acceleration coefficients• static Acceleration coefficients: The constants c1 and c2 are also referred to as trust parameters, where c1 expresses how much confidence a particle has in itself, while c2 expresses how much confidence a particle has in its neighbors.

• �c1 = c2 = 0 ��|)��$�� #G) ��, ��#p �� ( &�Vn8 4$V� �V� �V� �V��; �V� 4�)$V5 �#� �*�8 !,$� ��; ��#�$� #a./G.

• If c1 > 0 and c2 = 0, all particles are independent hill-climbers. Each particle finds the best position in its neighborhood by replacing the current best position if the new position is better. Particles perform a local search.

• On the other hand, if c2 > 0 and c1 = 0, the entire swarm is attracted to a single point, y. The swarm turns into one stochastic hill-climber.

7-20

• �.T) ) !�� | &��B+ �� L.��#�7� ��TD�� ; ��; �� <+, e#� nostalgia (c1) and envy (c2) coexist in a good balance, i.e. c1 � c2. If c1 = c2, particles are attracted towards the average of yi and y [863, 870]. While most applications use c1 = c2, the ratio between these constants is problem dependent. • If c1 >> c2, each particle is much more attracted to its own personal best position, resulting

in excessive wandering. • On the other hand, if c2 >> c1, particles are more strongly attracted to the global best

position, causing particles to rush prematurely towards optima. • For unimodal problems with a smooth search space, a larger social component will be

efficient, while rough multi-modal search spaces may find a larger cognitive component more advantageous.

• Low values for c1 and c2 result in smooth particle trajectories, allowing particles to roam far from good regions to explore before being pulled back towards good regions. High values cause more acceleration, with abrupt movement towards or past good regions.

• Usually, c1 and c2 are static, with their optimized values being found empirically. Wrong initialization of c1 and c2 may result in divergent or cyclic behavior [863, 870].

adaptive Acceleration coefficients Clerc [134] proposed a scheme for adaptive acceleration coefficients, assuming the social velocity model (refer to Section 11.3.5):

11.35 where mi is as defined in equation (11.30). The formulation of equation (11.30) implies that each particle has its own adaptive acceleration as a function of the slope of the search space at the current position of the particle. • Suganthan suggested that both acceleration coefficients be linearly decreased, but

reported no improvement in performance using this scheme [820]. • Ratnaweera et al. proposed that c1 decreases linearly over time, while c2 increases linearly

[706]. This strategy focuses on exploration in the early stages of optimization, while encouraging convergence to a good optimum near the end of the optimization process by attracting particles more towards the neighborhood best (or global best) positions. The values of c1(t) and c2(t) at time step t is calculated as

11.36&37 where c1,max = c2,max = 2.5 and c1,min = c2,min = 0.5.

•��# !N4 (D Z�� >��w �7��D#4: �� *+�7��#.V�� ) �V.8$' ��#Vp ��V�7�g� �#c� D�� !�� [136, 851,863, 870]. . L.��#�7� ��$�+ ��pso �� D�+n� H�� g�� &#/�

• For a specific constriction model and selected � value, t:he value of the constriction coefficient is calculated to ensure convergence. For an unconstricted simplified PSO

7-21

system that includes inertia, the trajectory of a particle converges if the following conditions hold [851, 863, 870, 937]:

11.38 and 0 � w < 1. Since �1 = c1U(0, 1) and �2 = c2U(0, 1), the acceleration coefficients, c1 and c2 serve as upper bounds of �1 and �2. Equation (11.38) can then be rewritten as

11.39 Therefore, if w, c1 and c2 are selected such that the condition in equation (11.39) holds, the system has guaranteed convergence to an equilibrium state. • The heuristics above have been derived for the simplified PSO system with no stochastic

component. It can happen that, for stochastic �1 and �2 and a w that violates the condition stated in equation (11.38), the swarm may still converge. The stochastic trajectory illustrated in Figure 11.6 is an example of such behavior. The particle follows a convergent trajectory for most of the time steps, with an occasional divergent step.

• Van den Bergh and Engelbrecht show in [863, 870] that convergent behavior will be observed under stochastic �1 and �2 if the ratio,

is close to 1.0, where

It is even possible that parameter choices for which �ratio = 0.5, may lead to convergent behavior, since particles spend 50% of their time taking a step along a convergent trajectory.

Figure 11.6 Stochastic Particle Trajectory for w = 0.9 and c1 = c2 = 2.0

7-22

7-5-7Stopping conditions �; !�� $ �$G� �T#� &�$� z�� l$1 ( ) �� !�� e�#G �; L; ��? ��2 ( �V; ��V�4 ��? ��

�� .�� $+�� e�#G �#�� S)�� 4 &��$G� .&�� `.�� $�4 �T#��# ��: 1.#�)# �'9B� �#�� N# ��D �P� .!�� ./��) ��$� ��; �$G� �� e�:.�� .��. 2. ��P4 �P� �?Xf(xi) � |f(x*) �ε| ��D �'9B� �J N# ��E (D : e�#VG �#V �V� }$8 !7�U D��

�� i��c� ��c, �B� "4#�? �� #+� &�$� �; !�� .�� 3. �P�� " *Y�J ��D ��P4. ��#� �� &�#�� 1 ( ��| ��a��G �� S� Kk#; �V� �V�

!,$� \�#.� S� $`p ��, !�� L.��#�7� S� �$�+ �� ;2 ( ��5 �#�� ,)��4�� j�� ( �� <�T 4� &� �4��!�� _: � .

4. • Terminate when the normalized swarm radius is close to zero. When the normalized swarm radius, calculated as [863]

where diameter(S) is the diameter of the initial swarm and the maximum radius, Rmax, is

with

is close to zero, the swarm has little potential for improvement, unless the global best is still moving. In the equations above, || • || is a suitable distance norm, e.g. Euclidean distance. The algorithm is terminated when Rnorm <ε. If ε is too large, the search process may stop prematurely before a good solution is found. Alternatively, if ε is too small, the search may take excessively more iterations for the particles to form a compact swarm, tightly centered around the global best position.

7-6Applications• Particle clustering • .Game Learning

7.6.1.1 Algorithm 11.17 PSO Coevolutionary Game Training Algorithm Create and randomly initialize a swarm of NNs; repeat

Add each personal best position to the competition pool; Add each particle to the competition pool; for each particle (or NN) do

Randomly select a group of opponents from the competition pool; for each opponent do

7-23

Play a game (using game trees to determine next moves) against the opponents, playing as first player; Record if game was won, lost or drawn; Play a game against same opponent, but as the second player; Record if game was won, lost or drawn;

endDetermine a score for each particle; Compute new personal best positions based on scores;

endCompute neighbor best positions; Update particle velocities; Update particle positions;

until stopping condition is true; Return global best particle as game-playing agent; Table 11.1 Applications of Particle Swarm Optimization

7-7Assignments

1. L.��#�7�PSO ��/�#�� 2. �� a��G �� $m#� &� �`7#�PSO �� k. 3. �� !,$� ) !��T#�$��Z� �g�� FG�PSO �� r�Y#� ��. 4. z�� ) ��F�gbest )lbest �� r�Y#� ��

5. Discuss in detail the differences and similarities between PSO and EAs. 6. Discuss how PSO can be used to cluster data. 7. *Why is it better to base the calculation of neighborhoods on the index assigned to

particles and not on geometrical information such as Euclidean distance?8. *Explain how PSO can be used to approximate functions using an n-th order polynomial. 9. *Show how PSO can be used to solve a system of equations. 10. If the basic PSO is used to solve a system of equations, what problem(s) do you

foresee? How can these be addressed?

7-24

11. *How can PSO be used to solve problems with discrete-valued parameters? For the predator-prey PSO, what will be the effect if more than one predator is used? 12. Critically discuss the following strategy applied to a dynamic inertia weight: Start

with an inertia weight of 2.0, and linearly decrease it to 0.5 as a function of the iteration number.

13. The GCPSO was developed to address a specific problem with the standard PSO. What is this problem? If mutation is combined with the PSO, will this problem be addressed?

14. . Consider the following adaptation of the standard gbest PSO algorithm: all particles, except for the gbest particle, use the standard PSO velocity update and position update equations. The new position of the gbest particle is, however, determined by using the LeapFrog algorithm. Comment on this strategy. What advantages do you see, any disadvantages? Will it solve the problem of the standard PSO in the question above?

15. Explain why the basic gbest PSO cannot be used to find niches (multiple solutions), neither in parallel nor sequentially (assuming that the fitness function is not allowed to be augmented). Explain why velocities should be initialized to zero for the NichePSO.

16. * Can it be said that PSO implements a form of (a) competitive coevolution? (b) cooperative coevolution?

Justify your answers. 17. *Discuss the validity of the following statement: “PSO is an EA.”

7-8Advanced topics While the basic PSO has shown some success, formal analysis [136, 851, 863, 870] has shown that the performance of the PSO is sensitive to the values of control parameters. It was also shown that the basic PSO has a serious defect that may cause stagnation [868]. A variety of PSO variations have been developed, mainly to improve the accuracy of solutions, diversity and convergence behavior. • Guaranteed Convergence PSO • Social-Based Particle Swarm Optimization • Fully Informed PSO • Selection-Based PSO • Cooperative Split PSO • Predator-Prey PSO• Multi-Start PSO Algorithms • Repelling Methods • Binary PSO • PSO was originally developed for continuous-valued search spaces. Kennedy

and Eberhart developed the first discrete PSO to operate on binary search spaces [450, 451]. Since real-valued domains can easily be transformed into

7-25

binary-valued domains (using standard binary coding or Gray coding), this binary PSO can also be applied to real valued optimization problems after such transformation (see [450, 451] for applications of the binary PSO to real-valued problems).

• Constraint Handling Approaches • Multi-Objective Optimization • Dynamic Environments

Niching PSO

7-26

•

12-1

12. Ant Algorithms

jG$� : <c817 jG$� e�.; � �k�#�100 ��? �*�8 !��+G ) �� 5 D��4 &)� [�5 H�� S#�*��1016 �# �� 4 D�+:� ��#c� �; �,�+.G� &� �.�� 30 �V��; �� '��4 �� S#�*�� . ��V� �V� ��#V�G &�V��$8? ��+V �� KV� ��V� D�V7)�

Eugéne Marais (1872-1936) � ("�� &��)# �0�"N �$; ��k �$�? f��.� ) ��$� ��. The Soul of the Ant [558] (first published in 1937, after his death), he described in detail his experimental procedures and observations of the workings of termite societies.

• ��#� �@��7$.� �VB�(*� (1862–1949) e�V.;The Life of the White Ant $V� �V; �$V; $V .�� !�� ~�� ^�� . &#/��$8 ��+ ��Pierre-Paul Grass e �� V �V��#� D�V� ��V�� F��B�

�$; �F��#E�. �� L��./� $�] l�� F��B� )�stigmergy �� •��u�5 ��.8� !V�� $V; KV�$�� S? $� D.8�� &�$� �� S�� >�� S� � �#� 4� S��k�#� �'$G �; &� .

��# (�,�0 �#�P ~�D �� !�E �� (E !�7�� ) ��,� 4� ��,: �h] &#a./Gforaging × ��; L�/��division of labour ��4#� !�T�$�brood care

��Y ��4��cemetery organization �� D.��)construction of nest • s#� 4� �,�+.G� ��$ U ��.8�stimulus–response !�� $8 �; �g�� ,X�� ^�� $� (S��_ F�� *�&

��# �,��'� (O;� �#�#� (E �� a�� . <+, �'�� L]$�*, &�$V8 4 i��SD �� S�D ��) (I �"F� y _ !��.

i#8 ��7�g� 4��# �� y�� &�� :� �� ; !�� a�� $�4 &��F*/7��. 1.foraging &4�� )$�/� D�$.��#; &�$� 2. division of labour :��B�� ) ��; L�/�� &4�� &�$� 3. cemetery organization :&�� #� &�$�

12-2

12-1Foraging :(Ant Colony Optimization Meta-Heuristic) � �� Q��E Z�9�"# :k�#� ��#�S�� hV] <V�� ) �� D�� $�/� D�$.��#; e�:.�� KV� �VB�7�U ��

�# �+� �� rY�) ��; H$.�; F;$� ��5 D�7)� 4� > �.8$' ��$T q�� #� �; !�� S��k�#� �� l#�$� &�!�� .

Stigmergy !Z7stigmergy \�#�Grassé [333] &�$� �S� F��O�S�# ��"# J Q D 5 U�� ^ {��# !� D. !VZ7

stigmergy )!�7��8 S� � ( ��#� �+*; )� 4�stigma !�X, �� sign !Z7 )ergon ��V; �V�� V�work !�� .8�� <�B � . V" 4� stigmergy �� F��sematectonic 4 signbased.

4.Sematectonic stigmergy l�� 4� �,#� �� \�V�� B�F�8 ��c: � �� $��Z� q�$� 4� �V��#' . ��? �*+G 4� �� H#�� $��nest building, nest cleaning, and brood sorting. ��./.

Grassé observed that coordination and regulation of nest-building activities are not on an individual level, but achieved by the current nest structure. The actions of individuals are triggered by the current configuration of the nest structure. Similar observations have been made by Marais [558], with respect to the Termes natalensis.

5.Sign-based stigmergy �,#�l�� +�V ��V� �� $' �� 7�� 7�� S� �� ; !��h' �� &�G �� #� $�/� �� k�#� �;. /� D�� S��k�#� $�� D�� ; �� $�.

• The action may reinforce or modify signals to influence the actions of other individuals.

[��4?Bridge Deneubourg et al. [199] �7�g�� & &)� �� .��C�? �k�#� K� Iridomyrmex humilis �� Va��

��; ��5 �$�? ��.8� H�� .�� $T H#� L ��*� $�/�)� �h] <�� ) �� D�� 4? ��$� D�� . �a�.� �; �� S� � [��4? F��O��# �� 4 Z�9�"# �,��'� �?D ��"G N# �+� � �� 4� � !�D �4�� 5^� 6&

�� 0 �� #�P ��eE#. �� <5 �� [��4? D�� &$�� binary bridge �� &� ��V� H�� ; �# �� .�� $�/� e�:.�� a�.�.

Figure 12.1 Binary Bridge Experiment

�; �$; H�� S�#� �� #�� 5 D��

12-3

1.$�/� D�7)� �8��c� e�:.�� :!�� k�#� 4� �7�� .�� $�/� . �� &$�V/� �8��c� ��#c� �k�#� D�7)� &$m� ) ��; �� e�:.��)S#�)$8 (��h' �� T�� #� 4� $�/� &)�.

2.�� ^�� 8��c� e�:.�� : �Vk�#� ) �8��c� ��#c� F�� &�� `.�� #� $. �� ; &$�/� �� !��, ��!�� .8$' ��$T )$. �� S#�)$8 (� �V��F' �V� $V� �� &$�/ . #� ��S 0 5 �S'� QS�#Pasteels et al. [666]

��E a�� ("7��#.

12.1 •nA(t) )nB(t) &�$�/� �; �� k�#� ��A )B S��4 �� t �� $; e�:.��. •c $�/� !��hG �G�� !�� .c !�� e�:.�� S� $� �8��c� �� v�F� . •α $5 $�/� �� <��+�pheromone !�� .α V5 $�/� e�:.�� H�+.U� $.'�F�$ pheromone e�V:.�� ) ~�V� ��

�� [�; �� 8��c� . ��#c� �#�&# �D�I�� 2 4c � 20 ��+6�& �� S� � �#� 4� �� #��.

• �k�#� \�#� $�/� e�:.�� ^�� D�� $�U(0, 1) � PA(t+1) �# ��..

a� �D *_ k��NG�4��" Goss et al. [330] extended the binary bridge experiment, where one of the branches of the bridge was longer than the other, as illustrated in Figure 12.2.

• • Figure 12.2 Shortest Path Selection by Forager Ants

•��./ � �k�#� <B &)� &� �g�� • Initially, paths are chosen randomly with approximately the same number of ants following

both paths (as illustrated in Figure 12.2(a)). • �hV] <V�� ) �� D�� $.��$� � �k�#� $�/� S? �� $�4 ��$' �� e�:.�� $. �� $.��#; $�/� S��4 !h' ��

��; �� h'. • Length ration: Goss et al. [330] found that the probability of selecting the shorter path

increases with the length ratio between the two paths. This has been referred to as the differential path length effect by Dorigo et al. [210, 212].

12-4

•$.��#; D�� H�� D�� !�J� K��8 4� $m�.� $�/� D�)$. �� `.�� ( �*Vp�8 <Y�`� )differential path !�� .[210, 212].

• H�� D�� k�#� ��; �� U �� S#�)$8stimulus �V� |�V:�� zV�� V.8� S? �7��k �� ^�� $� ) ��;response .

5��7;# =��# Q�# �D �� ("7��# (�� 0 5 �'��.

Algorithm 12.1 Artificial Ant Decision Process Let r ~U(0, 1); for each potential path A do

Calculate PA using, e.g., equation (12.1); if r � PA then

Follow path A;Break;

end end

• ��`.�� !�� DB+� F�� !��#�B� H�+.U� j�� Fa� $�� H�+.U� j��#� 4��#.

&4�� Foraging • 4� �; ��+.��#�7� �,#+a� ��foraging behavior �V�$a� i#V8 &��+.��#�7� S�#�, �� .8$' ��meta-

heuristic S��k�#� �'$G &4�� ant colony optimization (ACO-MH) �#V �V� ��$� �� . DV��4� ��, ��+.��#�7� :Basic Ant algorithm the ant system (AS)

•ant colony system (ACS) max-min ant system (MMAS) Ant-Q

• fast ant system Antabu AS-rank ANTS

12-1-1Simple Ant Colony Optimization SACO �V�? �V� e�/U �� S��k�#� L.��#�7� D�7)� L.��#V�7� �#V��artificial stigmergy, \V�#� �V; !V��

Dorigo and Di Caro !�� ? $�4 �7�� • “indirect communication mediated by numeric modifications of environmental states which

are only locally accessible by the communicating agents” •!�� $T D�� 4� L.��#�7� �*; ��.��:

1.#a./G o�$' b$� 2.� j�4#� &(��$.�� ) �� o�$' &)� � �k�#)��E/� �� G#� ��( 3.��B��7 �� 7)� S#�$8 S�� 4.K��7 e�:.�� ) � �k�#� !;$U > $/�� e�:.�� j��transitional probability

12-5

5.��4�� j�� 6.��4�� j�� ^�� $� $�/� &��B��7 &)� S#�$8 &��h'�G 7. �� D.8�4 ��*+, �T#� ��

�FG� ^�� D�� $��$�� 4� L.��#�7�:

12-1-2 o�$'#a./G7�� <Bo�$' 4� � #a./G� �� Q�� F��E �#�D !�� . o�$'&�� nG !�� $'.

Figure 12.3 Graph for Shortest Path Problems

o�$' &4�� *E/� <U &�$� &$�� e�#G!�� <B q��g� . �; <�� o�$'2n !�� $'G = (V,E) . �V� !�� $ ��$�2 �� &�$� �$' K� ) $`p �� &�$� �$' K� �# �� c.�� $'1. $�/� z��$� D��

&� �$' D�� 4� �B� ��#� �� \�82 �# <�� .

12-1-3��7)� \��$•�# �� .8$' $%� �� +; �� $`p ��B��7 ��7)� S#�)$8. • &4�� $�/� D�$.��#; ) : �k�#� &��k = 1, . . . , nk, �� #��) |#�0 ��# �� . • �*E/� ��path that connects all nodes, once only � �k�#� �V� jV�4#� �V �$V' �� 8��c� ��#c�

��#. • A pheromone concentration, �ij, is associated with each edge, (i, j), of the graph. τ is

stigmergic variables which encapsulate the information used by artificial ants to communicate indirectly.

12-6

12-1-4 H�+.U�K��7 K� e�:.�� transitional probability �7�+.U� ��#c� �k�#� �� &�� K��7��; �� e�:.�� .��)�`.� &��7#�$8 ��+.��#�7� �� =��.

Ant system (AS)

12(6) •k ��+ >�k�#�i �*�8 !��T#�j �,#+a� �� q*�.�4�a� &�� $' >� !�� S#�)$8 �� •��$' �� ohU �c�� S�� ) ��' $ S��5 �� ;�# ��a�� *U !�� DB+� L.��#�7� �� . •α z�$Y : ��$.'�F� α �V; �V� �V� $��Z� S#�)$8 [�� $. �� G#� �� 8��c�4� �� k�#� e�:.�� V�

�� [��F8� �� $�+ !,$� ��#� •�ij o$��a priori effectiveness !�� K��7 K� <�T 4� �;�� KV��7 H#V� ^#B�� #� �� ) !��

��£ij =1/dij ��Simple ant colony optimization (SACO) ��=1 !�� • �� = 0 �# �+� �G#� S#�)$8 ��stochastic greedy search �� )� = 0 �V+� ��$T �G#� ��#� K��7 �'(�)

�$�'.. \�#� �g�� D�� 4� &$�� s#�Maniezzo and Colorni � �8$�� $�4 ��#c� .[557]

12.8 �g�� D�� α [�� ) �� a�� *�T [�� S�+ �)�`.� ��#c�β ��1-α �� a��

Ant colony system (ACS)

12-17 K��7 e�:.�� 7�+.U� H#�$8ij !�� . r ~ U(0, 1) )r0 ∈ [0, 1] ) J ∈ Ni

k (t) �$V' D�V� 4� �V; !�� &� �$'��$' �� e�:.�� #�� 4�a� &�.

12.18 • The parameter r0 is used to balance exploration and exploitation: • if r � r0, the algorithm exploits by favoring the best edge;

12-7

• if r > r0, the algorithm explores. Therefore, the smaller the value of r0, the less best links are exploited, while exploration is emphasized more.

5��7;# ��"�� ]0 iD�� ,� �;�J Q�# ��AS ��# . 5��7;# Q�# �� (E � �E �P� ��wα=1 (S�,�0 �S�" �� # F�� (12.18).

12-1-5��' �4�� )S#�)$8 �� S� �8�Y�( ��+.��#�7� �� Vk�#� ��V+� &�$V� �Vc�� V� ��V�� 4� $�/� D�� ) <��; ��' K� S��5 4� �� V�*U ohVU ) �V &��V�� >�V

pheromone �# �� 4�� j�� ^�� $�.

AS !�� 5 �,#�.� &��7#�$8.

• Ant-cycle AS:

12.11 pheromone deposits are inversely proportional to the quality, f(xk(t)), of the complete path constructed by the ant. Q is a positive constant.

&4�� L+�F;�� &�$�> !�� #�� g��

12.12 ��SACO 4�Q=1 �# �� `.�� . �*E/� �� ) $�/� D�$.��#;

12.3 !�� ;Lk <B �� S? �� ; !�� $�/� H#�12.3 $��$�2 !�� .

• Ant-density AS:

12.13 � �k�#� $�/� �� e�:.�� !�� $��$� K��7 $ S#�$8 ��

• Ant-quantity AS:

12.14 �*�� 4�� 4� H#�$8 D�� dij �# �+� ��`.�� *; ��4�� j�� 4� ) ��`.��.

12-8

12-1-6 s�#��!��5? ��+��#�7� ��AS

�.8�� #�� L.��#�7� �� 9�� # �� 8�Y� \��)� �� . �V� L.��#V�7� �V; �V� �V� S�V � L.��#V�7� &4�� S��D F��SE ��D #� ��N �� (+"G !4�D �� #�7� �&�� . 4� FV�$5 &�$V�premature

convergence ��$' �� 8�Y� L.��#�7� �� S#�)$8 $�:�� . �ij(t + 1) = (1-ρ)�ij(t)+��ij(t) 12.9

12.10 �;� ∈ [0, 1] !�� $�:�� !,$� )nk !�� G�#� �� . �� = 1 )$�:�� S)�� ( �8��Vc� X��; #a./G �V; ��SACO �# �� `.��.

ACS �� #G) !��5? s#� )� L.��#�7� D�� .��# �� .#� $�4 ��#c� �; �*�� ) j��G !��5?

�6�� _G : �ij(t) = (1 � �2)�ij(t) + �2�0 12.21

�; �2∈ (0, 1) ) �0 !�� Kk#; !�J� �� . S? &�$� �; i��) ��_G t��global best � �� Q��D �4� tU,

�ij(t + 1) = (1 � �1)�ij(t) + �1��ij(t) (12.19)

12.20 with f(x+)(t) = |x+(t)|, in the case of finding shortest paths.

• Gambardella and Dorigo [215, 301] e�:.�� &�$� ")� )�x+(t) ��$; �� 5 • iteration-best, where x+ (t) represents the best path found during the current iteration • global-best, where x+ (t) represents the best path found till now. • Pheromone evaporation

o for small values of �1, the existing pheromone concentrations on links evaporate slowly, while the influence of the best route is dampened.

o On the other hand, for large values of �1, previous pheromone deposits evaporate rapidly, but the influence of the best path is emphasized. The effect of large �1 is that previous experience is neglected in favor of more recent experiences. Exploration is emphasized.

o While the value of �1 is usually fixed, a strategy where �1 is adjusted dynamically from large to small values will favor exploration in the initial iterations of the search, while focusing on exploiting the best found paths in the later iterations.

12-9

An-tabu In addition to using tabu search as local search procedure, the global update rule is changed such that each ant’s pheromone deposit on each link of its constructed path is proportional to the quality of the path. Each ant, k, updates pheromones using

(12.35) where f(x (t)) is the cost of the worst path found so far, f(ˆx(t)) is the cost of the best path found so far, and f(xk(t)) is the cost of the path found by ant k. Equation (12.35) is applied for each ant k for each link (i, j) ∈ xk (t).

12-1-7L.��#�7� ��+�� Any of a number of termination criteria can be used in Algorithm 12.2 for example, • terminate when a maximum number of iterations, nt, has been exceeded; • terminate when an acceptable solution has been found, with f(xk (t)) � ε; • terminate when all ants (or most of the ants) follow the same path.

12-2��7;# V#"#L.��#�7� Ant System (AS)

�.8�� #�� L.��#�7� D��SACO \�#� �;Dorigo � �� 5. S? �� )η )�*�T ��,X�� KV��7 ��V�� ( �V��# �� 8�Y� #a./G .

Algorithm 12.3 Ant System Algorithm t = 0; Initialize all parameters, i.e. �, �, �, Q, nk, �0; Place all ants, k = 1, . . . , nk; for each link (i, j) do

�ij(t) ~ U(0, �0); end repeat

for each ant k = 1, . . . , nk do xk (t) = �; repeat

From current node i, select next node j with probability as defined in equation (12.6); xk(t) = xk(t) ∪ {(i, j)};

until full path has been constructed; Compute f(xk(t));

end 12-10

for each link (i, j) do Apply evaporation using equation ; Calculate ��ij(t) using equation 12.10; Update pheromone using equation (12.9);

end for each link (i, j) do

�ij(t + 1) = �ij(t); end t = t + 1;

until stopping condition is true; Return xk(t) : f(xk (t)) = mink'=1,...,nk{f(xk'

(t))}; • ��$�ACO TSP S��4 ��MATLAB �*E/�TSP ��; �� $G� �� . &�� $� D��HELP )3 �*Vp� <��8

init >&�$.��5 S�$; ��) &�$�aco ) L.��#�7� &�$G� ��$�getcities �*EV/� &�$�V ��Vc: � H)�VG!�� .�$�' ��$T �+�� S�#�� $�� <=�/� <U &�$� ��#� �� ; D��.

Ant Colony System The ant colony system (ACS) #�� \Gambardella ) Dorigo �� $B*+, �#�� #%AS [77, 215,

301] �� ) � �=��4 �� &�)?#� !+/T (1) a different transition rule is used, (2) a different pheromone update rule is defined, (3) local pheromone updates are introduced, and (4) candidate lists are used to favor specific nodes.

Algorithm 12.4 Ant Colony System Algorithm t = 0; Initialize parameters �, �1, �2, r0, �0, nk; Place all ants, k = 1, . . . , nk; for each link (i, j) do

�ij(t) ~ U(0, �0); end ˆx(t) = �; f(ˆx(t)) = 0; repeat

for each ant k = 1, . . . , nk do xk(t) = �; repeat

if ∃j ∈ candidate list then Choose j∈ Ni

k (t) from candidate list using equations (12.17) and (12.18);

end

12-11

else Choose non-candidate j∈ Ni

k (t); end xk(t) = xk(t) ∪ {(i, j)}; Apply local update using equation (12.21);

until full path has been constructed; Compute f(xk (t)); end x = xk (t) : f(xk (t)) = mink'=1,...,nk{f(xk'

(t))}; Compute f(x); if f(x) < f(xˆ (t)) then

xˆ (t) = x; f(xˆ (t)) = f(x);

end for each link (i, j) ∈ ˆx(t) do

Apply global update using equation (12.19); end for each link (i, j) do

�ij(t + 1) = �ij(t); end xˆ (t + 1) = ˆx(t); f(xˆ(t + 1)) = f(ˆx(t)); t = t + 1;

until stopping condition is true; Return ˆx(t) as the solution;

Max-Min Ant System •St utzle and Hoos 4� &$�'#V*G &�$V� �V*�� <T��VU �V*� L.��#V�7�AS ")�max-min ant system

(MMAS) ��$; �=�� . ")� D��4 �� )�`�AS ��. ( ;4# !�4�,: !�� #� L+�F;�� 7)� S#�)$8 ��max(0) =�max

c�0 FN#�"# !#�E : �# �� .8$' $%� �� S�$; L.��#�7� ��' &�$�. S�$; &�$� ��8) H#�$8�Stützle and Hoos

[816] �� !�� $��$� S#�)$8 �U e�#G �� S�� ; ��)? !��

12.25 where f* is the cost (e.g. length) of the theoretical optimum solution. Since the optimal solution is generally not known, and therefore f* is an unknown value, MMAS initializes to an estimate of f*, by setting f* = f(ˆx(t)), where f(ˆx(t)) is the cost of the global-best path.

12.26

12-12

12.27 ��ij(t) ∝ (�max(t) � �ij(t)) (12.28)

a��, ��_G: �� S�#� H#�$812-19 ��ACS $Vm� zV�;$� �V;global best �V�iteration best S? ��# �� x��7. g-best ) �� ; �� #� &#� �� L.��#�7� e�#Gexploration ��; �� )�� . <�� i-

best �4�G�exploration $ �� $�4 �� $. ��iteration ��; $��Z��#� �� 5��7;# !�� *OP: �� # <`T L.��#�7� �.T)�ij �� # �� L+�F;�� .

Algorithm 12.5 MMAS Algorithm with Periodic Use of the Global-Best Path Initialize parameters �, �, �, nk, ˆp, �min, �max, f ; t = 0, �max(0) = �max, �min(0) = �min; Place all ants, k = 1, . . . , nk; �ij(t) = �max(0), for all links (i, j); x+(t) = �, f(x+(t)) = 0; repeat

if stagnation point then for each link (i, j) do Calculate ��ij(t) using equation (12.28); �ij(t + 1) = �ij(t)+��ij(t); end

end for each ant k = 1, . . . , nk do

xk (t) = �; repeat

Select next node j with probability defined in equation (12.6); xk(t) = xk (t) ∪ {(i, j)};

until full path has been constructed; Compute f(xk (t));

end (t mod f = 0) ? (Iteration Best = false) : (Iteration Best = true); if Iteration Best = true then

Find iteration-best: x+(t) = xk (t) : f(xk (t)) = mink'=1,...,nk{f(xk' (t))};

Compute f(x+(t)); end else

Find global-best: x = xk (t) : f(xk (t)) = mink''=1,...,nk{f(xk' (t))};

Compute f(x); if f(x) < f(x+(t)) then

12-13

x+(t) = x; f(x+(t)) = f(x);

end end for each link (i, j) ∈ x+(t) do

Apply global update using equation (12.19); end Constrict �ij(t) to be in [�min(t), �max(t)] for all (i, j); x+(t + 1) = x+(t); f(x+(t + 1)) = f(x+(t)); t = t + 1; Update �max(t) using equation (12.26); Update �min(t) using equation (12.27);

until stopping condition is true; Return x+(t) as the solution;

AS-rank Bullnheimer et al. [94] proposed a modification of AS to: 1. allow only the best ant to update pheromone concentrations on the links of the global-

best path, 2. to use elitist ants, and 3. to let ants update pheromone on the basis of a ranking of the ants.

12-2-1Parameter Settings &�$.��5 H)�GACO ��; �� 8$�� . Table 12.1 General ACO Algorithm Parameters

1.nk, the number of ants: • ��4 �� LaU [��F8� ��a�� +� $.�� $B*+, �� F7� ) �� H�� • !�*��T L; ��exploration ��)? �� D��5 �� L.��#�7�. • �� &�$� ��$�� H#�$8 �k�#� �!�� 5 ��#��.

(12.42) where �1�0 is the average pheromone concentration on the edges of the last best path before the global update, and �2�0 after the global update. Unfortunately, the optimal values of �1 and

12-14

�2 are not known. Again for the TSP, empirical analyses showed that ACS worked best when (�1�1)/(�2�1) � 0.4, which gives nk = 10 [215].

�� !��#+, ) �� w�$:.�� H�J� &�$� H#�$8 D��. • s#� �� .��U �$.��5 z�� e�:.��!�� ./��) �*E/�.

12-3Cemetery Organization and Brood Care•��Y &4�� z�$� •��4#� �~ !�T�$�

• �� =�� ,#�c� &��+.��#�7� While these behaviors are still not fully understood, a number of studies have resulted in mathematical models to simulate the clustering and sorting behaviors. Based on these simulations, algorithms have been implemented to cluster data, to draw graphs, and to develop robot swarms with the ability to sort objects.

12-3-1Basic Ant Colony Clustering Model�g� ^�� $� ��#� �� L.��#�7� D�7)� ��7Chrétien �k�#� &)�Lasius niger !��[127] . ^�V�� $V�

�#� �� ~ D.�� H�+.U� �; !8�� &) ��$a� �� n �� !�� $��$� ��1 � (1 � p)n &�$� n � 30 �; S? ��p ��5� !�� L�%�� $.[78] . &�VG �V� >�V�; D�� #� �4�� T� ��#� �+� �k�#� �; �a�? 4� �4��V��

�� $T $%� �� #� �� ~ S��$T &�$� �4~ ��; >�#� )��; �.:� .( D�� $� �� ~ �k�#� ^�� H��.�� $T �~ s#� S�+ �; ~�� 7��k �� U�#� �� ) �.�� $� L; �7��k �� U�#� 4� . !V��$� D��

��Deneubourg et al. [200] �=��# �� .`' F�� 5 L.��#�7� S? �� ; �

��5 L.��#�7� :�� /G� j�+a� 1. ��; }$8 S? &� �� n�� ; L�� 8�$'L/G ��$T �� 8��c� ��#c� �k�#� &�� )

�$' 4� �n�� &�G � 2.�� S)�� k�#� �� $� �� L/G $�4 �7�+.U� j�� ^�� $� )�� #G#� �+/G �$' �� $'�(

(12.43) �g�� D�� 1 > 0 ) = n /T, �� fraction of items �# �� #� .n ��/G� �� !�� V; �Vk�#�

��T !�� $; ��#�$� ��? �� ')�� #G) L/G � �� + �� ( • << �1 Pp approaches 1 • >> �1 Pp approaches 0

3.��h' �� D��4 �$' K� �� #� �� $�4 �7�+.U� j�� <��U �k�#�.

(12.44)

12-15

provided that the corresponding cell is empty; �2 > 0. If a large number of items is observed in the neighborhood, i.e. >> �2, then Pd approaches 1, and the probability of dropping the item is high. If << �2, then Pd approaches 0.

• L.��#�7� �S� �S� #� �4�O�� c��)# �)4 FN�)# (E 5��7;# (D !#� �� 0�� (D #� (��_ �� L�V+�� L/G <J�A )B . ��#c�� (12.43) )(12.44) $.��5� ��A �� B S� �V� �V; �+V/G s#V� �V� �.V/�

��#�� !�� .�� G�#� �# ��.

12-3-2Generalized Ant Colony Clustering ModelLumer and Faieta [540] &�V�� .V�� V��U ��,� &� ��$� S�#.� S? �� L�+�� 5 L.��#�7�

�$;clustering . 1. $ ) <�B � &� �B�data vector ��; �� H�Z� �$�? 4� &� �� 8��c� ��#c� . &�V �V�� V��

�� 4� $. �� data vector �� . � �+f� tU, ("�X � �� data vector �� $T. 2.��; �� !;$U �8��c� ��#c� ) �� $T � �� 8��c� ��#c� � �k�#�. 3.�� : d(ya, yb) !�� !�� $� )� D�� . ) �V��*T� <VJ� �*Vp�8 &�V�� s�#V��

�� $T ��`.�� #� S�#� �� $�]. d=0 !�� <��; !�� . �V; !V�� S? o�V ��V � &�V��$� $. �� !�� DB+� �k $ � �.�� D�� *p�8 ) K��F� $��B� �� $. �� k $ �.�� K� <�� .

4. The “local” density, (ya), of data vector ya within the ant’s neighborhood is then given as

(12.45)

S? �� ;ya �� $�i )�k�#� ��( , yb ��/+ &� �� $� S? �� ; !��nN-patch ��#' ��!�� . ��:.�� $� D�� !�� k �; �� S� � �� D��ya �k�#�k �#G#� &��$� $�� )

�� /+ ��i �� #G) . ��$� !��T#� �� S� � �� D�� S�#� v�F�ya �V�� V; !�� #� �#� ��$�� ) �� .�� $� �� #c�� $�] �� +� �T�� #� <�� # <�.��.

where � > 0 defines the scale of dissimilarity between items ya and yb. •γ �� [��F8� �� $8 !,$� ) ��; �� 7#� L; &�$.�X; v�F� �*��. •γ �# �� $.�X; &��4 �� 7#� �� $a�� Kk#; �*��.

5. Using the measure of similarity, (ya), the picking up and dropping probabilities are defined as [540]

12-16

(12.47)

Algorithm 12.7 Lumer–Faieta Ant Colony Clustering Algorithm Place each data vector ya randomly on a grid; Place nk ants on randomly selected sites; Initialize values of �1, �2, � and nt;

L.��#�7� &�$.��5 • nt is the maximum number of iterations.

• The grid size: � �$' �� $. ��$� K� \�8 �$' $ �� $�4 �� $� �� 4� �$V�' �� $T. ��V��; �� 7#� Kk#; ��4 &� �#� S?��4 �� ) ��h��+� �T�� k�#� &�$� !;$U &�G �$' L;.

•• The number of ants �� $� �� 4� $.+; ��

Modifications to Lumer--Faieta Algorithm L.��#�7� D�� &�$.�X; ��$��Z� <B � <U &�$� �; ��; �� 7#� ��4!�� 5 S? &�$�

1.Different Moving Speeds : !V/�� $V��$� ��V ��) $�4 H#�$8 �� ; � �k�#� !,$� . &�V �Vk�#��./ q�T� ��/� ��; &� �k�#� ) !�� $.+; !T� j�$� . !,$� �� !��.

12-17

: (12.48) where v ~ U(1, vmax), and vmax is the maximum moving speed. 2. Short-Term Memory: Each ant may have a short-term memory, which allows the ant to remember a limited number of previously carried items and the position where these items have been dropped [540, 601]. 3. Behavioral Switches: Ants are not allowed to start destroying clusters if they have not performed an action for a number of time steps. This strategy allows the algorithm to escape from local optima. 4. Pick-up and Dropping Probabilities: Pick-up and dropping probabilities different from equation (12.46) and (12.47) have been formulated specifically in attempts to speed up the clustering process:

• – Yang and Kamel [931] use a continuous, bounded, monotic increasing function to calculate the pick-up probability as

Pp = 1� fs( (ya)) (12.52) and the dropping probability as

Pd = fs( (ya)) (12.53) where fs is the sigmoid function.

12-4Division of Labor ��; L�/�� $ U �'$G ��$�' �� #p �n,� &��.�*��T ^�� $�. �� ;:

reproduction, caring for the young, foraging, cemetery organization, waste disposal. • ��4�� ) ��; _�c:� v�F� &� �,#+a� �� p#c�!�� &F;$� H$.�; S)�� . •��; ��a�� &�+��> q�$� 4� �� +� ��#� �� *�� L=X,pattern ��$�' �� <B �; !�� . • ��Va�� 4�V�� V; �G�� *�� \��$ j�� ) !�� K�� 7) �# �� a�� cc:� ��; �k $'� �V�

��; �� F�� $��Z� ��;. Task switching •�# �� a�� B*� �; #n, K� \�#� ~#+�� <J� ��7#� . •• Temporal polyethism, �� a�� &� ��; D� L &� �k�#� •• Worker polymorphism,�� a�� &��; F�� L &� �k�#�. •• Individual variability: �V*�� \��$ j��) !/�� q*g� ��'(�) $�� ) D� >F�� ^�� $� ��; L�/��

!�� G�� ). !�� ,#+a� �� $T !�+� ^�� u�?

12-4-1Task Allocation Based on Response Thresholds

Single Task Allocation

12-18

1. �*$��k�#� k �$�� &� ��; ��aj �� response threshold �#V �V� ��V��#� ¤kj . �Vk�#� k ��) !�7��8j �� .T) �# �� !�7��8 j (sj) �*$� 4� [��k ��$'.

2.sj ��; ��a�� &�$� K�$�� j !��. o�� [��F8� <g�� ; &�$� ��. o�# �� [�; S? �� #G) S? ��a�� &�$� ��4 ��$8� �� ; &��;.

3. !�7��8 ��a�� H�+.U�j �k�#� \�#�k �� !�� $��$�:

(12.60) where � > 1 determines the steepness of the threshold. Usually, � = 2. For sj << �kj, P�kj (sj) is close to zero, and the probability of performing task j is very small. For sj >> �kj, the probability of performing task j is close to one. • An alternative threshold response function is [77]

(12.61) �� .�� #G) ��; K� \�8 ��; }$8(j=1) \�8 )2 >�V�� #VG#� L�V��k = 0 ) ��VB�� &�Vn,��k = 1

H#Z � &�n,� . ��B�� #n, K� S� H#Z � H�+.U� �a�.� ��

(12.62) #n, K� S�� ; ��; 4� !�� H�+.U� ) !��

P(�k = 1 � �k = 0) = p (12.63) !�� . #n, $ z��$� D�� 1/p �V.8$' ��V; �V� ��V�)� ��#8 �B�? $�� ; �� ; 4� !�� @� ) H#Z � S��4�#.

• K�$�� j�� Stimulus intensity changes over time due to increase in demand and task performance. If � is the increase in demand, � is the decrease associated with one ant performing the task, and nact is the number of active ants, then

s(t + 1) = s(t) + � � �nact (12.64) The more ants engaged in the task, the smaller the intensity, s, becomes, and consequently, the smaller the probability that an inactive ant will take up the task. On the other hand, if all ants are inactive, or if there are not enough ants busy with the task (i.e. � > �nact), the probability increases that inactive ants will participate in the task.

Allocation of Multiple Tasks ��; }$8nj �� ) �� ;nkj L�� S�$'��; 4�k ��; H#Z �j ��.

Each individual has a vector of thresholds, �k, where each �kj is the threshold allocated to the stimulus of task j. After 1/p time units of performing task j, the ant stops with this task, and

12-19

selects another task on the basis of the probability defined in equation (12.60). It may be the case that the same task is again selected.

12-5 ��D��E H)�G12.2 �� =�� ")� D�� &��$��; 4� $�� n��.

Table 12.2 Ant Algorithm Applications

12-6Assignments1.��/�#�� S��k�#� L.��#�7� &��'. 2.��/�#�� !��5? ) ��' �4�� >$�/� H�+.U� j��#�) . �Ù �� &4��!/��( 3.�# �� x��7 ��k S��k�#� �� &�$� z�� . 4.��# �� D�� #�k &�� .�� *E/� �� k�#� �� ) �$' ��.

5. *Consider the following situation: ant A1 follows the shortest of two paths to the food source, while ant A2 follows the longer path. After A2 reached the food source, which path back to the nest has a higher probability of being selected by A2? Justify your answer. 6. *Discuss the importance of the forgetting factor in the pheromone trail depositing equation. 7. * Discuss the effects of the � and � parameters in the transition rule of equation (12.6). 8. Comment on the following strategy: Let the amount of pheromone deposited be a function of the best route. That is, the ant with the best route, deposits more pheromone. Propose a pheromone update rule. 9. *For the ant clustering algorithm, explain why (a) the 2D-grid should have more sites than number of ants; (b) there should be more sites than data vectors. 10. Devise a dynamic forgetting factor for pheromone evaporation.

12-7 Advanced TopicsContinuous Ant Colony Optimization

12-20

ACO � b$g� �.//' &4�� <=�/� <U &�$� �� . KV� &�Vc,� 4� &� �7�� \�8 e�#G <=�/� D�� !�� )�� ,#+a� . �V� �.�#�5 �� <�� &�$� �)� �.�#�5 $�Z.� �� <=�/� <U �� ")� D�� "$./' &�$�

�$; ��5 �� &$�� $� . � �� ")� K�<�� &��$� �� .�#�5 ��n !�� .//' �� . <��V �*EV/� <VU o�$' ��#p D�� 2n

!�� $'G = (V,E) . ��$� !�� $ ��2 &�$V� �$V' KV� ) $`Vp �� &�$� �$' K� �# �� c.�� $' ��1 . &� �$' D�� 4� �B� ��#� �� \�8 $�/� z��$� D�� 2 �# <�� .

Algorithm 12.8 Continuous Ant Colony Optimization Algorithm Create nr regions; �i(0) = 1, i = 1, . . . , nr; repeat

Evaluate fitness, f(xi), of each region; Sort regions in descending order of fitness; Send 90% of ng global ants for crossover and mutation; Send 10% of ng global ants for trail diffusion; Update pheromone and age of ng weak regions; Send nl ants to probabilistically chosen good regions; for each local ant do

if region with improved fitness is found then Move ant to better region; Update pheromone;

end else

Increase age of region Choose new random direction;

end Evaporate all pheromone; end

until stopping condition is true; Return region xi with best fitness as solution;

Multi-Objective Optimization Dynamic Environments

13-1

13.Artificial Immune Models

i)�� : *',18 i)�� Z��E ��#� �D �� #� *', Q�# >;�?� {�� ,��. �$; �� 8) ��+�� L./�� ) !�#5 �� S�#� �� ? D�� 4� �; �� &��4 �,�8� &��+./�� S�� . �V� �V�+�� L./�� )�

�.�� Innate immune system )Adaptive immune system ��# �� L�/�� Innate recognizes around 1000 infections and don’t require previous exposure.

• L./�� 8) ��+�� # �� S(�.�? �; �G�� &��7#*� ��; �� V� !�m �� ? ��c: � ) �� ? ��#.� $�� $� ��; ��5 .

�V*+G 4� &��V�4 �B�C#V7 #V�� &��7�V�classical view s clonal selection theory snetwork theory )danger theory �� r�Y#� �� 5 D�� ; �� #G).

13-1f �AE a�� D 5��7;# 13-1-1 H��K��X; ��

<�� S�� +�� L./�� >K��X; �� 4�3 !�� <��, :1 ( LG�� H#*� �� S(�.�?2 ( j8��V� H#*� �� &�� .�? )3 ( &��.��#`+7B )T &�#� D�� &��7#*� �;self �G�� )non self ��./

A . !` ��"G)�)��X( &��#� ��C �.�?��./ ��)$�) �� ) ��*�� >��$.;�� . �V��F��'��)$B�� H#*V� �� C �.�?) �4 &��V+�� ( ��$VT

�� &��'�� ) S#� q�$� 4� �B�� ) ��7#*� q�$� 4� ) ��transplant ��# �� <�.�� S�� . • SC �.�? �� [:� �� Bk#;� ] rg� �� S? d� _#epitopes !��. �� #G) ��@.�5� s�#��.

B .� O� a�60 : �� O�;)F��E ��( �� 0�P 4 �; ��`� &��7#*� �� S�#:.�� FZ�bone marrow <�� # �� .�� .��#'�T ) ��.��#`+7��./

13-2

!��? �#*� ) �� >��7#� �� l#�$� &��'�� ) ��.��#`+7 <�� +�� L./�� .� ) j+a� <�� .�V�#`+�7 �#V* �� tonsils >�4#VV7adenoids >thymus >^#VV+��lymph nodes >spleen >H�VV��appendix >��VV��5?

Peyer’s patches>lymphatic vessels !��. �; (�� O�; ��./ Kk#; ��`� &��7#*� �.�� )� �� ;T )B ��# �� L�/�� ) �� C �.�? �� ;

�� a��. �� &�$�> &� ��$�' rg� T-Cells ) B-Cells �V� ��7#*V� $�� g� &� ��$�' ��/k.

. Z (�� 0�, $�� o$� 4� &��7#*� ��`� v�F��#X F�} ��./ .��.��#'�8 <�� S�"� > �S,4�E�� )* ,4�� ./ � �`�,4�E�� &��7#*�versatile !��#`+7 K�$�� +�� [�� ; ��./T cell �� &4��

��; . <B13.2 �� S� � �� &�� .�� D��.

Figure 13.2 White Cell Types

C .�"G� ��D� )i,#��( 1-��./ S�� ,�8� �=��+� &��E�)$5 � &�� .�? . ��V� �V� S? rgV� �� Bk#V; �V�gT &�� .�? ��

�#��5 paratopes !��. 1-��# �� 7#� z�� &� &�� .�? ��$; �� C �.�? ��.��#`+7 �.T). 2- �� .�?Y &�� ; ��./ <B2 �� &)4�� . 3- �; �#��5variable regions ��; �� i$8 &$�� &�� .�? K� 4� �# �� #�. 4-� d.�5� �� ; !�� #��5�# �� SC �.�? S� H��8 $�] y,�� ) ��/k � . �� S��/k ��affinity

�# �� #�. �� S� � �� .�� D�� <B

13-3

igure 13.1 Antigen-Antibody-Complex

D . 5��7;#��E

1- H#*� �#��5B ��/k �� SC �.�? K� d.�5� �� 2- H#*�B ��.@5 �� ) � ; �� S)�� $�?peptides ��; �� L�/��.

3-4- Figure 13.4 B-Cell Develops into Plasma Cell, Producing Antibodies

5- ��.@5��;+;� t�" a6� ��?�B ) MHC II ( (D rg�B �# �� )? . 6- �.T) H�UHelper-T Cell Q�# (DMHC a6� �� "�DB proliferate �B � . ��affinity �SD#�D

a6� (E �� EO� ; �#�� D ��#T a6� �� E �� ;�B �� e+� . 7- H#*� $�JB� �a�.� ��B H#*� &��(�,�J )��A_ �# �� 7#�.

• .��A_ a6�s �N�� #� !` ��"G c#��"# j�X ��D ��"G 13-4

8-� �� [��F8� �� &�� G�#� �� +�X5 ��7#� !,$� �%8�U H#*�� . �$V�4 !V�� $.��$V� ��V/� &#��m ��5 �� &4��HTC H#*� �� S? S� �� )B ��

The Natural-Killer-T-Cell (NKTC) • K+; �� ? ) ��; �� Fa� ��.@5 �� SC �.�? C�8)$;��MHC I ��# �� )? rg� ��. •NKTC �#� L ) ��#7? H#*� L �a�.� �� ) ��/k �� S? ��NKTC ��# �� #��

Figure 13.5 Macrophage and NKTC

E .��# 5�� 0 ��1- �V�� +�� L./�� &$�'�� O�; ��N# ) H#*V�B ( �� V SC �V.�? �V��#� �V� �V; !V��

��; ��. 2- ��clonal selection �� S(�.�? �� T �; ��.��#`+7clone ��#. ) �.�/�G $�] $�JB�

�*p� ��#+�- ��# �� &4�� ( 3-.�? �� S��/k &�$� &�� .�? �� S(�.�? \�#� �� K�$�� !��#`+7� $V�JB� �VB*� >�V�; �V� ��7#� S(

��#.� �� a�� F�� .8�� [�Gbinding affinity �� .�� SC �.�? �� . �S��#�, Q�# (Dsomatic

hyper-mutation ��0 ��. • &�V��(.�? �V� �V; �V�� .V�� V ��$�' �; �$�' �� C �.�? �� G�#� �a�.� �� +�� L./�� D��$��

�� .�� q��g� $. �� G�#� .��; �� 7#� 4�� #� &� ��$�' 4� �B�� a�.� ��. • !��#`+7 <; �� ; �a�? 4�NIS K� !��+G [��F8� >!�� !��m ��$�� s#�clone !V��+G [�V; �V��

$��clone !�� .!�� *�T &��$�'�� S�$; "#��$8 �� $a�� D��.

13-5

• D�$.'�F� >�# ��) �� .�� SC �.�? K� �.T)clone �#V �� G�#� ��) ��#�$� . DV�� V�secondary immune response ��#' ��.

• 4��# �� .��; ��; �� ; SC �.�? �� ; ��.��#`+7 �. • �V� �V�� <V��; �� .��#`�� K�� $�' &��7#B7#� �� s#�� >�� H#�T <��T ��+�� L./�� $B*+, �B�� &�$�

��; �� G�� &� S(�.�? s#�� #.�.

13-1-2Basic artificial immune system ��+.��#�7� D�� #� ��3 !�� S� � !�� >$�JB� �#��

��5 L.��#�7� ^�� D�� $�AIS �# �� 8$�� #��.

Algorithm 13.1 Basic AIS Algorithm

13-6

�-(&�I� ��X C :initializing : ��7)� !��+Gp ��ALC �(yi∈C) �� #Vc� �V� �V�7#� �8��Vc� &)� 4� �B�� ) �#data set ��# �� .�� .�� 7)� �� #� ")� �� L.��#�7� $.

¥- SC �.�? �,#+a�DT :S(�.�? &�#�7� �8$�� z 3- �8��c� &� �,#+a� $�4 �� <; �� S(�.�? $S 4�ALC �� G�#� � �� #V �� D�� !�� .

d(zi,yj) .�� 4� !�� D�� &�$�&� �$; ��`.�� S�#� �� $�4 . L.��#V�7� �� !��V ��AIS ")�r contiguous !��.

�� V#"#: D�� !�� D�� &�$� �)� �� 4��ALC !�� G�� &#�7� ) .4� ��, ��)�:

")� �� r-contiguous ��.�� r �$' �� e�:.�� . D�V� �7�#V.� &� q��g� �� @�ALC &�#V� &#V�7� )

�# �� $+ . �� D�� $�4 r <p�U i��g�� #c�� $�] �� #G) i��g�� >�� #G) !�� B��F� $�� , �� >!/��.

Figure 13.1 r-Continuous Matching Rule Algorithm 13.2 Training ALCs with Negative Selection

13-7

4- [��F'ALC� : &��ALC D�$.'�F� ��d )!�� ( S(�.�? �� ,#+a� ) e�:.��H ��4�V� �� . DV��

")�ALC e#� D�$. �� S(�.�? �� !�� . 5- �,#+a� &�n,�H ��negative selection $�JB� $�� &��)� ��# �� . !�� DB+� ��)�supervised

�� unsupervised ��# &�� H#�$8 &$�� #�� . 6-Stimulation : ��$8 D�� a�.� ��ALC �� &�d ��#�� #� &�G Kk#; ALC �,#+a� &�H ��

��. 7- Stopping condition for the while-loop: In most of the discussed AIS models, the stopping

condition is based on convergence of the ALC population or a preset number of iterations.

13-1-3Negative selection

13-8

<�� ,#+a� L.��#�7� D��na H#*�ALC $V�] 4� &�#V� _�:V � &�$� !/� �� ; ��; �� 7#� &�#� �# �� `.�� &�#�.

13-2Clonal Selection Theory Models 1-ALC �� D�$�~�� !��"�� P�D �� ; ��. 2- �� ; �� 5 [�G ) $�JB� �@� ��?d ��$' $. �� G�� ?. 3- ALC �� G &$�JB� &�ALC ��; �� !��T� �� &�$� �+��T &� .!��T� ��> d !��

��k�� #G) ^�� D�� $� L.��#�7�

13-2-1 CLONALG De Castro and Von Zuben [186, 190] ��$; �=�� L.��#�7� D�� .� �;��!�� $�4 <B &��' &.

Algorithm 13.3 CLONALG Algorithm for Pattern Recognition t = tmax; Determine the antigen patterns as training set DT ; Initialize a set of P randomly generated ALCs as population C; Select a subset of M = |DT | memory ALCs, as population M ∈ C; Select a subset of P-M ALCs, as population Pr; while t > 0 dofor each antigen pattern zp ∈ DT do

Calculate the affinity between zp and each of the ALCs in C; Select nh of the highest affinity ALCs with zp from C as subset H; Sort the ALCs of H in ascending order, according to the ALCs affinity; Generate W as the set of clones for each ALC in H; Generate W’ as the set of mutated clones for each ALC in W; Calculate the affinity between zp and each of the ALCs in W’ ; Select the ALC with the highest affinity in w’ as x§ ; Insert ˆx in M at position p; Replace nl of the lowest affinity ALCs in Pr with randomly generated ALCs;

endt = t � 1; end

1- (&�I�Dt ��"` ��"G: �,#+a�M �� C �.�? ��DT ��# �� . !�� &$�� ;. 2-(&�I� ��X C: P = Pr + M �.�� )� �� #�

• M #n,& �%8�U S�#�, ��) �a�.�D�$.�� &� �*�T ( �� $��$� S? �� ;Dt ) !�� •) Pr )��G &� �8��c�( �# �� L�/��.

13-9

3- !�� d(zp,yi) $ &�$� zp ��; �� . ��+ �*p�8 ��!��. p �,#+a� �� S(�.�? !��T#�Dt !��.

4- �,#+a� D��$�$�: nh 4� ��ALC �� S(�.�? �� !�� D�$. �� ,#+a� ) e�:.�� H ��4�/� �� 5- &�n,�H 4� �� L; �� 4 �� ; z�$�.

Figure 3: Block diagram of the clonal selection algorithm

6- #n, $H �� z��.� �#� �� ,#+a� �; ��; �� 7#� ��4$8W ��4�� . �� $ ��4$8

ALC !��T#� �� &�� .�? �;i ��H�� !�� $��$� ��; �� 7#� :)© >��w(

(13.1) 7- �� [�G ��W �,#+a� W’ ��4�/� �� . K� [�G H�+.U�ALC ��W �� ^#VB�� Vg�� S? !��V ��

�� .ALC �D ��d K�$D �� E �� #� _ k�) ��E. 8- ��) ��r&#M: D�� !�� zp &�n�� )W’ �# �� . 9-ALC �� p 4�M ��ALC �� p �� l#�$�W’ �� .�� $�~�� $'� �# �� #�� 10- �,#+a� &�n,� ��8��c� ��# �� e�:.��

11- The learning process repeats, until the maximum number of generations, tmax, has been reached

a�e� :BINARY CHARACTER RECOGNITION 1- �� ; !�� o$U ! <�� SC �.�?AIS �� "4#�? ��? &)� .

13-10

2- $�#c�10*12 o$U $) <B4(a) ( H#� �� $�� $� K� ��L=120 �# �� 8$��. 3- �,#+a�ALC �10 �; !��8 ��./ �%8�U ��? &��. 4- �,#+a�8 �%8�U ��ALC <B �� #*� ��)� H#� �� )4(b) (!�� S� �. 5- D�� !�� &$�' �4�� ALC !�� + �*p�8 SC �.�? ). !�� /+ $`p ��.

6- 4� �� L.��#�7�250 �# �� $�+ ��)�.

Figure 4: (a) Patterns to be learned, or input patterns (antigens). (b) Initial memory set. (c) Memory set after 50 cell generations. (d) Memory set after 100 cell generations. (e) Memory set after 200 cell generations.

Multi-Layered AIS Knight and Timmis &�� $� L.��#�7�3 �� 5 �� ~ . ��.��V� �V� �V�� <V�� $��VB� �V� � ��~

��)? !�� S(�.�?. 1-��./ � SC �.�? �4#�? &#�7�. 2- �,#+a�ALC <��V 3 �V�~ free-antibody layer (F), B-Cell layer (B) memory layer (M) !V��.

H#*�!�� #� �� l#�$� v$� �*$� ) !��$T �*$� &�� ~ $. affinity threshold (aF, aB, aM), death threshold (ε F,εB,εM).

•v$� �*$� : H#*� S� � K�$�� S��4 �� $'� ��~ K� �� D�� 4��$�+�� ) ohU H#*� �# $. �� . •!�� *$� : �� S? 4� [�� ~ K� �� !�� $'� �; !�� g�)�� $.+; �*p�8 ( �V�*� �� S�#�, ��

�#.

13-11

• !��$T ��fa ��# ��B D �� (D !G 5E �#�U� (E ��# �� 6P# (6Y�,.

13-3Network Theory Models \�#� �� D�7)� H�� D��Jerne [416, 677] � b$g�.

• &��7#*� H�� D�� B ��; �*�� S(�.�? �� #.� �� ; �� a�� &� �B� . H#*V� K� �.T)B SC �V.�? �V� &��7#*� $�� !�� G�#�B ��; �� K�$�� F�� B� �� <c.�. D��$��ALC $��VB� �V� �B�V K� ��

� &��B+��; �� G�� H#*� �� ; � . �VB*� �#V �V� K�$�� S(�.�? �� !��#`+7 D��$��# �� ? �� K�$�� /+ !��#`+7 \�#� . H#*� �.T) �a�.� ��B >&��V� �V.�? ��7#� &�$� $�JB� �� s)$

mutated clone 4��$5 �� ; D�+ �� F�� /+ H#*� >��; �� %8�U )�. •��; ��? �� K�$�� $��+ !�� DB+� �� D�� . • K�ALC $'� �# �� K�$�� !�� .�� G�� H#*� �� &��4. • K�ALC $'� ��$' �� ? !�� &��4 �B�ALC �� .�� &�#� &.

13-3-1Artificial Immune NetworkThe network theory was first modeled by Timmis and Neal [846] resulting in the artificial immune network (AINE). AINE defines the new concept of artificial recognition balls (ARBs), which are bounded by a resource limited environment.

&�$� q8#� ")�unsupervised learning >data analysis and clustering !��. &�#� �,#+a�A &�� <��ARB K�$ �; !�� [#5 �� &� ��g��.

• $artificial recognition ball(ARB) (&�I�n a6�B 5�D f��$"��#� )K�$D �� ( • &��7#*� <; ��B �+ARB j��V� ��? 4� K�$ L�� 7) !�� !��m KV�$�� rgV� stimulation level S?

!/. • $ARB S? �� ; �+�� $�n=0 �# �� ohU .

Table 1: Mapping between the Immune System and AIRS

In summary, AINE consists of 13-12

1- ��"` ��"G �0��B, �� an : an �4#�? �,#+a� &��C �.�? D�� *p�8 D�� !�� $��$�DT 2- ��7)� �,#+a�ARB � : �,#+a� !�� &�$� S�#� ��A 4�ARB >� 4� �,#+a� $�4&� DT �$; ��`.��. 3- �� !��$T!�� *T� �*p�8 $. �� !��$T L; �� . 4-)� ARB �� $'� ��./ <c.� L��*p�8 4� $.+; ��? an ��. 5- f�� m?� (��ARB: ��+� ��)� $ ��ARB &�A }$�� "4#�? &��$.5) &�n,�DT ( ��$T

��$�' �� stimulation level �# �� ? . �*; K�$�� rg�ARB ��+Wi >Larb,i !��.

the antigen stimulation, la,i,

&�n,� ��+�ARB ��+i |Ln,i| &�n,� K� K� �� G�#� ��,#+a� �.�? K�$�� .8$' ��$T SC �.�?

SCl a,i ��? !��. la,i !��$T �� s#+a� �� !�� $��$� D��

the network stimulation, ln,i,

D�� K��7ARB � : ��Ln,i #n, D�� K�$�� rg�i &�#n, $�� ARB ��; �� _: � �� .

and the network suppression, sn,i.

, , K� &�n,� D�� !�� , ��ARB &�n,� )ARB $�� &�

��? �� !�� *; K�$�� rg� �a�.� ��.

(13.3) Ln,i �� &�n,�ARB ��+i 4��,#+a� A )La !��C �.�� .

6- �� H#*� �� _�c:�ARB : $ �� c.c.�� H#*� ��ARB #��.��? �� !�� .

(13.7) where larb,i is the stimulation level and � some constant. Since the stimulation level of the ARBs in A are normalised, some of the ARBs will have no resources allocated. After the resource allocation step, the weakest ARBs (i.e. ARBs having zero resources) are removed from the population of ARBs.

13-13

7- S#*; ) [�G Each of the remaining ARBs in the population, A, is then cloned and mutated if the calculated stimulation level of the ARB is above a certain threshold. These mutated clones are then integrated into the population by re-calculating the network links between the ARBs in A.

8- �P� Since ARBs compete for resources based on their stimulation level, an upper limit is set to the number of resources (i.e. B-Cells) available. The stopping condition can be based on whether the maximum size of A has been reached.

Algorithm 13.5 Artificial Immune Network (AINE)

Allocate resources to the set of ARBs, A, using Algorithm 13.6; Clone and mutate remaining ARBs in A; Integrate mutated clones into A;

end

Algorithm 13.6 Resource Allocation in the Artificial Immune Network Set the number of allocated resources, nr = 0;

13-14

Self Stabilizing AISThe self stabilizing AIS (SSAIS) was developed by Neal [626] to simplify and improve AINE. The main difference between these two models is that the SSAIS does not have a shared/distributed pool with a fixed number of resources that ARBs must compete for. Theresource level of an ARB is increased if the ARB has the highest stimulation for an incoming pattern. Each ARB calculates its resource level locally. After a data pattern has been presented to all of the ARBs, the resource level of the most stimulated ARB is increased by addition of the ARB’s stimulation level.

13-3-2 aiNet De Castro and Von Zuben [184, 187] ^�� $� �� L.��#�7� D��clonal selection �V��$; �V=��. L.��#V�7�

13.8 �� S� � �$�? ��*+, �7�#�.

13-15

1- F�� L.��#�7� D�� nh 4� ��ALC � �� e�:.�� S(�.�? �� !�� D�$. �� #V ��. �V��*T� �V/�� ")� !��; d L; !�� $. �� !�� .

2- � e+�: �� clonal selection ��#V �� $�JB� �� .�? 4� �n�� . �V�� V.�? 4� �.V�� $V &�$V�yi �,#+a�H ��nc �� #n,clone �# �� 7#�

fa(zp, yi) = d (zp, yj)

(13.18) where na = |B| is the number of antibodies.

3-[�G : The nc clones of each antibody are mutated

(13.19) where yj is the antibody, zp is the antigen training pattern, and pm is the mutation rate. The mutation rate is inverse proportional to the calculated affinity, fa (zp, yj ).

• The higher the calculated affinity, the less yj is mutated. • Based on the affinity (dissimilarity) of the mutated clones with the antigen pattern zp, nh%

of the highest affinity mutated clones are selected as the memory set, M. An antibody in M is removed if the affinity between the antibody and zp is higher than the threshold, amax.

4- The affinity (similarity) among antibodies in M is measured against εs for removal from M. M is then concatenated with the set of antibodies B. The affinity among antibodies in B is measured against εs for removal from B.

• A percentage, nl%, of antibodies with the lowest affinity to zp in B is replaced with randomly generated antibodies.

H�U<�� B� ALC ��; �� $' !`G ��7#� ) <p) L�� 7 4� �; !�� . D�� H�c�� T2 z��.� �$'!�� ? D�� !�� . 4� �Vn�� V; �V��; �V� �B� ��7#� �� <c.� L�� &� �$' z��$� D�� !�� Y �n�� ) LB�� ~�c��.

5-4� ��, L.��#�7� �T#� \��$: •iteration counter: :#a./G ��)� �� $J;��U •maximum size of the network �B� �4�� $J;��U �� S�� •convergence D�� \�#.� &�g� �.T) L.��#�7� S� �$�+ALC �$� <T��U �� 4#�? �,#+a� ) ��.

Some of the drawbacks of the aiNet model is

• the large number of parameters that need to be specified and that • the cost of computation increases as the number of variables of a training pattern increases.

13-16

13-17

endend

endRemove all elimination antibodies from B; Replace nl% of the lowest affinity antibodies in B with randomly generated antibodies; end

13-4Danger Theory Models \�#� �� D�7)� $g� &�#E�Matzinger [567, 568] � �=��.

1- �#V �V� <V=�T KV�B`� $Vg� �� G�� t��$g� �G�� D�� +�� L./�� H�� D�� . zV��$� D��V�� +� ��5 � �G�� +� �� +�� L./��.

2-'� !�� t��$g� �G�� H#*��# &�#� H#*� v$� �� ^$.�� z�� $ . �V�� >!�� &$�� 7#*� v$�� $T v$� ��? &��7#*� �FG �; ��$�' �� $T $%� �� 7#*� v$�.

3- $V� S? �V� >S�#� �.�� L]$�*, ��+�� L./�� ; !�� &�#� H#*� �� $��Z� �� q8) !�*��T H�� D�� F��; �+� ��#�.

4- ^$.V�� H��V� �#V �V�)$�) H#*� �.T)(signal 0) �V; �V�; �V� �V�7#�antigen presenting cells (APCs) ��; �� K�$�� .APC K�$�� zG#� �@�helper T-Cell ��$' �� . ��.8� �� ~#��

!�� K��X; �� . 5- K�$�� +�� L./�� ! � �p�� D�E�)$5 $'� �$�� ; �7#*� �+� �T�`�� #c�� $�] �� # ��

�.8�. 6- !�� $g� �� ) t��$g� �G�� H#*� ��; K�B`� H�� S� �8�Y� >K��X; H�� H�� D�� )�`�.

�;# (An Adaptive Mailbox !�� z7�G $�] 4� z7�G &� �� &4��G L.��#�7� o� .!�� !/� ) "4#�? 4�8 )� &�� L.��#�7�

a . 4�8"4#�? • ��G �� .T)z $��; \�#�delete �� .�? K� S? ��c: � 4� �# �� G &�ynew �# �� .��. •ynew ��B �# �� 7#� $��; &�$� z7�G $�] <�+�� &�$� &��+� D.8�� [�G ) S� S#*; 4� ��) �)� ��. • �,#+a� &�n,� S�� ; D��na �� .

• zAlgorithm 13.9 Initialization Phase for an Adaptive Mailbox while |B| < na do

if user action = delete email, z, thenGenerate an antibody, ynew, from z; Add, ynew, to the set of antibodies, B; for each antibody, yj B do

13-18

Clone and mutate antibody, yj , to maximize affinity with antibody, ynew; Add, nh, highest affinity clones to B;

endend

end7- !/� 4�8 The running phase (see Algorithm 13.10) labels all incoming

email, z, that are deleted by the user as uninteresting and buffers them as antigen in DT . When the buffered emails reach a specific size, nT , the buffer, DT , is presented to the antibody set, B.

The antibody set then adapts to the presented buffer of emails (antigens) through clonal selection. Thus, the antibody set, B, adapts to the changing interest of the user to represent the latest general set of antibodies (uninteresting emails).

Algorithm 13.10 Running Phase for an Adaptive Mailbox while true do

Wait for action from user; if user action = delete email, z, then

Assign class uninteresting to e; endelse

Assign class interesting to z; end

� = un � in; Clone and mutate yj in proportion to �; endRemove nl of the antibodies with the lowest calculated � from B;

13-19

Remove all antigens from DT ; endCalculate the degree of danger as, �; while � > �max do

end end

endendThe number of unread emails in the inbox determines the degree of the danger signal, �. If the degree of the danger signal reaches a limit, �max, the unread emails, U, are presented to the set of antibodies, B, for classification as uninteresting. An email, z'p∈U, is classified as uninteresting if the highest calculated affinity, ah, is higher than an affinity threshold, amax. The uninteresting classified email is then moved to a temporary folder or deleted. The degree of the danger signal needs to be calculated for each new email received.

Applications • intrusion and anomaly detection [13, 30, 176, 279, 280, 327, 457, 458, 803, 804] • virus detection [281], concept learning [689], data clustering [184], robotics [431, 892] • initialization of feed-forward neural network weights [189], • ROBOTICS and control: identification, adaptyive control

Assignments 1-#� �� !�� .#� �,#�c� &��+.��#�7� ��? ^�� $� �; �� +�� &��#E�� r�Y. 2-�� r�Y#� �� !�� &��. 3- ��5 L.��#�7�AIS �� r�Y#� ��k#*8 &)� ��. 4- L.��#�7�aiNet �� r�Y#� ��k#*8 &)� ��.

5- Discuss how the principles of a NIS can be used to solve real-world problems where anomalies need to be detected, such as fraud.

6- *With reference to negative selection as described in Section 13.2.1, discuss the consequences of having small values for r. Also discuss the consequences for large values.

7- A drawback of negative selection is that the training set needs to have a good representation of self patterns. Why is this the case?

13-20

8- How can an AIS be used for classification problems where there is more than two classes? 9- *For the aiNet model in Algorithm 13.5, how does network suppression help to control the

size of the ARB population? 10- *Why should an antibody be mutated less the higher the affinity of the antibody to an

antigen training pattern, considering the aiNet model? 5L��w

Dynamic Clonal Selection Kim and Bentley [461

Enhanced Artificial Immune Network was proposed by Nasraoui et al. [622]. Dynamic Weighted B-Cell AIS was presented in [623]

Adapted Artificial Immune Network [909]

14-1

14-1

14

��'& (+�� '& (+�� 4

a', (D �X ��&A�# * �+� �#�D2 s3 s4 s5 46 � �E V)� i)�� Z��E

�"�� !4� �B� 4� &� �)$� &�� F�� )�U ��0.1 sec �� _�: � . �V� �V.8$' ��$VT �G#� ��#� ��#� D��

�� H#*� &�� S)$� $dendrite �G)$� ) axon !�� -- synapse ��; �� <p) &�� S)$� !�$�� S#/;?. - K� ��.8� �B�7�U��# �� u�5 ��/� �B� ��.8� !�� S)$�. -�� #G) S)$� $ �� K��7 S��F -�� a�� m K� �� *+, ��p. -�# �� a�� &4�#� "4��$5. - Die off frequently (never replaced)

14-1 ��'& (+�� &�'�Artificial Neural Network

14 -1-1 f� (+��"4�"� �&�'�

<B14.1 �$B*+,�� [��+� �� S)$� K.

14-2

14-2

Figure 14.1 An Artificial Neuron

S? �� ; 1. ��$�x = (x1,x2, . . . , xI ) !�� &�)�) . 2. D��) &�)�net : &�)�)K�$�� j�� )�) e$Y �� j+G ��#p �� !�� DB+�.

∏�==

==l

i

wi

l

iii

ixnetwxnet11

S? �� j+G ��#p ��summation units (SU) S? �� e$Y ��#p �� )product units (PU) ��#�� . �VU�)$�~�� !�8$A ��$Y �� &�$� &��,X�� '��$�'$� ��. �V� l#V�$� ��4) z�$Y &�� &�)�) $ !7�U )� $ ��

�#�vi !��. 3. �*$�θ : �� ; �� *$� rg� &�� S)$� $ >�*; &�)�) $� �)X,bias, θ, �# �� #� F��. 4. j��4�� H��8 f : j�� &�$� K�$��activation function �.8$' $%� �� j�� &�� ? �*+G 4� �; �# ��

Linear >Sigmoid )Hyperbolic ��.

� ��+6�& ��D (�X��'& f� (+�� #c� �; ��)�) ��#� �� )$� K� �B�linearly separable $��VB� 4� �� # &�� #� ��#� ��

��; ��G. • <B14.4 ��; ��#�ksingle perceptron j�� &4�� 5 &�$�OR �� S� � �� . ��#Vc� j�� D��

&�� #� <��T �g� !�� . �� j��#� D�� 2 &�)�)(z) � �V�#�$� H)��G �� G)$� K� ) V ��V �� S�� . ��4) z��$Y �B� H#�a�v �*$� )θ �; ��./ ��g��

Σ

x1

x2

xn

...

w1

w2

wn

−θ

Σ wi xi

1 if net-θ >0o(xi)= -1 otherwise

o

{

n

i=1 =Net

F(net−θ)

14-3

14-3

vZ = ¤ (14.15) ��; �� i�p. �a�� θ=1 !�� .8$' $%� �� . &�)�) $'�2 ��,1 ��V, $`Vp ��1 LV 4� �V�� .V��

��$' �� K�B`�.

Figure 14.4 Linear Separable Boolean Perceptrons

• ��XOR ��linearly separable &�V�� V#� <V��T S)$.@V�$5 �V�)$� KV� �B�V D��$��V�� !V/��!/�� S? &��)�) <B14-5 !�� ~ ��k �B� �� 4�� .�� D�� &�� #� &�$�

• If a single perceptron is used, then the best accuracy that can be obtained is 75%.

Figure 14.5 XOR Decision Boundaries

��D (U�� (6b�� : �� !�� &�)$Y �� ~ S)$� K� 4$� $ &�$� &��. <B &�$� XJ�14-2-8 !V�� 4�V�� #� S)$� ��.

\*] &�� X; ��#� ~�� dk !+� �#' ��x �� #G) . &�$� �� S? r��p &�� X;3 4�V�� $V�� S)$�!��

Figure 14.2.8 Feedforward Neural Network Classification Boundary Illustration.

14-4

14-4

iD�� >��U� (6b��

Figure 14.2.9 Hidden Unit Functioning for Function

• �� j�� z�$�� *E/� ��4 �#+��– <T��U �� L+�F;��4+1 7��8 j�� ~ S)$�4�/ !�� 4�� #+�� .

7��8 j�� 4� $'�4�/ ��`.�� g�� S�� &�$� S)$� &$. �� # !�� 4�� <�T �� !T� S�+.

14-2 ��'& (+�� 6D�P �&�'� •&4�#� "4��$5 •Robust and fault tolerant : �V7) �� 5 �� $B*+, ��FG !8� �B� 4� � :� S� e�$� ><+, ��

+Y �� <��; ��$� �� $a��; &4�� 4�� $B*+, L7�� &��.+/T �� >� :� ��$� L]$�*, ��#� �� B�� D •��8): f��.� 4� &$�'�� #�Task �� *�T. •!/�� ^�/U F�#� �� !�/� •��; H�� u�5 ��*+, ��#� ��. •�� *�8 S? �� _c:.� K� [�� "4#�? ��c, �B� K�.

��D��E • Process Modeling and Control - Creating a neural network model for a physical plant then using that model to determine the best control settings for the plant. • Machine Diagnostics - Detect when a machine has failed so that the system can automatically shut down the machine when this occurs. • Target Recognition - Military application which uses video and/or infrared image data to determine if an enemy target is present. • Sonar target recognition: Distinguish mines from rocks on sea-bed. The neural network is provided with a large number of parameters which are extracted from the sonar signal The training set consists of sets of signals from rocks and mines.

14-5

14-5

• Engine management: The behaviour of a car engine is influenced by a large number of parameters temperature at various points, fuel/air mixture, lubricant viscosity. Major companies have used neural networks to dynamically tune an engine depending on current settings.

14-3 ��'& (+�� N�G •Supervised learning: &�4� �� B� D�� $ ��$� �� F�� G)$� ��$� K� &�)�)�# . �� $'�

��)�)p F�� G)$� �� p !�� . �B� &��4) �4#�? &��+.��#�7�w �B�V �V� �V�; �� L�%�� ; ��7#� ��)�U �� #�$� &��G)$� ��)�) &�4� �� #.� . ^�� $� �4#�? �� .�� L.��#�7�gradient

descent-based optimization L.��#�7� backpropagation (BP) �� . •Unsupervised learning : �&A'� $A� $,/6 �� $,M� �&6 �� $\�� ]�G* ��@ �E $,/6 ��

�&6 �� P� 2�!��* "�� E $� �&6 �* V�T�� . ��*Hebbian rule • Reinforcement learning is a special case of supervised learning where the exact desired output is unknown. It is based only on the information of whether or not the actual output is correct. • Evolutionary training: Evolution has been introduced into ANN’s at roughly three different levels:

a. connection weights; b. architectures;�The evolution of architectures enables ANN’s to adapt their topologies

to different tasks without human intervention and thus provides an approach to automatic ANN design

c. learning rules. The evolution of learning rules can be regarded as a process of “learning to learn” in ANN’s where the adaptation of learning rules is achieved through evolution.

�#�)# �DMATLAB iD�� D !4��H��_ (+��;��,N�� (6_

�� )4�X 4 ��4�4: &�)�) )�[-2;-2] )[2;2] �G)$� �7)� &�$� �; L��0 �VG)$� �V�)� &�$V� )1 � �� L�� $%�

p=[-2 2;-2 2]; t=[0 1]; (+�� X��: �# �� .�� S? z�� S)$.@�$5 �B�. net = newp(p,t);

�N�G: "4#�? ��1 �# �� .8$' $%� �� )� . S#k2 ��)� $ L�� &�)�) ��#+�2 <��V �� "4#�? ��$' ��. net.trainParam.epochs = 1;net = train(net,p,t);

�N�G (I �": 4� ��, ��? !�� "4#�? �a�.� �� ; ^�� ) ��4) ��$�

14-6

14-6

net.iw{:,:})net.b 2 2 [-1]

(+�� : L��; �� H�+,� �B� �� &�)�) �$B*+, !/� ) &4�� &�$� ~�Ua = sim(net,p) $�� G)$� S�+ ��? �� !�� ; &� �a�.� ��V�� "4#V�? e#V� �B�V �� S� � �; !�� *�T ��

!��. a = 0 1 �+ ,#�0 �#�)#: �� ; D�� $� D.#� &�G ��nntool ��; �$G� ��./��#� �� F�� B�8�$' ��#c�.

iD�� D !4��H��_ (+��;��,N�� ?X �#.�� 4� �#%�� D�� &�$�newlind �$; ��`.�� S�#� �� . j�� S)$.@�$5 �� &�� #� &�$�N��;��, 4� �g�

�#.��newlin �$��#� S�#� ��.

14-4 (+�� Supervised �A .(+�� X��

��#4�,� , (+��: ��Vc, �B�Vfeedforward neural network (FFNN) �V� 3 �$V')3 &�)�)( �� V�~K� &�)�) �� ~3 �� G)$� ��~ ) �$'3 �$' <B �� S� �!��

• Figure 14.2.1 Feedforward Neural Network

Simple Recurrent Neural Networks : �� &� �B�Jordan ) �G)$� ��~ 4�Elamn �V� �� ~ 4�

K��8 &�)�))$�� U�) K� �� (�� #G).

14-7

14-7

Figure 14.2.3 Jordan and Elman Simple Recurrent Neural Network

Time-Delay Neural Networks : ��#c� ��)�) �B� D�� i(k), i(k-1), i(k-2),…. ��./. Model Reference Control

The neural model reference control architecture uses two neural networks: a controller network and a plant model network, as shown in the following figure. The plant model is identified first, and then the controller is trained so that the plant output follows the reference model output.

�B . (+�� N�G Supervised : Back propagation • ��; }$8p ��$�K &�� S�#�� e#*g� �G)$�tp <�� p ��$�I &�)�) &�V��zp �V; �� #VG)K

) �G)$� &��$' ��I �V�4) ��$V� �� &� �B� q��g� &�$� ) !�� &�)�) &��$' ��W �V.8$' $V � ��!�� .

1. ��$� ��w �� 7)� �� 2. ��+ &�)�) &�.��p �G)$� ) H�+,� �B� �� Ok,p ��)? !�� B� &�$G� 4� ��. 3.��; �� B� �$B*+, &�g� o� �G)$� ) �G)$� i��g�� 4�.

14-8

14-8

4. ��$�w �� #�� . '4�� #c�E �#%�� D�� &�$� �# <T��UdEp/dW=0 �# ��.

W=W-£ dEp/dW �# �� `.�� &�$� �; �p�� ")�Back-propagation !��.

� �N�GBatch & online • Stochastic/online learning, where weights are adjusted after each pattern presentation. In this case the next input pattern is selected randomly from the training set, to prevent any bias that may occur due to the order in which patterns occur in the training set. • Batch/offline learning, where weight changes are accumulated and used to adjust weights only after all training patterns have been presented.

� N# F��O��#MATLAB }$8 j�� L��#� �� ;x2+x+1 L��; H�� .L�4�� G)$� ) &�)�) �#%�� D�� &�$�

p=[-1:0.1:1];t=polyval([1 1 1],p); �$�' ��$T S#.� K� �� #+� ��$� $ >&�)�) ��$�� <�B � ��.

1. &� ��#+� "4#�? )update �$.��5 ��#c�Batch �# �+� �.8$' $%� �� ,X�� 7�#� ")� D�� . ) !V/� &�$V� &�� ) "4#�? &�$� &�� &�)�) ��$� 4��# �� .8$' $%� �� a�� .,� &�$� &�� . �#.��train KV� �V��$� 4� �V�� V$.��5 ��a�epoch

�� a�� . K�epoch �� &�)�) ��,X�� )� �� K� $��$�!�� #%�� "4#�? &�$� �; !. net = newff(p,t,10,{},'trainbfg');net=init(net);

s�#�� 4#�? &��+.��#�7� F�� $�� S�#� �� $; ��`.��: Conjugate gradient (traincgf, traincgp, traincgb, trainscg),%Quasi-Newton (trainbfg, trainoss), Levenberg-Marquardt (trainlm) net.trainParam.epochs = 30;net.trainParam.goal = 1e-5; [net,tr,Y,E] = train(net,p,t);

�#.�� #� ��train �� S? 4� �V :� �VB*� �V�; �+� ��`.�� "4#�? &�$� �� &�)�) ��,X�� +� �; �# !T�� +� ��$T ��`.�� #� "4#�? �� ) ��; �� G �a�� .,� ) !/� &�$� . ��#� �� $�� .B�train �VB��

K� 4� �� #.�� D��epoch ��; �� a� �� $.��5. y = sim(net,p);plot([y',t'])

4� �� $' �� $'�10 ��3 ��.8� D�� &$.�� i��g�� 7) �# �� $. �� "4#�? S��4 �; �� [�;��? �� 5 &�)�) ��,X�� ) �B� . �G�� j�� K� !��U �� #� �� [��4? D��2 4� [�� 2 4�� $'

��.

14-9

14-9

�B�8�$' �$a�5 �� ; D�� a��nftool !�� a�� <��T �'�� . 2. &� ��#+� "4#�? ) �7�#.� ��#c�update ��#+� �� #+�

�$; <+, S�#� �� $�4 ")� �� #+� �� #+� "4#�? &�$� p=-1:0.1:1;t=polyval([1 1 1],p);

��#c� &�)�) �; ��; !T�sequence �# ��))��; !T� �~#;? ��( p1={p};t1={t}; net = newff(p1,t1,3,{},'trainbfg'); net.adaptParam.passes = 10; [net,y,e] =adapt(net,p1,t1); cell2mat([y';e']) ys = sim(net,p); subplot(211);plot([ys',t'])

�� #+� ��$� D�� net.adaptParam.passes=10 �V�$�' �� $T ��`.�� #� "4#�? �� . &�$V��$; ��`.�� S�#� �� $�4 ��$� 4� ��#+� �� #+� "4#�?.

p=-1:0.1:1;t=polyval([1 1 1],p);ee=0; for i=1:1 for j=1:21 [net,y,e,pf] = adapt(net,p(j),t(j)); ee(j+(i-1)*21)=e; end end y = sim(net,p); subplot(211);plot([y',t']), subplot(212);plot(ee) 3. &� ��#+� "4#�? ) �7�#.� ��#c� update ��#c� batch p=-1:0.1:1;t=polyval([1 1 1],p); p1={p};t1={t}; net = newff(p1,t1,3,{},'trainbfg'); net.trainParam.epochs = 30; [net,y,e] =train(net,p1,t1); cell2mat([y';e']) ys = sim(net,p); subplot(211);plot([ys',t']) 4. nftool

curve fitting ) i#8 &�� 4?( �B�8�$' \�� S�#� �� '�� nftool �$; ��5

14-10

14-10

5. nprtool

4� S�#� �� #�7� �� &�$�nprtool �$; ��`.�� . �� K� ��, K� \�8 o� �G)$� ��$� $ <=�/� D�� !�� )$' ��; S�� ;. H�J�irisflower �� B� �$�?20 �V� $V�� V� ) �$V'12 �V�� Va�� $V' . ��

��$��all confusion !�� ? �B� �$B*+, . ��V�.,� ) !V/� >"4#V�? �$VB*+, s#V+a� �� l#�$� ��,� D��!�� a�� .!�� g� S� � &$gT w�� ) _�: � �.�� $� ��+� $gT &)� ��,�.

6. nctool �B�8�$' ��`p D�� !�� &�� #� &�$� . ��.V/� �B�V �B*� >�# �+� �� G)$� �B� �� &�� #� ��

�� &�G z�� &� �#� �� ? ��)�) �B��F� ��. H�J�irisflower �� 4 *4 �� a�� $' . ��#+�som sample hits $ �� )�) 4� �� k �� S� �

��$T � �#� 4� K� �� .8$' . ��#+�som weight positions !�V/� �.��T#� �k �� $' �; �� S� �!�� k ��? D�� B��F� ) �.8$' ��$T ��)�) ��.

14 -4 -1 (+�� Unsupervised

• "4#�? ��Supervised �V� �� "4#�? �B� �; �� #G) �G)$� e#*g� &#�7� &�)�) &#�7� &�4� ��$T$� q��g��$' �� .

• the objective of unsupervised learning is to discover patterns or features in the input data with no help from a teacher.

�Associative memory NN • ~#+�� B� D��2 zV�� &�VG �� B�V �� &��) &#V�7� �; �� "4#�? &#�� ) ��./ ��~

�#g��# �$��| ��;�#�. �G)$� ) ��)�) D�� &��B��7 S4)k �V; !V�� V#�7� ��+� &)�U > �V#� ��k ��$�' �� &�G.

0 5 10 15 20 250.5

1

1.5

2

2.5

3

3.5

14-11

14-11

Figure 14.4..1 Unsupervised Neural Network

!�� $T D�� 4� �B� D�� "4#�?

(14.4.3) Hebbian Learning Rule

(14.4.4) S? �� ; i &�)�) ��+ k �G)$� ��+

uki K��7i ) k p &��) &#�7� ��+ � forgetting factor � Learning rate

�� $�4 "4#�? ")�Normalized Hebbian learning �# �� 5.

(14.4.12)

�Self-Organizing Feature Maps Kohonen developed the self-organizing feature map (SOM)

•SOM ��$�I �� &�)�) &��K &�� )^X; ( �G)$� map ��; ��(compression) • The output space is usually a two-dimensional grid.

•��# �� !�� G)$� K� �� &�)�) �� &��$�

14-12

14-12

Figure 14.4.3 Self-organizing Map

&�)�) ^�� $� "4#�?)�G)$� ) o� S)�� (��!�� #�

(14.4.31) where mn is the row and column index of the winning neuron. The function hmn,kj(t) in equation (14.4.31) is referred to as the neighborhood function. • The winning neuron is found by computing the Euclidean distance from each codebook vector to the input vector, and selecting the neuron closest to the input vector. That is,

(14.4.32) • Thus, only those neurons within the neighborhood of the winning neuron mn have their codebook vectors updated. For convergence, it is necessary that hmn,kj (t) � 0 when t��. • The neighborhood function is usually a function of the distance between the coordinates of the neurons as represented on the map, i.e.

(14.4.33)

with the coordinates cmn, ckj ∈ R2. With increasing value of (that is, neuron kj is further away from the winning neuron mn), hmn,kj � 0. The neighborhood can be defined as a square or hexagon.

14-13

14-13

14 -4 -2 Radial Basis Function Networks ��#c� L �; �� #G) �B� &��supervised ��#c� L )unsupervised �$T ��`.�� #�� V�$�' �� .

RBF !��? ��#�� 4� �B� . z�;$� 4� S��F+ ��`.��supervised )unsupervised �B�V �$B*+, �#�� a�� . <B14.3.1 �B� K�RBF ��V/� �V; �V� �� S� � �� VFFNN !V�� . )� DV�� )�V`�

4� ��,:

Figure 14.3.1 Radial Basis Function Neural Network

1.�)4�X �"� � (�l j D�� *p�8 4� �� j�� F;$�μj &�)�) &��$� )!��.

(14.3.2) where �j represents the center of the basis function, and || • ||2 is the Euclidean norm. • Weights from the input units to a hidden unit, referred to as �ij , represent the center of the radial basis function of hidden unit j. • radial basis function :hidden units do not implement an activation function, but represents a radial basis function. These functions, also referred to as kernel functions, are strictly positive, radially symmetric functions. A radial basis function (RBF) has a unique maximum at its center, �j , and the function usually drops off to zero rapidly further away from the center. The output of a hidden unit indicates the closeness of the input vector, zp, to the center of the basis function. • In addition to the center of the function, some RBFs are characterized by a width, �j , which specifies the width of the receptive field of the RBF in the input space for hidden unit j. A number of RBFs have been proposed [123, 130]: • Linear function

(14.3.4) • Gaussian function

14-14

14-14

(14.3.9) (+�� )4�X

The output of an RBFNN is calculated as

(14.3.3)

14-4-3 Assignments 2. ��.��RBF �� r�Y#� �$�? "4#�? ��#�k ) �� r�Y#� <B L�� .

3. Investigate alternative methods to initialize an RBF NN. 4. Develop a PSO, DE, and EP algorithm to train an RBFNN.

14 -5 ��'& (+�� &��+# 4�3 �� #�� c, �B� �$B*+, ��#� �� !�G.

1. connection weights; 2. architectures; 3. learning rules.

14 -5 -1 �� N�G

• �� "4#�?�� "4#�? S�#� �� S? �� *.:� &� �B� ) �# �+� $c�� B� s#� �� *��B� ")� • 4� $� ��; !�� DB+� �*��B� "4#�?BP ) ��; �+� $�' �*�� <T��U �� L.��#�7� �7) �� j�$� &�

!��$.+; ��7)� \��$ �� S? !��/U. • "4#�?BP ��learning rate parameter ��# (��D#4 �.V/��) [V�G ��V' �V� �V�+.��#�7� L �a��

��./. • �; �� S� � &��4 D��GA 4� $.��BP 7�:� ��$%� �.�7� ��; �� <+,!�� =�� F�� . • �� "4#�? �B�7�U �� >!/� K� ��BP �)�U3 �� "4#�? S��4 >�� ; �� H#� !,��GA �)�U ��3 !,�� )20 �� T�. • D�� $. �� &�$�� &��+.��#�7�) z�;$�BP ��+# )( ��; �� 5 ��. • oX.�� $%�/� �� ��.�� ; ��4�� )4#�)$; ��4) �$�a�4 !�� !��m �B� $�4 E��GA �� " �[�. KV� �)�V`.� �V�4) ��.�� B� �; !�� D�� ? �� [�5 �; �*B ��; �+� <+, e#� z�;$� �; �� D�� ; �� 7#� �G)$� .H�J� :K� �B� )� $ �V� �V�7#� �VG)$�

�� )�`.� &�$.��5 �k $'� ��; . �� K��7 $ �a�� 4 &�)�) )� �V; ��V �� S� � !��) D��V5 ( �V� �� G)$� K�)~�� (��; �� <p).

� k�)ES 4EP �� !�B" �DX ��+6�&

�� &�� ,� ��4) �; � �� 5 <B � j8� &�$�

14-16

14-16

D�� $�� $�*+, &$�� z�;$�GA !/�� `.�� <��T . 4� ��`.�� <��B� &�$�ES ��V; &��V ,� &��; &)� �;!�� H�� #� f��.� �; � ��p#� ��; ��.

• <�� D�� n��EP �XBV � �V� �V�� $V; ��V �5 �� V�; �V� ��`.�� [�G 4� \�8 �;GA �� #G#� z�;$� .EPNet

• It starts with an initial population of M random networks, partially trains each network for some epochs, selects the network with the best-rank performance as the parent network and if it is improved beyond some threshold, further training is performed to obtain an offspring network, which replaces its parent and the process continues until a desired performance criterion is achieved.

• However, EPNet might take a long time to find a solution to a large parity problem. Some of the runs did not finish within the user-specified maximum number of generations''.

• ��N �l�U�PSO Based NN ��# F�� F�#� W 9B� ( UD N# ��. AParticle Swarm Optimization

Particle swarm optimization (PSO), can be used to train a NN. In this case, • each particle represents a weight vector, and • fitness is evaluated using the MSE function • What should be noted is that weights and biases are adjusted without using any error

signals, or any gradient information. • Weights are also not adjusted per training pattern. • The PSO velocity and position update equations are used to adjust weights and biases,

after which the training set is used to calculate the fitness of a particle (or NN) in PT feedforward passes.

• The basic PSO and cooperative PSO have been shown to outperform optimizers such as gradient-descent, scaled conjugate gradient and genetic algorithms

�PSO Unsuperevised learning • Xiao et al. [921] used the gbest PSO to evolve weights for a self organizing map (SOM)

[476] to perform gene clustering. • Messerschmidt and Engelbrecht [580], and Franken and Engelbrecht [283, 284, 285] used the gbest, lbest and Von Neumann PSO algorithms as well as the GCPSO to coevolve neural networks to approximate the evaluation function of leaf nodes in game trees. No target values were available; therefore NNs compete in game tournaments against groups of opponents in order to determine a score or fitness for each NN. During the coevolutionary training process, weights are adjusted using PSO algorithms to have NNs (particles) move towards the best game player.

�Hybrid Training Most EA’s are rather inefficient in fine-tuned local search although they are good at global search. This is especially true for GA’s. The efficiency of evolutionary training can be improved significantly by incorporating a local search procedure into the evolution, i.e., combining EA’s global search ability with local search’s ability to fine tune.

14-17

14-17

� The PSO–BP algorithm for feedforward neural network training ��$�� $%� �� $�4 �B�.

The learning error E can be calculated by the following formulation

where q is the number of total training samples, is the error of the actual output and desired output of the ith output unit when the kth training sample is used for training. We defined the fitness function of the ith training sample as follows:

�(+�� #��D h��E

where M is the number of the total particles, i = 1,. . .,M.

�� ; �B� K� !/� D�� $B*+, <BBP >PSO-BP )Adaptive PSO-BP �V� �V� S�V � �� . DV�� S� � �)�`.� �� ~ S)$� �� B� [��4?

14-18

14-18

14 -5 -2 ��X�� *��+� • �#VG#� �B��+.V/�� <VU ��) !V�� V�, $� _c:.� �$8 �� B� &�$� z�� .�� D�� H�U ��

�#��. • D.8$' $%� �� ) S? "$./' �@� ) Kk#; �B� D.8$' $%� �� V� �V.�' S? F�� [�; ) v�F� �B�

�; �#�� +� �a�.� �� .�� F7� . �� .��search �$; ��5 S�#� �� • �V�� Vc�� &4�V� <T��VU �V� �� B� D.8�� &�$� #a./Glowest training error �V�lowest network

complexity ��; ��5 �� z�� B� ��#� ��. •!�� &�� *E/� �B� ��.�� ; �#�� . ��) ��V; �� B�V ��V�=FG ��+� S�#� �� ")� K� ��

")� S? �� ; �$;direct ��#�� . ) �#V �V� �V; ��~ $ &��)$� �� ) ��~ �� XJ� �� $�� ")� ��$@� "4#�? L.��#�7� �� # �� . ")�indirect

• \�#� �B� ��.�� \�8 !�� DB+� L��./� ")� ��EA �V�; D�V�� "4#�? ��$8 �� 4) ) �# L�%�� !�� $�4 �#� ��2 S)$� >5 S)$� )6 �V�� V� ) ��V� H�Vc��

$�!�� <c.� ��) .��? �� !�� )$� �; &$� 4� �B� <; �; ��#c�� . • ��4) �� !�� DB+� �.U�� <�B � �� )4#�)$; ��4) !/ ��#�� )4#�)$; ��#c�� ;.

0 0 0 0.5 0.1 0 0 0 0 0.8 0 0.4 0 0 0 0 -0.9 0.2 0 0 0 0 0 0 0 0 0 0 0 0 •+� �)4#�)$; $.��G !7�U ��# ��$�� $�4 <B �� !�� DB

• ANN architectures may be specified by a set of parameters such as the number of hidden layers, the number of hidden nodes in each layer, the number of connections between two layers, etc.

• !�� DB+� F�� $' $ <�� j��$� �#� �� <B �� .�� ; ��5 <��B� X�./� • �V� v�FV� FV�� )4#V�)$; ��.��V� �V+�7�U �� &$m#� ")� ��? �+� $%� �� 4) ) �B� S��F+ $��Z�

��$'. • Zhang and Shao [949, 950] proposed a PSO model to simultaneously optimize NN

weights and architecture. Two swarms are maintained: • one swarm optimizes the architecture, : Particles in the architecture swarm are two-

dimensional, with each particle representing the number of hidden units used and the connection density. The first step of the algorithm randomly initializes these architecture particles within predefined ranges.

• and the other optimizes weights. • Each of these swarms is evolved using a PSO, where the fitness function is the MSE

computed from the training set. After convergence of each NN weights swarm, the best weight vector is identified from each swarm (note that the selected weight vectors are of different architectures). The fitness of these NNs are then evaluated using a validation set containing patterns not used for training. The obtained fitness values are used to quantify the performance of the different architecture specifications given by the corresponding particles of the architecture swarm. Using these fitness values, the architecture swarm is further optimized using PSO.

14-20

14-20

14 -5 -3 #P �� *��+� �N�G f �AE �& evolution of learning rules • The evolution of learning rules can be regarded as a process of “learning to learn”

• &�$.��5 ��8) L�%��BP ��learning rate )momentum "4#V�? ��V,�T ��Va�� 7)� &��X��? �� e�/U �� *��B�.

• An ANN training algorithm may have different performance when applied to different architectures. Different variants of the Hebbian learning rule have been proposed to deal with different architectures. • However, designing an optimal learning rule becomes very difficult when there is little prior knowledge about the ANN’s architecture, which is often the case in practice.

•!�� TX, ��#� ) ��`� ��/� "4#�? �*��B� ��,�T &�$� �)� S�$; ��5 . �.�� b$� ��T >"4#�? ��,�,� ��Y$8 ) ��; �� <+, ��$a� ^�� $� D�cc:.�� r��p � �+ !�� DB+� �; ��; �� H�+.

•�� z�� ; D�� &�$� �� !�� <��B� ��| �� S�#� ��8) �; �a�? 4� • The relationship between evolution and learning is extremely complex. Research into the evolution of learning rules is still in its early stages .

��-� Reinforcement Learning • F��#RL : ��RL ) "��V5 !�� <+, ��a�� <�� S�#�U �.T) !�� #�U "4#�? ")� 4� �.8$' $�

�# �� 4�a� \*] <+, ��a�� <�� . • �V�� V; S? �; �� "��5 !8�� #p �� ; �� a�� ; �g�) �� #p �� S�#�U

�� a�� .��? �� !�� \�� <�� &$�'�� D��$��. • <B6-1 ��$'�� t#*�RL �� S� � ��.

�Neural Networks and Reinforcement Learning • ) ��c, �B�RL � 4� ��`.V�� V��? �V*+G 4� �V�� .8$' ��$T L ��;�� *.:� &��#' �RL "4#V�? &�$V�

�� ) ��c, �B�w � . �� ")�PRROP !��.

� RPROP �; �)�`� D�� # �� H�� z� ^�� $� #a./G K��X; ")� L.��#�7� D��

14-21

14-21

• S4) �� !�/� �g� z� $'�K��7 ªE/ªvji ��V' �� ; !�� D�� !�X, $��Z� ��G ��' ��# �.��; ��) ��#� v�F� �*�T

• $��Z� $'� ��!�X, �.V�� D�V+ �� ) �#V ��`.V�� .V/��#� �� &$.'�F� ��' H#� �; �� S� � ��$�' �� $%� �� v�F� �� ' H#�.

(14.5.13) • !�G �� !�J� z� $'� �.�7��# �� $��Z� !;$U.

(14.5.11) . It is suggested that �� = 0.5 and �+ = 1.2.

14 -6-1 � (+�� > E��ensemble of networks �V� "4#V�? $.��$V� LV �#V �.8$' ��B� $.Bk#; �B� &�� v�F� �B� K� &�a� $'� �; !��

�# �� <p�U $.�� f��.� ) ��. • ��#� �� c, &� �B� 4� &� �,#+a�generalization ��; ��a�� &$.�� . • &� �B� D�� &��B+ s#Y#�)�/�7�5 ( �V��#� �� +# &��)� �; !�� ,#Y#� e�#G �� S�� &�$�

��; <+, $m#�.

Figure 14.2.10 Ensemble Neural Network

�Assignments 1. �*$� �$kθ !�� &�)$Y . $'�{��? �� 5 �*B � �k �� .�� #G)

14-22

14-22

2.�� r�Y#� &�� X; �� &�)�) )� �� S)$� K� �$B*+,. 3.�$; ��5 S�#� �� S)$� K� �� $�4 j��#� 4� K��;

4. where z1z2 denotes (z1 AND z2); z1 + z2 denotes (z1 OR z2); denotes (NOT z1). 5. *Can errors be calculated as |tp � op| instead of (tp � op)2 if gradient descent is used to

adjust weights? 6. *Is the following statement true or false: A single neuron can be used to approximate the

function f(z) =z2? Justify your answer. 7. B� "4#�? ")� ��k�� r�Y#� {�� #G) �. 8. �B�MLP �� $��; �k. 9. �� B�SOM {�� $��; �k

10. Explain why it is necessary to retrain a supervised NN on all the training data, including any new data that becomes available at a later stage. Why is this not such an issue with unsupervised NNs?

11. For a SOM, if the training set contains PT patterns, what is the upper bound on the number of neurons necessary to fit the data? Justify your answer.

12. Explain how a SOM can be used as a classifier. 13. Explain how it is possible for the SOM to train on data with missing values.

14.�� r�Y#� {!/�k �*��B� ��c, �B�4� �#%��. 15. *How can PSO be used for unsupervised learning? 16. * Discuss how reinforcement learning can be used to guide a robot out of a room filled

with obstacles. 17. For the RPROP algorithm, what will be the consequence if

(a) �max is too small? (b) �+ is very large? (c) �� is very small?

18.

15-1

15. 5�� f �"`- �N�,

a',20 s21 422 �� ~�D Q�# i)�� Z��E. a�e�: !�� S� � �#� 4� <=�/� <U �� &� �%UX� <��T !�*��T &4�8 L./�� . �V�#+� &�$V� &4�V8 H$V.�; ��

q*�� H)��5 &��)4θ )ω # �� &$�' �4��.

��U�#�"#F�� 0 FN =��# �D��r& ()�� iD�� &4�8 $�4�#.

Negative Large (NL), Negative Medium (NM), Negative Small (NS), Zero (ZR), Positive Small (PS), Positive Medium (PM), Positive Large (PL)

&��.�`�; �� &$�' �4�� &��, $�� a�.� ��# �� <�� X; a��&#a��E �&#P: $�4 �7$.�; �,�#T !�� u�5 L./�� K�� B�7�U ��)&4�8 �,�#8 ( $V7$.�; &�VG �V�

K��X; ��X; &�$�Z.� ^�� $� H)��5 !��T#� !��J�� a�� . Rule 1. IF θis PM AND ω is ZR THEN F is PM.

15-2

Rule 2. IF θ is PS AND ω is PS THEN F is PS. Rule 3. IF θ is PS AND ω is NS THEN F is ZR. Rule 4. IF θ is NM AND ω is ZR THEN F is NM. Rule 5. IF θ is NS AND ω is NS THEN F is NS. Rule 6. IF θ is NS AND ω is PS THEN F is ZR. Rule 7. IF θ is ZR AND ωis ZR THEN F is ZR.

D.V�� V� l#V�� $V�� DV�� $; H$.�; ��#:� �� L./�� &(��$.�� D�� S�#� �� ; !�� S� �rule ) z�� S�$; &4�8 !�� H�� r��p . � Fuzzy Sets� qg�� H)��.)� 4� �B� \�8 �; !�� #g�� qg�� S�+ �� &$�� qg�� a�.�false ��true �$�h5 ��

�� #p �#G)� �?P c�& �$; ��`.�� &$�� qg�� 4� �� #�k !/�� #*��)&$��- &$�� $�]( Most real-world problems are incomplete, imprecise, vague or uncertain information. For example, in the phrases ‘‘it is partly cloudy’’,

�#/8)$5 ��; 4� �� &4)$�� K�G~ &4�8 �� *;$� �� 4� ��4 �`g7 $�/,� H�1965 !�� N�, 5�� #$)#

4� ��, &4�8 L./�� FG� : s�N�� N�, s��r& iD��N�� N�O�� 4 �N�, \��#

15-3

15-1 ��r& i��Membership Functions ��D ��r& iD��: ��; }$8D �P �#�,# Z�9�"# Vw�6��D �� . XJ� &�*$� �#%�� D�� &�$�1.5 $V.��# �� .8$' $%� �� . 4� $.+; �T �� $8� &$�� qg�� 1.5 ��# �� *� ��*� �T ��`� ) ��#; �T $.� . D�� $�

�#� ��#� 1.5 �� D��μtall(x)=1 !�� . �$V8 z��$.�� 1.49

) ��#; �T &$.�1.51�� )�`� �� !��$� �� ; �# �� *� ��*� �T &$.� . �N�, ��r& iD��: ��6D �P N# �� !�#n, j�� D ��N �# �� S��

Figure 20.2 Illustration of tall Membership Function

j�� D�� ^�� $� �$81.51 �G�� \�8 &$.�1« 50 �V�� #� �� e�/U �� *� �T �,#+a� �FG. DV�� s#V�!#� S�#� �� #�� &�� G��

�A : X � [0, 1] (20.1) Therefore, for all x ∈ X, �A(x) indicates the certainty to which element x belongs to fuzzy set A.

�.T) �A(x) �� \�8 0 ��1 �# �� &$�� qg�� S�+ �$�� #:� ��.. #Vn, &�V�� G�� j�� !�#n, j��!�� ,#+a� �� .

� (&�I�(DB) Database

!�� !�#n, j��#� ��c: � &)�U �,#+a� D��. � ��r& iD�� 03�4

1- !�� K� ) $`p D�� !�#n, j�� 2- $ ��x∈X ��A(x) �$`� $c�� !�/�.. 3- !�#n, j��#� s�#�� #�$�4� ��,:

15-4

• Triangular, Trapezoidal, Logistic, Exponential-like , Gaussian

4- Height L+�F;��A(x) !��. 5- Normality �,#+a�A $'� !�� H��$� S? L+�F;��1 >��A(x) = 1 �� .�� !-Normalization $� !�#n, j�� L�/�� S�#� �� &4�8 �,#+a�height �$; H��$�.

(20.21) ¬-Support �� $`p $�] !�#n, �G�� ; �,#+a� 4� �.+/T support(A) = {x � X|�A(x) > 0} -Core !�� K� S? !�#n, �G�� ; �,#+a� 4� � :�. core(A) = {x � X|�A(x) = 1} 9- �-cut ¯� ��±�� ²³\µ� ¶!�� ¶�\�· ¯� �¸¹�α

�º-Unimodality L+�F;�� K� \�8 �� !�#n, j�� 11- Cardinality &4�8 �,#+a� &�$� �7) !�� ,#+a� �n,� �� &$�� ,#+a� &�$� D�� 4� &�$� $�4 \��)�

�.�#�5 ) �.//' !7�U��? �� !�� .)!�#n, j�� $�4 rg� ��(

&�$� �� .//' !�#n, j��4 #Vn,X = {a, b, c, d} !�#Vn, j��V� ��V�� )A = 0.3/a + 0.9/b + 0.1/c +

0.7/d !�� $�4 �� &�$� �.�7��; ��.(0.3/a means μx(a)=0.3) card(A) = 0.3 + 0.9 + 0.1 + 0.7 = 2.0.

��#V� �� &$��V� �V,#+a� ) &4�V8 �,#+a� commutative >, associative > distributive > transitive ) idempotency �� 7) ��./ �� cardinality �� )�`�

• card(A) + card(B) = card(A » B) + card(A B) • card(A) + card(A) = card(X) where A and B are fuzzy sets, and X is the universe of discourse.

15-5

15-2Fuzzification ��X; �`�; &�$�Z.� �� _�c:� 4� &� �G�� &��, !�+; <�� &4�� &4�8)&4�8 (!�� . $� DV��

&��, �� $�4 <B �� ^��20 �G�� μold=0.4 ��X; $�Z.� &�$�old �# �� .

� Linguistics Variables and Hedges: Data Base

Lotfi Zadeh [946] introduced the concept of linguistic variable (or fuzzy variable) in 1973, which allows computation with words instead of numbers. Referring to the set of tall people, tall is a linguistic variable. Sensory inputs are linguistic variables, or nouns in a natural language, for example, temperature, pressure, displacement, etc.

$Linguistic variables ��? �� ; �$�� #:� �� .�`�; ��#� �� $�4 ��#� ^�� $�hedge ��#' �� • Quantification variables, e.g. all, most, many, none, etc. • Usuality variables, e.g. sometimes, frequently, always, seldom, etc. • Likelihood variables, e.g. possible, likely, certain, etc.

��4 $�Z.� XJ�tall ��#� ��very tall >not very tall �$�� #� ��

15-3Fuzzy Logic and Reasoning � Inferencing 4�N�, �&#P Fuzzy Rules

4� &4�8 qg�� a�.�rule ��? �� !�� if A is a and B is b then C is c (21.6) where A and B are fuzzy sets with universe of discourse X1, and C is a fuzzy set with universe of discourse X2.

$'� H�J� &�$�rule if Age is Old the Speed is Slow (21.7)

&�$� ��old=0.4 !�� <B q��g� �G)$� �� #� \� ��g��

15-6

Figure 21.2 Interpreting a Fuzzy Rule

� (&�I�Rule Base (RB) ��#�# �� `.�� &4�8 L./�� K� �� ; !�� &�,�#T &)�U �,#+a� D��.

� ��76�&, �N�Fuzzy Operators

��(� ��# Equality of fuzzy sets: Two fuzzy sets A and B are equal if and only if the sets have the same domain, and �A(x) = �B(x) for all x � X. That is, A = B.

$8&�%* �� # Containment of fuzzy sets: Fuzzy set A is a subset of fuzzy set B if and only if �A(x) � �B(x) for all x � X. That is, A B. Figure 20.4 shows two membership functions for which A B.

Figure 20.4 Illustration of Fuzzy Set Containment

0�,* ��# Complement of a fuzzy set (NOT): Let A denote the complement of set A. Then, for all x � X, �A(x) = 1 � �A(x).

&¯�¼AND Intersection of fuzzy sets (AND):

&�$�AND (t-norm) �# �� $�4 ��`.�� $�4 H#�$8 )� 4� �B� 4� &4�8 �,#+a� )� • Min-operator: �A B(x) = min{�A(x), �B(x)}, �x � X

15-7

• Product operator: �A B(x) = �A(x)�B(x), �x � X $�*+, 4� $� �� e$Y $�*+,min ��; �� <+, . &� $�*+, )� D�� $� �)X,AND �V�� V ��$�� F�� $��.

�a�.� <BAND (assuming the min operator) �� S� � ��.

&¯�¼OR Union of fuzzy sets (OR):

&�$�OR (s-norm) $�4 H#�$8 )� 4� �B� 4� &4�8 �,#+a� )� �$; ��`.�� S�#� �� • Max-operator: �AB(x) = max{�A(x), �B(x)}, �x � X, or • Summation operator: �AB(x) = �A(x)+�B(x)��A(x)�B(x), �x � X

s-norm �� .8$' ��$T ��`.�� #� ) ��$�� F�� &$�� &� . �$B*+, <B OR (assuming the max operator) �� S� � ��.

Consider the rule

if A is a and B is b then C is c with �A(a) and �B(b). assuming the min-operator, the firing strength is

min{�A(a), �B(b)} (21.9) For each rule k, the firing strength �k is thus computed.

15-8

• The next step is to accumulate all activated outcomes. During this step, one single fuzzy value is determined for each ci � C. Usually, the final fuzzy value, �i, associated with each outcome ci is computed using the max-operator, i.e.

(21.10) where �ki is the firing strength of rule k which has outcome ci. • The end result of the inferencing process is a series of fuzzified output values. Rules that are not activated have a zero firing strength. Rules can be weighted a priori with a factor (in the range [0,1]), representing the degree of confidence in that rule. These rule confidence degrees are determined by the human expert during the design process.

15-4Defuzzification �V� <��V�� &��V, �+; $�Z.� �� ? !�#n, �G�� $+ w�.�.�� 4� �� X; &�$�Z.� &4�� &4�`��

��# �G)$� !�#n, j�� ; }$8 f��.� ) &4�8 w�.�.��LI =0.8 > �SI = 0.6 ) �NC = 0.3 �� large decrease (LD), slight increase (SI), no change (NC), slight increase (SI), and large increase (LI).

� ��# F�� (L#�# �N�, � ^ (D �N�, � �E *�� #�D ��N ��4�

• The max-min method: The rule with the largest firing strength is selected For our example, the largest firing strength is 0.8, which corresponds to the large increase membership function. Figure 21.3(b) illustrates the calculation of the output. • The clipped center of gravity method: For this approach, each membership function is clipped at the corresponding rule firing strengths. The centroid of the composite area is calculated and the horizontal coordinate is used as the output of the controller. This approach to centroid calculation is illustrated in Figure 21.3(e).

15-9

� ��r& iD�� *U[ $E�� (��

�� !�� $��$� <�m F;$� ��

(21.12) where X is the universe of discourse.

H�J� &�$� �� D��Old(70)= 0.4 �� #� �#� rg� $��$�3 �# �� !,$� &�$�.

� Fuzziness and Probability

�� #G) �� ) �)�`� &4�8 ) H�+.U� D�� . 4� &� �G�� )� $uncertainity �� S� � ��. 1( �� G)$� s#T) 4� �� $�4 !�� s#T) 4� <�T &�$� �G�� D�� H�+.U� ��

For example, before flipping a fair coin, there is a 50% probability that heads will be on top, and a 50% probability that it will be tails. After the event of flipping the coin, there is no uncertainty as to whether heads or tails are on top, and for that event the degree of certainty

no longer applies. 2) In contrast, membership to fuzzy sets is still relevant after an event occurred. For example,

consider the fuzzy set of tall people, with Peter belonging to that set with degree 0.9. Suppose the event to execute is to determine if Peter is good at basketball. Given some

15-10

membership function, the outcome of the event is a degree of membership to the set of good basketball players. After the event occurred, Peter still belongs to the set of tall people with degree 0.9.

3) Fuzziness does not assume everything to be known, and is based on descriptive measures of the domain (in terms of membership functions

4) Fuzziness is not probability, and probability is not fuzziness. Probability and fuzzy sets can, however, be used in a symbiotic way to express the probability of a fuzzy event.

15-5��D��E L�� $��;FRBS !�� &�� X; ) H$.�; >&4�/7��

A. Fuzzy modelling 1( H�� K��X;��; t�� $�? ��.8� ��#.� �; !/�� $8� &�$� ��*+, t�� <��T . �; �7��FRBS �V� �=��

!�� <�*�� ) t�� <��T �� . &�� ,#+a� H�� $�4if-then !�� 2) Under certain conditions, FRBSs are universal approximators, in the sense that they are

able to approximate any function to the desired degree . Although there are classical techniques, based on polynomials and splines, that are universal approximators as well, they are not frequently used nowadays.

B.Fuzzy classification• A fuzzy rule-based classification system (FRBCS) is a classifier that uses fuzzy rules to assign class labels to objects . • Humans possess the remarkable ability to recognise objects despite the presence of uncertain and incomplete information. FRBCSs provide a suitable means to deal with this kind of noisy, imprecise or incomplete information which in many classification problems is rather the norm than the exception. They make a strong effort to reconcile the empirical precision of traditional engineering techniques with the interpretability of artificial intelligence.

C.Fuzzy Controllers ><�� j�� $7$.�; �=�� &�G ��4� ��./ t�� <��T $� �� ; �# �� `.�� X; ��$ �X+G

��#V� ) �VU�$� �$V�] ) ��V�� >�#V ��#�� >D��)� �#� H$.�; >��# ��7 D�� &�$� &4�8 $7$.�; �g�� D�� !�� .8$' ��$T ��`.��

1) Components of Fuzzy Controllers As illustrated in Figure 22.1, a fuzzy controller consists of four main components, which are integral to the operation of the controller:

15-11

Figure 22.1 A Fuzzy Controller

• Fuzzy rule base: The rule base, or knowledge base, contains the fuzzy rules that represent the knowledge and experience of a human expert of the system. These rules express a nonlinear control strategy for the system. While rules are usually obtained from human experts, and are static, strategies have been developed that adapt or refine rules through learning using neural networks or evolutionary algorithms [257, 888]. • Condition interface (fuzzifier): • Inference engine: The inference engine performs inferencing upon fuzzified inputs to produce a fuzzy output. • Action interface (defuzzifier):

w�.�.�� #�� &4�8 &�$7$.�; s�#�� )�`�) ��.��rule ( !V��? S�$V; &4�`�� ) . t�$.V� $V�4 ��#V� �� )��

a( ��7�� ) ��)�) &�$� !�#n, j�� D� b( !/�7 ��rule D�cc:.� K+; �� c( �#��defuzzification

�g�� D�� 3 !�� .8$' ��$T ��`.�� #� ")�Mamdani ) Takagi-Sugeno . � Mamdani Fuzzy Controller � Takagi-Sugeno Controller

�� G)$� �*�T ")singleton �� !��? 4� &� �,#+a� ��TS �� G)$�) �Vg� z�;$�linear combinations ( 4�singleton � !�� $�4 <B ��.

if f1(A1 is a1,A2 is a2, …,An is an) then C = f2(a1, a2, …, an) (22.1) The firing strength of each rule is computed using the min-operator, i.e.

15-12

(22.2) where Ak is the set of antecedents of rule k. Alternatively, the product can be used to calculate rule firing strengths:

(22.3)

The output of the controller is then determined as (22.4) The main advantage of Takagi-Sugeno controllers is that it breaks the closed-loop approach of the Mamdani controllers. For the Mamdani controllers the system is statically described by rules. For the Takagi-Sugeno controllers, the fact that the consequent of rules is a mathematical function, provides for a more dynamic control.

15-6�N�, f �"` Genetic Fuzzy System &��+��#�7� z�;$� 4�#$' <B &��4 &��+.��#�7� ��+8!�� . �*+G 4�

Neuro-fuzzy systems: the dominant component is the FS, which is therefore regarded as an FS with the capability of neural adaptation.. Fuzzy-neural networks: Fuzzy-neural networks are hybrid systems that more resemble an NN rather than an FS. Genetic algorithms for training neural networks A GA is used to optimise the parameters of the learning method that adapts the synaptic weights of the neural net.

Fuzzy evolutionary algorithms A fuzzy evolutionary algorithm is an EA that uses FL techniques to adapt its parameters or operators to improve its performance.

15-13

Genetic Fuzzy Systems A GFS is an FS that is augmented with an evolutionary learning process. Figure 3.3 illustrates this idea. It is important to notice that the genetic learning process aims at designing or optimising the KB. Consequently, a GFRBS is a design method for FRBSs which incorporates evolutionary techniques to achieve the automatic generation or modification of the entire or part of the KB.

KB H�V� �� ) ��V� �V��+ ��.��Mamdani <��V RB ) &4�V8 �V,�#T ( )DB ) z�$VY ) !�#Vn, jV��#�

scaling (!�� . ��.�� D��GFRBS <B ��3-4 !�� S� �.

Tuning is more concerned with optimisation of an existing FRBS, whereas learning constitutes an automated design method for fuzzy rule sets that starts from scratch.

15 -6 -1 Genetic Tuning Processes 4� ��, ��# L�%�� ; &�$.��5:

• DB components: scaling functions* and membership function parameters, • RB components: "IF-THEN" rule consequents.

15-14

A. Tuning the scaling functions j�� &�$�SCALING �g� H�� !�� DB+�

$�] ��g�

�# �.8$' $%� ��. &�$.��5 �;α )β ��# L�%�� .

$.��5 D�� $� �)X,S={-1,0,1} �� S� � �; �� #G) F��scaling �� 4#VU &�V�.�� V� \V�) >��V.��x ��$' H�+,� .!�� S� � <B �� $.��5 D�� $B*+,

B. Tuning the membership functions K� ��$�'$� �� )4#�)$; K� !7�U D�� DB !�� <��;

�� !�� DB+� ��; D��Piece-wise linear functions �# ��a��.

$�h5 q. � j��#� �� B�� Differentiable functions �# �.��.

�� j�� $ >�J*J� !�#n, j�� 3 �� &� ��4)| �� ) $.��54 �8$�� $.��5�# �� . j��V� &�$V� $V�4 �)4#�)$;!�� J*J� !�#n,

15-15

where Ni is the number of linguistic labels in the term set of the i-th input variable.

C. Tuning of fuzzy rules �� $.�� D��#T �� #� &�G ��Y �,�#T �; !�� D�� . ) �� V*�� $Vm� \V�8 ��V,�T KV� $V��Z�

4� �+�� H#�� %.�� S�#.�+� !�� S? .!�� FG !�#n, j�� L�%�� /�� S? $�m��. These kinds of genetic processes are subsumed within the genetic learning of RB based on the Pittsburgh approach.

15 -6 -2Learning with Genetic Algorithms �; �# �� }$8DB D.8�� o� ) �� $�� T�RB !�� .

!�� `.�� ; s#� )�1 ( $rule !�� )4#�)$; K�)S�� ")� (2 ( �V,#+a� K�rule (RB) !�� )4#�)$; K�)v�#�/.�5 ")�(

A .!�7 B � �4� �� ,�#T!��+� {4�� T�� !/�7 �� $�$� ��,�T ) ��; �� !��T� $��B� �� $�$� �,�#T ��{4�� 7�� ; &��B+.

A. The Pittsburgh approach solves the case by evolving a population of RBs instead of individual rules, thus implicitly evaluating the cooperation of fuzzy rules. This method is very time consuming as in every generation each member of the population is evaluated

separately and is therefore not suitable for online learning tasks. B.The iterative rule learning approach �V� �V� ��Va�� V*U$� )� �V� �� "4#V�? .1 ( ")� �V��

S�� # �� ) !/�7 K� �� $.�� ,�T ) ��; �� !��T� L �� ,�#T .2 ( &��$VB� �V� ) �=�4 �,�#T �@� 4�8 ��POST PROCESSING ��# �� ohU

15 -6 -3GFRBS Based on the Michigan Approach ��.��GFRBS S�� ")� ^�� $� !�� $T D�� 4�

15-16

S��F+ "4#�? ")� D�� DB )RB ��Vc: � ��V�$�'$� �� \�8 �)4#�)$; ) �# �+� ��a��rule ) !V/!�� .8$�� $%� �� S? �� !�#n, j�� &�$� ��G.

A. Rules coding In order to illustrate the basic operation of an FCS, this section focuses on the coding scheme proposed by Valenzuela-Rendon (1991b) in which a classifier is a DNF rule of the form:

�# �� ; ��#�� $�4 ��,�T H�J� &�$�

S? �� ; &�)�) )�x0 )x1 �G)$� )y ��; $ �;3 <J� !7�U{Low, Medium, High} �� D�7)�00 �� H)� $�Z.�x0 �@� �; ��$�� 110 �� ; !�� ?low ��medium !��

�; !�� )� $�Z.� ��; 4� ��101 ��low ��high !�� . 4� ��/ ��? �� G)$� $�Z.� ��c: �

B. Creation of new rules through a GA 2) The strength of each classifier is evaluated for a certain number of time steps before the

FCS invokes a genetic process to generate new fuzzy classifiers that substitute the weaker one.

3) The GA employs a steady-state generational replacement scheme in which a certain fraction of the current population is replaced by offspring. The currently weakest classifier in the population is replaced by an offspring generated by means of crossover and mutation from two parent classifiers selected according to their strength.

4) Before a new classifier is added to the population, it is checked for its syntactical correctness. Recombination and mutation might generate classifiers that are invalid for one the following two reasons:

Solving of the problem is very difficult in these kinds of systems. As the evolutionary process is made at the individual rule level, it is difficult to obtain a good cooperation among the fuzzy rules that are competing among them. To do so, there is a need to have a fitness function able to measure both the goodness of an individual rule and the quality of the cooperation it presents with the remaining ones in the population to provide the best possible output. As said by Bonarini (1996a), it is not easy to obtain a fitness function of this kind. The basic cycle of the FCS The basic cycle of the FCS is composed of the following steps: (1) The input unit receives input values, encodes them into fuzzy messages,and adds these messages to the message list.

15-17

(2) The population of classifiers is scanned to find all classifiers whose conditions are at least partially satisfied by the current set of messages on the list. (3) All previously present messages are removed from the list. (4) All matched classifiers are fired producing minimal messages that are added to the message list. (5) The output unit identifies the output messages. (6) The output unit decomposes output messages into minimal messages. (7) Minimal messages are defuzzified and transformed into crisp output values. (8) External reward from the environment and internal taxes raised from currently active classifiers are distributed backwards to the classifiers active during the previous cycle.

�l�U� 1) Valenzuela-Rendon (1991a, 1991b) presented the first GFRBS based on the Michigan

approach for learning RBs with DNF fuzzy rules. 2) Furuhashi, Nakaoka, Morikawa, and Uchikawa (1994) proposed a basic FCS based on the

usual linguistic rule structure with a single label per variable. � �.8$' ��B� � �. ; $�� #�� H�U�� . ; ��#�$� &$�'#*G &�$� &4�8 $7$.�; �U�$� �� ")� D��

3) Bonarini (1993) developed ELF (Evolutionary Learning of Fuzzy Rules), 4) Ishibuchi, Nakashima, and Murata (1999) proposed an FCS for learning fuzzy linguistic

rules for classification problems. 5��7;#

The FCS is composed of the following steps: (1) Generate an initial population ofNpop fuzzy rules by randomly specifying the antecedent linguistic term of each rule. The consequent class and the degree of certainty are determined by the heuristic 5) procedure. Classify all the given training patterns by the fuzzy rules in the current population, then calculate the fitness value of each rule. (3) Generate Nrep fuzzy rules from the current population by means of the selection, crossover and mutation operator. The consequent class and the degree of certainty of each fuzzy rule are again determined by the heuristic procedure. (4) Replace the worst Nrep fuzzy rules with the smallest fitness values in the current population with the newly generated rules. (5) Terminate if the maximum number of generations elapsed, otherwise return to Step 2.

15 -6 -4GFRBS Based on the Pittsburgh Approach The general structure of a GFRBS based on the Pittsburgh approach is shown in Fig. 7.1 which is an adapted version of Fig. 5.3.

15-18

• The major difference between the two figures is that the classical rule-based system has been replaced by an FRBS and that it includes additional components required for learning.

• The Pitt learning process operates on a population of candidate solutions, in this case FRBSs. As the inference engine itself is not subject to adaptation, all FRBSs employ the identical fuzzy reasoning mechanism and the individuals in the population only encode the fuzzy rule sets themselves rather than the entire FS.

• GFRBS based on the Pitt approach focus the learning process on the RB , but might in addition incorporate a part of the DB (scaling or membership functions) to that process.

• However, this approach faces the drawback that the dimension of the search space increases significantly, making it substantially more difficult to find good solutions.

Coding Rule Bases as Chromosomes The two main issues, which are tightly interrelated, are the use of fixed or non-fixed length codes, and the use of positional or non-positional representation schemes. Fixed length codes are applicable to RB representations that possess a static structure such as a decision table (see Sec. 1.2.1) or a relational matrix. The obtained code is positional, that is, the functionality of a gene is determined by its position, and its value defines the corresponding attribute. A specific location in a positional chromosome refers to a certain entry in the decision table or the relational matrix, and the gene value defines the output fuzzy set associated to that entry. The number of genes in the chromosome is identical to the number of elements in the decision table or the relational matrix. It is also possible to rely on a fixed length code when working with a list of rules, if the number of rules is known in advance and remains constant. • The code is non-positional since it represents a list of rules, and the order of rules in an

FS is immaterial to their interpretation.

15-19

When working with RBs represented as a list of rules with variable number of rules, the code length becomes variable as well. Again, the code of an individual rule can be either represented with a fixed or nonfixed length code. In the first case, the sub-code of each rule is positional, considering one particular gene per variable or per fuzzy set. In the second case, genes employ a composed code that includes both the variable and the associated fuzzy set. This code explicitly indicates to which part of the fuzzy rule it refers to, in contrast to a positional coding scheme in which this information is implicit in the gene position.

15 -6 -5GFRBS Based on the Iterative Rule Learning Approach Genetic learning processes based on the iterative rule learning (IRL) approach are characterised by tackling the learning problem in multiple steps.

� D��; �� z�;$� �� v�#�/.�5 ) S�� &��)� 4� K� $ &��=��#� ") The objective of the IRL approach is to reduce the dimension of the search space by encoding individual rules in the chromosome like in the Michigan approach, but the evaluation scheme take the cooperation of rules into account like in the Pitt approach.

A. generation process• a generation process, that derives a preliminary set of fuzzy rules representing the knowledge existing in the data set, the generation stage forces competition between fuzzy rules, as in genetic learning processes based on the Michigan approach, to obtain a fuzzy rule set composed of the best possible fuzzy rules. To do so, a fuzzy rule generating method is run several times (in each run, only the best fuzzy rule as regards the current state of the example set is obtained as process output) by an iterative covering method that wraps it and analyses the covering that the consecutively rules learnt cause in the training data set. Hence, the cooperation among the fuzzy rules generated in the different runs is only briefly addressed by means of a rule penalty criterion.

B. post-processing process • a post-processing process, with the function of refining the previous rule set in order to remove the redundant rules that emerged during the generation stage and to select those fuzzy rules that cooperate in an optimal way. This second stage is necessary as the generation stage merely ignores the cooperation aspect. the post-processing stage forces cooperation between the fuzzy rules generated in the previous stage by refining or eliminating the previously generated redundant or excessive fuzzy rules in order to obtain a final fuzzy rule set that demonstrates the best overall performance. • The post-processing stage deals with a simple search space as well because it focuses on the fuzzy rule set obtained in the previous step. there are only two different examples of application of the IRL approach to the design of GFRBSs (Gonzalez and Herrera, 1997; Cordon, Herrera, Gonzalez, and Perez, 1998): MOGUL (Cordon, del Jesus, Herrera, and Lozano, 1999) and SLAVE (Gonzalez and Perez, 1998a, 1999a).

15-20

Coding the Fuzzy Rules

!�� S�� ")� �� ; �#�� &4�8 ��descriptive Mamdani �� ) !�#n, j��#�TSK &�$.��5linguistic labels ) ��U &�$.��5

��# �� ; �a�.� j��. ��$' �� 8$�� ; &��)� 4� &�� H��.

Coding linguistic rules ")�1 The primary fuzzy sets belonging to each one of the variable fuzzy partitions

considered are enumerated from 1 to the number of labels in each term set. A linguistic variable Xi taking values in a primary term set

$�4 &4�8 ��,�T )$�� 4�

�)4#�)$; S�#�, �� $�4 <B ��C �# �� ;

")�2 On the other hand, Gonzalez, Perez, and Verdegay (1993) proposed a

binary coding scheme to encode DNF Mamdani-type rules Having three input variables Xi, Xi and X3, such that the linguistic term set associated to each one is

�� S� � �$�? �; ) ��,�T $�4 <B

each variable is associated with a bit-string whose length is equal to the number of labels in its linguistic term set. A bit is set to 1 if the corresponding term is present in the rule antecedent, and is set to 0 otherwise.

&�$� l$ S#kX3 !�� $�4 �� ,�T H�� i#8 ��,�T >!�� .8$' $�� $� )� $

")�3 In this new coding, each element of the population is represented by two

binary chromosomes: the variable chromosome (VAR) is a binary string of length n (the number of input variables) with an 1 indicating that the variable is active and a 0 that it is not active. The value chromosome (VAL) uses the binary coding previously shown. The following example clarifies this novel coding.

�; <�T H�J� &�$�X1 >!7�U ��X2 ) !7�U f�5X3 �# �� #�� ; >�� !7�U )�

15-21

!�� $�4 ��,�T �� l#�$� �;

�~�U ��+� F�� a�� a�X1 $ ��H�� $�? �; !�� ) l

Coding approximate Mamdani-type fuzzy rules

!�� z�� U ��,� �� ; !�� &� ��4)| �� J*J� ��#c� !�#n, j�� B�.7�U �� . ")� �V� ��V; D��3-tuple (a, b, c) ) �J*J� &�$�4-tuple (a, b, c, d) �$�' �� #p &� ��4)| &�$�.

Hence, an approximate Mamdani-type fuzzy rule of the form IF Xi is Ax and ... and Xn is An THEN Y is B,

where the Ai and B are directly triangular-shaped fuzzy sets, is encoded into the following chromosome

C = (ai,bx,ci,... ,an,b„,cn,a,b,c) and, when normalised trapezoidal-shaped ones are considered, its representation is the one shown below

C = (ai,bi,ci,di,... ,an,bn,cn,dn,a,b,c,d) �# �� &�G �)4#�)$; �� F�� .�� #G) ��4)| �� y*J� ^)� &�$� &�#�T $'� . ��V,�T �� H�J� &�$�

��X; IF Xx is Ax and ... and Xn is An THEN Y is B

$'�X1 �# �� ; ��#�� #�T &��.

is encoded into a chromosome C of the form:

15 -6 -6 ��) �� (� �NGFRBS

• D��4 ��# �� 5 ��? &�$� !�G 1) Trade-off interpretability versus precision. Most researchers would agree in interpretability involving aspects as: the number of rules is enough to be comprehensible, rules premises should be easy in structure and contain only a few input variables, linguistic terrns should be intuitively comprehensible.

�� =�� : � &�$� �+;interpretability ��k S�#.��FRBS �$; �/�� L��

15-22

2) FRBSs for high dimensional problems. there is a difficulty of exponential growth of the fuzzy rule search space with the increase in the number of features/instances considered. This problem can be tackled by different ways: a) compacting and reducing the rule set as a post-processing approach, b) carrying out a feature selection process, that determines the most relevant variables before or during the inductive learning process of the FRBS, and c) removing irrelevant training instances prior to FRBS learning. 3) Learning genetic models based on vague data. However, according to the point of view of fuzzy statistics, the primary use of fuzzy sets in classification and modelling problems is for the treatment of vague data [5]. This is a novel area that is worth to explore in the near future, that can provide interesting and promising results.

� �l#�:1- {!/�k �� &4�8 ) &$�� qg�� )�`� 2- ��/�#�� !�#n, j��#� 4� �� #+� 3- 4� ��#+�)� &�$� $�*+,AND )OR ��/�#�� . 4- �� r�Y#� �� D �B�`�F`�� #+� K�. 5- �� r�Y#� �$�? �FG� ) L�� &4�8 qg�� $'�� t#*�. 6- �� r�Y#� �$�? �FG� ) &4�8 $7$.�; ��$'�� t#*�. 7- <B &4�8 �,#+a� �� s�+.G� ) t�$.�20.6 ��; D�� .

8- Give the height, support, core and normalization of the fuzzy sets in Figure 20.6.

Figure 20.6 Membership Functions for Assignments 1 and 2

6( �� ,�#T $�4 �*E/� ��)_c:.� 4� �.8$' $� (4� ��,:

if x is Small then y is Big if x is Medium then y is Small if x is Big then y is Medium

&�)�) &�$� !�#n, j��#� $'�x �G)$� )y �� 5 �� ~�#� �� <B q��g�

15-23

(a) Using the clipped center of gravity method, draw the composite function for which the centroid needs to be calculated, for x = 2. 4. *Consider the following Takagi-Sugeno rules:

if x is A1 and y is B1 then z1 = x + y + 1 if x is A2 and y is B1 then z2 = 2x + y + 1

if x is A1 and y is B2 then z3 = 2x + 3y if x is A2 and y is B2 then z4 = 2x + 5

Compute the value of z for x = 1, y = 4 and the antecedent fuzzy sets A1 = {1/0.1, 2/0.6, 3/1.0} A2 = {1/0.9, 2/0.4, 3/0.0} B1 = {4/1.0, 5/1.0, 6/0.3} B2 = {4/0.1, 5/0.9, 6/1.0}

7( !/�k &4�8 &4�/7�� 4� �#%��.{ 8( ) v�#�/.�5>S�� ")� �)�`�iterative {!/�k ��

v�#�/.�5 ) S�� ")� �� )4#�)$; ��$�� #�� r�Y#� ��.

15-24

9(

16-1

16 Cellular Systems

Cellular Automata (CA) s#� D�$� ��Cellular Systems \�#� �; !��Von Neumann �V b$Vg� .

CA �; �,�F.�� Y�� D�� K� !��!�� T �; ��; ��7#� �� #� �@;."Can a machine build a copy of itself?"

�*�8 �� $a�� D�#�� S�) ��; �a�.�Artificial Life ��$' . �; ��, D�� F�� Q "#P �� CA �V��#� �� &��.�� ? 4� �n�� ; ��; s#�.� ��.8� ��7#�organic �n�� )fractal ��./) v�FV� ) KVk#; F��V� S$.5

!�� ( H�J� :�B�C#7#�� H#*� !�� 4 �#G#� D�$� �� ? j+.a�)�7#*� ��k( �� U 4� &� ��u�5 ��./ �V; �a�? 4� S? H#*V� )� &C#V7#8�#�) �#V��D.8$' ��$T L ��; (�� )�`.� ��7� i#8 ��#� �� !V�� )�`.� j+G ��.8� . �V� �V�� V� H#*V� KV� ��V.8�

�B*� �� ./� S? �� !�� ) �7#� �� !��Y) �g�� $m� �� S��4 S? 4�!�� ./��) F��.•� �B�7�U �� D��$�� ./ ��5 H#*� K� �@; ��7# �� v�FV� �8$p �; ��; �� 7#� �� &��.8� ��? D�� <��

��!/�� ? K� K� !�*��T.• o� �a�� w�$:.�� a��r&# Q D (?D#� �w�� )��7#*� ( !� ��#.� �;�� &��.8� H#�� ; &4��

�4�O�� F��_ �N�� ( �� F��O��# *D�P 4 ��. &�� H��;6� >!G ��X�� )*�� !"�P !��? D�� V�#+� &�$V� ��V.8� &4�V/7�� $a�� !�� ;

&4�� artificial life ��$' �� 8$��.

16-2

• &�$� \�8 ��+./�� D��)� �� F��_ �N�� ( �� ) �VU�$� &�$V� &� �*�V�) �VB*� ��# �+� ��`.�� <�*��F� y _ 4 i��D �� F��_ �"#� �� (E �� &�'� �� ; ��7#� ��.

15-1#$)# a�� 1. �� a6�a�� !��;# �; �� /�!�� B�C#7#�� H#*� 4� $� �� /�.• a6� � �w4state :) !��Y�# �� S�� K�*�+� �� &��, �� H#*� D�� 4� &#Vn, !��VY) ��

�� .�/� �,#+a� K�!�� !��Y) <�� ;non active �� F�� H��8 $�].• Sometimes an n-tuple of numerical variables rather than a single variable is used to represent the state, even if in principle this n-tuple could be re-encoded as a single variable. The advantage of using an n-tuple is that each variable can represent in a more meaningful way the different aspects of the cell condition that must be modeled and can thus simplify the definition of the transition function.

2. (&�I� ��X�� : >��7#*� �,#+a� &�$� discrete cellular space �#V �� $�� V; !V�� DVB+�d 4� [�� H�� 4� �� k$'� �� &��3 ��$' �� `.�� . �� V�� $VB�5 DV�� 4� &��V�� <B

�� S� � .

Figure 14.1 Some examples of one-, two-, and three-dimensional cellular • Neighborhood. The neighborhood of a cell is the set of cells (including the cell itself) whose state can directly influence the future state of the cell.

H#*� S? !��Y) �; !�� 7#*� ��/+ a6� t�� *D�P �� . &��V�� ~#+�� /+ &��7#*�� $��B� �� #.� �� ./ �)�a� H#*��; ��,X�� *�� H�. !V�� &��V�.�� /+ \�� 4�� ) <B.

<B14-2 � �� *B D�� 4� &�� S�.

16-3

Figure 14.2 Some examples of neighborhoods in cellular spaces. �7�� V" Q�� & !�� #� ) D�#�� S�) ��/+ >&�� )�

• If all the cells in the system have the same kind of neighborhood, the cellular system is said to have a homogeneous or uniform neighborhood. Homogeneity can be further specified as pertaining to space, time, or both.

3. ��# Q "#P)]0 iD��#� State transition function(: �.V//' �V� �.V�#�5 ��#Vc� �~#*V� L.V/�� V� �V� !7�U$��Z� S��4 H#� �� B�� #c� . ��V.8� ) ��7#*V� D�V� ��V�u�5 <V�� VY�� jV��#�

��; �� S�� ? �B�� . �S6�P � �Sw4 4 (�� a6� � �w4 N# ��" a6� ��,� a��)(�,�J (��# !G.

• �# �.`' �.T) ^#�C#+ �~#*� L./�� ;homogeneous �V; !�� D�� !�� V$.��5� & j��V� ��h'��./ H#*� S�B� ) S��4 4� <�./� .

• For discrete-time cellular systems the actual implementation of the transition function on a computer can be done by programming a routine that evaluates the function at each time step. • For small finite state sets and small neighborhoods the transition function can be implemented with a lookup table that stores all the entries of the transition function.

•�� !� �� "�#�$:Boundary conditions &4$V� &��7#*V� ��/+ �,#+a� !V/�� <V��; .� D��$��$' e�:.�� z�� &(��$.� .4� ��, ��)� D�� 4� &��(figure 14.3)

•Periodic: �� *U <�B � �� # �� /k D�4�]? 4$� �� &$�? 4$� !7�U D�� . &(��$.V�� DV�� V�periodic boundary conditions. ��#��.

•Assigned: ")� D�� # �� .8$' $%� �� &4$� H#*� ��; &4�a� H#* �.V/��) �V��#� �� S? �� ;�� <�./� �B�� L./�� &��7#*� !��Y) �� .

o�# �#%�� $��Z� S)�� !��m �.��Y) !�� DB+� &4�a� H#*� &�$�.

16-4

o�# �� 8��c� �� $ S? �� .o &4$� !��Y) �� ; �� &4$� H#*� !��Y) �� @; ��adiabatic �# �� #�o �� &4$� H#*� �� B� H#*� !��p) �@; ��) &4$� !��Y)mirror.(

•Reflecting:

H�� &4$� ��~ 4� ^�B�� .8� �� i�`�� u�? ) !�� )�/� 4$� �; �# �� }$8 !7�U D�� # . �; ��# �� 8$�� L./�� H�� # �� .8$' $%� �� ^�B�� &�$�.

Absorbing: �# �� }$8 !7�U D�� /�� &4$� ��~ �; . L.V/�� &�$� �)�� .�� }$8 ��#� �� S? s#� K�

�� . �� L./�� 4$� ) �# �+� �B�� &F�k H�� D�� disturb ��; �+� . ��$8 ��$��absorbing s#V� ��!�� ./��) ��h' j��.

•��; �� H�� &4$� ��.8� ehG ) ^�B�� ) !�� !��m 4$� �# �� H�� L./�� D�� | !;$U �.T)

14.3 Some examples of boundary conditions, illustrated for the one dimensional case

•Moving boundary D�� h�� L./�� *p� ��U�� $� &$�m�� &4$� $m� �B�� "$./' &4�a� ��#c� 4$� H��

4. Initial conditions. In order to start the updating of the state of the cells of the system according to the transition function it is necessary to specify the initial state of all the cells. This is known as the assignment of the initial condition or seed of the cellular system.

5. Stopping condition. The stopping condition specifies when the update of the state of the cellular space must be stopped. Typical stopping conditions are the attainment of a preassigned simulation time and the observation that the state of the cellular system is cycling in a loop.

16-5

15-2Cellular Automata Von Neumann )Ulam ��CA S�#�� ; ��7#� �� #� �; ��$; b$g� .�$� �� ~#*� s#� D

L./��cellular automaton (CA) !�� . ��CA �Y�� 4� �.8$'$�automaton ��# .L./�� D�� o D��!�� )�� ,#+a� K� �� l#�$� ) �.//' ��7#*� !��Y)o ��)�� /+ ��./. o�# �� #� S��F+ ��7#*� !��Y).

��CA1. �A discrete-time system with a finite set of inputs I, a finite set of states S, a finite set of

outputs O, a state transition function ϕ which gives the state at the next time step as a function of the current state and inputs, and an output function � which gives the current output as a function of the current state.

2. The integer sequence S = {0, . . . , k � 1} is often used as the CA state set, with so = 0 representing the quiescent state S#B� !7�U.

3. In a CA each cell is thus an automaton which issues its state as output and takes as inputs the outputs of the cells in the cell’s neighborhood.

4. The transition function � of a CA (also called the transition rule or CA rule) is a deterministic function that gives the state si(t+1) of the ith cell at the time step t + 1 as a function of the state of the cells in the cell’s neighborhood Ni at time t, that is,

) �#V� �V*�8 !��VY) ^�V�� $V� H#*� &�� !��Y) �; �� H)�G K� ��#� �� h' j�� .//' L./��

��; �� D�� /+ . <B14-4 . • ��#.� H#*� $ $'�k )Y � ��/+ �� ) �$�� #� �� !��n �� kn &�)�))!7�U ( �V; �� #G)

S? &�$�kn �# �=�� ,�T . �,#+a� 4� �B� �,�#T D�� S�#V� �V� �*E/� &�$� �; !�� &�,�#T �,#+a�

!�� . ��4� �� D�� $��k )n F�� s#�� [��F8� ��#a�.• &$�� !7�U &�$� ^�� D�� $�k=2 ��/+ �� )n=3 ��.7�U ��23=8 ��V,�T KV� S? &�$� �; !��8

�# �� .#� �.�� . �� .�7�8 !�� D.�� S�B��255 �� #G) F�� $�� ,�T.• �B�.7�U ��k=3 (a ternary CA) )n=3 DB+� �,�#T �� s#�� !��

987484,597,625,7,33 933==

16-6

Figure 14.4 An example of a transition table for a two-dimensional CA with the Moore neighborhood. The cell states are represented as gray levels. The table contains one entry for each configuration of states of the nine cells that form the neighborhood. With k possible states, the table contains k9 entries.

15-1-1Special CA Rules �h' �,�#T D�$� ��#+,CA 4� ��,:

•Totalistic s#+a� &�� $� S#��T D��!��Y) !�� /+. A totalistic transition rule can be written as

With k states and a neighborhood of size n the sum can take only n(k � 1)+1 different values and thus there are kn(k 1)+1 possible totalistic rules. For example, only 16 of the 256 rules of a binary CA with neighborhood size of three, and only 2187 of the more than 1012 rules of a ternary CA with neighborhood size of three are totalistic rules.

•Outer totalistic • A CA rule is called outer totalistic if it depends only on the value of the state of the updated cell (the “center” cell) and the sum of the values of the states of the other cells in the neighborhood (the “outer neighborhood”). An outer totalistic transition rule can be written as

•Symmetric:

�� .�� .�+� � ��/+ z��$� S? �� $'� !�� S��.� �h' j�� . L ��$�� D�� ^�� $�totalistic L )outer totalistic !�� .8$' $%� �� /+ &�$� z��$� ��? �� $�4 ��./ S��.�.

Null state quiescent: A CA rule is called null state quiescent if it maps a quiescent neighborhood to the quiescent state.

16-7

15-1-2S��4 ��.8� &��#+�- ��B�CA

• ��.8� [��+� �� D�$.�� $�#�@��; ��`p &)� [��+�CA !��.• &h]�; [��+� &�$�CA S��4 &��#+� 4� &�� )� ) K�-�$; ��`.�� S�#� �� S�B�.

o The cellular space at each time step is represented as a horizontal line of squares o and the vertical direction is used to show the unfolding in time of the configuration of

states of the cellular space. o Each state of the state set is represented by a different shade of gray (or color, when

available).• The right side of figure 14.5 shows the direct generalization of the one-dimensional diagram to the two-dimensional case (the white cells of the cellular space are not shown for clarity).• Figure 14.5 and figure 14.6 both represent a binary CA with the Moore neighborhood implementing the so-called outer parity rule.

• �� s=1 ��`� �� )s=0 !��• �; !�� D�� H�J� D�� h' S#��T �� $8 � ��/+ !��Y) s#+a� $'�s=1 ��#Vc�� $V�] �� $V' ��

s=0 !��.

Figure 14.5 Space-time diagrams for (left) one- and (right) two-dimensional CAs. • [��+� $�� s#�CA ��V+� !��VY) &)�VU ) �#V �V� #� S��4 �� S��4 4� ��`p �; !�� $�4 <B &�� )�

!��7#*�

Figure 14.6 Another kind of space-time diagram for a two-dimensional CA.

16-8

15-3Modeling with Cellular Systems L./�� FG� <�T [:� ��CA � r�$ � . �V�T�) &�V�� &�V ��V��5 &4�V/7�� S? &$�'��B� �#�� a��

�# �� r�Y#� .!�� $�4 &��' &�� &4�/7�� D��. 1. Assign the cellular space. 2. Assign the time variable. 3. Assign the neighborhood. 4. Assign the state set. 5. Assign the transition rule. 6. Assign the boundary conditions. 7. Assign the initial condition. 8. Assign a stopping condition. 9. Proceed to update the state of the cells until the stopping condition is met.

15-1-3�� K�8�$� &4��CA

•��? &�$� > ��$' �� w�$:.�� a�� K�8�$� &�$� &� �� H��.•�# �� L�/�� !��m H#� �� &� ��G �� ; !�� \� K� &�� $�/� . <B14-7a•CA ��; �� !;$U �� K� ��' $ �� *��) $ D�� $�� !�� .//'.

•�� *��) �� H#*� $• !�� \�8 H#*� $ !��p) L��; �� }$8 ��/+ H#*� !��p) $�m��)L��./� (�� S?

•�h' ��,�T : �� $'� ��; !;$U �� #*G �� *��) ) ��; �� !;$U !�� dk 4� <��) �; �# �� }$8�� 7�� #*G <B �� ,�T D��17-7e �� S� �.

16-9

.• ")� &4$� \��$ &�$�periodic L�$�' �� $%� �� .14-7f

• Figure 14.7 The elements of the traffic CA model. a) The kind of traffic flow modeled. b)The cellular space-time. c) The neighborhood. d) The state set. e) The transition table. f ) The boundary conditions.

• �� 8��c� �#g� �*��) �� &4�� &�$��# �� G ��G &� . �� V�� 7��kρ �V��; �� D�� K� ) $`p D�� .

•!�� !��m �7��k �; �� D�� # � ��4 �� L; �� # �� }$8.• Running the CA we observe that after an initial transient the flow of vehicles stabilizes into a configuration that repeats itself periodically. • Figure 14.8 shows the space-time diagrams for two different car densities. Black cells correspond to cells occupied by cars and white cells correspond to empty cells. Note that there is a qualitative difference in the aspect of the two diagrams. • For the lower density � = 0.3 once the initial transient is finished, there remain in the space-time diagram only diagonal stripes of white cells, corresponding in the model to empty stretches moving along the road in the same direction of the traffic. • For the higher density � = 0.7 after the transient, there remain only diagonal stripes of black cells, corresponding to traffic jams moving along the road in the direction opposite to that of the traffic.

16-10

Figure 14.8 Examples of traffic flow in the elementary traffic CA with two different densities of vehicles. The space-time diagram on the left corresponds to a road with 30% occupation by vehicles. The diagram on the right corresponds to a 70% occupation by vehicles.

•�# ��$B� K�8�$� �*.:� &��7��k �� &4�� D�� K�8�$� ��.8� t�� &�$�.• ��#V+�5 &�V �� $��$� �; �# �� .8$' $%� �� <��) ��+� \�#.� !,$� �� K�8�$� ��4�� &�$�

!�� 4 ��' $ �� • Figure 14.9 shows the plot of the mean speed of the vehicles as a function of the vehicle density �. The plot was obtained by running the CA simulation and averaging over the part of the space-time diagram where the traffic flow has stabilized.• The plot reveals that the qualitative change of behavior occurs for � = 0.5. Below this threshold the traffic is moving freely and above it it is congested. In the language there is a phase transition between the two regimes at the critical density � = 0.5 (Fuks 1997; Maerivoet and De Moor 2005).

• Figure 14.9 Examples of mean speed v of the vehicles as a function of the vehicle density � in the elementary traffic CA. The points shown in the plot are determined by running the CA for 1000 time steps with a randomly generated initial distribution of vehicles and considering only the last 500 time steps in order to let the initial transient die out. The plot shows that at the critical 50% vehicle density there is a transition between two different kinds of traffic flow.

�B�.��7��? �*�*�� l#�$� ��#+� �� $5 \�)�� &4�� $�] ( \�#.� !,$� &�$� �; !��!�� ? !�� .�� ~#*� &�c8 .(Fuks 1997) .

f ,#�� N�� ( �� u��"�; !8�� S�#� �� i#8 H�J� ��$� 4�:

• by defining just the local properties of the model we have revealed moving traffic “holes” and jams, and the existence of two distinct regimes of traffic. This is one of the fundamental characterizing properties of cellular modeling which makes them an ideal tool to study how simple local rules can produce complex global behaviors.

16-11

• In this respect, since the state set of a CA is finite, CA models have the advantage of being implementable exactly on computer, without worrying about numerical approximations and error propagation. This means that what is observed in the simulation is an actual property of the model implemented and not a numerical artifact.

• &4�/7�� CA �$; ��5 �� !�� $.�� B�? �5#B�)$;�� 5#B�)$B�� H�� S�#� ��.• �5#B�)$B�� H�� =FG�# �� 5 ��.8� �� . &� ��V� ��#Vc� �� &4$� ��.8� S�#� �� H�� D��

<B �� #� �� 5#B�)$;�� H�� S? D.#� �; &F�k !��14-10. <V�� S�$V; �V7#�$8 <VJ�!/�� ; �5#B�)$;�� #c� ��#�� j��

•�� LaU ��FG &4�/7�� $�� o$� 4� �$V� �� ~ . jV��#� 4� �5#BV�)$;�� &�V�.�+; ��V�� &�$V�!�� 4�� 4 �� 5#B�)$B��.

• �V�� .V�� V*; ��V.8� $� &$�m�� ~#p� ��=FG 4� �n�� !�� DB+� �5#B�)$B�� &4�/7�� . H�V� XJV��+, [:� ) ��; �+� K�8�$� �*E/� ��$� �� B+; S? &��+�� ^�� $� �*��) S�$; ��8 �� 4� &� �

!��.• One must consider that most present-day computers are designed for efficiency in the complex manipulations of relatively few entities rather than of simple manipulations of large numbers of entities. Computers specially designed to handle cellular models do exist, but occupy a small niche of the computer market (Toffoli and Margolus 1987; Talia 2000).

•!�� $� �� .8� F�7��? ) �� &4�� $� �� *; ��.8� z�� H�� e�:.�� .• z�� H�� D.8��)��=$G $%� 4� ( 4� [��T�) L./�� &)� �*+, ��$a� ��k �� B*� !�� L*, �� $� F��

!�� 4��.

Figure 14.10 The interaction of a fluid with a fixed wall. In the CA model represented schematically in this figure the black disks with an arrow represent moving particles and the black disks on a dark background represent fixed particles that constitute the wall. The collisions of the moving particles with those belonging to the wall slow down the collective motion of the particles and produce a result that appears as friction once the motion of the particles is averaged over large enough portions of the space-time diagram. The amount of

16-12

friction is specified directly in terms of the roughness of the wall by choosing the frequency and depth of the pits. No special boundary conditions must be specified in the CA model besides the rules defining the result of the interaction between a moving particle and a fixed particle.

15-4Some Classic Cellular Automata &�� K� L./�� >�~#*� &��+./�� s�#�� D�� 4�elementary CA &�� )� )game of life CA � ��#V� ��/

��# �� r�Y#� K�� ; ��./ �G#�.

��D f� :Elementary CAs ��CA ��/+ s�� $'� &�� K�1 H#*� $ &�$� >�# �.8$' $%� ��) <BV �� H#*� (8 �V� !V7�U

�� $`p 4� S�#� �� $�? &$�� ; �� ; �� .�� #G) ��#�7 �� S� �. &�$�!��Y) ��)�V`.� &��VY$8 &��!�� S� � <B �� ? 4� �B� �; !�� S�#� �� /+ ) H#*� �*�8 !��Y) z/U $� .

Figure 14.11 a) The mechanism that associatesWolfram’s rule code to the elementary CA. b) The conventional graphical representation of the transition table of elementary CAs: cells in the s = 1 state are represented as dark squares and cells in the s = 0 state are represented as white squares.

• D�� &�$�CA �Wolfram (1983) �� &��, �;Wolfram’s rule code b$V� &�$V� �; �� c.�� i#8R184 !�� . �� 4� �.8$'$� �� D��8 !�� ? !��Y) &$�� !�� .

• The transition table of an elementary CA is often represented with the stylized diagram shown in figure 14.11b).

• z��$� D�� 256 !�� S�#� �� )�`.� b$� . b$� D�� elementary CAs ��#' ��. !�D >;�) * ;�CA F��

<�7� )� ��CA !�� z7�G ��

16-13

•��; H�� 5 ��#� �� ; !�� &� �� /� H�� . H�� D�+ K�8�$� &4�� XJ�CA �; ��R184 � ��`.�� . K�8�$� H�J� ��,�T14-7 <B ��,�T ��14-11 ��; �/��.

• \�8 �; !�� S? �)� <�7�256 S? s#� �$; ��7�g� ��T� �� ? S�#� �� ; �� #G). •<B 14-12 &� ��,�T �$B*+,R30 )R110 �� S� � �� . S�V�4 �$VB*+, �V; �#V �V� ��V��- ��Vn8

�� S� � �#� 4� ��)�`.�.

Figure 14.12 Examples of space-time diagrams for the elementary CA with rule codes 30 (left) and 110 (right). Both diagrams were obtained with a randomly generated initial state with about 50% of cells in the state s = 1, and periodic boundary conditions. Cells in the s = 1 state are represented as black squares and cells in the s = 0 state are represented as white squares.

The “Wolfram rule 30 CA”pattern is very similar to the patterns of some seashells. These seashells exhibit a natural variant of CA.

16-14

• Wolfram used computer simulations to explore the behavior of a large number of one-dimensional CAs. In particular, he explored exhaustively the class of one-dimensional elementary CAs for a large ensemble of initial conditions and noticed the following 4 classes.• Class I CAs are those that for almost all initial conditions evolve in a finite number of time steps to a uniform state over all the cellular space. • • Class II CAs are those that for almost all initial conditions and after a short transient either produce a stable nonuniform structure in cellular space, or start to cycle over a small set of simple structures. • • Class III CAs are those that for almost all initial conditions produce random- like “chaotic” sequences of states that result in fractal-looking patterns in the space-time diagram. • These first three CA classes correspond to three kinds of asymptotic behavior observed in continuous dynamical systems, namely limit points, limit cycles, and chaotic attractors. Wolfram observed, however, a fourth class of CA behavior that has no correspondence in the theory of continuous dynamical systems. • • Class IV CAs are characterized by long-lived localized structures that can propagate on a crystal-like background that covers the cellular space (see also figure 2.29). Wolfram conjectured that class IV CAs are capable of universal computation and that thus, in general, for the determination of their long-term behavior there is no shortcut to the explicit simulation.

15-5 ��D 4�Conway’s Life Game Artificial Life

• � ��CA !�� &��+./�� '��u�5 �� *E/� ) D�#�� S�) &��; ��. • In the late 1940s von Neumann was interested in proving formally that there exist machines which can produce machines more complex than themselves (von Neumann 1966; McMullin 2000).

• ��4�V� �� #� 4� $� �� &�� u�5 &��+./�� $a� �� . )��#V� �V�7#� \V� XJV� 4� $V� ��V�u�5!�� )��#� . �V��; �V� �V�7#� �� #� 4� $� ��u�5 &��+./�� &��B+ �� FG� !�� . DV�� D�#V��

H#c�� '��u�5self reproduction !8�� . S�+7�CA ��V; $V &�$V� �V; ��V� ��V��? �V� ) �� 4�� ? 4� &��4 �� Bk#;

16-15

• Von Neumann’s investigation can thus be considered as the first formal study of the fundamental principles of life, which later became the subject of study of artificial life (Langton 1996).

•Conway �� 4? ��Life Game 4� �7�J�self reproduction H�� D�#�� ;�$; ��5 �� #� S?. �� <��)� ��1970 >John Conway H�V� KV�CA )� �~�V�� H�V�� V� �V; �$V; �V�$�� &$��V� &�V��

Scientific American (Gardner 1970, 1971) !8$' ��$T q�� #� ) � �#� � ��/� . Conway H��CA �� .�� &� �� .�� ; ! ' �� .�� 4� �V�� !V�2 )� ��V; H�V�CA �V�

�$; �=�� $�4 ��c: �. �]0 �&#P

Any live cell with fewer than two live neighboursdies of loneliness Any live cell with more than three live neighboursdies of overcrowding Any live cell with two or three live neighbourslives on Any dead cell with exactly three live neighbourscomes to life 1. A cell that is in the state s = 0 at time t switches to the state s = 1 at time t+1 only if

exactly three of its eight outer neighbors are in the state s = 1 at time t.2. A cell that is in the state s = 1 at time t remains in this state at time t + 1 only if two or

three of its eight outer neighbors are in the state s = 1 at time t. Conway gave a snappier description of the rule by calling the cell in the state s = 0 dead cells, and calling live cells those in the state s = 1. The CA rule can then be rephrased as

3. Birth rule. A dead cell becomes a live cell only if exactly three of its eight outer neighbors are live cells.

4. Survival rule. A live cell remains a live cell only if two or three of its eight outer neighbors are live cells. A live cell dies by isolation if it has fewer than two live neighbors, and dies by overcrowding if it has more than three live neighbors.

• �� L.��#�7� �� ) �7#� �*E/� b$� �*�7� ��The Game of Life $� �� Life CA �# �� #�• �8��c� ��7)� \��$ �� L.��#�7� D�� &�$U� ��3 �# �� !7�U .1 ( !��m <B �� S��2 ( ��V�#� ��.8�3 (!;$U H�U �� <B.

1. Some correspond to stable configurations that remain unchanged from one time step to the next. Conway and his coworkers called these objects still-life configurations.

<B14 -13 �� &�� ; �� S� � �� # �� ./�� ; <Bblock >pond )beehive �� .��#.

16-16

Figure 14.13 Three static objects in Conway’s Life Game: (left) the block , (center) the pond, and (right) the beehive. Live cells are represented in dark gray. The figure shows also how a configuration at time step t can be generated by different configurations at time step t � 1. In this case, a beehive can be generated by another beehive or by two adjacent rows of live cells.

2. The second kind of common Life objects are oscillators, or life cycles. These are configurations that repeat themselves with a period greater than one time step.

<B14-14 <Bblinker �#�$5 ��2 �� ) ) �V�7)� &�$� �8��c��clock II �#V�$5 �V�4 �V� �V� S�V � �� .(Poundstone 1985)

Figure 14.14 Two oscillators in Conway’s Life Game: (left) the blinker, which has period two, and (right) the clock II with period four.

3. A third kind of common configurations in Life are moving objects. Figure 14.15 shows the simplest and most interesting of them: the glider.

16-17

Figure 14.15 Left: The glider is the most common moving object in Conway’s Life Game. Right: An eater can annihilate a glider and repair itself in four time steps.

• $ ��X'4 �� &$gT !;$U H#*� K� ��' .!�� ./��) ��X' ��7)� !�G �� !;$U !�G.• The existence of moving objects like the gliders suggested to Conway that Life could be interpreted as a synthetic universe where it is possible to send signals between places.

•)� $�� D�� ; "4��$5 �� ,X�� ; �4�/� �� &$� ��u�5 &��+./�� ; �$; q�# � ��..• This investigation was encouraged by the discovery of the glider gun, a configuration designed by R.W. Gosper that is able to produce a new glider every 30 time steps (figure 14.16).

Figure 14.16 A glider gun generates a glider every 30 steps. Note that besides producing the glider the gun regenerates its initial configuration (the two spurious live cells on the right will die of isolation at time step 31).

16-18

• &��; �� L�� .B�Conway � ;eater �� X' �; !�� B��.�� K� �; !��4 �V�*� �� ' ��; �� L��$� �� S��Z*� 4� �� z��? �@� )14-15 !�� !+�

• Later we will explain how these structures were used by Conway to define within Life structures such as computers and self-reproducing automata.

•Conway !�� .8$' [�5 �� )� �#� &�� 4? ��.o�# �� a�� *B ��#�k �8��c� ��7)� [��? �� ; ��; �� y�� H)� ��. H�J�oscillatoro �7�� ; �� L�$8 �� &� ��7)� \��$ �)� ��# ��7#� �c: � . H�J�glider gun

• $�h5 ! '$� ��$8 �B�� $�� L�� .B�reversible. !/��.o A Cellular Automata is reversible if for every current state of the CA there is exactly

one past state (preimage) o •Cellular Automata that are not reversible have patterns for which there are no

previous states. These patterns are called ‘Garden of Eden patterns’ o •For 1D CA there exist algorithms that can find preimages, and decide whether a rule

is reversible or irreversible o •For CA of two or more dimensions it has been proved that the reversibility is

undecidablefor arbitrary rules • Finally, note that being based on an outer totalistic rule, the Life CA preserves the symmetry of the configurations. Thus, any asymmetry existing in the configuration of states at a certain time step is a consequence of an asymmetry in the initial conditions.

��D��E•CA !�� `.�� $�4 &� ��5 &4�� &�$�

• •Biological processes • •Social movement • •Cancer cells growth • •Forest fires

15-6Q�� 1. �FG�CA �� r�Y#� ��.2. 4� �#%��R128 {!/�k

16-19

3. o�CA {!/�k4. &�$� ��? !�� ,�#Tartificial life ��/�#�� .

15-7Advanced topicsCA &�� ; ��? �� e�/U �� ~#*� &��+./�� D�$� �� FG

a discrete time variable, a finite state set, a neighborhood that is homogeneous in both space and time, a transition function that is deterministic and homogeneous in space and time, and updates all its cells synchronously.

��? �� !�� ~#*� L./�� &$�� c: � D�� 4� K� $ $��Z� �� . �V� H#*V� $V �g�� *; !7�U �� $�4 y >�G)$�u ) &�)�)G !�� SC .G !�� L./�� &�$.��5 ��$� . L.V/�� XJ�

$�4

� � +++= zubyaxx kkkk11�

G=[a0 …..an,b0 …bm,z] !��. ��$' �� 8$�� +./�� D�� 4� &�� K��.

Nonhomogeneous CA Asynchronous CA Probabilistic CA Particle CA

&��$��; D�$.+�� 4� �B�CA !�� t$�.� ��| !;$U &4�/7�� S? ��`.��.

Coupled Map Lattices

Cellular Neural NetworksCellular Systems with Multiple Cellular Spaces

Sometimes it is useful to consider systems obtained combining several interacting cellular systems. A first example is the multilayered CA (Bandini and Mauri 1999).

16-20

Figure 14.22 A schematic representation of a multilayered cellular system. A cell at level i > 0 in the hierarchy of layers corresponds to a whole CA at the next lower level.

Computation &�$��; 4� �B�CA !V�� S? 4� ��`.�� . 4� �V�� \��$V �VG)$� ) ��7#*V� �V�7)� \��$V &�)�)

!�� T#� \��$ �� S��. A cellular computer can be implemented in hardware as an array of analog or digital processors and has a number of advantages over a conventional computer, both in the realization of the hardware and in performance.

Complex SystemsMany systems of interest in the physical and biological sciences are composed of many simple units that interact nonlinearly. Study has revealed that at the global level these systems can display behaviors and phenomena that look very complicated despite the simplicity of their components and interactions. For this reason systems with many nonlinearly interacting units are called complex systems (of course, systems with few interacting elements can also show complex behavior). Following Crutchfield (2003) we could call it structural complexity. When the elements that form a complex system interact only locally, cellular systems are often the tool of choice for modeling. Earlier in this chapter we have already met some examples of cellular models for complex systems.

17-1

17

��Fractal

The word Fractal was introduced by B Mandelbrot in the 1970s. The term fractal is derived from the Latin adjective fractus. The corresponding Latin verb frangere means ‘to break’, to create irregular fragments. In addition to ‘fragmaneted’ fractus should also mean ‘irregular’, both meanings being preseved in fragment

��#; "4#�? K� 4� �� ; �� #G) ��$.5 !�� ./ <��, �� <��T �� . !V�� v$V� <J��$�] ) #� . )$�� 4�Fractal geometry �� .� H�B� �� !�� !�� G ��. LV�� V.B�

�� S�� <��T �'�� ; !�� D�� 5 �� 6P# (�� "��;# !/�� . $V� �; !�� p�#� &�� H�.;$8 �� S? ^��Fractal Geometry !�� .8$' ��#G#� >S�.�� &� ��5 . DV�� V; �# }$8 $'� ��./ F�� FG� &�� ; ��./ �+./�� G)$� ��.�� S� �S� �S� )4�X Q�# (D �I�� (E �� 6�&

��# ("7�

15-1(��U� �� ? �a�� o��E�, (&�I� �6Y# �r&#a )�#�+� �D a��E�, a�+�# � ;� H#V�$8 KV�Iterated

Function System (IFS) !�� .

17-2

��VV �.��VV &�VV��familiar symmetries ��VVa��G 4� �VV�� !VV�� translational $�#VVc� > reflectional ) �� rotation !�� S$.5 K� . &�+�'�F� 4� �� S�� $�� s#�magnification ��+�Bk#; �� #� ��. 4� �� #+� $�4 H)�G�� S� � �� 7�.;$8.

similarity-self : K� ��$B� �� <B &)� S#��VT �V�7)� S$V.5�V; �#V �V� �V�7#� �� z�;$� ��#c� �*.:� &�F��!�� S$.5 S?.

similarity-More examples of self �V��$8 D�V+ $V� ��u�5 z�;$� S�#� �� &� ��p �� #G#� !��

!/��

Initiators and Generators ��7#� ��+n� ")� K�self similarity ? "$.V/' ) �V�7)� ��V� S$V.5 K� e�:.��¿

��; �� 7#� �� $.5 �; !�� H#�$8 K� q�� .H�BV� �V� $Va�� V��$8 D�� $B�self similar �# ��

It is also the oldest, dating back 5000 years to south India.

Geometry of plane transformations rotation F�� #�� H#U ��7#� &�$� &$�� ")� �� S$.5!��.

Inverse problems �� 7#� H�.;$8 ^�� $� H�.;$8 H#�$8 S�$; ��5

Random algorithm ��IFS D��Deterministic Algorithm �# �� 7#� H�.;$8 <B ��,�T ��$B� ) ��7)� S$.5 ^�� $� .��

�� 7�+.U� !7�U S#��T �8��c� H�.;$8 L./��

17-3

H. Driven IFS a variation on the Random Algorithm to test for patterns in data. We investigate patterns in mathematical sequences, DNA sequences, financial data, and texts.

I. Fractals in architecture African, Indian, and European. Repetition across several scales is a theme common to many cultures, developed independently so far as we can tell.

15-2�� ;��E�, �� D �� dimension !�� H�.;$8 K� �'��u�5 �� . KV�B`� $V�4 �7�V.;$8 $�)�c� D�� S�#� �� #�k H�J� &�$�

� <=�T .

��; �� 8$�� D�� 4� �� #+� $�4 H)�G.

B � ��G �� Box-counting dimension extends the notion of dimension to fractals. Arguing by analogy with Euclidean dimension, we develop an algorithm for determining this dimension.

C &��*B��+ Similarity dimension is a simplified method of computing dimensions for self-similar fractals with all pieces scaled by the same factor. This dimension gives a clear indication of the relation between dimension and complexity.

D. �� &��*B The Moran formula extends the similarity dimension fromula to self-similar fractals with different scaling factors.

F. Area-Perimeter Relations. Fractal curves that enclose regions in the plane can reveal their dimensions by a subtle relation between area and perimeter.

17-4

G. Some Alegbra of Dimensions. When we build fractals from other fractals, how is the dimension of the whole related to the dimensions of the pieces?

15-3The Mandelbrot Set and Julia Sets ��7�T#8 &� �,#+a� &��4 Mandelbrot )Julia ��# �� 7#� &� �� D��#T q��.

�B*@+; ��, ��7)C �,#+a� ��7#� &�$�c )z0 ; e�:.�� $��; �$G� �� $�4 �7�

z1 = z02 + c

z2 = z12 + c

z3 = z22 + c

and in general zn+1 = zn

2 + c Julia Sets. For a complex number c, the filled-in Julia set of c is the set of all z for which the iteration z À z2 + c does not diverge to infinity. The Julia set is the boundary of the filled-in Julia set. For almost all c, these sets are fractals.

Finite resolution Although mathematically we can speak of all complex numbers z0 in the plane, in practice we consider only some portion of the complex plane (the window in which the picture is rendered). Because on a computer monitor it is represented as an array of pixels, in this window we take only a finite collection of z0. Pictures cannot be resolved at a level smaller than a pixel, so for each pixel we take one point z0, usually the center of the pixel. A 200 × 200 window requires testing 2002 = 40,000 pixels Run Away to Infinity

17-5

It is not difficult to prove that if some member zj of the sequence is farther than 2 from the origin, then the distance between the origin and later members of the sequence will grow without bound.

Here are some iterates of zn+1 = zn

2 + c for c = -0.25 + 0.25i, starting with z0 = 0.5 + 0.7i. The points z0 through z4 are shown, with later iterates a brighter red. Note that z4 is outside the circle of radius 2, so later zi should run farther away from the origin. A few more iterates will illustrate this.

�� ; L�� S� � $'� �.U1000 �� )�z 4� $.+;2 FV�� S? 4� jV� �#V �,�� ; �� #G) ��+n� ��+�� $G� &��)� �� g�� $'�) ��, )$�� 4� !�� q�� s#Y#� D��N L��; �� b$g�.

For example, here are two renderings of Kc (the points painted black) for the same c. On the left, N = 10; on the right, N = 50.

Generally, the larger N, the fewer mistakes we make. On the other hand, the larger N, the more computer time needed to generate the picture.

15-4� ;4` (&�I� �$ �G h"� 1. l��z0 &�$� �;N ��? �� B � �� # � &�$'�) �)�. 2. l��z0 4� $.'�F� �� D�7)� &�$� �; &�)� �� # �� $'�) �� $a�� ;2 �V� �� #V �V�

��?. if z1 is farther than 2 from the origin then paint white if zk is farther than 2 from the origin then paint ....

17-6

4� �7)�G $'�N �� L�� .�� k�� !�/� S? �� . [��F8� ��k �)� �V� KV�� V� D)� 4� �� .�� ? �� F�� 4� �@� ) ��4 �� a�� `� 4� ��.�� )�.

The Mandelbrot set. The Mandelbrot set is the set of all c for which the iteration z z2 + c, starting from z = 0, does not diverge to infinity. Julia sets are either connected (one piece) or a dust of infinitely many points. The Mandelbrot set is those c for which the Julia set is connected.

The Mandelbrot set is defined by the same iteration used to define Julia sets, but applied in a different fashion.

The analog of the Mandelbrot set can be defined for any zn + c, for any integer n > 2. Here is an illustration of the effect of the maximum number of iterations on drawing the Mandelbrot set. One of the early surprises of the Mandelbrot set is that its periphery is filled with a halo of tiny copies of the entire set, each of which is surrounded by its own halo of still tinier copies, and so on, on smaller and smaller scales, without end.

Despite appearances, these small copies are attached to the main body of the set, through a sequence of still smaller copies. That is, the Mandelbrot set is connected.

The large filled-in components of the Mandelbrot set correspond to stable cycles.

17-7

D. Combinatorics of the Mandelbrot Set. Associated with each disc and cardioid of the Mandelbrot set is a cycle. There are simple rules relating the cycle of a feature to those of nearby features. From this we can build a map of the Mandelbrot set. E. Some features of the Mandelbrot set boundary. The boundary of the Mandelbrot set contains infinitely many copies of the Mandelbrot set. In fact, as close as you look to any boundary point, you will find infinitely many little Mandelbrots. The boundary is so "fuzzy" that it is 2-dimensional. Also, the boundary is filled with points where a little bit of the Mandelbrot set looks like a little bit of the Julia set at that point. F. Scalings in the Mandelbrot Set. The Mandelbrot set includes infinitely many smaller copies of itself. These can be organized into hierarchical sequences for which the ratio of the sizes of successive copies approaches a limiting value.Some of these give the Feigenbaum constant associated with the logistic map, others give new constants. Some give integer limits. G. Complex Newton's Method. Julia sets related to finding the roots of equations. SImilar features arise in magnetic pendula and in light reflected within a pyramid of shiny spheres. H. Universality of the Mandelbrot Set. Newton's method for a family of cubic polynomials revealed more copies of the Mandelbrot set. Yet Newton's method is nothing like z À z2

+ c. Further investigation shows we're surrounded by Mandelbrot sets. I. Here is Ray Girvan's page on the Mandelbrot Monk. Was the Mandelbrot set discovered in the 13th century? Read the page carefully.

J. Fractals in Literature. Not only are fractals present in the structure of literature, sometimes they are the subject of literature.

17-8

K. Fractals in Art. Because it exhibits a balance of familiarity and novelty, the Mandelbrot set is more interesting than the Sierpinski gasket. This aesthetic maxim is familiar in art.

4. Cellular Automata and Fractal Evolution, or how to build a world in a computer. These simple worlds can generate fractals, and exhibit wonderfully complicated dynamics. The biological paradigm can be extended to evolve populations of computer programs, and we are led, perhaps, to fractal aspects of evolution.

15-5Cellular Automata and Fractal Evolution

Cellular automata (CA) are a convenient setting for exploring genetic algorithms, a powerful computer science application of a major biological paradigm. Some of the patterns that apppear here also arise in music and in history, pointing to fractal aspects in those fields.

D. Genetic Algorithms and Artificial Evolution A message from biology to computer science. Applying the biological paradigms of crossover and mutation to evolve smart programs from a population of dumb programs.

E. Fractal Fitness Landscapes A message from fractal geometry to evolution. A map of the fitness of genotypes reveals not only multiple peaks, but a fractal distribution of peaks. This has implications for evolutionary strategies.

F. 1/f Noise Scalings in CA, evolution, and elsewhere. This scaling, that the size of events falls off as roughly the reciprocal of the frequency of the events, is present in a wide variety of settings, yet the source of this behavior remains unclear.

17-9

H. Fractals in History There is some evidence that the distribution of conflicts, and of other major events, exhibit 1/f scaling. But perhaps other hierarchical structures are present in the fabric of history. Is Hegel's dielectic related to Leibnitz's monads?

J. How the Leopard Gets Its Spots A biological expression of CA principles appears to be responsible for the formation of the leopard's spots. The same mechanism may also explain the tiger's stripes. 5. Random Fractals and the Stock Market extends the geometrical fractals studied so far to fractals involving some elements of randomness. After examples from biology, physics, and astronomy, we apply these ideas to the stock market. Do we uncover useful information? Wait and see. 6. Chaos is type of dynamical behavior most commonly characterized by sensitivity to initial conditions: tiny changes can grow to huge effects. Inevitible uncertainties in our knowledge of the initial conditions grow to overwhelm long-term prediction. Yet we shall see chaos has engineering and medical applications.

15-6L system H�� 1968 Aristid Lindenmayer (1968a,b) ��V� �� ; �$; �=�� K�C#7#�� &��+./�� &�$� �7��

rewriting systems K�� ) �# �� #��$�? L-system ��#:��. An L-system is a rewriting system that operates on strings of symbols. The system is defined by assigning

o an alphabet A of symbols, o an initial string of symbols � called the axiom, o and a set � = {pi} of rewriting or production rules that specify how each alphabet

symbol is replaced by a string of symbols at each rewriting step. A �� !�� )�� &�n,� �� &� �,#+a�

!�� .��; �� S�� H#*� �� \��$ ��.��Y) D�� . !��Y) �� H#*�g L�V/�� V�? ) !V�� V� H�VU��!/�� .� !��Y) �� H#*� ��d !�� L�/�� &�$� ��?.

��7)� !��Y) �� $8 �,#+a� �� h' �,�#T ) �# �� s)$π �� =��.

�# �� Fa� H#*� �.T)2 s#� 4� �B� ��; �� 7#� H#*�d s#� 4� �B�)g . o)$Ur )l �� 7#*� D�� .��X5

��; �� _: � . ^�� $� �� <B�� S� � �� H#�$8 D��.

17-10

a�e�1: L./�� 4� $�� s#�L �# �� $�� #��.

��$�� 4� H�� D�� &��*�+� �#�`�.

F Move forward by a step while drawing a line. f Move forward by a step without drawing a line. + Turn left by an angle . - Turn right by an angle . [ Save the current state of the turtle, that is, the turtle position and orientation. ] Restore the state of the turtle using the last saved state.

H�B� ��7#� �� $a�� H�� D�� &$�'��B�fractal �� S? s#� D�� ; ��$' ��Koch curve �\� � Á��\Â.

!�� $�� 4� �a�� *+, �7�#�

17-11

a�e�2: &4�� L system $�4 ��c: � ��

$� �)X,fractal ��; �� H�� S��' �� . 4� S�#V� �� D��$�� L.V/��L S�V��' �V� &4�V� ��V &�$V�

�$; ��`.�� . &�$� �a�.� ��#+� <B = 21�. �Q�� .

15-7�l#� 1.{!/�k H�.;$8 �� 4� �#%�� 2.�# �� $�� #�k H�.;$8 �� . 3. &��7�J� ��' )�1 )2 ��; �$G� ��.

17-12

Documents

Natural Computing - shahed.ac.irshahed.ac.ir/stabaii/Files/NaturalComputingText.pdf · 2015-02-10 · Natural Computing 1392 1- Computational Intelligence, 2nd ed.( A. P. Engelbrecht)