The art of evolutionary algorithm programmin

Embed Size (px)

DESCRIPTION

An invitation to change the way scientific programming is done, via evolutionary computation.

Citation preview

The art of Evolutionary Algorithm programming,

by Sgt. Dr. Juan-Julin Merelo.
EC Headquarters,
U. Granada (Spain)

Thanks for being here. But what I wonder is, why are you here? In a tutorial about programming evolutionary algorithms? Because...

You suck

CC picture from http://www.flickr.com/photos/torsteinsaltvedt/2459322441/in/photostream/

Don't worry

I suck too

This tutorial is about sucking less

At programming evolutionary algorithms, no less!

Or any other, for that matter!Wonderful drawing from http://www.flickr.com/photos/keeki/4264005006/in/photostream/

And about the zen

Picture from http://www.flickr.com/photos/imsbildarkiv/4962991603/in/photostream/Zen means totality, and this tutorial is not just about how to write programs, but about the whole process from idea to publishing a paper in a high-impact index journal (and everything in between).So we'll make in the shape of 28 mantras that should be followed when programming evolutionary algorithms (and probably all metaheuristics)Why 28? It's more than 10, less than one hundred. And I couldn't come up with any round number of mantras. So let it be 28. Or 29. Maybe I though about a few more after I wrote this.

And about writing publishable papers artifacts painlessly through efficient, maintainable, scientific programming

Picture By MarionVosshttp://www.flickr.com/photos/ooocha/2869488840/in/photostream/

1.
Mind your environment

Forge picture from http://www.flickr.com/photos/jamesclay/1396990924/in/photostream/

Tools of the trade

Use real operating systems and real programming languages .

Programmer's editor: kate, emacs, geany, vim [your favorite editor here] + Debugger (gdb, language-specific debugger)Set up macros, syntax-highlighting and -checking, online debugging.

IDE: Eclipse, NetBeans, Emacs!Steep learning curve, geared for Java, and too heavy on the resource usage sideBut worth the while eventually

You need real operating systems to speed up the gathering of tools to start. It needs to be free and gratis, so that you can have all the libraries you need, and designed for programming, not for desktop and office environment.Programmer's editors speed up coding, allowing you to check the structure of a program at first glance. Every one has his favorite program, but you need to settle for one, since you'll be much more productive with it. Emacs or vi are probably a good choice, since you can live in them and with them.NetBeans and Eclipse are free, and popular, and good, but it's overkill if you want to do small programs and scripts (which is all you need sometimes)There are other IDEs for particular languages, worth checking out.

2.
Open source your code and data

Picturehttp://www.flickr.com/photos/goldenberg/22276200/in/photostream/Now it's more important than ever to open source because research topics are more and more narrower and the code is the law

Aw, maaaan!

Open source first, then programScientific code should be born free.

Science must be reproductible.

Easier for others to compare with your approachIncreased HScientist heaven!

Manifest hidden assumptions.

If you don't share, you don't care!

It's the second, because the best moment to make an open source release is now, before writing the first line of code.

3.
Minimize bugs via
test-driven programming

http://www.flickr.com/photos/wohinauswandern/4146824312/in/photostream/In this case, it's a bug-colony optimization algorithm... so you really want to maximize bugs. Once you want to program, you want to think about the tests first.

Tests before code

What do you want your code to do?Mutate a bit string, for instance.

Write the testIs the result from mutation different from the original?Of course!

But will it be even if you change an upstream function? Or the representation?

Does it change all bits in the same proportion (including first and last corner cases)?

Doest it follow (roughly) the mutation rate?

Those are just a few examples. You might think of different tests at different levels for all parts of an evolutionary algorithm.

Use testing frameworks

Unit testing tries to catch bugs in the smallest atom of a programInterface, class, function, decision

Every language has its testing frameworkPHPUnit, jUnit, xUnit, DejaGNU, Test::More...

Write tests for failure, not for success

http://www.flickr.com/photos/paulwatson/43393028/in/photostream/

4.
Control the source of your power

http://www.flickr.com/photos/anotherpioneer/5252749975/

Source control systems save the day

Source code management systems allowCheckpoints

Stygmergic interaction

Individual responsability over code changes

Branches

Distributed are in: git, mercurial, bazaar

Centralized are out: subversion, cvs.

Instant backup!

Code complete

Check out code/Update code

Make changes

Commit changes (and push to central repository)

Code complete

Check out code/Update code

Make changes

Test your code

Commit changes (and push to central repository)

Go with the Joneses

Use GitHub: http://github.com

Some day, most papers in a conference will include Fork me on GitHub notices. Eventually. GitHub is basically a Git repo, but it's popular and the network effect makes it the place to be; it's like LinkedIn for hackers.Besides, good clients that allow people even without any programming idea to use.

5.
Integrate

Nice picture by Joris Louwes at http://www.flickr.com/photos/jorislouwes/6921257085/This is basically about continuous integration

Pushing is not the end of the story

Tests must be run, compilations made, checks and balances checked and balanced.

Use Travis or JenkinsIf it's good enough for software developers, it's good enough for scientists!

All this is free if you open source your codeBack to #2

Push is what you do with systems like git, but also bazaar and mercurial. Travis is integrated with GitHub. Most stuff I say here is integrated with GitHub http://github.comTravis is at http://travis-ci.org Other services like drone.io can do lots of stuff, and there are some that are mainly Java based, but you've got lots of services (many of them free) to choose from.

6.
Be language agnostic

http://www.flickr.com/photos/youngrobv/2497132840/That means that you shouldn't believe that a single language is the revealed Truth to which all others should kneel, and that outside that language lies hell and eternal condemnation.

Language shapes thought

Don't believe the hype:Compiled languages are faster... NOT

There is no free lunch.

Avoid programming in C in every language you use

Consider scripting languages: Python, Perl, Lua, Ruby, Clojure, Javascript... interpreted languages are faster.

Well, they are at running stuff... mostlyOld saying goes Fortran programmers are so good they can write fortran programs in any language. Fortran is out of fashion, but people now write C programs in any language. Speed will depend on many factors, and every language is geared for particular features.

Language agnoticism at its best

Evolving Regular Expressions for GeneChip Probe Performance Prediction

http://www.springerlink.com/content/j3x8r108x757876w/

The regular expresions are coded in AWK scripts:Although this may seem complex, gawk (Unix free interpreted pattern scanning and processing language) can handle populations of a million individuals.

Unique and beautiful usage of English possessive!Aho, Weinberger, Kernighan, AWK is a data-driven language included in all Unices. Extremely mature, extremely fast, extremely unknown... and also the best tool for this paper.

Programming speed > program speed

7.

Foto: http://www.flickr.com/photos/gnackgnackgnack/3297936548/in/photostream/Programming speed is more important than program speed. It does not make sense to spend days programming something to be able to run it in 2 minutes instead of 6. Or even 10.

Scientists, not software engineers

Our deadlines are for papers not for software releases (but we have those, too).

What should be optimized is speed-to-publish.

Makes no sense to spend 90% time programming 5% writing the paper.

Scripting languages rockand minimize time-to-publish.

Perl faster than Java?

Algorithm::Evolutionary, a flexible Perl module for evolutionary computationhttp://www.springerlink.com/content/8h025g83j0q68270/ Class-by-class, Perl library much more compactLess code to write.More time to write the paper, perform experiments....

In pure EC code, Algorithm::Evolutionary was faster than ECJ.

In fact, test by Kernighan and Pike over Markov chains, back in 99, proved the same. Things have evolved, but in both languages. Scripting languages like Perl (or Python, or Ruby) require less effort to write even complex applications.

8.
Become an adept of the chosen language

http://www.flickr.com/photos/inoxkrow/1041584485/in/photostream/It could contradict with your agnosticism, but let's say you should be pragmatic, like Japanese, and believe in three or four languages, and be agnostic on which one using at a particular time.

Don't teach an old dog new tricks

The first language you learn brands you with fire.

But no two languages are the same.

Every language includes very efficient solutionsRegular expressions.

Symbolic processing.

Graphics.

Efficiency/knowledge balance.Clashes with #2? Not really.

There is a balance between efficiency in programming and efficiency when doing a task. There's a power law: you can be best at one, maybe 3 languages, but at least know the potentialities of several others. Or at least know enough to tell a graduate student where she should look.

9.

Don't assume: measure

Foto de http://www.flickr.com/photos/marcelgermain/2071204651/in/photostream/

Performance matters

Basic measure: CPU time as measured by time

jmerelo@penny:~/proyectos/CPAN/Algorithm-Evolutionary/benchmarks$ time perl onemax.pl0; time: 0.0032741; time: 0.005438[...]498; time: 1.006539499; time: 1.00884500; time: 1.010817

real0m1.349suser0m1.140ssys0m0.050s

Don't ask me how you can do that in Windows, I told you already you should use a real operating system for development.

Going a bit deeper: profilers

A profiler, which is available for every computer language but we show here for Perl, allows to know how much time is spent in every function, and then in every line of every function. That allows identification of bottlenecks, and surgically directed optimization.

Size matters

Parameters of an evolutionary algorithm really matter, and the two main ones are chromosome and population size. This is, BTW, one of the first purely evolutionary-algorithmic stuff I introduce here. It can vary in complex ways, and differently depending on implementation, computer languaje, and lots of other factors. In this case what is being measured is the running time of three different implementations.Details in an upcoming paper (from SpringerLink)

You probably know this already. But it's good to remember it from time to time.

There's always a better algorithm/
data structure

10.

Well, not really true... there's also no free lunch here. How can you optimize? By looking for a better algorithm, like the cat is doing.Picture from http://www.flickr.com/photos/broterham/13140244/in/photostream/

And differences are huge

Sort algorithms are an examplePlus, do you need to sort the population?

Cache fitness evaluationsCache them permanently in a database?Measure how much fitness evaluation takes

Thousand ways of computing fitnessHow do you compute the MAXONES?$fitness_of{$chromosome} = ($copy_of =~ tr/1/0/);

Algorithms and data structures interact.And not in a nice way.

Ins, some algorithms just need the max or two max. Yotead of sortingu have to look for the best implementation in your programming language of choice.A reviewer asked: top50% without sorting? Yes: select size/2 slots, only have to compare each element, at worst, with size/2, not with every other (starting with the last one, that will be dropped). This is called Partial sorting.

A choide of data structure forces using lenguage function and algorithms, which might be slow or the other way round. For instance, if you use a string instead of a bitvector you will be forced to use string operations a language might not be prepared for.

Case Study: EAs as software programs

Time analysis of standard evolutionary algorithms as software programshttp://dx.doi.org/10.1109/ISDA.2011.6121667

Programs implementing EAs are analyzed; huge improvements can be achieved by changing random number generators or memory usage patternsImplementation matters!

11.

Learn the tricks of the trade

Image from http://www.flickr.com/photos/adactio/502362353/in/photostream/

Two trades

Evolutionary algorithmsBecome one with your algorithm.It does not work, but for a different reason that what you think it does

Programming languages.What function is better implemented?

Is there yet another library to do sorting?

Where should you go if there's a problem?

Even a third trade: programming itself.

This is the Zen!

Case study: sort

Sorting is routinely used in evolutinary algorithmsRoulette wheel, rank-based algorithms

Faster sorts (in Perl): http://raleigh.pm.org/sorting.htmlSorting implies comparing

Orcish Manoeuver, Schwartzian transform

Sort::Key, fastest ever http://search.cpan.org/dist/Sort-Key/

Do you even need sorting? Top or bottom n will do nicely sometimes.And are algorithmically faster.

Make experiment processing easy

12

Imagen de http://www.flickr.com/photos/lwr/3462352862/in/photostream/

Avoid drowning in data

Every experiment produces megabytes of dataTimestamps, vectors, arrays, hashes.

Difficult to understand after some time.

Use serialization languages for storing dataYAML: Yet another markup language.

JSON: Javascript Object Notation.

XML: eXtensible Markup language.

[Name your own].

Remember that Excel sheets are not the best way for doing this. The languages above are much better at serializing any kind of data. For the same reason, CSV is no good either.

Case study: Mastermind

Entropy-Driven Evolutionary Approaches to the Mastermind ProblemCarlos Cotta et al., http://www.springerlink.com/content/d8414476w2044g2m/Output uses YAML.

Includes:Experiment parameters.

Per-run and per-generation data.

Final population and run time.

Open source! (Follow #2!) http://github.com/JJ/algorithm-mastermind

Visit me at the poster session tomorrow!

When everything fails

visualize

13

De hecho, tambin lo aconseja en The practice of programming: hacer tests estadsticos simples.It's an example of evoluitonary algorithm for MasterMind. It does not work very well: too many copies of a single one in the population.Having a YAML log allowed easy off-linen visualization of the evolution of the population.

14 to 22

backup your data

Image from http://www.flickr.com/photos/dacran/2369215001/in/photostream/'nuff said!

Better safe than unpublished

Get an old computer, and backup everything there.If you do open science, you get that for free!

In some cases, create virtual machines to reproduce one paper's environmentDo you think gcc 3.2.3 will compile your old code?

Use rsync, bacula or simply cp.

It's not if your hard disk will fail, it's when.

Cloud solutions are OK, but backup that too.

23.

Keep stuff together

Picture from RealBlades http://www.flickr.com/photos/xtl/3007809619/

Where did I left my keys?

Paper: program + data + graphics + experiment logs + text + revisions + referee reports + presentations.

Experiments have to be rerun, graphics replotted, papers rewritten.

Use logs to know which parameters produced which data that produced which graph.

And put them all in the same directory tree.

Consider literate programming

Literate programming means keeping program and document describing it and results in the same place.

SWeave and Knitr integrate LaTeX and R in the same document.Check availability for your favorite platform.

Not the most popular way of writing papers.

But check also http://www.executablepapers.com/

With this, you get a great integration with source control tools and you can use commits to keep a writing + experimental log.

Get out and smell the air

24.

Picture by Fernando Rodrguez, http://www.flickr.com/photos/frodrig/5317382828/

New is always better

Programming paradigms are changing on a daily basisNoSQL, cloud computing, internet of things, Google Prediction API, map/reduce, GPGPU, functional languages.

Keep a balance between following fashions and finding more efficient ways of doing evolutionary algorithms.And by doing I mean publishing your stuff.

Nurture your code

25.

Picture by 1967geezerhttp://www.flickr.com/photos/chrisguise/5607158342/in/photostream/

A moment of joy, a lifetime of grief

Run tests periodically, or when there is a major upgrade of interpreter, upstream library or OS.Can be automated.See #6.

Maintain a roadmap of releasesRemember this is free software

Get the community involvedYour research is for the whole wide world.

Publish, don't perish!

Picture by doviende from http://www.flickr.com/photos/doviende/77324602/

Check me out at:

http://geneura.wordpress.com

http://twitter.com/geneura

http://facebook.com/jjmerelo

http://www.amazon.com/-/e/B006K92NX2

Or camels!(no cats were harmed doing this presentation)

Any (more) questions?

/