If you can't read please download the document
Upload
jj-merelo
View
228
Download
1
Embed Size (px)
DESCRIPTION
An invitation to change the way scientific programming is done, via evolutionary computation.
Citation preview
The art of Evolutionary Algorithm programming,
by Sgt. Dr. Juan-Julin Merelo.
EC Headquarters,
U. Granada (Spain)
Thanks for being here. But what I wonder is, why are you here? In a tutorial about programming evolutionary algorithms? Because...
You suck
CC picture from http://www.flickr.com/photos/torsteinsaltvedt/2459322441/in/photostream/
Don't worry
I suck too
This tutorial is about sucking less
At programming evolutionary algorithms, no less!
Or any other, for that matter!Wonderful drawing from http://www.flickr.com/photos/keeki/4264005006/in/photostream/
And about the zen
Picture from http://www.flickr.com/photos/imsbildarkiv/4962991603/in/photostream/Zen means totality, and this tutorial is not just about how to write programs, but about the whole process from idea to publishing a paper in a high-impact index journal (and everything in between).So we'll make in the shape of 28 mantras that should be followed when programming evolutionary algorithms (and probably all metaheuristics)Why 28? It's more than 10, less than one hundred. And I couldn't come up with any round number of mantras. So let it be 28. Or 29. Maybe I though about a few more after I wrote this.
And about writing publishable papers artifacts painlessly through efficient, maintainable, scientific programming
Picture By MarionVosshttp://www.flickr.com/photos/ooocha/2869488840/in/photostream/
1.
Mind your environment
Forge picture from http://www.flickr.com/photos/jamesclay/1396990924/in/photostream/
Tools of the trade
Use real operating systems and real programming languages .
Programmer's editor: kate, emacs, geany, vim [your favorite editor here] + Debugger (gdb, language-specific debugger)Set up macros, syntax-highlighting and -checking, online debugging.
IDE: Eclipse, NetBeans, Emacs!Steep learning curve, geared for Java, and too heavy on the resource usage sideBut worth the while eventually
You need real operating systems to speed up the gathering of tools to start. It needs to be free and gratis, so that you can have all the libraries you need, and designed for programming, not for desktop and office environment.Programmer's editors speed up coding, allowing you to check the structure of a program at first glance. Every one has his favorite program, but you need to settle for one, since you'll be much more productive with it. Emacs or vi are probably a good choice, since you can live in them and with them.NetBeans and Eclipse are free, and popular, and good, but it's overkill if you want to do small programs and scripts (which is all you need sometimes)There are other IDEs for particular languages, worth checking out.
2.
Open source your code and data
Picturehttp://www.flickr.com/photos/goldenberg/22276200/in/photostream/Now it's more important than ever to open source because research topics are more and more narrower and the code is the law
Aw, maaaan!
Open source first, then programScientific code should be born free.
Science must be reproductible.
Easier for others to compare with your approachIncreased HScientist heaven!
Manifest hidden assumptions.
If you don't share, you don't care!
It's the second, because the best moment to make an open source release is now, before writing the first line of code.
3.
Minimize bugs via
test-driven programming
http://www.flickr.com/photos/wohinauswandern/4146824312/in/photostream/In this case, it's a bug-colony optimization algorithm... so you really want to maximize bugs. Once you want to program, you want to think about the tests first.
Tests before code
What do you want your code to do?Mutate a bit string, for instance.
Write the testIs the result from mutation different from the original?Of course!
But will it be even if you change an upstream function? Or the representation?
Does it change all bits in the same proportion (including first and last corner cases)?
Doest it follow (roughly) the mutation rate?
Those are just a few examples. You might think of different tests at different levels for all parts of an evolutionary algorithm.
Use testing frameworks
Unit testing tries to catch bugs in the smallest atom of a programInterface, class, function, decision
Every language has its testing frameworkPHPUnit, jUnit, xUnit, DejaGNU, Test::More...
Write tests for failure, not for success
http://www.flickr.com/photos/paulwatson/43393028/in/photostream/
4.
Control the source of your power
http://www.flickr.com/photos/anotherpioneer/5252749975/
Source control systems save the day
Source code management systems allowCheckpoints
Stygmergic interaction
Individual responsability over code changes
Branches
Distributed are in: git, mercurial, bazaar
Centralized are out: subversion, cvs.
Instant backup!
Code complete
Check out code/Update code
Make changes
Commit changes (and push to central repository)
Code complete
Check out code/Update code
Make changes
Test your code
Commit changes (and push to central repository)
Go with the Joneses
Use GitHub: http://github.com
Some day, most papers in a conference will include Fork me on GitHub notices. Eventually. GitHub is basically a Git repo, but it's popular and the network effect makes it the place to be; it's like LinkedIn for hackers.Besides, good clients that allow people even without any programming idea to use.
5.
Integrate
Nice picture by Joris Louwes at http://www.flickr.com/photos/jorislouwes/6921257085/This is basically about continuous integration
Pushing is not the end of the story
Tests must be run, compilations made, checks and balances checked and balanced.
Use Travis or JenkinsIf it's good enough for software developers, it's good enough for scientists!
All this is free if you open source your codeBack to #2
Push is what you do with systems like git, but also bazaar and mercurial. Travis is integrated with GitHub. Most stuff I say here is integrated with GitHub http://github.comTravis is at http://travis-ci.org Other services like drone.io can do lots of stuff, and there are some that are mainly Java based, but you've got lots of services (many of them free) to choose from.
6.
Be language agnostic
http://www.flickr.com/photos/youngrobv/2497132840/That means that you shouldn't believe that a single language is the revealed Truth to which all others should kneel, and that outside that language lies hell and eternal condemnation.
Language shapes thought
Don't believe the hype:Compiled languages are faster... NOT
There is no free lunch.
Avoid programming in C in every language you use
Consider scripting languages: Python, Perl, Lua, Ruby, Clojure, Javascript... interpreted languages are faster.
Well, they are at running stuff... mostlyOld saying goes Fortran programmers are so good they can write fortran programs in any language. Fortran is out of fashion, but people now write C programs in any language. Speed will depend on many factors, and every language is geared for particular features.
Language agnoticism at its best
Evolving Regular Expressions for GeneChip Probe Performance Prediction
http://www.springerlink.com/content/j3x8r108x757876w/
The regular expresions are coded in AWK scripts:Although this may seem complex, gawk (Unix free interpreted pattern scanning and processing language) can handle populations of a million individuals.
Unique and beautiful usage of English possessive!Aho, Weinberger, Kernighan, AWK is a data-driven language included in all Unices. Extremely mature, extremely fast, extremely unknown... and also the best tool for this paper.
Programming speed > program speed
7.
Foto: http://www.flickr.com/photos/gnackgnackgnack/3297936548/in/photostream/Programming speed is more important than program speed. It does not make sense to spend days programming something to be able to run it in 2 minutes instead of 6. Or even 10.
Scientists, not software engineers
Our deadlines are for papers not for software releases (but we have those, too).
What should be optimized is speed-to-publish.
Makes no sense to spend 90% time programming 5% writing the paper.
Scripting languages rockand minimize time-to-publish.
Perl faster than Java?
Algorithm::Evolutionary, a flexible Perl module for evolutionary computationhttp://www.springerlink.com/content/8h025g83j0q68270/ Class-by-class, Perl library much more compactLess code to write.More time to write the paper, perform experiments....
In pure EC code, Algorithm::Evolutionary was faster than ECJ.
In fact, test by Kernighan and Pike over Markov chains, back in 99, proved the same. Things have evolved, but in both languages. Scripting languages like Perl (or Python, or Ruby) require less effort to write even complex applications.
8.
Become an adept of the chosen language
http://www.flickr.com/photos/inoxkrow/1041584485/in/photostream/It could contradict with your agnosticism, but let's say you should be pragmatic, like Japanese, and believe in three or four languages, and be agnostic on which one using at a particular time.
Don't teach an old dog new tricks
The first language you learn brands you with fire.
But no two languages are the same.
Every language includes very efficient solutionsRegular expressions.
Symbolic processing.
Graphics.
Efficiency/knowledge balance.Clashes with #2? Not really.
There is a balance between efficiency in programming and efficiency when doing a task. There's a power law: you can be best at one, maybe 3 languages, but at least know the potentialities of several others. Or at least know enough to tell a graduate student where she should look.
9.
Don't assume: measure
Foto de http://www.flickr.com/photos/marcelgermain/2071204651/in/photostream/
Performance matters
Basic measure: CPU time as measured by time
jmerelo@penny:~/proyectos/CPAN/Algorithm-Evolutionary/benchmarks$ time perl onemax.pl0; time: 0.0032741; time: 0.005438[...]498; time: 1.006539499; time: 1.00884500; time: 1.010817
real0m1.349suser0m1.140ssys0m0.050s
Don't ask me how you can do that in Windows, I told you already you should use a real operating system for development.
Going a bit deeper: profilers
A profiler, which is available for every computer language but we show here for Perl, allows to know how much time is spent in every function, and then in every line of every function. That allows identification of bottlenecks, and surgically directed optimization.
Size matters
Parameters of an evolutionary algorithm really matter, and the two main ones are chromosome and population size. This is, BTW, one of the first purely evolutionary-algorithmic stuff I introduce here. It can vary in complex ways, and differently depending on implementation, computer languaje, and lots of other factors. In this case what is being measured is the running time of three different implementations.Details in an upcoming paper (from SpringerLink)
You probably know this already. But it's good to remember it from time to time.
There's always a better algorithm/
data structure
10.
Well, not really true... there's also no free lunch here. How can you optimize? By looking for a better algorithm, like the cat is doing.Picture from http://www.flickr.com/photos/broterham/13140244/in/photostream/
And differences are huge
Sort algorithms are an examplePlus, do you need to sort the population?
Cache fitness evaluationsCache them permanently in a database?Measure how much fitness evaluation takes
Thousand ways of computing fitnessHow do you compute the MAXONES?$fitness_of{$chromosome} = ($copy_of =~ tr/1/0/);
Algorithms and data structures interact.And not in a nice way.
Ins, some algorithms just need the max or two max. Yotead of sortingu have to look for the best implementation in your programming language of choice.A reviewer asked: top50% without sorting? Yes: select size/2 slots, only have to compare each element, at worst, with size/2, not with every other (starting with the last one, that will be dropped). This is called Partial sorting.
A choide of data structure forces using lenguage function and algorithms, which might be slow or the other way round. For instance, if you use a string instead of a bitvector you will be forced to use string operations a language might not be prepared for.
Case Study: EAs as software programs
Time analysis of standard evolutionary algorithms as software programshttp://dx.doi.org/10.1109/ISDA.2011.6121667
Programs implementing EAs are analyzed; huge improvements can be achieved by changing random number generators or memory usage patternsImplementation matters!
11.
Learn the tricks of the trade
Image from http://www.flickr.com/photos/adactio/502362353/in/photostream/
Two trades
Evolutionary algorithmsBecome one with your algorithm.It does not work, but for a different reason that what you think it does
Programming languages.What function is better implemented?
Is there yet another library to do sorting?
Where should you go if there's a problem?
Even a third trade: programming itself.
This is the Zen!
Case study: sort
Sorting is routinely used in evolutinary algorithmsRoulette wheel, rank-based algorithms
Faster sorts (in Perl): http://raleigh.pm.org/sorting.htmlSorting implies comparing
Orcish Manoeuver, Schwartzian transform
Sort::Key, fastest ever http://search.cpan.org/dist/Sort-Key/
Do you even need sorting? Top or bottom n will do nicely sometimes.And are algorithmically faster.
Make experiment processing easy
12
Imagen de http://www.flickr.com/photos/lwr/3462352862/in/photostream/
Avoid drowning in data
Every experiment produces megabytes of dataTimestamps, vectors, arrays, hashes.
Difficult to understand after some time.
Use serialization languages for storing dataYAML: Yet another markup language.
JSON: Javascript Object Notation.
XML: eXtensible Markup language.
[Name your own].
Remember that Excel sheets are not the best way for doing this. The languages above are much better at serializing any kind of data. For the same reason, CSV is no good either.
Case study: Mastermind
Entropy-Driven Evolutionary Approaches to the Mastermind ProblemCarlos Cotta et al., http://www.springerlink.com/content/d8414476w2044g2m/Output uses YAML.
Includes:Experiment parameters.
Per-run and per-generation data.
Final population and run time.
Open source! (Follow #2!) http://github.com/JJ/algorithm-mastermind
Visit me at the poster session tomorrow!
When everything fails
visualize
13
De hecho, tambin lo aconseja en The practice of programming: hacer tests estadsticos simples.It's an example of evoluitonary algorithm for MasterMind. It does not work very well: too many copies of a single one in the population.Having a YAML log allowed easy off-linen visualization of the evolution of the population.
14 to 22
backup your data
Image from http://www.flickr.com/photos/dacran/2369215001/in/photostream/'nuff said!
Better safe than unpublished
Get an old computer, and backup everything there.If you do open science, you get that for free!
In some cases, create virtual machines to reproduce one paper's environmentDo you think gcc 3.2.3 will compile your old code?
Use rsync, bacula or simply cp.
It's not if your hard disk will fail, it's when.
Cloud solutions are OK, but backup that too.
23.
Keep stuff together
Picture from RealBlades http://www.flickr.com/photos/xtl/3007809619/
Where did I left my keys?
Paper: program + data + graphics + experiment logs + text + revisions + referee reports + presentations.
Experiments have to be rerun, graphics replotted, papers rewritten.
Use logs to know which parameters produced which data that produced which graph.
And put them all in the same directory tree.
Consider literate programming
Literate programming means keeping program and document describing it and results in the same place.
SWeave and Knitr integrate LaTeX and R in the same document.Check availability for your favorite platform.
Not the most popular way of writing papers.
But check also http://www.executablepapers.com/
With this, you get a great integration with source control tools and you can use commits to keep a writing + experimental log.
Get out and smell the air
24.
Picture by Fernando Rodrguez, http://www.flickr.com/photos/frodrig/5317382828/
New is always better
Programming paradigms are changing on a daily basisNoSQL, cloud computing, internet of things, Google Prediction API, map/reduce, GPGPU, functional languages.
Keep a balance between following fashions and finding more efficient ways of doing evolutionary algorithms.And by doing I mean publishing your stuff.
Nurture your code
25.
Picture by 1967geezerhttp://www.flickr.com/photos/chrisguise/5607158342/in/photostream/
A moment of joy, a lifetime of grief
Run tests periodically, or when there is a major upgrade of interpreter, upstream library or OS.Can be automated.See #6.
Maintain a roadmap of releasesRemember this is free software
Get the community involvedYour research is for the whole wide world.
Publish, don't perish!
Picture by doviende from http://www.flickr.com/photos/doviende/77324602/
Check me out at:
http://geneura.wordpress.com
http://twitter.com/geneura
http://facebook.com/jjmerelo
http://www.amazon.com/-/e/B006K92NX2
Or camels!(no cats were harmed doing this presentation)
Any (more) questions?
/