A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy
24
A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy Paul Gardner University of Canterbury Christchurch New Zealand
A meta-analysis of computational biology benchmarks reveals predictors of programming accuracy
1. A meta-analysis of computational biology benchmarks reveals
predictors of programming accuracy Paul Gardner University of
Canterbury Christchurch New Zealand
2. Hard work from...
3. ResBaz I want to say a big thank you to the organisors of
ResBaz and NeSI and Aleksandra and...! Everything you are about to
see is built using tools you have learned at ResBaz... Warning: the
following research is a work in progress, conclusions may change
(after Ive triple-checked data & claims) { }
4. Pretend we want to build a phylogenetic tree...
5. Building trees... Bioinformaticians are bad, impatient &
intolerant people! Once you have gathered your data, you are faced
with a problem... Parsimony (useful if we want to publish in
Cladistics) 47 methods ARB FootPrinter LVB Parsimov POY Bionumerics
Freqpars MALIGN PAST PRAP BIRCH Gambit MEGA PAUP* PSODA Bosque
GAPars Mesquite PAUPRat RA BPAnalysis GelCompar-II Murka PaupUp
SeaView CAFCA GeneTree Network phangorn SeqState CRANN gmaes
NimbleTree PHYLIP Simplot DAMBE Hennig86 NONA PhyloNet sog EMBOSS
IDEA Notung Phylo_win TCS TNT Felsenstein
http://evolution.genetics.washington.edu/phylip/software.html
8. How can we choose software? Which of the 172 methods do you
use?
9. Can we trust the authors of software? We can read all the
manuscripts & manuals describing 172 software packages.
But...
10. How should we choose software? Some possibilities (assuming
you dont create another method...) Do you know the developer? Are
they famous? Select the most recently published tool? Has the
software been widely adopted? Is it published in a good journal? Is
the software fast? We could test the software...
11. Neutral comparison studies (a.k.a. benchmarks) A. The main
focus of the article is the comparison itself. B. The authors
should be reasonably neutral. C. The evaluation criteria, methods,
and data sets should be chosen in a rational way.
12. Try approaching software like a scientist Are any good
controls available? Positive: databases, publications, simulation,
... Negative: randomized, select relevant negative data, ... Some
common accuracy metrics: Sensitivity (true positive rate) Specicity
(true negative rate) Mathews correlation coecients Area under an
ROC curve False positive rateTruepositiverate 0.0 0.2 0.4 0.6 0.8
1.0 0.00.20.40.60.81.0 Pfam Treefam Custom PROVEAN Polyphen2 FATHMM
FATHMM, unweighted Wheeler et al. (2016) A Prole-Based Method for
Measuring the Impact of Genetic Variation. bioRxiv.
13. Benchmarks are useful, and fun...
14. Tools can be slow and inaccurate! CLARK Kraken OneCodex
LMAT MGRAST MetaPhlAn mOTU Genometa QIIME EBI MetaPhyler MEGAN
taxatortk GOTTCHA A) Sum of log odds scores, phylum level Deviation
0 10 20 30 40 50 0 5 10 15 Log2ofruntime(minutes) ~30 mins ~17 hrs
~23 days
15. Is there really a relationship between speed &
accuracy? Can we run a meta-analysis of bioinformatic benchmarks
What factors are predictive of accuracy? Training articles:
initially 10 (historical knowledge) Candidate articles:
((bioinformatics) AND (algorithmic OR algorithms OR biotechnologies
OR computational OR kernel OR methods OR procedure OR programs OR
software OR technologies)) AND (accuracy OR analysis OR assessment
OR benchmark OR benchmarking OR biases OR comparing OR comparison
OR comparisons OR comprehensive OR effectiveness OR estimation OR
evaluation OR metrics OR efficiency OR performance OR perspective
OR quality OR rated OR robust OR strengths OR suitable OR
suitability OR superior OR survey OR weaknesses) AND (benchmark OR
competing OR complexity OR cputime OR duration OR fast OR faster OR
perform OR performance OR slow OR speed OR time) 568,130 articles
Background articles: (bioinformatics [TIAB] 2013:2015 [dp]) #sorted
on first author 154,485 articles
17. Word and article scores Can use the same scoring scheme for
words that we use for scoring biological sequences... logOdds(word)
= log2 ftraining (word)+ fbackground (word)+ articleScore =
wordarticle logOdds(word) expression mirnas associated patients
binding mirna expressed network involved regulated levels revealed
database mutations drug response tumor system activity induced . .
. benchmarking sequencers benchtop merits correctness benchmark
kernels convolution winner supertree structal seeker choosing
corpora supermatrix phenocopy epistasis segmod encad balibase head
& tail word scores wordscore(bits) 10 5 0 5
18. Iteratively checking articles... 1. Score and rank
candidate articles 2. Check the highest scoring articles, add to
either training or background articles 3. Return to 1.
20. Possible predictors of accuracy... Number of citations
#citations Frequency 0 5 10 15 20 1 10 100 1,000 10,000 100,000
Journal impact factor journal.IF Frequency 0 10 20 30 40 50 60 0.5
1 2.5 5 10 25 50 Journal H5 index (GoogleScholar) journal.H5
Frequency 0 10 20 30 40 50 60 10 25 50 100 250 500 Corresponding
Author's Hindex author.H Frequency 0 5 10 15 5 10 25 50 100 150
Corresponding Author's Mindex author.M Frequency 2 4 6 8 0 5 10 15
20 25 30 Relative age Relative age Frequency 0.0 0.2 0.4 0.6 0.8
1.0 0 5 10 15 20 25 30
21. I have found no *signicant* predictors accuracy! Z = 1.52;
p = 0.94author.M author.H journal.H5 relative age speed #citations
journal.IF Correlations with accuracy rank Spearman'srho 0.10 0.05
0.00 0.05 0.10 Accuracy vs. Speed mean normalised speed rank
meannormalisedaccuracyrank 0.2 0.4 0.6 0.8 1.0 1.2 1.0 0.8 0.6 0.4
0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 * * * * * * * * * ** * * * * * * *
* * * * * * * * * * * ** * * ** * * * * o o o o o o o o o o o o o o
o o o o o o o o o o o o o o o o o o o o o o x x x x x x x x x x x
xx x x x x x x x x x x xx x xx x x x xx x xx x x x x x x x x x * =
hi profile journal; o = hi profile author; x = hi cited
fast+accurate fast+inaccurateslow+inaccurate slow+accurate
23. Conclusions Nothing appears to be predictive of accuracy1
Fast software undergoes more developmental iterations Can heuristic
approaches produces a better result than mathematically complete
approaches? It doesnt appear to matter how famous you are, the
journals you publish in, whether youre early or late or often your
work is cited, you can still write great software! 1 There is still
a chance I have screwed something up...
24. Thanks Stephanie McGimpsey Fatemeh Ashari Ghomi Sinan Uur
Umu Funded by: Rutherford Discovery Fellowship, BPRC and Biological
Heritage: National Science Challenge.