Contributions to methods in phylogeny
Contributions to methods in phylogenyMy PhD and post-doc works
Blaise Li
Institut de Génétique Humaine - 20/09/2013
Blaise Li IGH, 20/09/2013 1 / 45
Contributions to methods in phylogeny
Outline
I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.
I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.
I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.
I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.
Blaise Li IGH, 20/09/2013 2 / 45
Contributions to methods in phylogeny
Outline
I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.
I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.
I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.
I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.
Blaise Li IGH, 20/09/2013 2 / 45
Contributions to methods in phylogeny
Outline
I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.
I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.
I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.
I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.
Blaise Li IGH, 20/09/2013 2 / 45
Contributions to methods in phylogeny
Outline
I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.
I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.
I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.
I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.
Blaise Li IGH, 20/09/2013 2 / 45
Contributions to methods in phylogeny
My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one
slide.
I I was supposed to test a new phylogenetic inference algorithmon real data but we failed at making the algorithm perform well,so the project was abandoned.
I So I used my time studying computer sciences and doing anamateur work on human population genetics.
I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.
I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.
Blaise Li IGH, 20/09/2013 3 / 45
Contributions to methods in phylogeny
My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one
slide.I I was supposed to test a new phylogenetic inference algorithm
on real data but we failed at making the algorithm perform well,so the project was abandoned.
I So I used my time studying computer sciences and doing anamateur work on human population genetics.
I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.
I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.
Blaise Li IGH, 20/09/2013 3 / 45
Contributions to methods in phylogeny
My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one
slide.I I was supposed to test a new phylogenetic inference algorithm
on real data but we failed at making the algorithm perform well,so the project was abandoned.
I So I used my time studying computer sciences and doing anamateur work on human population genetics.
I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.
I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.
Blaise Li IGH, 20/09/2013 3 / 45
Contributions to methods in phylogeny
My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one
slide.I I was supposed to test a new phylogenetic inference algorithm
on real data but we failed at making the algorithm perform well,so the project was abandoned.
I So I used my time studying computer sciences and doing anamateur work on human population genetics.
I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.
I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.
Blaise Li IGH, 20/09/2013 3 / 45
Contributions to methods in phylogeny
My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one
slide.I I was supposed to test a new phylogenetic inference algorithm
on real data but we failed at making the algorithm perform well,so the project was abandoned.
I So I used my time studying computer sciences and doing anamateur work on human population genetics.
I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.
I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.
Blaise Li IGH, 20/09/2013 3 / 45
Contributions to methods in phylogeny Reminders
Part I
Reminders
Blaise Li IGH, 20/09/2013 4 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.
I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.
I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.
I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.
Blaise Li IGH, 20/09/2013 5 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.
I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.
I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.
I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.
Blaise Li IGH, 20/09/2013 5 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.
I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.
I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.
I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.
Blaise Li IGH, 20/09/2013 5 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.
I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.
I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.
I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.
Blaise Li IGH, 20/09/2013 5 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.
I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.
I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).
I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.
Blaise Li IGH, 20/09/2013 6 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.
I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.
I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).
I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.
Blaise Li IGH, 20/09/2013 6 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.
I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.
I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).
I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.
Blaise Li IGH, 20/09/2013 6 / 45
Contributions to methods in phylogeny Reminders
Molecular phylogeny, short and simplified
I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.
I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.
I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).
I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.
Blaise Li IGH, 20/09/2013 6 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Branch
(Internal) node
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Taxa(leaves, terminals)
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Clades
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Clades
Clade =monophyletic group(an ancestor and allits descendants)
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Sister-groups
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Reminders
Some vocabulary
β
Batrachoidiformes
Gobioidei
Apogonidae
γ
δH
E+E’
Q
F
Indostomus
Symbranchoidei
Mastacembeloidei
f1 Channoidei
Anabantoidei
L
η
Sister-groups
Two sister-groupsform a clade
Blaise Li IGH, 20/09/2013 7 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Part II
Dealing with compositionconvergence
Blaise Li IGH, 20/09/2013 8 / 45
Contributions to methods in phylogeny Contribution to primary analyses
The endosymbiotic origin of plastids
Cyanobacteria
Glaucophyta
green algae
red algae
land plants
chromalveolates. . .
euglenids
primary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
(after Keeling, 2010)Blaise Li IGH, 20/09/2013 9 / 45
Contributions to methods in phylogeny Contribution to primary analyses
The endosymbiotic origins of plastids
There is, however, a large number of endosymbioticrelationships seemingly based on photosynthesis thatare less well understood and vary across the entirespectrum of integration, from passing associations tolong term and seemingly well-developed partnerships(e.g. Rumpho et al. 2008). Indeed, the line between
what is an organelle and what is an endosymbiont isan arbitrary one. There are a few different, specific cri-teria that have been argued to distinguish the two, themost common being the genetic integration of the twopartners, and the establishment of a protein-targetingsystem. Most photosynthetic endosymbionts probably
primary endosymbiosis
primary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
secondary endosymbiosis
serial secondary endosymbiosis
(green alga)
tertiary endosymbiosis(diatom)
stramenopiles
ciliates
Dinophysis
Lepididinium
euglenids
chlorarachniophytes
Paulinella
dinoflagellatesApicomplexa
green algae
Durinskia
Karlodinium
red algae
glaucophytes
tertiary endosymbiosis(cryptomonad)
tertiary endosymbiosis(haptophyte)
haptophytes
cryptomonads
land plants
?
Figure 2. (Caption opposite.)
732 P. J. Keeling Review. The origin and fate of plastids
Phil. Trans. R. Soc. B (2010)
on May 13, 2011rstb.royalsocietypublishing.orgDownloaded from
Blaise Li IGH, 20/09/2013 10 / 45
Contributions to methods in phylogeny Contribution to primary analyses
The endosymbiotic origin of plastids
plastids
section I
section III section IV
Blaise Li IGH, 20/09/2013 11 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:
I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalitiesDifficulty amplified because of endosymbiosis:
I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)
I enough time for divergences and convergences in evolutionmodalities
Difficulty amplified because of endosymbiosis:I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalities
Difficulty amplified because of endosymbiosis:I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalitiesDifficulty amplified because of endosymbiosis:
I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalitiesDifficulty amplified because of endosymbiosis:
I simplification (drift and loss of genes)
I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalitiesDifficulty amplified because of endosymbiosis:
I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)
→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution
modalitiesDifficulty amplified because of endosymbiosis:
I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.
Blaise Li IGH, 20/09/2013 12 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Overly straightforward analyses give conflicting results.
I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.
Blaise Li IGH, 20/09/2013 13 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastids
I protein-coding gene data: plastids close to pluricellularCyanobacteria
What is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.
Blaise Li IGH, 20/09/2013 13 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular
Cyanobacteria
What is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.
Blaise Li IGH, 20/09/2013 13 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.
Blaise Li IGH, 20/09/2013 13 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Phylogenetic difficulties
Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular
CyanobacteriaWhat is the cause of this incongruence?
→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.
Blaise Li IGH, 20/09/2013 13 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:
I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)
I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)
I SPM-3, SO-6, GBACT, UNIT+ (section I)I 75 protein-coding genes (but with some missing sequences,
14% overall)I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Dataset
I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)
I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)
I 75 protein-coding genes (but with some missing sequences,14% overall)
I Concatenated dataset (cg75) and its translation (cp75)
Blaise Li IGH, 20/09/2013 14 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
Blaise Li IGH, 20/09/2013 15 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
"basal" GBACT
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation
"basal" GBACT
Blaise Li IGH, 20/09/2013 15 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
pluricellulars
grade of Cyanobacteria
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation"core"
Cyanobacteria
Blaise Li IGH, 20/09/2013 15 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
pluricellulars
grade of Cyanobacteria
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.70
1.00
0.990.81
0.88
0.70
cp75
translation"core"
Cyanobacteria
Blaise Li IGH, 20/09/2013 15 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
I cp75 is a direct translation of cg75
→ The trees should be the same.I But the analyses conflict in the identification of the plastid
sister-group.→ Something is wrong!
→ Can we have confidence in one of these trees?
Blaise Li IGH, 20/09/2013 16 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!
→ Can we have confidence in one of these trees?
Blaise Li IGH, 20/09/2013 16 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of the plastidsister-group.
→ Something is wrong!→ Can we have confidence in one of these trees?
Blaise Li IGH, 20/09/2013 16 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!
→ Can we have confidence in one of these trees?
Blaise Li IGH, 20/09/2013 16 / 45
Contributions to methods in phylogeny Contribution to primary analyses
‘Standard’ ML analyses
I cp75 is a direct translation of cg75→ The trees should be the same.
I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!
→ Can we have confidence in one of these trees?
Blaise Li IGH, 20/09/2013 16 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotides or amino-acids?
I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).
I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.
I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .
Blaise Li IGH, 20/09/2013 17 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotides or amino-acids?
I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).
I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.
I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .
Blaise Li IGH, 20/09/2013 17 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotides or amino-acids?
I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).
I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.
I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .
Blaise Li IGH, 20/09/2013 17 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotides or amino-acids?
I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely
I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).
I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.
I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .
Blaise Li IGH, 20/09/2013 17 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.
I Different groups may have different mutational biases and codonpreferences.
I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that
composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.
Blaise Li IGH, 20/09/2013 18 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.
I Different groups may have different mutational biases and codonpreferences.
I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that
composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.
Blaise Li IGH, 20/09/2013 18 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.
I Different groups may have different mutational biases and codonpreferences.
I This influences the composition of the genome.
I Sites under low selection constraint tend to conform to thatcomposition.
→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.
Blaise Li IGH, 20/09/2013 18 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.
I Different groups may have different mutational biases and codonpreferences.
I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that
composition.
→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.
Blaise Li IGH, 20/09/2013 18 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attraction
We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.
I Different groups may have different mutational biases and codonpreferences.
I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that
composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.
Blaise Li IGH, 20/09/2013 18 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attractionT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li IGH, 20/09/2013 19 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Nucleotide composition attractionT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li IGH, 20/09/2013 19 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75Blaise Li IGH, 20/09/2013 20 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
3rd pos. G+C
Blaise Li IGH, 20/09/2013 20 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
1st pos. G+C
Blaise Li IGH, 20/09/2013 20 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Composition and codon usage biases
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC
, ArgAArgC
):
10.50G+C proportion by codon position (1: −, 2: +, 3: ×):
−+ ×−+ ×
−+ ×−+ ×
−+×
−+ ×
−+×
−+ ×
−+×−+×
−+ ×
−+×
−+ ×
−+ ×
−+ ×
−+×
−+ ×
cg75
1st pos. G+C
ArgA bias
LeuT bias
Blaise Li IGH, 20/09/2013 20 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removalT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li IGH, 20/09/2013 21 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removalT C A G
T
TT-Phe
TC-
Ser
TA-Tyr
TG-Cys
TT- TC- TA- TG-TT-
LeuTC- TA-
TerTG- Ter
TT- TC- TA- TG- Trp
C
CT-
Leu
CC-
Pro
CA-His
CG-
ArgCT- CC- CA- CG-CT- CC- CA-
GlnCG-
CT- CC- CA- CG-
A
AT-Ile
AC-
Thr
AA-Asn
AG-Ser
AT- AC- AA- AG-AT- AC- AA-
LysAG-
ArgAT- Met AC- AA- AG-
G
GT-
Val
GC-
Ala
GA-Asp
GG-
GlyGT- GC- GA- GG-GT- GC- GA-
GluGG-
GT- GC- GA- GG-
Blaise Li IGH, 20/09/2013 21 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removal
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
SO-6
UNIT+
NOST-1
SPM-3
OSC-2
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
1.000.99
1.001.00
1.000.99
1.00
0.54
0.881.00
0.99
1.00
cg75_no3Blaise Li IGH, 20/09/2013 22 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removal
I UNIT+ monophyly restored.
I But some signal not corresponding to synonymous substitutionswas lost.
I This signal can be partially saved by recoding instead ofremoving.
Blaise Li IGH, 20/09/2013 23 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removal
I UNIT+ monophyly restored.I But some signal not corresponding to synonymous substitutions
was lost.
I This signal can be partially saved by recoding instead ofremoving.
Blaise Li IGH, 20/09/2013 23 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position removal
I UNIT+ monophyly restored.I But some signal not corresponding to synonymous substitutions
was lost.I This signal can be partially saved by recoding instead of
removing.
Blaise Li IGH, 20/09/2013 23 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recodingT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li IGH, 20/09/2013 24 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recodingT C A G
T
TTYPhe
TCN
Ser
TAYTyr
TGYCys
TTY TCN TAY TGYTTN
LeuTCN TAR
TerTGR Ter
TTN TCN TAR TGG Trp
C
CTN
Leu
CCN
Pro
CAYHis
CGN
ArgCTN CCN CAY CGNCTN CCN CAR
GlnCGN
CTN CCN CAR CGN
A
ATHIle
ACN
Thr
AAYAsn
AGNSer
ATH ACN AAY AGNATH ACN AAR
LysAGN
ArgATG Met ACN AAR AGN
G
GTN
Val
GCN
Ala
GAYAsp
GGN
GlyGTN GCN GAY GGNGTN GCN GAR
GluGGN
GTN GCN GAR GGN
Blaise Li IGH, 20/09/2013 24 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recoding
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
SO-6
UNIT+
NOST-1
SPM-3
OSC-2
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
1.000.99
1.001.00
1.000.99
1.00
0.60
0.891.00
0.98
1.00
cg75_degen3Blaise Li IGH, 20/09/2013 25 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restored.
I But codon degeneracy exists at other positions, associated withswitches between Leu, Arg and Ser families.→ We can try to recode these degenerate codon positions too.
Blaise Li IGH, 20/09/2013 26 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restored.I But codon degeneracy exists at other positions, associated with
switches between Leu, Arg and Ser families.
→ We can try to recode these degenerate codon positions too.
Blaise Li IGH, 20/09/2013 26 / 45
Contributions to methods in phylogeny Contribution to primary analyses
3rd codon position recoding
I Similar effect as no3: UNIT+ monophyly restored.I But codon degeneracy exists at other positions, associated with
switches between Leu, Arg and Ser families.→ We can try to recode these degenerate codon positions too.
Blaise Li IGH, 20/09/2013 26 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Degenerating all synonymous codon positionsT C A G
T
TTTPhe
TCT
Ser
TATTyr
TGTCys
TTC TCC TAC TGCTTA
LeuTCA TAA
TerTGA Ter
TTG TCG TAG TGG Trp
C
CTT
Leu
CCT
Pro
CATHis
CGT
ArgCTC CCC CAC CGCCTA CCA CAA
GlnCGA
CTG CCG CAG CGG
A
ATTIle
ACT
Thr
AATAsn
AGTSer
ATC ACC AAC AGCATA ACA AAA
LysAGA
ArgATG Met ACG AAG AGG
G
GTT
Val
GCT
Ala
GATAsp
GGT
GlyGTC GCC GAC GGCGTA GCA GAA
GluGGA
GTG GCG GAG GGG
Blaise Li IGH, 20/09/2013 27 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Degenerating all synonymous codon positionsT C A G
T
TTYPhe
WSN
Ser
TAYTyr
TGYCys
TTY WSN TAY TGYYTN
LeuWSN TAR
TerTGR Ter
YTN WSN TAR TGG Trp
C
YTN
Leu
CCN
Pro
CAYHis
MGN
ArgYTN CCN CAY MGNYTN CCN CAR
GlnMGN
YTN CCN CAR MGN
A
ATHIle
ACN
Thr
AAYAsn
WSNSer
ATH ACN AAY WSNATH ACN AAR
LysMGN
ArgATG Met ACN AAR MGN
G
GTN
Val
GCN
Ala
GAYAsp
GGN
GlyGTN GCN GAY GGNGTN GCN GAR
GluGGN
GTN GCN GAR GGN
Blaise Li IGH, 20/09/2013 27 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Degenerating all synonymous codon positions
FirmicutesChloroflexi
ChlorobiProteobacteria
Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)
SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)
SPM-3NOST-1
OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta
0.980.97
1.001.001.000.72
1.00
0.700.720.720.720.94
0.59
1.00
cg75
Firmicutes
Chloroflexi
Chlorobi
Proteobacteria
Gloeobacter_violaceus (GBACT)
Synechococcus_JA33Ab (GBACT)
Glaucophyta
Rhodophyta
Streptophyta
Chlorophyta
UNIT+
OSC-2
SO-6
NOST-1
SPM-3
1.001.00
1.00
1.00
1.00
1.000.80
1.00
0.980.64
0.75
0.59
cg75_degen
"core"Cyanobacteria
Blaise Li IGH, 20/09/2013 28 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Degenerating all synonymous codon positions
I Core Cyanobacteria are sister to plastids, like when usingamino-acids.
I 1st and 2nd position signal significantly contributes tocomposition attraction.
Blaise Li IGH, 20/09/2013 29 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Degenerating all synonymous codon positions
I Core Cyanobacteria are sister to plastids, like when usingamino-acids.
I 1st and 2nd position signal significantly contributes tocomposition attraction.
Blaise Li IGH, 20/09/2013 29 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Conclusions
I Incongruence between nucleotide and amino-acid data is mainlydue to G+C convergence biases. It is likely that plastidsdiverged early from the Cyanobacteria.
I rDNA sequences are directly constrained by selection, this mightexplain the results similar to amino-acid data.
Blaise Li IGH, 20/09/2013 30 / 45
Contributions to methods in phylogeny Contribution to primary analyses
Conclusions
I Incongruence between nucleotide and amino-acid data is mainlydue to G+C convergence biases. It is likely that plastidsdiverged early from the Cyanobacteria.
I rDNA sequences are directly constrained by selection, this mightexplain the results similar to amino-acid data.
Blaise Li IGH, 20/09/2013 30 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Part III
Identifying reliable clades
Blaise Li IGH, 20/09/2013 31 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Clade supports
I In the previous part, I mentioned ‘bootstrap supports’.
I Other types of clade support exist. They give confidenceindications for the clades.
I My PhD work involved the development of a kind of supportindicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).
I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.
Blaise Li IGH, 20/09/2013 32 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Clade supports
I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence
indications for the clades.
I My PhD work involved the development of a kind of supportindicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).
I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.
Blaise Li IGH, 20/09/2013 32 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Clade supports
I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence
indications for the clades.I My PhD work involved the development of a kind of support
indicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).
I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.
Blaise Li IGH, 20/09/2013 32 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Clade supports
I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence
indications for the clades.I My PhD work involved the development of a kind of support
indicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).
I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.
Blaise Li IGH, 20/09/2013 32 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sensitivity
I Ability to resist to changes in the method
I Example: ‘Navajo rugs’:1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination
3. with 2 parameters → smallbinary matrix to place near thebranch
Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi
et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of
ARTICLE IN PRESS
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
Collembola + Ectognatha
Crustacea + Ectognatha
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
1 2 4 8
124
Gap
/cha
nge
transversion/transition
A
B
Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data
for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is
indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white
square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-
tognatha.
G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327
Blaise Li IGH, 20/09/2013 33 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sensitivity
I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:
1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination
3. with 2 parameters → smallbinary matrix to place near thebranch
Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi
et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of
ARTICLE IN PRESS
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
Collembola + Ectognatha
Crustacea + Ectognatha
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
1 2 4 8
124
Gap
/cha
nge
transversion/transition
A
B
Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data
for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is
indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white
square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-
tognatha.
G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327
Blaise Li IGH, 20/09/2013 33 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sensitivity
I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:
1. analyses repeated with different parameter values
2. clade occurrences recorded for each parameter combination
3. with 2 parameters → smallbinary matrix to place near thebranch
Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi
et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of
ARTICLE IN PRESS
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
Collembola + Ectognatha
Crustacea + Ectognatha
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
1 2 4 8
124
Gap
/cha
nge
transversion/transition
A
B
Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data
for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is
indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white
square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-
tognatha.
G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327
Blaise Li IGH, 20/09/2013 33 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sensitivity
I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:
1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination
3. with 2 parameters → smallbinary matrix to place near thebranch
Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi
et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of
ARTICLE IN PRESS
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
Collembola + Ectognatha
Crustacea + Ectognatha
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
1 2 4 8
124
Gap
/cha
nge
transversion/transition
A
B
Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data
for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is
indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white
square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-
tognatha.
G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327
Blaise Li IGH, 20/09/2013 33 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sensitivity
I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:
1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination
3. with 2 parameters → smallbinary matrix to place near thebranch
Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi
et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of
ARTICLE IN PRESS
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
Collembola + Ectognatha
Crustacea + Ectognatha
Chi
lopo
da
Sym
phyl
a
Pro
tura
"Jap
ygoi
dea"
Cam
pode
idae
Mal
acos
trac
a
Ent
omos
trac
a
Col
lem
bola
Arc
haeo
gnat
ha
Tric
hole
pidi
on
Zyg
ento
ma
s.s.
Pte
rygo
ta
1 2 4 8
124
Gap
/cha
nge
transversion/transition
A
B
Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data
for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is
indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white
square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-
tognatha.
G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327
Blaise Li IGH, 20/09/2013 33 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbance
I Example: bootstrap and jacknife:1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbanceI Example: bootstrap and jacknife:
1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbanceI Example: bootstrap and jacknife:
1. random data resampling
2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbanceI Example: bootstrap and jacknife:
1. random data resampling2. analysis of the resampled data
3. N times4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbanceI Example: bootstrap and jacknife:
1. random data resampling2. analysis of the resampled data3. N times
4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Robustness
I Ability to resist to data disturbanceI Example: bootstrap and jacknife:
1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs
Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma
Trachurus trachurusTrachurus japonicus
Caranx melampygusCarangoides armatus
Paralichthys olivaceusPlatichthys bicoloratus
Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta
Indostomus paradoxus Monopterus albus Mastacembelus favus
Crenimugil crenilabis Mugil cephalus
Petroscirtes breviceps Salarias fasciatus
Arcos sp. Aspasma minima
Rivulus marmoratus Gambusia affinis
Oryzias latipesCololabis sairaExocoetus volitans
Melanotaenia lacustris Hypoatherina tsurugae
Scomber scombrus
Auxis rocheiAuxis thazard
Euthynnus alletteratusKatsuwonus pelamis
Thunnus alalungaThunnus thynnus
Macroramphosus scolopaxAeoliscus strigatus
Eurypegasus draconisPegasus volitans
Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni
Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus
Arctoscopus japonicus Aptocyclus ventricosusCottus reinii
Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus
Etheostoma radiosum
Lophius litulon Lophius americanus
Chaunax abei Chaunax tosaensis
Caulophryne pelagica Melanocetus murrayi
Antigonia capros
Emmelichthys struhsakeri Pterocaesio tile
Pagrus auriga Pagrus major
Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola
Solenostomus cyanopterus
Ophidiiformes
Carangidae
Pleuronectiformes
Gobioidei
Gasterosteiformes-1Synbranchiformes
Mugiliformes
Gobiesocidae
Atheriniformes
Scombridae
Scorpaeniformes-2
Zoarcidae
Percidae
Lophiiformes
Sparidae
Tetraodontiformes
outgroups
Blenniidae
Scorpaeniformes-1
Trichodontidae
Zeiformes
Emmelichthyidae
Cyprinodontiformes
Pholidae
Lutjanidae
(Indostomidae)
Beloniformes
Gasterosteiformes-2(Syngnathoidei)
Gasterosteiformes-3(Gasterosteoidei)
97 / 100
– / 55
80 / 100
– / 74
– / 50
70 / 100
99 / 100
87 / 100
97 / 100
– / 87
– / 90
– / 100– / 98
– / 85
97 / 100
99 / 100
70 / 99
70 / 100
92 / 100
71 / 100
91 / 100
77 / 97
77 / 87
– / 88
93 / 100
87 / 100
– / –
– / –
– / –
– / –
– / –
– / –
A
B
C
D
E
F
G
H
Perco-morpha
Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.
R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231
Blaise Li IGH, 20/09/2013 34 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Reliability
I Sensitivity and robustness reflect the strength of the signal
(historical or artefact-inducing).I Are inferred relationships between species compatible with the
‘true’ tree? Do they reflect well the real phylogeny?I The big problem: the real tree is unknown (usually, it’s the one
we are searching).I A possible proxy: agreement between analyses.
Blaise Li IGH, 20/09/2013 35 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Reliability
I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).
I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?
I The big problem: the real tree is unknown (usually, it’s the onewe are searching).
I A possible proxy: agreement between analyses.
Blaise Li IGH, 20/09/2013 35 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Reliability
I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).
I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?
I The big problem: the real tree is unknown (usually, it’s the onewe are searching).
I A possible proxy: agreement between analyses.
Blaise Li IGH, 20/09/2013 35 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Reliability
I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).
I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?
I The big problem: the real tree is unknown (usually, it’s the onewe are searching).
I A possible proxy: agreement between analyses.
Blaise Li IGH, 20/09/2013 35 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Reliability
I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).
I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?
I The big problem: the real tree is unknown (usually, it’s the onewe are searching).
I A possible proxy: agreement between analyses.
Blaise Li IGH, 20/09/2013 35 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Combined or separate analyses?
I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.
I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.
I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true
historical relationships.
Blaise Li IGH, 20/09/2013 36 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Combined or separate analyses?
I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.
I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.
I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true
historical relationships.
Blaise Li IGH, 20/09/2013 36 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Combined or separate analyses?
I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.
I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.
I Non-historical signal may vary depending on the dataset.
I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true
historical relationships.
Blaise Li IGH, 20/09/2013 36 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Combined or separate analyses?
I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.
I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.
I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.
I Clade repeatedly found across datasets should represent truehistorical relationships.
Blaise Li IGH, 20/09/2013 36 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Combined or separate analyses?
I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.
I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.
I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true
historical relationships.
Blaise Li IGH, 20/09/2013 36 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689
Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer
Dettai and Lecointre (2005):I table showing repeated
clades across differentanalyses
I ‘handcrafted’ → headachesand errors!
I had to be formalized, andsummarized into a cladesupport that could becalculated automatically
Blaise Li IGH, 20/09/2013 37 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689
Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer
Dettai and Lecointre (2005):I table showing repeated
clades across differentanalyses
I ‘handcrafted’ → headachesand errors!
I had to be formalized, andsummarized into a cladesupport that could becalculated automatically
Blaise Li IGH, 20/09/2013 37 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689
Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer
Dettai and Lecointre (2005):I table showing repeated
clades across differentanalyses
I ‘handcrafted’ → headachesand errors!
I had to be formalized, andsummarized into a cladesupport that could becalculated automatically
Blaise Li IGH, 20/09/2013 37 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
I The trees to compare must be obtained using independentdatasets.
I Combining data may improve result accuracy, hence the partialcombinations technique.
Blaise Li IGH, 20/09/2013 38 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
I The trees to compare must be obtained using independentdatasets.
I Combining data may improve result accuracy
, hence the partialcombinations technique.
Blaise Li IGH, 20/09/2013 38 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
A repetition index for clades
I The trees to compare must be obtained using independentdatasets.
I Combining data may improve result accuracy, hence the partialcombinations technique.
Blaise Li IGH, 20/09/2013 38 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Independence
I What are the minimal parts of my data that I may keepseparate?
I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.
I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).
I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).
Blaise Li IGH, 20/09/2013 39 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Independence
I What are the minimal parts of my data that I may keepseparate?
I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.
I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).
I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).
Blaise Li IGH, 20/09/2013 39 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Independence
I What are the minimal parts of my data that I may keepseparate?
I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.
I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).
I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).
Blaise Li IGH, 20/09/2013 39 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Independence
I What are the minimal parts of my data that I may keepseparate?
I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.
I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).
I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).
Blaise Li IGH, 20/09/2013 39 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partial combinations
Aacbde
Bcdabe
Cabced
α
A ∪ B ∪ Cabcde
α
→
↓A ∪ B
abdec
α data number ofsets occurrences for α
A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2
Blaise Li IGH, 20/09/2013 40 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partial combinations
Aacbde
Bcdabe
Cabced
αA ∪ B ∪ C
abcde
α
→
↓A ∪ B
abdec
α data number ofsets occurrences for α
A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2
Blaise Li IGH, 20/09/2013 40 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partial combinations
Aacbde
Bcdabe
Cabced
αA ∪ B ∪ C
abcde
α
→
↓A ∪ B
abdec
α
data number ofsets occurrences for α
A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2
Blaise Li IGH, 20/09/2013 40 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partial combinations
Aacbde
Bcdabe
Cabced
αA ∪ B ∪ C
abcde
α
→
↓A ∪ B
abdec
α data number ofsets occurrences for α
A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2
Blaise Li IGH, 20/09/2013 40 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:
I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B
Analyse, and then compare ‘independent’ trees:
I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:I Elementary datasets: A, B and C
I Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B
Analyse, and then compare ‘independent’ trees:
I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ C
I Partial combinations: B ∪ C , A ∪ C and A ∪ BAnalyse, and then compare ‘independent’ trees:
I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B
Analyse, and then compare ‘independent’ trees:
I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B
Analyse, and then compare ‘independent’ trees:
I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Partitioning schemes
Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B
Analyse, and then compare ‘independent’ trees:I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),
(C ,A ∪ B) and (A ∪ B ∪ C)
Blaise Li IGH, 20/09/2013 41 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:
N(α) = maxPSc∈SP(∑
d∈PSc δα,d)δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning scheme
SP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0
d : dataset (elementary or combined)PSc : Partitioning scheme
SP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)
PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Sums of occurrences
I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences
I Choose the partitioning scheme providing the highest sum(optimal signal extraction).
I Sum of occurrences for clade α:N(α) = maxPSc∈SP(
∑d∈PSc δα,d)
δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes
Blaise Li IGH, 20/09/2013 42 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of
fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly
the same set of taxa?I How to summarize the results?I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python language
I Application to 4 nuclear protein-coding genes in a large group offish (Acanthomorpha)
I How to count clades when primary analyses don’t have exactlythe same set of taxa?
I How to summarize the results?I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of
fish (Acanthomorpha)
I How to count clades when primary analyses don’t have exactlythe same set of taxa?
I How to summarize the results?I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of
fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly
the same set of taxa?
I How to summarize the results?I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of
fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly
the same set of taxa?I How to summarize the results?
I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny Contribution to secondary analyses
Application
I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses
I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of
fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly
the same set of taxa?I How to summarize the results?I Presentation available for those interested in fish phylogeny
Blaise Li IGH, 20/09/2013 43 / 45
Contributions to methods in phylogeny
Conclusions
I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:
I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable
results.I Secondary analyses can be used to identify a posteriori good
methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.
I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.
Blaise Li IGH, 20/09/2013 44 / 45
Contributions to methods in phylogeny
Conclusions
I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:
I Make a better use of the data during the primary analyses.
I Compare the results of primary analyses to identify reliableresults.
I Secondary analyses can be used to identify a posteriori goodmethods for primary analyses: methods better extractinghistorical signal should yield more coherent results.
I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.
Blaise Li IGH, 20/09/2013 44 / 45
Contributions to methods in phylogeny
Conclusions
I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:
I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable
results.
I Secondary analyses can be used to identify a posteriori goodmethods for primary analyses: methods better extractinghistorical signal should yield more coherent results.
I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.
Blaise Li IGH, 20/09/2013 44 / 45
Contributions to methods in phylogeny
Conclusions
I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:
I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable
results.I Secondary analyses can be used to identify a posteriori good
methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.
I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.
Blaise Li IGH, 20/09/2013 44 / 45
Contributions to methods in phylogeny
Conclusions
I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:
I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable
results.I Secondary analyses can be used to identify a posteriori good
methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.
I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.
Blaise Li IGH, 20/09/2013 44 / 45
Contributions to methods in phylogeny
Thanks for your attention
I Contact: [email protected]
Blaise Li IGH, 20/09/2013 45 / 45