157
Contributions to methods in phylogeny Contributions to methods in phylogeny My PhD and post-doc works Blaise Li Institut de Génétique Humaine - 20/09/2013 Blaise Li IGH, 20/09/2013 1 / 45

MyPhDandpost-docworks BlaiseLi …

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Contributions to methods in phylogenyMy PhD and post-doc works

Blaise Li

Institut de Génétique Humaine - 20/09/2013

Blaise Li IGH, 20/09/2013 1 / 45

Page 2: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Outline

I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.

I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.

I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.

I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.

Blaise Li IGH, 20/09/2013 2 / 45

Page 3: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Outline

I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.

I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.

I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.

I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.

Blaise Li IGH, 20/09/2013 2 / 45

Page 4: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Outline

I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.

I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.

I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.

I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.

Blaise Li IGH, 20/09/2013 2 / 45

Page 5: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Outline

I During my PhD and two post-docs, I’ve mainly worked onmethodological aspects of phylogeny.

I You may not be very familiar with this, so I’ll start by brieflyreminding you a few things about (molecular) phylogeny.

I Then, I’ll present you my recent and second post-doc work:trying to avoid obtaining wrong trees in the presence ofcomposition biases in the genomes of bacteria and chloroplasts.

I And finally I’ll briefly talk about my PhD work: trying to extractreliable groups from a bunch of fish phylogenies.

Blaise Li IGH, 20/09/2013 2 / 45

Page 6: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one

slide.

I I was supposed to test a new phylogenetic inference algorithmon real data but we failed at making the algorithm perform well,so the project was abandoned.

I So I used my time studying computer sciences and doing anamateur work on human population genetics.

I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.

I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.

Blaise Li IGH, 20/09/2013 3 / 45

Page 7: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one

slide.I I was supposed to test a new phylogenetic inference algorithm

on real data but we failed at making the algorithm perform well,so the project was abandoned.

I So I used my time studying computer sciences and doing anamateur work on human population genetics.

I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.

I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.

Blaise Li IGH, 20/09/2013 3 / 45

Page 8: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one

slide.I I was supposed to test a new phylogenetic inference algorithm

on real data but we failed at making the algorithm perform well,so the project was abandoned.

I So I used my time studying computer sciences and doing anamateur work on human population genetics.

I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.

I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.

Blaise Li IGH, 20/09/2013 3 / 45

Page 9: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one

slide.I I was supposed to test a new phylogenetic inference algorithm

on real data but we failed at making the algorithm perform well,so the project was abandoned.

I So I used my time studying computer sciences and doing anamateur work on human population genetics.

I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.

I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.

Blaise Li IGH, 20/09/2013 3 / 45

Page 10: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

My first post-doc in one slideI I’ll skip what I did during my first post-doc, so here it is in one

slide.I I was supposed to test a new phylogenetic inference algorithm

on real data but we failed at making the algorithm perform well,so the project was abandoned.

I So I used my time studying computer sciences and doing anamateur work on human population genetics.

I I gathered worldwide population polymorphism data from variousstudies, ran a kind of classification program on it, and wrotescripts to generate graphical output in large number.

I I wrote a detailed comment of the results, speculating aboutsuch things as a possible link between the ancestors of pygmies,‘bushmen’, and various so-called ‘negrito’ populations of southand south-east Asia.

Blaise Li IGH, 20/09/2013 3 / 45

Page 11: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Part I

Reminders

Blaise Li IGH, 20/09/2013 4 / 45

Page 12: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.

I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.

I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.

I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.

Blaise Li IGH, 20/09/2013 5 / 45

Page 13: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.

I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.

I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.

I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.

Blaise Li IGH, 20/09/2013 5 / 45

Page 14: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.

I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.

I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.

I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.

Blaise Li IGH, 20/09/2013 5 / 45

Page 15: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I DNA sequences accumulate mutations as time passes anddiverge between branches of the tree of life.

I We gather homologous (i.e. deriving from a common ancestor)sequence data and want to infer the evolutionary history that ledto the observed sequences.

I The relationships between the sequences can be represented by atree whose branch lengths are proportional to the quantity ofmutations accumulated since the ancestor represented by thebranching point.

I We use maths and informatics to search for the tree thathopefully best represents the true genealogical relationshipsbetween the sequences.

Blaise Li IGH, 20/09/2013 5 / 45

Page 16: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.

I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.

I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).

I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.

Blaise Li IGH, 20/09/2013 6 / 45

Page 17: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.

I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.

I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).

I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.

Blaise Li IGH, 20/09/2013 6 / 45

Page 18: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.

I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.

I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).

I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.

Blaise Li IGH, 20/09/2013 6 / 45

Page 19: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Molecular phylogeny, short and simplified

I We set up a model for sequence evolution: basically, probabilitiesof substitution between nucleotides or between amino-acids.

I Algorithms are used to explore the possible topologies, sets ofbranch lengths, and model parameters.

I These trees are evaluated using the probability of the sequencesbeing as we observe them if we assume they evolved accordingto our model, along the branches of the tree (likelihood).

I We retain the most likely tree (Maximum Likelihood): This iswhat I call a primary analysis.

Blaise Li IGH, 20/09/2013 6 / 45

Page 20: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Branch

(Internal) node

Blaise Li IGH, 20/09/2013 7 / 45

Page 21: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Taxa(leaves, terminals)

Blaise Li IGH, 20/09/2013 7 / 45

Page 22: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Clades

Blaise Li IGH, 20/09/2013 7 / 45

Page 23: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Clades

Clade =monophyletic group(an ancestor and allits descendants)

Blaise Li IGH, 20/09/2013 7 / 45

Page 24: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Sister-groups

Blaise Li IGH, 20/09/2013 7 / 45

Page 25: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Reminders

Some vocabulary

β

Batrachoidiformes

Gobioidei

Apogonidae

γ

δH

E+E’

Q

F

Indostomus

Symbranchoidei

Mastacembeloidei

f1 Channoidei

Anabantoidei

L

η

Sister-groups

Two sister-groupsform a clade

Blaise Li IGH, 20/09/2013 7 / 45

Page 26: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Part II

Dealing with compositionconvergence

Blaise Li IGH, 20/09/2013 8 / 45

Page 27: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

The endosymbiotic origin of plastids

Cyanobacteria

Glaucophyta

green algae

red algae

land plants

chromalveolates. . .

euglenids

primary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

(after Keeling, 2010)Blaise Li IGH, 20/09/2013 9 / 45

Page 28: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

The endosymbiotic origins of plastids

There is, however, a large number of endosymbioticrelationships seemingly based on photosynthesis thatare less well understood and vary across the entirespectrum of integration, from passing associations tolong term and seemingly well-developed partnerships(e.g. Rumpho et al. 2008). Indeed, the line between

what is an organelle and what is an endosymbiont isan arbitrary one. There are a few different, specific cri-teria that have been argued to distinguish the two, themost common being the genetic integration of the twopartners, and the establishment of a protein-targetingsystem. Most photosynthetic endosymbionts probably

primary endosymbiosis

primary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

secondary endosymbiosis

serial secondary endosymbiosis

(green alga)

tertiary endosymbiosis(diatom)

stramenopiles

ciliates

Dinophysis

Lepididinium

euglenids

chlorarachniophytes

Paulinella

dinoflagellatesApicomplexa

green algae

Durinskia

Karlodinium

red algae

glaucophytes

tertiary endosymbiosis(cryptomonad)

tertiary endosymbiosis(haptophyte)

haptophytes

cryptomonads

land plants

?

Figure 2. (Caption opposite.)

732 P. J. Keeling Review. The origin and fate of plastids

Phil. Trans. R. Soc. B (2010)

on May 13, 2011rstb.royalsocietypublishing.orgDownloaded from

Blaise Li IGH, 20/09/2013 10 / 45

Page 29: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

The endosymbiotic origin of plastids

plastids

section I

section III section IV

Blaise Li IGH, 20/09/2013 11 / 45

Page 30: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:

I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalitiesDifficulty amplified because of endosymbiosis:

I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 31: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)

I enough time for divergences and convergences in evolutionmodalities

Difficulty amplified because of endosymbiosis:I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 32: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalities

Difficulty amplified because of endosymbiosis:I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 33: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalitiesDifficulty amplified because of endosymbiosis:

I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 34: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalitiesDifficulty amplified because of endosymbiosis:

I simplification (drift and loss of genes)

I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 35: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalitiesDifficulty amplified because of endosymbiosis:

I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)

→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 36: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Old events are generally difficult to correctly infer:I mutational saturation (sequence randomization)I enough time for divergences and convergences in evolution

modalitiesDifficulty amplified because of endosymbiosis:

I simplification (drift and loss of genes)I gene relocation (bacteria → host nucleus)→ Standard methods of analysis can produce artefactual groupings.

Blaise Li IGH, 20/09/2013 12 / 45

Page 37: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Overly straightforward analyses give conflicting results.

I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.

Blaise Li IGH, 20/09/2013 13 / 45

Page 38: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastids

I protein-coding gene data: plastids close to pluricellularCyanobacteria

What is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.

Blaise Li IGH, 20/09/2013 13 / 45

Page 39: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular

Cyanobacteria

What is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.

Blaise Li IGH, 20/09/2013 13 / 45

Page 40: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.

Blaise Li IGH, 20/09/2013 13 / 45

Page 41: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Phylogenetic difficulties

Overly straightforward analyses give conflicting results.I rDNA and amino-acid data: early divergence of plastidsI protein-coding gene data: plastids close to pluricellular

CyanobacteriaWhat is the cause of this incongruence?

→ We studied the phenomenon on a dataset of protein-coding genesfrom plastids (or relocated in the plant host nucleus) and their(cyano)bacterial homologues.

Blaise Li IGH, 20/09/2013 13 / 45

Page 42: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 43: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:

I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 44: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)

I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 45: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)

I SPM-3, SO-6, GBACT, UNIT+ (section I)I 75 protein-coding genes (but with some missing sequences,

14% overall)I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 46: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 47: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 48: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Dataset

I 42 taxa, including 8 outgoup (non-cyano)bacteria,16 Cyanobacteria, and plastids from 1 Glaucophyta,4 Rhodophyta (red algae) and 13 Viridiplantae (green plants)

I Cyanobacteria groups present:I NOST-1 (section IV)I OSC-2 (section III)I SPM-3, SO-6, GBACT, UNIT+ (section I)

I 75 protein-coding genes (but with some missing sequences,14% overall)

I Concatenated dataset (cg75) and its translation (cp75)

Blaise Li IGH, 20/09/2013 14 / 45

Page 49: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

Blaise Li IGH, 20/09/2013 15 / 45

Page 50: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

"basal" GBACT

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation

"basal" GBACT

Blaise Li IGH, 20/09/2013 15 / 45

Page 51: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

pluricellulars

grade of Cyanobacteria

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation"core"

Cyanobacteria

Blaise Li IGH, 20/09/2013 15 / 45

Page 52: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

pluricellulars

grade of Cyanobacteria

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.70

1.00

0.990.81

0.88

0.70

cp75

translation"core"

Cyanobacteria

Blaise Li IGH, 20/09/2013 15 / 45

Page 53: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

I cp75 is a direct translation of cg75

→ The trees should be the same.I But the analyses conflict in the identification of the plastid

sister-group.→ Something is wrong!

→ Can we have confidence in one of these trees?

Blaise Li IGH, 20/09/2013 16 / 45

Page 54: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!

→ Can we have confidence in one of these trees?

Blaise Li IGH, 20/09/2013 16 / 45

Page 55: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of the plastidsister-group.

→ Something is wrong!→ Can we have confidence in one of these trees?

Blaise Li IGH, 20/09/2013 16 / 45

Page 56: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!

→ Can we have confidence in one of these trees?

Blaise Li IGH, 20/09/2013 16 / 45

Page 57: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

‘Standard’ ML analyses

I cp75 is a direct translation of cg75→ The trees should be the same.

I But the analyses conflict in the identification of the plastidsister-group.→ Something is wrong!

→ Can we have confidence in one of these trees?

Blaise Li IGH, 20/09/2013 16 / 45

Page 58: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotides or amino-acids?

I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).

I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.

I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .

Blaise Li IGH, 20/09/2013 17 / 45

Page 59: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotides or amino-acids?

I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).

I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.

I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .

Blaise Li IGH, 20/09/2013 17 / 45

Page 60: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotides or amino-acids?

I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).

I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.

I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .

Blaise Li IGH, 20/09/2013 17 / 45

Page 61: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotides or amino-acids?

I Nucleotide sequences are more likely to randomize with time.I codon degeneracy → lowered selective pressureI only 4 states → convergence likely

I But estimation of substitution rates is easier for nucleotides(4× 4 substitution matrix).

I And there may be not enough variability in amino-acid sequencesto sort out relationships within recent groups.

I Here, we are at large scale, and low bootstrap supports suggestconflicting signals for nucleotides. . .

Blaise Li IGH, 20/09/2013 17 / 45

Page 62: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.

I Different groups may have different mutational biases and codonpreferences.

I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that

composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.

Blaise Li IGH, 20/09/2013 18 / 45

Page 63: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.

I Different groups may have different mutational biases and codonpreferences.

I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that

composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.

Blaise Li IGH, 20/09/2013 18 / 45

Page 64: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.

I Different groups may have different mutational biases and codonpreferences.

I This influences the composition of the genome.

I Sites under low selection constraint tend to conform to thatcomposition.

→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.

Blaise Li IGH, 20/09/2013 18 / 45

Page 65: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.

I Different groups may have different mutational biases and codonpreferences.

I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that

composition.

→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.

Blaise Li IGH, 20/09/2013 18 / 45

Page 66: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attraction

We focus on a particular type of reconstruction artefact: nucleotidecomposition attraction.

I Different groups may have different mutational biases and codonpreferences.

I This influences the composition of the genome.I Sites under low selection constraint tend to conform to that

composition.→ Shared mutational biases and codon preferences between (possiblydistant) groups may induce convergence in the nucleotide sequence,especially at 3rd codon position.

Blaise Li IGH, 20/09/2013 18 / 45

Page 67: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attractionT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li IGH, 20/09/2013 19 / 45

Page 68: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Nucleotide composition attractionT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li IGH, 20/09/2013 19 / 45

Page 69: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75Blaise Li IGH, 20/09/2013 20 / 45

Page 70: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

3rd pos. G+C

Blaise Li IGH, 20/09/2013 20 / 45

Page 71: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

1st pos. G+C

Blaise Li IGH, 20/09/2013 20 / 45

Page 72: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Composition and codon usage biases

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

10-1-2-3codon usage log10 ratios ( LeuTLeuC, SerAGSerTC

, ArgAArgC

):

10.50G+C proportion by codon position (1: −, 2: +, 3: ×):

−+ ×−+ ×

−+ ×−+ ×

−+×

−+ ×

−+×

−+ ×

−+×−+×

−+ ×

−+×

−+ ×

−+ ×

−+ ×

−+×

−+ ×

cg75

1st pos. G+C

ArgA bias

LeuT bias

Blaise Li IGH, 20/09/2013 20 / 45

Page 73: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removalT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li IGH, 20/09/2013 21 / 45

Page 74: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removalT C A G

T

TT-Phe

TC-

Ser

TA-Tyr

TG-Cys

TT- TC- TA- TG-TT-

LeuTC- TA-

TerTG- Ter

TT- TC- TA- TG- Trp

C

CT-

Leu

CC-

Pro

CA-His

CG-

ArgCT- CC- CA- CG-CT- CC- CA-

GlnCG-

CT- CC- CA- CG-

A

AT-Ile

AC-

Thr

AA-Asn

AG-Ser

AT- AC- AA- AG-AT- AC- AA-

LysAG-

ArgAT- Met AC- AA- AG-

G

GT-

Val

GC-

Ala

GA-Asp

GG-

GlyGT- GC- GA- GG-GT- GC- GA-

GluGG-

GT- GC- GA- GG-

Blaise Li IGH, 20/09/2013 21 / 45

Page 75: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removal

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

SO-6

UNIT+

NOST-1

SPM-3

OSC-2

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

1.000.99

1.001.00

1.000.99

1.00

0.54

0.881.00

0.99

1.00

cg75_no3Blaise Li IGH, 20/09/2013 22 / 45

Page 76: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removal

I UNIT+ monophyly restored.

I But some signal not corresponding to synonymous substitutionswas lost.

I This signal can be partially saved by recoding instead ofremoving.

Blaise Li IGH, 20/09/2013 23 / 45

Page 77: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removal

I UNIT+ monophyly restored.I But some signal not corresponding to synonymous substitutions

was lost.

I This signal can be partially saved by recoding instead ofremoving.

Blaise Li IGH, 20/09/2013 23 / 45

Page 78: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position removal

I UNIT+ monophyly restored.I But some signal not corresponding to synonymous substitutions

was lost.I This signal can be partially saved by recoding instead of

removing.

Blaise Li IGH, 20/09/2013 23 / 45

Page 79: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recodingT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li IGH, 20/09/2013 24 / 45

Page 80: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recodingT C A G

T

TTYPhe

TCN

Ser

TAYTyr

TGYCys

TTY TCN TAY TGYTTN

LeuTCN TAR

TerTGR Ter

TTN TCN TAR TGG Trp

C

CTN

Leu

CCN

Pro

CAYHis

CGN

ArgCTN CCN CAY CGNCTN CCN CAR

GlnCGN

CTN CCN CAR CGN

A

ATHIle

ACN

Thr

AAYAsn

AGNSer

ATH ACN AAY AGNATH ACN AAR

LysAGN

ArgATG Met ACN AAR AGN

G

GTN

Val

GCN

Ala

GAYAsp

GGN

GlyGTN GCN GAY GGNGTN GCN GAR

GluGGN

GTN GCN GAR GGN

Blaise Li IGH, 20/09/2013 24 / 45

Page 81: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recoding

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

SO-6

UNIT+

NOST-1

SPM-3

OSC-2

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

1.000.99

1.001.00

1.000.99

1.00

0.60

0.891.00

0.98

1.00

cg75_degen3Blaise Li IGH, 20/09/2013 25 / 45

Page 82: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restored.

I But codon degeneracy exists at other positions, associated withswitches between Leu, Arg and Ser families.→ We can try to recode these degenerate codon positions too.

Blaise Li IGH, 20/09/2013 26 / 45

Page 83: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restored.I But codon degeneracy exists at other positions, associated with

switches between Leu, Arg and Ser families.

→ We can try to recode these degenerate codon positions too.

Blaise Li IGH, 20/09/2013 26 / 45

Page 84: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

3rd codon position recoding

I Similar effect as no3: UNIT+ monophyly restored.I But codon degeneracy exists at other positions, associated with

switches between Leu, Arg and Ser families.→ We can try to recode these degenerate codon positions too.

Blaise Li IGH, 20/09/2013 26 / 45

Page 85: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Degenerating all synonymous codon positionsT C A G

T

TTTPhe

TCT

Ser

TATTyr

TGTCys

TTC TCC TAC TGCTTA

LeuTCA TAA

TerTGA Ter

TTG TCG TAG TGG Trp

C

CTT

Leu

CCT

Pro

CATHis

CGT

ArgCTC CCC CAC CGCCTA CCA CAA

GlnCGA

CTG CCG CAG CGG

A

ATTIle

ACT

Thr

AATAsn

AGTSer

ATC ACC AAC AGCATA ACA AAA

LysAGA

ArgATG Met ACG AAG AGG

G

GTT

Val

GCT

Ala

GATAsp

GGT

GlyGTC GCC GAC GGCGTA GCA GAA

GluGGA

GTG GCG GAG GGG

Blaise Li IGH, 20/09/2013 27 / 45

Page 86: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Degenerating all synonymous codon positionsT C A G

T

TTYPhe

WSN

Ser

TAYTyr

TGYCys

TTY WSN TAY TGYYTN

LeuWSN TAR

TerTGR Ter

YTN WSN TAR TGG Trp

C

YTN

Leu

CCN

Pro

CAYHis

MGN

ArgYTN CCN CAY MGNYTN CCN CAR

GlnMGN

YTN CCN CAR MGN

A

ATHIle

ACN

Thr

AAYAsn

WSNSer

ATH ACN AAY WSNATH ACN AAR

LysMGN

ArgATG Met ACN AAR MGN

G

GTN

Val

GCN

Ala

GAYAsp

GGN

GlyGTN GCN GAY GGNGTN GCN GAR

GluGGN

GTN GCN GAR GGN

Blaise Li IGH, 20/09/2013 27 / 45

Page 87: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Degenerating all synonymous codon positions

FirmicutesChloroflexi

ChlorobiProteobacteria

Gloeobacter_violaceus (GBACT)Synechococcus_JA33Ab (GBACT)

SO-6Thermosynechococcus_elongatus (UNIT+)Cyanothece_PCC7425 (UNIT+)Acaryochloris_marina (UNIT+)

SPM-3NOST-1

OSC-2GlaucophytaRhodophytaStreptophytaChlorophyta

0.980.97

1.001.001.000.72

1.00

0.700.720.720.720.94

0.59

1.00

cg75

Firmicutes

Chloroflexi

Chlorobi

Proteobacteria

Gloeobacter_violaceus (GBACT)

Synechococcus_JA33Ab (GBACT)

Glaucophyta

Rhodophyta

Streptophyta

Chlorophyta

UNIT+

OSC-2

SO-6

NOST-1

SPM-3

1.001.00

1.00

1.00

1.00

1.000.80

1.00

0.980.64

0.75

0.59

cg75_degen

"core"Cyanobacteria

Blaise Li IGH, 20/09/2013 28 / 45

Page 88: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Degenerating all synonymous codon positions

I Core Cyanobacteria are sister to plastids, like when usingamino-acids.

I 1st and 2nd position signal significantly contributes tocomposition attraction.

Blaise Li IGH, 20/09/2013 29 / 45

Page 89: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Degenerating all synonymous codon positions

I Core Cyanobacteria are sister to plastids, like when usingamino-acids.

I 1st and 2nd position signal significantly contributes tocomposition attraction.

Blaise Li IGH, 20/09/2013 29 / 45

Page 90: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Conclusions

I Incongruence between nucleotide and amino-acid data is mainlydue to G+C convergence biases. It is likely that plastidsdiverged early from the Cyanobacteria.

I rDNA sequences are directly constrained by selection, this mightexplain the results similar to amino-acid data.

Blaise Li IGH, 20/09/2013 30 / 45

Page 91: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to primary analyses

Conclusions

I Incongruence between nucleotide and amino-acid data is mainlydue to G+C convergence biases. It is likely that plastidsdiverged early from the Cyanobacteria.

I rDNA sequences are directly constrained by selection, this mightexplain the results similar to amino-acid data.

Blaise Li IGH, 20/09/2013 30 / 45

Page 92: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Part III

Identifying reliable clades

Blaise Li IGH, 20/09/2013 31 / 45

Page 93: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Clade supports

I In the previous part, I mentioned ‘bootstrap supports’.

I Other types of clade support exist. They give confidenceindications for the clades.

I My PhD work involved the development of a kind of supportindicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).

I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.

Blaise Li IGH, 20/09/2013 32 / 45

Page 94: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Clade supports

I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence

indications for the clades.

I My PhD work involved the development of a kind of supportindicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).

I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.

Blaise Li IGH, 20/09/2013 32 / 45

Page 95: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Clade supports

I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence

indications for the clades.I My PhD work involved the development of a kind of support

indicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).

I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.

Blaise Li IGH, 20/09/2013 32 / 45

Page 96: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Clade supports

I In the previous part, I mentioned ‘bootstrap supports’.I Other types of clade support exist. They give confidence

indications for the clades.I My PhD work involved the development of a kind of support

indicator comparing trees from primary analyses based on severalphylogenetic markers (secondary analysis).

I I’ll present you a classification of support types to introduce the‘reliability index’ I developed.

Blaise Li IGH, 20/09/2013 32 / 45

Page 97: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sensitivity

I Ability to resist to changes in the method

I Example: ‘Navajo rugs’:1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination

3. with 2 parameters → smallbinary matrix to place near thebranch

Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi

et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of

ARTICLE IN PRESS

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

Collembola + Ectognatha

Crustacea + Ectognatha

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

1 2 4 8

124

Gap

/cha

nge

transversion/transition

A

B

Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data

for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is

indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white

square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-

tognatha.

G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327

Blaise Li IGH, 20/09/2013 33 / 45

Page 98: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sensitivity

I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:

1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination

3. with 2 parameters → smallbinary matrix to place near thebranch

Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi

et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of

ARTICLE IN PRESS

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

Collembola + Ectognatha

Crustacea + Ectognatha

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

1 2 4 8

124

Gap

/cha

nge

transversion/transition

A

B

Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data

for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is

indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white

square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-

tognatha.

G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327

Blaise Li IGH, 20/09/2013 33 / 45

Page 99: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sensitivity

I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:

1. analyses repeated with different parameter values

2. clade occurrences recorded for each parameter combination

3. with 2 parameters → smallbinary matrix to place near thebranch

Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi

et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of

ARTICLE IN PRESS

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

Collembola + Ectognatha

Crustacea + Ectognatha

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

1 2 4 8

124

Gap

/cha

nge

transversion/transition

A

B

Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data

for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is

indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white

square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-

tognatha.

G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327

Blaise Li IGH, 20/09/2013 33 / 45

Page 100: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sensitivity

I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:

1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination

3. with 2 parameters → smallbinary matrix to place near thebranch

Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi

et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of

ARTICLE IN PRESS

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

Collembola + Ectognatha

Crustacea + Ectognatha

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

1 2 4 8

124

Gap

/cha

nge

transversion/transition

A

B

Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data

for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is

indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white

square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-

tognatha.

G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327

Blaise Li IGH, 20/09/2013 33 / 45

Page 101: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sensitivity

I Ability to resist to changes in the methodI Example: ‘Navajo rugs’:

1. analyses repeated with different parameter values2. clade occurrences recorded for each parameter combination

3. with 2 parameters → smallbinary matrix to place near thebranch

Beutel and Gorb 2001; Giribet et al. 2001; Wheeler et al.2001), although molecular biologists have had troublefinding data to support this hypothesis (but see Mallattet al. 2004). Recent analyses with dense character andtaxon sampling including several lineages of ‘basal’hexapods have suggested hexapod monophyly (Carpen-ter and Wheeler 1999; Wheeler et al. 2001), althoughunusually autapomorphic sequences can yield patternsincongruent with morphology (Giribet et al. 2001; Nardi

et al. 2003). In the present analyses, monophyly ofHexapoda is found only for morphological data inisolation (Fig. 1), because the molecular and combinedanalyses nest crustaceans and symphylans within theHexapoda when chilopods are specified as outgroups.The non-monophyly of hexapods is certainly shock-ing from a morphological perspective. In addition tothe unique thoracic tagmosis of hexapods (ch. 64;see Appendix A), other apomorphic characters of

ARTICLE IN PRESS

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

Collembola + Ectognatha

Crustacea + Ectognatha

Chi

lopo

da

Sym

phyl

a

Pro

tura

"Jap

ygoi

dea"

Cam

pode

idae

Mal

acos

trac

a

Ent

omos

trac

a

Col

lem

bola

Arc

haeo

gnat

ha

Tric

hole

pidi

on

Zyg

ento

ma

s.s.

Pte

rygo

ta

1 2 4 8

124

Gap

/cha

nge

transversion/transition

A

B

Fig. 4. Summary cladograms showing most parsimonious topologies for combined analysis of molecular and morphological data

for parameter set 121 (A) and immediately suboptimal parameter set 111 (B). Monophyly of clades in 12 explored parameter sets is

indicated (black square=monophyly; gray square=monophyly in some of a set of equally parsimonious resolutions; white

square=non-monophyly). Cladogram A shows groups in two equally parsimonious resolutions of Crustacea+Collembola+Ec-

tognatha.

G. Giribet et al. / Organisms, Diversity & Evolution 4 (2004) 319–340 327

Blaise Li IGH, 20/09/2013 33 / 45

Page 102: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbance

I Example: bootstrap and jacknife:1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 103: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbanceI Example: bootstrap and jacknife:

1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 104: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbanceI Example: bootstrap and jacknife:

1. random data resampling

2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 105: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbanceI Example: bootstrap and jacknife:

1. random data resampling2. analysis of the resampled data

3. N times4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 106: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbanceI Example: bootstrap and jacknife:

1. random data resampling2. analysis of the resampled data3. N times

4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 107: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Robustness

I Ability to resist to data disturbanceI Example: bootstrap and jacknife:

1. random data resampling2. analysis of the resampled data3. N times4. → proportion of the analyses in which a given clade occurs

Polymixia japonica Beryx splendens Bassozetus zenkevitchi Diplacanthopoma brachysoma

Trachurus trachurusTrachurus japonicus

Caranx melampygusCarangoides armatus

Paralichthys olivaceusPlatichthys bicoloratus

Rhyacichthys asproEleotris acanthopomaAcanthogobius hasta

Indostomus paradoxus Monopterus albus Mastacembelus favus

Crenimugil crenilabis Mugil cephalus

Petroscirtes breviceps Salarias fasciatus

Arcos sp. Aspasma minima

Rivulus marmoratus Gambusia affinis

Oryzias latipesCololabis sairaExocoetus volitans

Melanotaenia lacustris Hypoatherina tsurugae

Scomber scombrus

Auxis rocheiAuxis thazard

Euthynnus alletteratusKatsuwonus pelamis

Thunnus alalungaThunnus thynnus

Macroramphosus scolopaxAeoliscus strigatus

Eurypegasus draconisPegasus volitans

Hippocampus kudaMicrophis brachyurusFistularia commersoniiAulostomus chinensisDactyloptena peterseniDactyloptena tiltoni

Helicolenus hilgendorfiiSebastes schlegeliSatyrichthys amiscus

Arctoscopus japonicus Aptocyclus ventricosusCottus reinii

Enedrias crassispinaLycodes toyamensis Hypoptychus dybowskii Aulorhynchus flavidus Gasterosteus aculeatus

Etheostoma radiosum

Lophius litulon Lophius americanus

Chaunax abei Chaunax tosaensis

Caulophryne pelagica Melanocetus murrayi

Antigonia capros

Emmelichthys struhsakeri Pterocaesio tile

Pagrus auriga Pagrus major

Sufflamen fraenatus Stephanolepis cirrhifer Takifugu rubripes Masturus lanceolatus Mola mola

Solenostomus cyanopterus

Ophidiiformes

Carangidae

Pleuronectiformes

Gobioidei

Gasterosteiformes-1Synbranchiformes

Mugiliformes

Gobiesocidae

Atheriniformes

Scombridae

Scorpaeniformes-2

Zoarcidae

Percidae

Lophiiformes

Sparidae

Tetraodontiformes

outgroups

Blenniidae

Scorpaeniformes-1

Trichodontidae

Zeiformes

Emmelichthyidae

Cyprinodontiformes

Pholidae

Lutjanidae

(Indostomidae)

Beloniformes

Gasterosteiformes-2(Syngnathoidei)

Gasterosteiformes-3(Gasterosteoidei)

97 / 100

– / 55

80 / 100

– / 74

– / 50

70 / 100

99 / 100

87 / 100

97 / 100

– / 87

– / 90

– / 100– / 98

– / 85

97 / 100

99 / 100

70 / 99

70 / 100

92 / 100

71 / 100

91 / 100

77 / 97

77 / 87

– / 88

93 / 100

87 / 100

– / –

– / –

– / –

– / –

– / –

– / –

A

B

C

D

E

F

G

H

Perco-morpha

Fig. 3. Maximum likelihood tree using 123rRTndata set. Topological incongruities between analyses (partitioned ML and Bayesian analyses) denoted byarrowheads. Numbers beside internal branches indicate bootstrap values (above 50%) from 100 replicates and Bayesian PPs, respectively (shown aspercentages). ‘‘–’’ indicates node not recovered in the other analysis. Thick branches supported by 100% bootstrap values and PPs. Gasterosteiform fishesindicated by black bars, perciform suborders by gray, non-perciform orders by white.

R. Kawahara et al. / Molecular Phylogenetics and Evolution 46 (2008) 224–236 231

Blaise Li IGH, 20/09/2013 34 / 45

Page 108: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Reliability

I Sensitivity and robustness reflect the strength of the signal

(historical or artefact-inducing).I Are inferred relationships between species compatible with the

‘true’ tree? Do they reflect well the real phylogeny?I The big problem: the real tree is unknown (usually, it’s the one

we are searching).I A possible proxy: agreement between analyses.

Blaise Li IGH, 20/09/2013 35 / 45

Page 109: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Reliability

I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).

I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?

I The big problem: the real tree is unknown (usually, it’s the onewe are searching).

I A possible proxy: agreement between analyses.

Blaise Li IGH, 20/09/2013 35 / 45

Page 110: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Reliability

I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).

I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?

I The big problem: the real tree is unknown (usually, it’s the onewe are searching).

I A possible proxy: agreement between analyses.

Blaise Li IGH, 20/09/2013 35 / 45

Page 111: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Reliability

I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).

I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?

I The big problem: the real tree is unknown (usually, it’s the onewe are searching).

I A possible proxy: agreement between analyses.

Blaise Li IGH, 20/09/2013 35 / 45

Page 112: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Reliability

I Sensitivity and robustness reflect the strength of the signal(historical or artefact-inducing).

I Are inferred relationships between species compatible with the‘true’ tree? Do they reflect well the real phylogeny?

I The big problem: the real tree is unknown (usually, it’s the onewe are searching).

I A possible proxy: agreement between analyses.

Blaise Li IGH, 20/09/2013 35 / 45

Page 113: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Combined or separate analyses?

I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.

I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.

I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true

historical relationships.

Blaise Li IGH, 20/09/2013 36 / 45

Page 114: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Combined or separate analyses?

I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.

I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.

I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true

historical relationships.

Blaise Li IGH, 20/09/2013 36 / 45

Page 115: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Combined or separate analyses?

I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.

I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.

I Non-historical signal may vary depending on the dataset.

I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true

historical relationships.

Blaise Li IGH, 20/09/2013 36 / 45

Page 116: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Combined or separate analyses?

I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.

I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.

I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.

I Clade repeatedly found across datasets should represent truehistorical relationships.

Blaise Li IGH, 20/09/2013 36 / 45

Page 117: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Combined or separate analyses?

I In a combined analysis, one hopes that the historical signal addsup and emerges above misleading signal.

I But sometimes, well supported results obtained from a combinedanalysis are due to a strong misleading signal in one of thedatasets.

I Non-historical signal may vary depending on the dataset.I But the history of the taxa is unique.I Clade repeatedly found across datasets should represent true

historical relationships.

Blaise Li IGH, 20/09/2013 36 / 45

Page 118: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689

Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer

Dettai and Lecointre (2005):I table showing repeated

clades across differentanalyses

I ‘handcrafted’ → headachesand errors!

I had to be formalized, andsummarized into a cladesupport that could becalculated automatically

Blaise Li IGH, 20/09/2013 37 / 45

Page 119: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689

Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer

Dettai and Lecointre (2005):I table showing repeated

clades across differentanalyses

I ‘handcrafted’ → headachesand errors!

I had to be formalized, andsummarized into a cladesupport that could becalculated automatically

Blaise Li IGH, 20/09/2013 37 / 45

Page 120: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

682 A. Dettai, G. Lecointre / C. R. Biologies 328 (2005) 674–689

Table 4Table of repeated clades. X represents groups present in a given analysis, no marks represents groups contradicted by an analysis. For the MPanalyses, x: groups present in majority rule consensus only; X: groups present in strict consensus,X: bootstrap value above 80%. For the BPIManalyses, x: posterior probability between 0.50 and 0.59,x: posterior probability between 0.60 and 0.69, X: posterior probability between 0.70and 0.89,X: posterior probability between 0.90 and 1.+: taxon intruding in repeated group.−: taxon escaping from repeated group./: insertingor escaping taxa form a clade. In the column ‘supertree’, clades present in the strict consensus supertree are marked ‘X’. Question marks meanthat the corresponding clade is collapsed in that strict consensus. The taxon name abbreviations are presented in the left hand column and in thefollowing list: Ah, Atherina; Ai, Antigonia; As, Astronotus; Au, Austrolycus; B, Bothidae; Bo,Bothus; Ce,Cetostoma; Ci, Chelidonichthys; Cr,Carapus; Cs,Coryphaenoides; Cu,Citharus; Dc, Dicentrarchus; Dr, Drepane; El, Elassoma; Fi, Fistularia; Ga,Gadus; Gs,Gasterosteus; Hi,Hippocampus; Lg, Lagocephalus; Me,Merlangius; Mo, Mora; My, Myripristis; Os,Ostichthys; Ot,Ostracion; Oy,Oryzias; Pd,Pomadasys; Ps,Psettodes; Pt,Pomatoschistus; Sn,Sargocentron; Sr,Serranus; Su,Syacium; Sy,Syngnathus; Tet, Tetraodontidae; Tr,Trachinus;Ve, Metavelifer

Dettai and Lecointre (2005):I table showing repeated

clades across differentanalyses

I ‘handcrafted’ → headachesand errors!

I had to be formalized, andsummarized into a cladesupport that could becalculated automatically

Blaise Li IGH, 20/09/2013 37 / 45

Page 121: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

I The trees to compare must be obtained using independentdatasets.

I Combining data may improve result accuracy, hence the partialcombinations technique.

Blaise Li IGH, 20/09/2013 38 / 45

Page 122: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

I The trees to compare must be obtained using independentdatasets.

I Combining data may improve result accuracy

, hence the partialcombinations technique.

Blaise Li IGH, 20/09/2013 38 / 45

Page 123: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

A repetition index for clades

I The trees to compare must be obtained using independentdatasets.

I Combining data may improve result accuracy, hence the partialcombinations technique.

Blaise Li IGH, 20/09/2013 38 / 45

Page 124: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Independence

I What are the minimal parts of my data that I may keepseparate?

I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.

I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).

I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).

Blaise Li IGH, 20/09/2013 39 / 45

Page 125: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Independence

I What are the minimal parts of my data that I may keepseparate?

I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.

I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).

I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).

Blaise Li IGH, 20/09/2013 39 / 45

Page 126: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Independence

I What are the minimal parts of my data that I may keepseparate?

I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.

I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).

I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).

Blaise Li IGH, 20/09/2013 39 / 45

Page 127: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Independence

I What are the minimal parts of my data that I may keepseparate?

I When non-independent parts are analysed separately, a samebiased result might occur repeatedly.

I Do not analyse separately genes that are physically linked (mayhave a shared non-species tree).

I Do not analyse separately genes that code products in directmolecular interaction (co-evolution and perhapsco-functional-convergence).

Blaise Li IGH, 20/09/2013 39 / 45

Page 128: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partial combinations

Aacbde

Bcdabe

Cabced

α

A ∪ B ∪ Cabcde

α

↓A ∪ B

abdec

α data number ofsets occurrences for α

A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2

Blaise Li IGH, 20/09/2013 40 / 45

Page 129: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partial combinations

Aacbde

Bcdabe

Cabced

αA ∪ B ∪ C

abcde

α

↓A ∪ B

abdec

α data number ofsets occurrences for α

A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2

Blaise Li IGH, 20/09/2013 40 / 45

Page 130: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partial combinations

Aacbde

Bcdabe

Cabced

αA ∪ B ∪ C

abcde

α

↓A ∪ B

abdec

α

data number ofsets occurrences for α

A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2

Blaise Li IGH, 20/09/2013 40 / 45

Page 131: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partial combinations

Aacbde

Bcdabe

Cabced

αA ∪ B ∪ C

abcde

α

↓A ∪ B

abdec

α data number ofsets occurrences for α

A,B,C 1A ∪ B ∪ C 1A ∪ B,C 2

Blaise Li IGH, 20/09/2013 40 / 45

Page 132: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:

I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B

Analyse, and then compare ‘independent’ trees:

I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 133: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:I Elementary datasets: A, B and C

I Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B

Analyse, and then compare ‘independent’ trees:

I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 134: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ C

I Partial combinations: B ∪ C , A ∪ C and A ∪ BAnalyse, and then compare ‘independent’ trees:

I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 135: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B

Analyse, and then compare ‘independent’ trees:

I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 136: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B

Analyse, and then compare ‘independent’ trees:

I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 137: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Partitioning schemes

Datasets:I Elementary datasets: A, B and CI Total combination: A ∪ B ∪ CI Partial combinations: B ∪ C , A ∪ C and A ∪ B

Analyse, and then compare ‘independent’ trees:I Partitioning schemes: (A,B,C), (A,B ∪ C), (B,A ∪ C),

(C ,A ∪ B) and (A ∪ B ∪ C)

Blaise Li IGH, 20/09/2013 41 / 45

Page 138: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 139: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 140: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:

N(α) = maxPSc∈SP(∑

d∈PSc δα,d)δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 141: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 142: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning scheme

SP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 143: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0

d : dataset (elementary or combined)PSc : Partitioning scheme

SP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 144: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)

PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 145: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Sums of occurrences

I The more a clade is repeated independently, the more it isreliable.→ sums of occurrences

I Choose the partitioning scheme providing the highest sum(optimal signal extraction).

I Sum of occurrences for clade α:N(α) = maxPSc∈SP(

∑d∈PSc δα,d)

δα,d : 1 if clade α is obtained with dataset d , else 0d : dataset (elementary or combined)PSc : Partitioning schemeSP: set of all partitioning schemes

Blaise Li IGH, 20/09/2013 42 / 45

Page 146: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of

fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly

the same set of taxa?I How to summarize the results?I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 147: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python language

I Application to 4 nuclear protein-coding genes in a large group offish (Acanthomorpha)

I How to count clades when primary analyses don’t have exactlythe same set of taxa?

I How to summarize the results?I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 148: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of

fish (Acanthomorpha)

I How to count clades when primary analyses don’t have exactlythe same set of taxa?

I How to summarize the results?I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 149: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of

fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly

the same set of taxa?

I How to summarize the results?I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 150: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of

fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly

the same set of taxa?I How to summarize the results?

I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 151: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny Contribution to secondary analyses

Application

I Repetition index based on this sum of occurrences, and takinginto account incompatible results between analyses

I Implementation in Python languageI Application to 4 nuclear protein-coding genes in a large group of

fish (Acanthomorpha)I How to count clades when primary analyses don’t have exactly

the same set of taxa?I How to summarize the results?I Presentation available for those interested in fish phylogeny

Blaise Li IGH, 20/09/2013 43 / 45

Page 152: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Conclusions

I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:

I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable

results.I Secondary analyses can be used to identify a posteriori good

methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.

I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.

Blaise Li IGH, 20/09/2013 44 / 45

Page 153: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Conclusions

I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:

I Make a better use of the data during the primary analyses.

I Compare the results of primary analyses to identify reliableresults.

I Secondary analyses can be used to identify a posteriori goodmethods for primary analyses: methods better extractinghistorical signal should yield more coherent results.

I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.

Blaise Li IGH, 20/09/2013 44 / 45

Page 154: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Conclusions

I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:

I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable

results.

I Secondary analyses can be used to identify a posteriori goodmethods for primary analyses: methods better extractinghistorical signal should yield more coherent results.

I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.

Blaise Li IGH, 20/09/2013 44 / 45

Page 155: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Conclusions

I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:

I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable

results.I Secondary analyses can be used to identify a posteriori good

methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.

I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.

Blaise Li IGH, 20/09/2013 44 / 45

Page 156: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Conclusions

I My PhD and post-doc works concerned the quality ofphylogenetic results at two different levels:

I Make a better use of the data during the primary analyses.I Compare the results of primary analyses to identify reliable

results.I Secondary analyses can be used to identify a posteriori good

methods for primary analyses: methods better extractinghistorical signal should yield more coherent results.

I I hope this presentation gave you an idea about how research inphylogenetic methods may look like.

Blaise Li IGH, 20/09/2013 44 / 45

Page 157: MyPhDandpost-docworks BlaiseLi …

Contributions to methods in phylogeny

Thanks for your attention

I Contact: [email protected]

Blaise Li IGH, 20/09/2013 45 / 45