36
Evolutionary and Ecological Bioinformatics Biology/Computer Science 327, Fall 2015 Professors Fred Cohan and Danny Krizanc DATE LECTURER LECTURE TITLE TEXTBOOK READINGS Sept. 8 Cohan 1. Bioinformatic approaches to ecology and evolution Ch. 1,2 Sept. 10 Krizanc 2. Algorithms in everyday life and in research Sept. 15 Cohan 3. Approaches to phylogeny through overall similarity of organisms (phenetics vs. cladistics) Sept. 17 Krizanc 4. Alignment of DNA and protein sequences Ch. 3, 4, 12 Sept. 22 Krizanc 5. Distance-based algorithms for estimating relationships (UPGMA and NJ) Ch. 5, 6 Sept. 24 Krizanc 6. Maximum parsimony approach to phylogeny; search algorithms for finding the best phylogeny Ch. 5, 8 Sept. 29 Krizanc 7. Models of molecular evolution (including Jukes-Cantor, neutral theory, transition-transversion); incorporating molecular models in maximum likelihood algorithms for phylogeny estimation Ch. 9 pp. 75- 78 Oct. 1 Krizanc 9. Bayesian approaches to phylogeny and your own life Ch. 10 Oct. 6 Krizanc 8. Testing the robustness of a tree pp. 82- 89 Oct. 8 Krizanc 10. Gene trees vs. species trees; splits trees and phylogenetic networks Ch. 15 Oct. 13 Cohan 11. The importance of using phylogeny for testing hypotheses about natural selection; phylogenetic algorithms for testing natural selection Oct. 15 Krizanc 12. Assembly algorithms for genome

wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Evolutionary and Ecological BioinformaticsBiology/Computer Science 327, Fall 2015Professors Fred Cohan and Danny Krizanc

DATE LECTURER LECTURE TITLETEXTBOOK READINGS

Sept. 8 Cohan 1. Bioinformatic approaches to ecology and evolution Ch. 1,2Sept. 10 Krizanc 2. Algorithms in everyday life and in research

Sept. 15 Cohan3. Approaches to phylogeny through overall similarity of organisms (phenetics vs. cladistics)

Sept. 17 Krizanc 4. Alignment of DNA and protein sequences Ch. 3, 4, 12

Sept. 22 Krizanc5. Distance-based algorithms for estimating relationships (UPGMA and NJ) Ch. 5, 6

Sept. 24 Krizanc6. Maximum parsimony approach to phylogeny; search algorithms for finding the best phylogeny Ch. 5, 8

Sept. 29 Krizanc

7. Models of molecular evolution (including Jukes-Cantor, neutral theory, transition-transversion); incorporating molecular models in maximum likelihood algorithms for phylogeny estimation

Ch. 9pp. 75-78

Oct. 1 Krizanc 9. Bayesian approaches to phylogeny and your own life Ch. 10Oct. 6 Krizanc 8. Testing the robustness of a tree pp. 82-89

Oct. 8 Krizanc10. Gene trees vs. species trees; splits trees and phylogenetic networks Ch. 15

Oct. 13 Cohan

11. The importance of using phylogeny for testing hypotheses about natural selection; phylogenetic algorithms for testing natural selection

Oct. 15 Krizanc

12. Assembly algorithms for genome sequencing—from isolates, metagenomes, and uncultivated single cells (Velvet)

Oct. 20Cohan and Krizanc

13. Algorithms for structural annotation (where are the genes) and databases for functional annotation (what are the genes)

Oct. 22 Krizanc

14. Genome-based trees (based on gene content, gene order, and sequence of concatenation) (FastTree); supertrees.

Oct. 27 Fall break

Oct. 29 Cohan15. Gene duplication in evolution; genome-wide analysis of adaptation through gene acquisition

Nov. 3 Cohan16. Analyses of adaptation through changes in genome-wide gene expression

Nov. 5Cohan and Krizanc 17. Research projects

Nov. 10Cohan and Krizanc

18. Genome-wide approaches for finding shared genes under recent positive selection Ch. 14

Nov. 12 Cohan 19. Metagenomics in ecosystems biology: how to find out the physiological processes occurring in an

Page 2: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

ecosystem even when we don’t know who the organisms are

Nov. 17 Cohan

20. Metagenomic approaches for characterizing community-wide organismal diversity; sorting sequences into taxa

Nov. 19

GIS Guest speaker: Sophie Breitbart 21. Ecological niche modeling

Nov. 24 (cancelled)Nov. 26 Thanksgiving

Dec. 1 Cohan

22. Metagenomic approaches to finding out what unidentified genes do (ecological annotation); bioinformational bioprospecting

Dec. 3 Cohan

23. The human microbiome: types of communities across humans, functional screening for novel genes, antibiotic holocausts and health consequences

Dec. 8 Cohan24. Baseball, biology, global climate change, and big data

Dec. 10Cohan and Krizanc

25. Molecular approaches for identifying microbial diversity in natural communities—AdaptML and Ecotype Simulation

Registrar-scheduled time for our Final ExamZelnick Pavillion POSTER SESSION

HOMEWORK ASSIGNMENTSDue Oct. 6 1. Make a tree (with help from computer algorithms)Due Oct. 22 2. A pencil and paper phylogenetic problem setDue Dec. 1 3. Ecological niche modelingDue Dec. 3 4. Project abstractDue Dec. 8 5. Comparing genomes to characterize past natural selection

TERM PROJECTDue at time of final exam

Poster on research project

Due at time of final exam

Paper on (the same) research project

GRADINGHomework assignments 50%

Page 3: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Term project poster 20%Term project paper 30%

READINGS

Textbook: Phylogenetic Trees Made Easy: A How-To Manual, Fourth Edition, Barry G. Hall., 2011, Sinauer Associates.Supplementary Readings will be listed on the class WesFiles web site.

CONTACT INFORMATION (Email is the best way to set up an appointment.)Fred Cohan207 [email protected] hours: Fridays 1:15-2:15, and by appointment

Danny Krizanc631 Exley Science [email protected]

Abby Cram111 [email protected] hours: Tuesdays 1:00-2:00 (with additional times to be announced)

Sophie BreitbartQAC Office, [email protected] hours: Mondays 3:00-4:00, Wednesday 3:00-4:00, Thursday 7:00-9:00 pm (except Nov. 19 & Dec. 3) at the QAC and by appointment. You can also email her.

November 18, 2015

Page 4: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Evolutionary and Ecological BioinformaticsBiology/Computer Science 327, Fall 2015Supplementary Reading

Sep. 8

1.Bioinformatic approaches to ecology and evolution

Ginsberg gives a really nice example of the Big Data approach, in this case to predict influenza levels before the CDC can, based on Google search queries (Ginsberg et al., 2009). Also, Salathé et al. (Salathé et al., 2013). Larson et al. provides phylogenetic evidence that wild pigs were domesticated in six different places around Eurasia (Larson et al., 2005); similarly, Thalmann et al. show that dogs were domesticated four times in Europe (Thalmann et al., 2013). Keeling and Palmer chart phylogenetically the most significant horizontal transfer events in eukaryotic history (Keeling and Palmer, 2008). Mikkelsen et al. have identified those genes in the genome that have been under selection for new adaptations in humans (Mikkelsen et al., 2005). Merhej compared bacterial genomes to test whether different lineages evolving independently toward pathogenicity (or mutualism) tend to lose the same genes convergently (Merhej et al., 2009). (They do!) Christina Richards et al. explored the circumstances under which gene expression changed over the course of an organism’s life, in the case of the plant Arabidopsis (Richards et al., 2012). Fierer et al. explored how the bacterial community on hands varies between the left and right hands and between people, and the effects of washing on hands’ bacterial communities (Fierer et al., 2008). Knight et al. showed, in a meta-analysis across various high-impact studies from the Earth Microbiome Project, how the similarity of environment drives the similarity of bacterial communities (Knight et al., 2012).

Sep. 10

2. Algorithms in everyday life and in research

Harel’s Chapter 4 is a "gentle" introduction to the notion of NP-completeness or why some problems are hard for computers to solve (Harel, 2000).

Sep. 15

3. Approaches to phylogeny through overall similarity of organisms

Nosenko et al. give a recent phylogeny of animals based on various genes; they explain how to choose the best set of genes when genes differ in the phylogenies they yield; for our purposes, the article shows how some evolutionary groups that were not obvious from morphological similarities were discovered through sequence analysis (Nosenko et al., 2013). Related to this, Adoutte et al. show how morphological and sequence data yield different relationships among animal phyla (Adoutte et al., 2000). Funch and Kristensen present their discovery of an animal phylum (Funch and Kristensen, 1995). Schloss and Handelsman present a phylogeny of the bacterial phyla, showing that most of the phyla do not have even a single cultivated species (Schloss and Handelsman, 2004). My colleagues and I offer an example of discovering new taxa based on sequence data alone, at the level of new genera and new species (Kim et al., 2012). My recent encyclopedia chapter on species gives an overview of the concepts of species, including the dynamic qualities species have long been expected to have (Cohan, 2013). Mallet gives a species concept based on Darwin’s idea that two species should have no or very little overlap in a set of distinguishing characteristics; his concept does not deal with the dynamic qualities of cohesion irreversible separateness, and so on (Mallet, 1995). Genoways and Choate, from the heyday of numerical taxonomy, illustrate two ways of presenting data on clustering of organisms

Page 5: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

by their overall phenotypic similarity (Genoways and Choate, 1972). Kämpfer et al. make a case that species of Streptomyces form distinct, justifiable units when we demarcate species at the 80% similarity level for phenotypic traits (Kämpfer et al., 1991). Futuyma, in his textbook, explains the limitations of the phenetic approach to phylogeny (where all characters are used), and why we should constrain our analyses to those characters that are derived (Futuyma, 1998). Baum and Smith, in their textbook, give a clearer example of using shared, derived characters for making a phylogeny (Baum and Smith, 2013).

Sep. 17

4. Sequence Alignment

Sean Eddy contains a biologist’s view of something called dynamic programming, which is the central idea behind a number of bioinformatics algorithms including how to perform pairwise sequence alignment (Eddy, 2004). I’ve also included the original papers introducing ClustalW (the most commonly used multiple alignment tool), MUSCLE (a newer tool recommended by Hall) (Edgar, 2004) and GUIDANCE (a tool for evaluating the quality of alignments described in Chapter 12 of Hall) (Penn et al., 2010).Morrison tries to answer the question ``Why would phylogeneticists ignore computerized sequence alignment’’ and makes some interesting points along the way (Morrison, 2009). His conclusion is that the current tools aren’t good enough.

Sep. 22

5. Distance-based Methods for Phylogeny Construction

I’ve included the original papers describing UPGMA (by Michener and Sokal) (Michener and Sokal, 1057) and Neighbor-Joining (by Saitou and Nei) (Saitou and Nei, 1987). Both are pretty heavy going but interesting. For gentler descriptions of these algorithms I suggest Wikipedia. For a computer science perspective on this and the next three lectures I have also included Mona Singh’s notes (from a course she teaches at Princeton) on phylogeny reconstruction.

Sep. 24

6. Maximum parsimony approach

Sep.29

7. Models of molecular evolution and maximum likelihood approach

The paper by Bos and Posada is a nice review of different models of DNA evolution and how they are used in building trees (Bos and Posada, 2005). The article by Guindon et al. discusses some recent developments in maximum likelihood algorithms that have had a real impact on how fast they are and how large a tree they can construct (Guindon et al., 2010). Sumner et al. discuss why it might not be such a good idea to use the most general model available when estimation trees (Sumner et al., 2012).

Oct.1

8. Bayesian methods

McGrayne discusses implicit, embedded use of Bayesian methods in baseball batting averages and other issues of daily import (McGrayne, 2011) (p. 130). Silver introduces Bayesian analysis using the mysterious panties (or nighty) story (Silver, 2012) (p. 245). Huelsenbeck et al. reviews the use of Bayesian methods in phylogeny reconstruction (Huelsenbeck et al., 2001). Ronquist and Huelsenbeck introduce the third iteration of the program Mr. Bayes (Ronquist and Huelsenbeck, 2003).

Oct. 6

9. Testing the robustness of trees

The paper by Anisimova and Gascuel introduces an approximate likelihood ratio test that can be used in conjunction with maximum likelihood methods to estimate one’s confidence in the clades of a given tree (Anisimova and

Page 6: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Gascuel, 2006). This turns out to be much faster than using non-parametric approaches such as bootstrapping.

Oct.8

10. Gene trees vs. species trees; splits trees and phylogenetic networks

The paper by Degnan and Rosenberg shows how lineage sorting can cause serious problems when trying to infer the correct species tree from gene trees (Degnan and Rosenberg, 2006). White et al. study the discordance between gene trees for three subspecies of mouse (White et al., 2009). The Iwabe et al. paper uses gene duplication/loss parsimony to root the tree of life (Iwabe et al., 1989). Zmasek and Eddy describe a straightforward algorithm for inferring duplication/loss events given a gene tree and its corresponding species tree (Zmasek and Eddy, 2001). Ropars et al. present genomic evidence for adaptive horizontal gene transfers between cheese fungi; apparently, this was driven by artificial selection for deliciousness (Ropars, 2015).

Oct. 13

11. The importance of using phylogeny for testing hypotheses about natural selection; phylogenetic algorithms for testing natural selection

Donoghue presents the classic case for why every evolutionary biologist needs to pay attention to phylogeny (Donoghue, 1989). In their book on comparative biology, Harvey and Pagel explain how phylogeny can be used to make tests of natural selection (Harvey and Pagel, 1991). Probert et al. analyze the relationship between seed longevity and various phenotypic and environmental factors. In one analysis, they perform the tests using a pre-Donoghue, non-phylogenetic approach, and in another, they make a test based on phylogenetically independent contrasts (Probert et al., 2009). Our own Mike Singer presents a very nice phylogenetically independent contrasts analysis of the effect of caterpillar specialization on the effect of bird predation (Singer et al., 2014). Laurin compares the accuracy of various methods for phylogenetically independent contrasts (Laurin, 2010).

Oct. 15

12. Assembly algorithms for genome sequencing—from isolates, metagenomes, and uncultivated single cells

Zerbino and Birney present Velvet, an algorithm for sequence assembly from very short reads (Zerbino and Birney, 2008). Miller et al. present an overview of algorithms for assembly from short-read sequencing (Miller et al., 2010). Waterston et al. discuss the Celera project as a cannibalization of the worldwide human genome project (Waterston et al., 2002). And from the Venter group a rebuttal (Myers et al., 2002). She et al. discuss the challenge of genome sequencing in the context of duplicated regions (She et al., 2004).

Oct. 20

13. Algorithms for structural annotation (where are the genes) and databases for functional annotation (what are the genes)

Van Domselaar et al. present an overview of genome annotation (van Domselaar et al., 2014). Pruitt et al. describe the current state of NCBI’s Reference Sequences project, against which “extrinsic gene finding” is used to characterize putative ORFs (Pruitt et al., 2012). Hyatt presents Prodigal, a recent algorithm for identifying genes by ab initio methods (Hyatt et al., 2010), and Borodovsky et al. present Gene Mark, among the first successful ab initio algorithms (Borodovsky et al., 2003). Delcher et al. on Glimmer (Delcher et al., 2007). Functional annotation requires a reliable database with functions accurately attributed to genes; The Uniprot Consortium provides one such database (Consortium, 2013). The KEGG database is another (Kanehisa et al., 2014). The TIGRFAM database of protein families is searchable by nhmmer (Wheeler and Eddy, 2013). Gardy presents PSORTb, a program for locating an unknown protein to a portion of a bacterial cell (Gardy et al., 2005). The COG database of 25 broad functional categories of

Page 7: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

proteins is presented by Tatusov (Tatusov et al., 2003). An example of a RAST annotation is presented by Kopac et al. (Kopac et al., 2014).

Oct. 22

14. Genome-based trees (based on gene content, gene order, and sequence of concatenation) (FastTree); supertrees; need for highly resolved trees.

Rokas et al. empirically show that about 20 genes are all that is required to yield a high-resolution tree based on concatenation of genes (Rokas et al., 2003). Lin et al. present an algorithm for producing genome-based trees of prokaryotes, taking into account the content and structure of the genome (Lin et al., 2009). Huson and Steel present an algorithm for producing a tree based on gene content (Huson and Steel, 2004). Li et al. produce a distance-based tree-building algorithm based on the Kolmogorov distance between genomes (Li et al., 2001). Whidden et al. present an algorithm for producing a supertree while accommodating horizontal genetic transfer (Whidden et al., 2014). Swenson et al. present a two-step algorithm for producing supertrees with enormous numbers of organisms (Swenson et al., 2012).

Oct. 29

15. Gene duplication in evolution; genome-wide analysis of adaptation through gene acquisition

Brenner et al. present their classic result that families of gene duplicates are extremely common in a genome (Brenner et al., 1995). Zhong et al. explore the young duplicated genes specific to various species and lineages of Drosophila fruit flies, and find that there has been much convergence of duplication events across lineages (Zhong et al., 2013). Merhej et al. make a case for convergent losses of genes in the origins of various lineages of obligately intracellular parasites (Merhej et al., 2009). The Welch et al. paper shows the huge magnitude of gene content differences within one bacterial species (Welch et al., 2002). Popa et al. perform a comprehensive network analysis to identify donor-recipient pairs in recent HGT events, showing that most HGT events have involved close relatives (Popa et al., 2011). Nevertheless, this same group also showed that the radical transformation that resulted in the Haloarchaea involved over 1000 HGT events from various bacteria (a different domain) (Nelson-Sathi et al., 2012). Such a radical transformation is hindered by architectural constraints, as I have discussed (Wiedenbeck and Cohan, 2011; Cohan, 2010). Veyrier et al. identify the genes in the various steps from non-pathogenic Mycoobacterium species to the human tuberculosis pathogen (Veyrier et al., 2009). Sun et al. used a genomic comparison to figure out the critical step toward Plague’s transmission from fleas (Sun et al., 2014). The bioinformatics for the Plague study was performed by Chain et al. (Chain et al., 2004).

Nov. 3

16. Adaptation through gene acquisition (part 2); analyses of adaptation through changes in genome-wide gene expression

Luo et al. (as discussed by Cohan and Kopac) identify the genes that distinguish environmental from gut-commensal E. coli, and use a bioinformatic approach to show that these changes are adaptive (Luo et al., 2011; Cohan and Kopac, 2011). Kleiner et al. present a case of genomic “reverse ecology,” where genomes indicate aspects of the environment that were previously unknown, in this case the sea grass sediment environment (Kleiner et al., 2012). Bhaya et al. do the same for a hot springs environment (Bhaya et al., 2007). Hao and Golding present evidence that evolution is accelerated when a gene enters a new organism through HGT (Hao and Golding, 2006). Touchon et al. identify HGT events among members of the species taxon E. coli, and show that among closest relatives, nearly all of HGT events involve genes without a function for the bacteria (Touchon et al., 2009). Kopac et al. also showed that, among extremely close relatives, nearly

Page 8: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

all of the genes acquired are not of known function (Kopac et al., 2014). Richards worked out the timing of gene gains and losses in Streptococcus, and found an early period of net gene gains and then a later period of net losses (Richards et al., 2014).Herring et al. use a genome “re-sequencing” approach to infer that single changes in one gene might have manifold effects on gene expression across the genome (Herring et al., 2006). Ferea et al. present a classic piece of work showing the hundreds of gene expression changes that yeast undergoes as it spontaneously evolves to be aerobic (in the absence of competitors) (Ferea et al., 1999). Sumby et al. use genome-wide gene expression and genome resequencing to show that passaging a non-pathogenic strain of Strep through a mouse brings about evolution of virulence through a single change in a signal transducing gene brings about massive changes in gene expression, including dozens of virulence genes (Sumby et al., 2006). Hahne et al. explore in one strain of Bacillus subtilis the various gene expression changes genome-wide that respond to a salinity challenge (Hahne et al., 2010). Dettman et al. discuss the diversity of evolutionary responses among closely related populations to a single selection pressure through the magic of genome-wide gene expression analyses (Dettman et al., 2012). Vital et al. work out the transcriptome differences between commensal and aquatic strains of E. coli (Vital et al., 2015). Gómez-Lozano et al. show the power of RNA-seq to open-endedly discover differences in expression beyond the annotated genome (Gómez-Lozano et al., 2012).

Nov. 10

18. Genome-wide approaches for finding shared genes under recent positive selection

Williamson et al. present a genome-wide analysis of selective sweeps in the human genome, across the entire species and within ethnic groups (Williamson et al., 2007). Pavlidis et al. presented a new algorithm (SweeD) for detecting selective sweeps from an input of thousands of whole-genome sequences (Pavlidis et al., 2013). Here they applied it to detect several genes that underwent a selective sweep on human chromosome 1. Clark et al. performed a genome-wide phylogenetic analysis of positive selection in the human lineage, compared to chimps, and with mouse as the outgroup (Clark et al., 2003). Note how they identified the individual genes under selection in the human lineage, and how they identified functional classes of genes with a particularly high frequency of accelerated evolution in humans. Vos developed a species concept for bacteria based on each ecotype having its own unique history of positive selection (Vos, 2011); you might think about how this idea may yield the same or different demarcations of ecotypes. Our Kopac et al. 2014 article implements this approach to find evidence that the most closely related strains we can find within Bacillus subtilis are already ecologically divergent (Kopac et al., 2014). Vos et al. present their new computer package ODoSE to find bacterial ecotypes as units that are different in their histories of positive selection (Vos et al., 2013). PAML implements a maximum likelihood approach (Yang, 2007).

Nov. 12

19. Metagenomics in ecosystems biology: how to find out the

Bell et al. present evidence that increasing bacterial diversity increases the productivity of an ecosystem (Bell et al., 2005). Lay et al. investigate the functional diversity in an extremely cold and salty spring at the top of the world; they find that certain functions are found redundantly in a great diversity of organisms, while others are not (Lay et al., 2013). Simon et al. use

Page 9: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

physiological processes occurring in an ecosystem even when we don’t know who the organisms are

a metagenomic approach to studying the microbial organismic diversity on a glacier; they also discover the genes responsible for protection against the cold in this community (Simon et al., 2009). McHardy et al. present a package called Phylopythia, for identifying organisms from a single metagenomic sequence, based on nucleotide composition (McHardy et al., 2007). Cecchini et al. use a metagenomic approach to figure out which organisms provide certain functions in the environment, in this case the ability to utilize prebiotic compounds (Cecchini et al., 2013). McMahon et al. present a functional screen for novel genes that provide a certain function, and they show that the host in which metagenomic segments are cloned makes a big difference in their expression (and ability to be screened) (McMahon et al., 2012). Sommer et al. perform a functional screen for antibiotic resistance genes in human guts; surprisingly, there are many resistance genes that show only a distant relationship to those resistance genes isolated from cultured bacteria (Sommer et al., 2009). Robertson et al. perform a functional screen for novel nitrilases, and are able to chart the history of evolutionary transitions from activity on one enantiomer to activity on another (Robertson et al., 2004). Rinke et al. show how single-cell genomics (i.e., sequencing the entire genome of one cell we cannot culture) add to our understanding of the functional repertoire of an ecosystem (Rinke et al., 2013). (More from Rinke in the next lecture on the diversity of organisms in bacterial communities.) And one last bit (discovered after the lecture). Rocca et al. have tested the assumption that metagenomic or metatranscriptomic abundance of a gene is correlated with the amount of the process coded by the gene in the ecosystem—not such a great correlation, it turns out (Rocca et al., 2015)†. Nayfach et al. have very recently developed a pipeline for accurately annotating the functions of genes in a metagenome (Nayfach et al., 2015)†.

Nov. 17

20. Metagenomic approaches for characterizing community-wide organismal diversity

DeSantis et al. present their algorithm and web site, GreenGenes, for classifying a 16S rRNA sequence to a taxon (DeSantis et al., 2006). Konstantinidis and Tiedje present evidence for criteria (or a range of criteria) of 16S rRNA divergence for demarcating taxa of different ranks (Konstantinidis and Tiedje, 2005). Kim et al. is my foray into discovery of new genera and species by 16S rRNA analysis of environmental DNA (Kim et al., 2012). There are various tools for classifying unknown sequences into taxa, including Qiime (Caporaso et al., 2010), Mothur (Schloss et al., 2009), and BioMaS (Fosso et al., 2015). An approach to gene-based classification to identify elephant populations that are being poached is presented by Wasser et al. (Wasser et al., 2015). Sogin et al. present the first high-throughput sequencing of environmental DNA from a marine habitat, providing evidence that there is an extraordinary diversity of extremely rare organisms (Sogin et al., 2006). Hughes et al. review the various ways to estimate the total richness of a community when we know we haven’t sampled enough organisms to see everyone who is there (Hughes et al., 2001). We briefly revisit Simon et al., who gave an example of characterizing the organismic diversity of a community by assigning protein-coding genes from the metagenome to taxa (Simon et al., 2009); also, we revisit PhyloPythia (McHardy et al., 2007). Hess et al. perform the amazing feat of obtaining a nearly complete genome sequence of various organisms from the metagenome fragments of a cow’s

Page 10: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

rumen (Hess, 2011); Mackelprang et al. obtained a similar result from permafrost soil, obtaining the sequence of a novel methanogen from permafrost soil (Mackelprang et al., 2011). Rinke et al. provide results from single-cell genome sequencing of various phyla that had never previously been sequenced; this provided evidence for four previously unknown superphyla (Rinke et al., 2013). Just to show that we care about the gene-based discovery of phylogenetic supergroups in non-bacteria, we provide the discovery of superorders of mammals (Bininda-Emonds et al., 2007).

Nov. 19

21. Ecological niche modeling(Sophie Breitbart lecture)

Levine et al. present an ecological niche modeling analysis of where the pathogen monkeypox could survive in today’s world (Levine et al., 2007). Why might monkeypox not be able to survive in some of the places where it is predicted to be able to? Peterson and colleagues predict with ENM what habitats the chachalaca of Mexico will be leaving and which it will be entering between now and 2055; more generally, they predict species turnovers among all species of birds and mammals in Mexico (Peterson et al., 2009). Batalden et al. predict the future geographic distribution of monarch butterflies, assuming that their food organisms (milkweeds) can keep up with climate change (Batalden et al., 2007). Peterson predicts the future distribution of malaria in Africa (Peterson, 2009). And now, something completely different, or is it? Lozier et al. present an ENM analysis of Sasquatch citings (Lozier et al., 2009)!

Dec. 1

22. Metagenomic approaches to finding out what unidentified genes do (ecological annotation); bioinformatic prospecting

Here are the references for the metagenome projects discussed in class (Wu et al., 2009; Turnbaugh et al., 2007; Gilbert et al., 2010; 10K, 2009; Davies et al., 2012; Tyson et al., 2004). Knight et al. and Field et al. plea for a new standard of coverage of environmental data in metagenomics studies (Knight et al., 2012; Field et al., 2011). Plewniak give a nice old-style example of how we can identify the genes responsible for adaptation to a given geochemical stressor, if we already know the genes (Plewniak et al., 2013). Inskeep et al. give a nice example of extremely different sets of geochemical stressors across habitats in a metagenome study (Inskeep et al., 2010). Biddle et al. give an example of less extreme variation among environments, where the same phyla are found everywhere, possibly a good source of ecological annotation (Biddle et al., 2011). Mackay et al. describe the Drosophila melanogaster genetic reference panel, which consists of the genome sequences of 168 inbred lines derived from a single natural population; this is being used to determine the genes responsible for each of many physiological, behavioral, and ecological traits (Mackay et al., 2012).

Dec. 3

23. The human microbiome: types of communities across humans, functional screening for novel genes, antibiotic holocausts and

Our story today begins with the emergence of the germ theory of disease, and an attitude both within households and in the public health establishment that the only good germ is a dead germ; I recommend The Gospel of Germs by Nancy Tomes as a great narrative of this period, from the 1870’s mostly until the antibiotic revolution of the 1940’s (Tomes, 1998). Martin Blaser’s Missing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health (Blaser, 2014; Collen, 2015). Zimmer and Velasquez-Manoff have recently written short popular accounts on this issue (Pollan, 2013; Velasquez-Manoff, 2013) http://www.nytimes.com/2013/05/19/magazine/say-hello-to-the-100-

Page 11: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

health consequences

trillion-bacteria-that-make-up-your-microbiome.html?ref=magazine . The most direct repercussion of the germ-as-enemy approach, leading to overuse of antibiotics, has been the emergence of antibiotic resistance. Forslund et al. present data on the prevalence of antibiotic resistance in different countries, and the relationship between use of antibiotics for animal agriculture and resistance in the human gut microbiome (Forslund et al., 2013). More recently, we have reached an appreciation for the beneficial qualities of our gut bacteria, and Khosravi and Mazmanian describe the disease-fighting importance of our resident bacteria (Khosravi and Mazmanian, 2013). Pérez-Cobas describe the lasting effect of an antibiotic regimen on the composition of an individual’s gut microbiome (Perez-Cobas et al., 2013). Cho et al. chart the change in the mouse microbiome with early antibiotic treatment (Cho et al., 2012). Liping Zhao presents a proposal for a research field where we use various bioinformatic approaches to determine the organismal changes correlated with obesity and leanness, and then perform experiments to test the effects of the implicated bacteria (Zhao, 2013). Wu et al. make a case that there are two primary clusters of bacterial gut communities across humanity, one dominated by Prevotella and associated with a carbohydrate diet, and another dominated by Bacteroides and associated with a diet high in fat and proteins; they also show that the microbiome can be changed in the short-term but that it probably takes a long time to fully change a human’s gut microbiome (Wu et al., 2011). Lozupone and Knight have developed a very useful algorithm called Unifrac for clustering bacterial communities by their phylogenetic differences; it is described in a couple of articles (Lozupone et al., 2006; Lozupone and Knight, 2005). Muegge et al. find a functional pattern to the differences in microbiomes of mammalian herbivores vs. carnivores; they find such interesting things as carnivores tending to have microbiomes with lots of amino acid degradation enzymes, while herbivores tend to have lots of amino acid biosynthesis enzymes, which makes sense when you think about it. They also make the case that the microbiomes of human vegans tend to look more like those of mammalian herbivores, while microbiomes of human meat-eaters tend to look like those of mammalian carnivores (Muegge et al., 2011). A mouse study by Zhang et al. (including Liping Zhao) shows that the microbiome of a mouse is rapidly changed with the onset of a high-fat diet, and changes back quickly with resumption of a low-fat diet; they also identify key phylotypes that change rapidly with the change in diet, a step toward replacing fecal transplant with targeted probiotic therapy (Zhang et al., 2012). The next step in Liping Zhao’s paradigm is to test each of these taxa for its effect on weight by introducing it into gnotobiotic mice; I supply a (non-bioinformatic) example with a previously suspected effect of the bacterium on reducing inflammation (Sokol et al., 2008). With thanks to our own Ariel Kaluzhny, here is an article from Wired on a start-up to do poop metagenome sequencing for the CDC (Zhang, 2015).

Dec. 8

24. Baseball, biology, and big data

I have written a couple of pieces on Sandy Koufax’s perfect game, and what it taught me about using our imaginations better to have a fuller and more useful data set (Cohan, 2011b; Cohan, 2011c). I also wrote an article on how Big Data approaches can be used better in biology, in homage to Moneyball (Cohan, 2012). Lozupone and Knight wrote their break-out piece on Unifrac,

Page 12: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

showing that changes in adaptations to salinity were the most difficult transitions in bacterial history; they also lamented that the resolution of the environmental data was such that they could not also investigate the difficulty of more subtle changes in salinity adaptation (Lozupone and Knight, 2007). Travis Sawchik’s Big Data Baseball talks about the second wave of big data transforming baseball (Sawchik, 2015). Schoenfield’s article on Ned Yost discusses how a manager who has totally eschewed big data has succeeded (Schoenfeld, 2015). The disappointment of the missing data led to various conferences on what environmental parameters (and sequencing and assembly tools!) we should be recording when we spend millions of dollars on genome and metagenome sequencing (Field et al., 2008; Gilbert et al., 2010). David Toomey writes about how we see only what we expect to see. This was exemplified by microbiologists’ disinterest in exploring what life may exist in Yellowstone’s hot springs, owing to their “knowledge” that life couldn’t possibly exist at such high temperatures (Toomey, 2013), p. 12-13. See also Thomas Brock’s account of his discovery (Brock, 1995) and Thomas Kuhn’s account of our limitations toward discovery (Kuhn, 1996), p. 63-64. Hurwirtz and Sullivan have organized the unknown diversity among marine viral proteins by clustering them, and then trying to find out what ocean properties each cluster is associated with (Hurwitz and Sullivan, 2013). A whole new and open-ended way of seeing biodiversity (maybe with less preexisting bias) is being explored by Map of Life, a mobile application that allows researchers to add in the field to the knowledge base of species that they encounter; plus it gives all the species that one can expect in the area that you are in (using your GPS); a kind of a community-based wiki of species distributions (http://mol.org ) (Goldsmith, 2015).

Dec. 10

25. Molecular approaches for identifying microbial diversity in natural communities and tests of models of bacterial speciation

Two of our recent papers have explored the diversity of models of bacterial speciation (Cohan, 2011a; Kopac et al., 2014). Some of our earlier papers defined a wider range of speciation models but were more wedded to the Stable Ecotype model (Cohan and Perry, 2007). A classic paper by Silvia Acinas on marine bacterial diversity used a lineage-through-time approach to provide evidence for ecotypes (Acinas et al., 2004); Danny and I developed our algorithm Ecotype Simulation to estimate the parameters of speciation dynamics that would best match a lineage-through-time plot (Koeppel et al., 2008). We later compared Ecotype Simulation to various other algorithms that also purported to find ecotypes, and found that ES competed well (Francisco et al., 2014). We showed that the putative ecotypes demarcated for Bacillus by Ecotype Simulation were ecologically distinct, as based on habitat associations and physiological differences relating to heat tolerance (Connor et al., 2010). In what I call Ford Doolittle’s theory of Very Little Species, he has compellingly argued that the rate of ecological diversification may be extremely rapid in many bacteria, owing to the high rate of horizontal genetic transfer (Doolittle and Zhaxybayeva, 2009). Our genomic study of variation within one putative ecotype of Bacillus has indeed shown an extremely high rate of speciation, and has supported the Nano-Niche model of speciation (Kopac et al., 2014). Michiel Vos has argued that we can identify bacterial ecotypes by finding groups with unique histories of positive selection (Vos, 2011). In our study of diversification among extremely close relatives of

Page 13: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Synechococcus within Yellowstone’s hot springs, we found that putative ecotypes identified by ES (with up to 0.5% genomic divergence) were ecologically homogeneous, indicating a slow rate of speciation in this group (Becraft et al., 2015). Also, our analysis of several genomes did not indicate evidence of positive selection for any gene, within putative ecotypes (Olsen et al., 2015). A not-quite-published paper by Bendall et al. (from the Trina McMahon and Rex Malmstrom labs) gives evidence for genome-wide sweeps in some sequence clusters from Trout Bog Lake, but single-gene sweeps in other clusters. They offer an recombinational model for the differences in the breadth of selective sweeps, but in a paper to come out as soon as Bendall et al. comes out, I argue for an ecological mechanism (Cohan, 2016). Kwong et al. show that a phylogeny based on a concatenation of the core genome of Listeria monocytogenes yielded many unexpected, small sequence clusters (Kwong et al., 2015); this illustrates a way to test for microdiversity within the sequence clusters of Trout Bog Lake.

10K, G. (2009). Genome 10K: A proposal to obtain whole-genome sequence for 10,000 vertebrate species. Journal of Heredity 100: 659-674.

Acinas, S. G., Klepac-Ceraj, V., Hunt, D. E., Pharino, C., Ceraj, I., Distel, D. L. & Polz, M. F. (2004). Fine-scale phylogenetic architecture of a complex bacterial community. Nature 430: 551-554.

Adoutte, A., Balavoine, G., Lartillot, N., Lespinet, O., Prud'homme, B. & de Rosa, R. (2000). The new animal phylogeny: reliability and implications. Proc Natl Acad Sci U S A 97(9): 4453-4456.

Anisimova, M. & Gascuel, O. (2006). Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol 55(4): 539-552.

Batalden, R. V., Oberhauser, K. & Peterson, A. T. (2007). Ecological niches in sequential generations of eastern North American monarch butterflies (Lepidoptera: Danaidae): the ecology of migration and likely climate change implications. Environ Entomol 36(6): 1365-1373.

Baum, D. A. & Smith, S. D. (2013). Tree Thinking: An Introduction to Phylogenetic Biology. Greenwood Village, Colorado: Roberts & Company Publishers.

Becraft, E. D., Wood, J. M., Rusch, D. B., Kühl, M., Jensen, S. I., Bryant, D. A., Roberts, D. W., Cohan, F. M. & Ward, D. M. (2015). The molecular dimension of microbial species: 1. Ecological distinctions among, and homogeneity within, putative ecotypes of Synechococcus inhabiting the cyanobacterial mat of Mushroom Spring, Yellowstone National Park. Front Microbiol 6: 590.

Bell, T., Newman, J. A., Silverman, B. W., Turner, S. L. & Lilley, A. K. (2005). The contribution of species richness and composition to bacterial services. Nature 436(7054): 1157-1160.

Bhaya, D., Grossman, A. R., Steunou, A. S., Khuri, N., Cohan, F. M., Hamamura, N., Melendrez, M. C., Bateson, M. M., Ward, D. M. & Heidelberg, J. F. (2007). Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME J 1(8): 703-713.

Biddle, J. F., White, J. R., Teske, A. P. & House, C. H. (2011). Metagenomics of the subsurface Brazos-Trinity Basin (IODP site 1320): comparison with other sediment and pyrosequenced metagenomes. ISME J 5(6): 1038-1047.

Bininda-Emonds, O. R., Cardillo, M., Jones, K. E., MacPhee, R. D., Beck, R. M., Grenyer, R., Price, S. A., Vos, R. A., Gittleman, J. L. & Purvis, A. (2007). The delayed rise of present-day mammals. Nature 446(7135): 507-512.

Blaser, M. J. (2014). Missing Microbes: How the Overuse of Antibiotics Is Fueling Our Modern Plagues. New York: Henry Holt and Co.

Page 14: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Borodovsky, M., Mills, R., Besemer, J. & Lomsadze, A. (2003). Prokaryotic gene prediction using GeneMark and GeneMark.hmm. Curr Protoc Bioinformatics Chapter 4: Unit4 5.

Bos, D. H. & Posada, D. (2005). Using models of nucleotide evolution to build phylogenetic trees. Dev Comp Immunol 29(3): 211-227.

Brenner, S. E., Hubbard, T., Murzin, A. & Chothia, C. (1995). Gene duplications in H. influenzae. Nature 378(6553): 140.

Brock, T. D. (1995). The road to Yellowstone--and beyond. Annu Rev Microbiol 49: 1-28.Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Pena,

A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., Reeder, J., Sevinsky, J. R., Turnbaugh, P. J., Walters, W. A., Widmann, J., Yatsunenko, T., Zaneveld, J. & Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7(5): 335-336.

Cecchini, D. A., Laville, E., Laguerre, S., Robe, P., Leclerc, M., Dore, J., Henrissat, B., Remaud-Simeon, M., Monsan, P. & Potocki-Veronese, G. (2013). Functional metagenomics reveals novel pathways of prebiotic breakdown by human gut bacteria. PLoS One 8(9): e72766.

Chain, P. S., Carniel, E., Larimer, F. W., Lamerdin, J., Stoutland, P. O., Regala, W. M., Georgescu, A. M., Vergez, L. M., Land, M. L., Motin, V. L., Brubaker, R. R., Fowler, J., Hinnebusch, J., Marceau, M., Medigue, C., Simonet, M., Chenal-Francisque, V., Souza, B., Dacheux, D., Elliott, J. M., Derbise, A., Hauser, L. J. & Garcia, E. (2004). Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A 101(38): 13826-13831.

Cho, I., Yamanishi, S., Cox, L., Methe, B. A., Zavadil, J., Li, K., Gao, Z., Mahana, D., Raju, K., Teitler, I., Li, H., Alekseyenko, A. V. & Blaser, M. J. (2012). Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488(7413): 621-626.

Clark, A. G., Glanowski, S., Nielsen, R., Thomas, P. D., Kejariwal, A., Todd, M. A., Tanenbaum, D. M., Civello, D., Lu, F., Murphy, B., Ferriera, S., Wang, G., Zheng, X., White, T. J., Sninsky, J. J., Adams, M. D. & Cargill, M. (2003). Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302(5652): 1960-1963.

Cohan, F. M. (2010). Synthetic biology: now that we're creators, what should we create? Curr Biol 20(16): R675-677.

Cohan, F. M. (2011a).Are species cohesive?--A view from bacteriology. In Bacterial Population Genetics: A Tribute to Thomas S. Whittam, 43-65 (Eds S. Walk and P. Feng). Washington, DC: American Society for Microbiology Press.

Cohan, F. M. (2011b).A more perfect numbers game. In Los Angeles Times.Cohan, F. M. (2011c). Q&A: Frederick Cohan. Current Biology 21(11): R412-R414.Cohan, F. M. (2012). Science needs more Moneyball. American Scientist 100(3): 182-185.Cohan, F. M. (2013).Species. In Brenner's Encyclopedia of Genetics, Second Edition, 506-511 (Eds S.

Maloy and K. Hughes). Amsterdam: Elsevier.Cohan, F. M. (2016). Bacterial Speciation: Genetic Sweeps in Bacterial Species. Current Biology.Cohan, F. M. & Kopac, S. M. (2011). Microbial genomics: E. coli relatives out of doors and out of body.

Curr Biol 21(15): R587-589.Cohan, F. M. & Perry, E. B. (2007). A systematics for discovering the fundamental units of bacterial

diversity. Current Biology 17: R373-R386.Collen, A. (2015). 10% Human: How Your Body's Microbes Hold the Key to Health and Happiness. New

York: HarperCollins.Connor, N., Sikorski, J., Rooney, A. P., Kopac, S., Koeppel, A. F., Burger, A., Cole, S. G., Perry, E. B.,

Krizanc, D., Field, N. C., Slaton, M. & Cohan, F. M. (2010). The ecology of speciation in Bacillus. Applied and Environmental Microbiology 76: 1349-1358.

Page 15: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Consortium, T. U. (2013). Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 41: D43-D47.

Davies, N., Field, D. & Genomic Observatories, N. (2012). Sequencing data: A genomic network to monitor Earth. Nature 481(7380): 145.

Degnan, J. H. & Rosenberg, N. A. (2006). Discordance of species trees with their most likely gene trees. PLoS Genet 2(5): e68.

Delcher, A. L., Bratke, K. A., Powers, E. C. & Salzberg, S. L. (2007). Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6): 673-679.

DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., Huber, T., Dalevi, D., Hu, P. & Andersen, G. L. (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72(7): 5069-5072.

Dettman, J. R., Rodrigue, N., Melnyk, A. H., Wong, A., Bailey, S. F. & Kassen, R. (2012). Evolutionary insight from whole-genome sequencing of experimentally evolved microbes. Mol Ecol 21(9): 2058-2077.

Donoghue, M. J. (1989). Phylogenies and the analysis of evolutionary sequences, with examples from seed plants. Evolution 43: 1137-1156.

Doolittle, W. F. & Zhaxybayeva, O. (2009). On the origin of prokaryotic species. Genome Res 19(5): 744-756.

Eddy, S. R. (2004). What is dynamic programming? Nat Biotechnol 22(7): 909-910.Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Nucleic Acids Research 32: 1792-1797.Ferea, T. L., Botstein, D., Brown, P. O. & Rosenzweig, R. F. (1999). Systematic changes in gene expression

patterns following adaptive evolution in yeast. Proc Natl Acad Sci U S A 96(17): 9721-9726.Field, D., Amaral-Zettler, L., Cochrane, G., Cole, J. R., Dawyndt, P., Garrity, G. M., Gilbert, J., Glockner, F.

O., Hirschman, L., Karsch-Mizrachi, I., Klenk, H. P., Knight, R., Kottmann, R., Kyrpides, N., Meyer, F., San Gil, I., Sansone, S. A., Schriml, L. M., Sterk, P., Tatusova, T., Ussery, D. W., White, O. & Wooley, J. (2011). The Genomic Standards Consortium. PLoS Biol 9(6): e1001088.

Field, D., Garrity, G., Gray, T., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., Thomson, N., Allen, M. J., Angiuoli, S. V., Ashburner, M., Axelrod, N., Baldauf, S., Ballard, S., Boore, J., Cochrane, G., Cole, J., Dawyndt, P., De Vos, P., DePamphilis, C., Edwards, R., Faruque, N., Feldman, R., Gilbert, J., Gilna, P., Glockner, F. O., Goldstein, P., Guralnick, R., Haft, D., Hancock, D., Hermjakob, H., Hertz-Fowler, C., Hugenholtz, P., Joint, I., Kagan, L., Kane, M., Kennedy, J., Kowalchuk, G., Kottmann, R., Kolker, E., Kravitz, S., Kyrpides, N., Leebens-Mack, J., Lewis, S. E., Li, K., Lister, A. L., Lord, P., Maltsev, N., Markowitz, V., Martiny, J., Methe, B., Mizrachi, I., Moxon, R., Nelson, K., Parkhill, J., Proctor, L., White, O., Sansone, S. A., Spiers, A., Stevens, R., Swift, P., Taylor, C., Tateno, Y., Tett, A., Turner, S., Ussery, D., Vaughan, B., Ward, N., Whetzel, T., San Gil, I., Wilson, G. & Wipat, A. (2008). The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26(5): 541-547.

Fierer, N., Hamady, M., Lauber, C. L. & Knight, R. (2008). The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci U S A 105(46): 17994-17999.

Forslund, K., Sunagawa, S., Kultima, J. R., Mende, D. R., Arumugam, M., Typas, A. & Bork, P. (2013). Country-specific antibiotic use practices impact the human gut resistome. Genome Res 23(7): 1163-1169.

Fosso, B., Santamaria, M., Marzano, M., Alonso-Alemany, D., Valiente, G., Donvito, G., Monaco, A., Notarangelo, P. & Pesole, G. (2015). BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS. BMC Bioinformatics 16: 203.

Page 16: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Francisco, J. C., Cohan, F. M. & Krizanc, D. (2014). Accuracy and efficiency of algorithms for the demarcation of bacterial ecotypes from DNA sequence data. Int. J. Bioinformatics Research and Applications 10: 409-425.

Funch, P. & Kristensen, R. (1995). Cycliophora is a new phylum with affinities to Entoprocta and Ectoprocta. Nature 378: 711-714.

Futuyma, D. J. (1998). Evolutionary Biology.Gardy, J. L., Laird, M. R., Chen, F., Rey, S., Walsh, C. J., Ester, M. & Brinkman, F. S. (2005). PSORTb v.2.0:

expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics 21(5): 617-623.

Genoways, H. H. & Choate, J. r. (1972). A multivariate analysis of systematic relationships among populations of the short-tailed shrew (genus Blarina) in Nebraska. Systematic Zoology 21: 106-116.

Gilbert, J. A., Meyer, F., Jansson, J., Gordon, J., Pace, N., Tiedje, J., Ley, R., Fierer, N., Field, D., Kyrpides, N., Glockner, F. O., Klenk, H. P., Wommack, K. E., Glass, E., Docherty, K., Gallery, R., Stevens, R. & Knight, R. (2010). The Earth Microbiome Project: Meeting report of the "1 EMP meeting on sample selection and acquisition" at Argonne National Laboratory October 6 2010. Stand Genomic Sci 3(3): 249-253.

Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S. & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature 457(7232): 1012-1014.

Goldsmith, G. R. (2015). The field guide, rebooted. Science 349: 594.Gómez-Lozano, M., Marvig, R. L., Molin, S. & Long, K. S. (2012). Genome-wide identification of novel

small RNAs in Pseudomonas aeruginosa. Environ Microbiol 14(8): 2006-2016.Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. (2010). New algorithms

and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59(3): 307-321.

Hahne, H., Mader, U., Otto, A., Bonn, F., Steil, L., Bremer, E., Hecker, M. & Becher, D. (2010). A comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress adaptation. J Bacteriol 192(3): 870-882.

Hao, W. & Golding, G. B. (2006). The fate of laterally transferred genes: life in the fast lane to adaptation or death. Genome Res 16(5): 636-643.

Harel, D. (2000).Sometimes we just don't know. In computers Ltd.: what they really can't do, 91-117 Oxford: Oxford Univ. Press.

Harvey, P. H. & Pagel, M. D. (1991). The Comparative Method in Evolutionary Biology. Oxford: Oxford University Press.

Herring, C. D., Raghunathan, A., Honisch, C., Patel, T., Applebee, M. K., Joyce, A. R., Albert, T. J., Blattner, F. R., van den Boom, D., Cantor, C. R. & Palsson, B. O. (2006). Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat Genet 38(12): 1406-1412.

Hess, M. (2011). Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331: 463-467.

Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294(5550): 2310-2314.

Hughes, J. B., Hellmann, J. J., Ricketts, T. H. & Bohannan, B. J. (2001). Counting the uncountable: statistical approaches to estimating microbial diversity. Appl Environ Microbiol 67(10): 4399-4406.

Hurwitz, B. L. & Sullivan, M. B. (2013). The Pacific Ocean virome (POV): a marine viral metagenomic dataset and associated protein clusters for quantitative viral ecology. PLoS One 8(2): e57355.

Page 17: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Huson, D. H. & Steel, M. (2004). Phylogenetic trees based on gene content. Bioinformatics 20(13): 2044-2049.

Hyatt, D., Chen, G. L., Locascio, P. F., Land, M. L., Larimer, F. W. & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.

Inskeep, W. P., Rusch, D. B., Jay, Z. J., Herrgard, M. J., Kozubal, M. A., Richardson, T. H., Macur, R. E., Hamamura, N., Jennings, R., Fouke, B. W., Reysenbach, A. L., Roberto, F., Young, M., Schwartz, A., Boyd, E. S., Badger, J. H., Mathur, E. J., Ortmann, A. C., Bateson, M., Geesey, G. & Frazier, M. (2010). Metagenomes from high-temperature chemotrophic systems reveal geochemical controls on microbial community structure and function. PLoS One 5(3): e9773.

Iwabe, N., Kuma, K., Hasegawa, M., Osawa, S. & Miyata, T. (1989). Evolutionary relationship of archaebacteria, eubacteria, and eukaryotes inferred from phylogenetic trees of duplicated genes. Proc Natl Acad Sci U S A 86(23): 9355-9359.

Kämpfer, P., Kroppenstedt, R. M. & Dott, W. (1991). A numerical classification of the genera Streptomyces and Streptoverticillium using miniaturized physiological tests. Journal of General Microbiology 137: 1831-1891.

Kanehisa, M., Goto, S., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. (2014). Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42(Database issue): D199-205.

Keeling, P. J. & Palmer, J. D. (2008). Horizontal gene transfer in eukaryotic evolution. Nat Rev Genet 9(8): 605-618.

Khosravi, A. & Mazmanian, S. K. (2013). Disruption of the gut microbiome as a risk factor for microbial infections. Curr Opin Microbiol 16(2): 221-227.

Kim, J. S., Makama, M., Petito, J., Park, N. H., Cohan, F. M. & Dungan, R. S. (2012). Diversity of Bacteria and Archaea in hypersaline sediment from Death Valley National Park, California. MicrobiologyOpen 1(2): 135-148.

Kleiner, M., Wentrup, C., Lott, C., Teeling, H., Wetzel, S., Young, J., Chang, Y. J., Shah, M., VerBerkmoes, N. C., Zarzycki, J., Fuchs, G., Markert, S., Hempel, K., Voigt, B., Becher, D., Liebeke, M., Lalk, M., Albrecht, D., Hecker, M., Schweder, T. & Dubilier, N. (2012). Metaproteomics of a gutless marine worm and its symbiotic microbial community reveal unusual pathways for carbon and energy use. Proceedings of the National Academy of Sciences of the United States of America 109(19): E1173-E1182.

Knight, R., Jansson, J., Field, D., Fierer, N., Desai, N., Fuhrman, J. A., Hugenholtz, P., van der Lelie, D., Meyer, F., Stevens, R., Bailey, M. J., Gordon, J. I., Kowalchuk, G. A. & Gilbert, J. A. (2012). Unlocking the potential of metagenomics through replicated experimental design. Nat Biotechnol 30(6): 513-520.

Koeppel, A., Perry, E. B., Sikorski, J., Krizanc, D., Warner, W. A., Ward, D. M., Rooney, A. P., Brambilla, E., Connor, N., Ratcliff, R. M., Nevo, E. & Cohan, F. M. (2008). Identifying the fundamental units of bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics. Proceedings of the National Academy of Sciences 105: 2504-2509.

Konstantinidis, K. T. & Tiedje, J. M. (2005). Towards a genome-based taxonomy for prokaryotes. J Bacteriol 187(18): 6258-6264.

Kopac, S., Wang, Z., Wiedenbeck, J., Sherry, J., Wu, M. & Cohan, F. M. (2014). Genomic heterogeneity and ecological speciation within one subspecies of Bacillus subtilis. Applied and Environmental Microbiology 80: 4842-4853.

Kuhn, T. (1996). The Structure of Scientific Revolutions. Chicago: University of Chicago.

Page 18: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Kwong, J. C., Mercoulia, K., Tomita, T., Easton, M., Li, H. Y., Bulach, D. M., Stinear, T. P., Seemann, T. & Howden, B. P. (2015). Prospective whole genome sequencing enhances national surveillance of Listeria monocytogenes. Journal of Clinical Microbiology.

Larson, G., Dobney, K., Albarella, U., Fang, M., Matisoo-Smith, E., Robins, J., Lowden, S., Finlayson, H., Brand, T., Willerslev, E., Rowley-Conwy, P., Andersson, L. & Cooper, A. (2005). Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science 307(5715): 1618-1621.

Laurin, M. (2010). Assessment of the Relative Merits of a Few Methods to Detect Evolutionary Trends. Systematic Biology 59: 689-704.

Lay, C. Y., Mykytczuk, N. C., Yergeau, E., Lamarche-Gagnon, G., Greer, C. W. & Whyte, L. G. (2013). Defining the functional potential and active community members of a sediment microbial community in a high-arctic hypersaline subzero spring. Appl Environ Microbiol 79(12): 3637-3648.

Levine, R. S., Peterson, A. T., Yorita, K. L., Carroll, D., Damon, I. K. & Reynolds, M. G. (2007). Ecological niche and geographic distribution of human monkeypox in Africa. PLoS One 2(1): e176.

Li, M., Badger, J. H., Chen, X., Kwong, S., Kearney, P. & Zhang, H. (2001). An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17(2): 149-154.

Lin, G. N., Cai, Z., Lin, G., Chakraborty, S. & Xu, D. (2009). ComPhy: prokaryotic composite distance phylogenies inferred from whole-genome gene sets. BMC Bioinformatics 10 Suppl 1: S5.

Lozier, L. D., Aniello, P. & Hickerson, M. J. (2009). Predicting the distribution of Sasquatch in western North America: anything goes with ecological niche modelling. Journal of Biogeography.

Lozupone, C., Hamady, M. & Knight, R. (2006). UniFrac--an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7: 371.

Lozupone, C. & Knight, R. (2005). UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12): 8228-8235.

Lozupone, C. A. & Knight, R. (2007). Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104(27): 11436-11440.

Luo, C., Walk, S. T., Gordon, D. M., Feldgarden, M., Tiedje, J. M. & Konstantinidis, K. T. (2011). Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proc Natl Acad Sci U S A 108(17): 7200-7205.

Mackay, T. F., Richards, S., Stone, E. A., Barbadilla, A., Ayroles, J. F., Zhu, D., Casillas, S., Han, Y., Magwire, M. M., Cridland, J. M., Richardson, M. F., Anholt, R. R., Barron, M., Bess, C., Blankenburg, K. P., Carbone, M. A., Castellano, D., Chaboub, L., Duncan, L., Harris, Z., Javaid, M., Jayaseelan, J. C., Jhangiani, S. N., Jordan, K. W., Lara, F., Lawrence, F., Lee, S. L., Librado, P., Linheiro, R. S., Lyman, R. F., Mackey, A. J., Munidasa, M., Muzny, D. M., Nazareth, L., Newsham, I., Perales, L., Pu, L. L., Qu, C., Ramia, M., Reid, J. G., Rollmann, S. M., Rozas, J., Saada, N., Turlapati, L., Worley, K. C., Wu, Y. Q., Yamamoto, A., Zhu, Y., Bergman, C. M., Thornton, K. R., Mittelman, D. & Gibbs, R. A. (2012). The Drosophila melanogaster Genetic Reference Panel. Nature 482(7384): 173-178.

Mackelprang, R., Waldrop, M. P., DeAngelis, K. M., David, M. M., Chavarria, K. L., Blazewicz, S. J., Rubin, E. M. & Jansson, J. K. (2011). Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480(7377): 368-371.

Mallet, J. (1995). A species definition for the modern synthesis. Trends Ecol. Evol. 10: 294-299.McGrayne, S. B. (2011). The theory that would not die: how bayes' rule cracked the enigma code, hunted

down russian submarines, & emerged triumphant from two centuries of controversy. New Haven: Yale.

McHardy, A. C., Martin, H. G., Tsirigos, A., Hugenholtz, P. & Rigoutsos, I. (2007). Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4(1): 63-72.

Page 19: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

McMahon, M. D., Guan, C., Handelsman, J. & Thomas, M. G. (2012). Metagenomic analysis of Streptomyces lividans reveals host-dependent functional expression. Appl Environ Microbiol 78(10): 3622-3629.

Merhej, V., Royer-Carenzi, M., Pontarotti, P. & Raoult, D. (2009). Massive comparative genomic analysis reveals convergent evolution of specialized bacteria. Biol Direct 4: 13.

Michener, C. D. & Sokal, R. R. (1057). A Quantitative Approach to a Problem in Classification. Evolution 11: 130-162.

Mikkelsen, T. S., Hillier, L. W. & authors, a. m. o. (2005). Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437(7055): 69-87.

Miller, J. R., Koren, S. & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics 95(6): 315-327.

Morrison, D. A. (2009). Why would phylogeneticists ignore computerized sequence alignment? 58: 150-158.

Muegge, B. D., Kuczynski, J., Knights, D., Clemente, J. C., Gonzalez, A., Fontana, L., Henrissat, B., Knight, R. & Gordon, J. I. (2011). Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science 332(6032): 970-974.

Myers, E. W., Sutton, G. G., Smith, H. O., Adams, M. D. & Venter, J. C. (2002). On the sequencing and assembly of the human genome. Proc Natl Acad Sci U S A 99(7): 4145-4146.

Nayfach, S., Bradley, P. H., Wyman, S. K., Laurent, T. J., Williams, A., Eisen, J. A., Pollard, K. S. & Sharpton, T. J. (2015). Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes. PLoS Comput Biol 11(11): e1004573.

Nelson-Sathi, S., Dagan, T., Landan, G., Janssen, A., Steel, M., McInerney, J. O., Deppenmeier, U. & Martin, W. F. (2012). Acquisition of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc Natl Acad Sci U S A 109(50): 20537-20542.

Nosenko, T., Schreiber, F., Adamska, M., Adamski, M., Eitel, M., Hammel, J., Maldonado, M., Muller, W. E., Nickel, M., Schierwater, B., Vacelet, J., Wiens, M. & Worheide, G. (2013). Deep metazoan phylogeny: when different genes tell different stories. Mol Phylogenet Evol 67(1): 223-233.

Olsen, M. T., Nowack, S., Wood, J. M., Becraft, E. D., LaButti, K., Lipzen, A., Martin, J., Schackwitz, W. S., Rusch, D. B., Cohan, F. M., Bryant, D. A. & Ward, D. M. (2015). The molecular dimension of microbial species: 3. Comparative genomics of Synechococcus isolates with different light responses and in situ diel transcription patterns of associated putative ecotypes in the Mushroom Spring microbial mat. Front Microbiol 6: 604.

Pavlidis, P., Zivkovic, D., Stamatakis, A. & Alachiotis, N. (2013). SweeD: likelihood-based detection of selective sweeps in thousands of genomes. Mol Biol Evol 30(9): 2224-2234.

Penn, O., Privman, E., Ashkenazy, H., Landan, G., Graur, D. & Pupko, T. (2010). GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res 38(Web Server issue): W23-28.

Perez-Cobas, A. E., Gosalbes, M. J., Friedrichs, A., Knecht, H., Artacho, A., Eismann, K., Otto, W., Rojo, D., Bargiela, R., von Bergen, M., Neulinger, S. C., Daumer, C., Heinsen, F. A., Latorre, A., Barbas, C., Seifert, J., dos Santos, V. M., Ott, S. J., Ferrer, M. & Moya, A. (2013). Gut microbiota disturbance during antibiotic therapy: a multi-omic approach. Gut 62(11): 1591-1601.

Peterson, A. T. (2009). Shifting suitability for malaria vectors across Africa with warming climates. BMC Infect Dis 9: 59.

Peterson, I., Borrell, L. N., El-Sadr, W. & Teklehaimanot, A. (2009). A temporal-spatial analysis of malaria transmission in Adama, Ethiopia. Am J Trop Med Hyg 81(6): 944-949.

Plewniak, F., Koechler, S., Navet, B., Dugat-Bony, E., Bouchez, O., Peyret, P., Seby, F., Battaglia-Brunet, F. & Bertin, P. N. (2013). Metagenomic insights into microbial metabolism affecting arsenic dispersion in Mediterranean marine sediments. Mol Ecol 22(19): 4870-4883.

Pollan, M. (2013).Some of my best friends are germs. In New York TimesNew York.

Page 20: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Popa, O., Hazkani-Covo, E., Landan, G., Martin, W. & Dagan, T. (2011). Directed networks reveal genomic barriers and DNA repair bypasses to lateral gene transfer among prokaryotes. Genome Res 21(4): 599-609.

Probert, R. J., Daws, M. I. & Hay, F. R. (2009). Ecological correlates of ex situ seed longevity: a comparative study on 195 species. Annals of Botany 104(1): 57-69.

Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. (2012). NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 40(Database issue): D130-135.

Richards, C. L., Rosas, U., Banta, J., Bhambhra, N. & Purugganan, M. D. (2012). Genome-wide patterns of Arabidopsis gene expression in nature. PLoS Genet 8(4): e1002662.

Richards, V. P., Palmer, S. R., Pavinski Bitar, P. D., Qin, X., Weinstock, G. M., Highlander, S. K., Town, C. D., Burne, R. A. & Stanhope, M. J. (2014). Phylogenomics and the dynamic genome evolution of the genus Streptococcus. Genome Biol Evol 6(4): 741-753.

Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N. N., Anderson, I. J., Cheng, J. F., Darling, A., Malfatti, S., Swan, B. K., Gies, E. A., Dodsworth, J. A., Hedlund, B. P., Tsiamis, G., Sievert, S. M., Liu, W. T., Eisen, J. A., Hallam, S. J., Kyrpides, N. C., Stepanauskas, R., Rubin, E. M., Hugenholtz, P. & Woyke, T. (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature 499(7459): 431-437.

Robertson, D. E., Chaplin, J. A., DeSantis, G., Podar, M., Madden, M., Chi, E., Richardson, T., Milan, A., Miller, M., Weiner, D. P., Wong, K., McQuaid, J., Farwell, B., Preston, L. A., Tan, X., Snead, M. A., Keller, M., Mathur, E., Kretz, P. L., Burk, M. J. & Short, J. M. (2004). Exploring nitrilase sequence space for enantioselective catalysis. Appl Environ Microbiol 70(4): 2429-2436.

Rocca, J. D., Hall, E. K., Lennon, J. T., Evans, S. E., Waldrop, M. P., Cotner, J. B., Nemergut, D. R., Graham, E. B. & Wallenstein, M. D. (2015). Relationships between protein-encoding gene abundance and corresponding process are commonly assumed yet rarely observed. ISME J 9(8): 1693-1699.

Rokas, A., Williams, B. L., King, N. & Carroll, S. B. (2003). Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960): 798-804.

Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19(12): 1572-1574.

Ropars, J. (2015). Adaptive Horizontal Gene Transfers between Multiple Cheese-Associated Fungi. Current Biology 25: 1-8.

Saitou, N. & Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4(4): 406-425.

Salathé, M., Freifeld, C. C., Mekaru, S. R., Tomasulo, A. F. & Brownstein, J. S. (2013). Influenza A (H7N9) and the Importance of Digital Epidemiology. N Engl J Med.

Sawchik, T. (2015). Big Data Baseball: Math, Miracles, and the End of a 20-Year Losing Streak. New York: Flatiron Books.

Schloss, P. D. & Handelsman, J. (2004). Status of the microbial census. Microbiol Mol Biol Rev 68(4): 686-691.

Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., Lesniewski, R. A., Oakley, B. B., Parks, D. H., Robinson, C. J., Sahl, J. W., Stres, B., Thallinger, G. G., Van Horn, D. J. & Weber, C. F. (2009). Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75(23): 7537-7541.

Schoenfeld, B. (2015).How Ned Yost Made the Kansas City Royals Unstoppable. In New York Times.She, X., Jiang, Z., Clark, R. A., Liu, G., Cheng, Z., Tuzun, E., Church, D. M., Sutton, G., Halpern, A. L. &

Eichler, E. E. (2004). Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 431(7011): 927-930.

Page 21: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some Don't. New York: Penguin.

Simon, C., Wiezer, A., Strittmatter, A. W. & Daniel, R. (2009). Phylogenetic diversity and metabolic potential revealed in a glacier ice metagenome. Appl Environ Microbiol 75(23): 7519-7526.

Singer, M. S., Lichter-Marck, I. H., Farkas, T. E., Aaron, E., Whitney, K. D. & Mooney, K. A. (2014). Herbivore diet breadth mediates the cascading effects of carnivores in food webs. Proc Natl Acad Sci U S A 111(26): 9521-9526.

Sogin, M. L., Morrison, H. G., Huber, J. A., Mark Welch, D., Huse, S. M., Neal, P. R., Arrieta, J. M. & Herndl, G. J. (2006). Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci U S A 103(32): 12115-12120.

Sokol, H., Pigneur, B., Watterlot, L., Lakhdari, O., Bermudez-Humaran, L. G., Gratadoux, J. J., Blugeon, S., Bridonneau, C., Furet, J. P., Corthier, G., Grangette, C., Vasquez, N., Pochart, P., Trugnan, G., Thomas, G., Blottiere, H. M., Dore, J., Marteau, P., Seksik, P. & Langella, P. (2008). Faecalibacterium prausnitzii is an anti-inflammatory commensal bacterium identified by gut microbiota analysis of Crohn disease patients. Proc Natl Acad Sci U S A 105(43): 16731-16736.

Sommer, M. O., Dantas, G. & Church, G. M. (2009). Functional characterization of the antibiotic resistance reservoir in the human microflora. Science 325(5944): 1128-1131.

Sumby, P., Whitney, A. R., Graviss, E. A., DeLeo, F. R. & Musser, J. M. (2006). Genome-wide analysis of group a streptococci reveals a mutation that modulates global phenotype and disease specificity. PLoS Pathog 2(1): e5.

Sumner, J. G., Jarvis, P. D., Fernandez-Sanchez, J., Kaine, B. T., Woodhams, M. D. & Holland, B. R. (2012). Is the general time-reversible model bad for molecular phylogenetics? Syst Biol 61(6): 1069-1074.

Sun, Y.-C., Jarrett, Clayton O., Bosio, Christopher F. & Hinnebusch, B. J. (2014). Retracing the Evolutionary Path that Led to Flea-Borne Transmission of Yersinia pestis. Cell Host & Microbe 15(5): 578-586.

Swenson, M. S., Suri, R., Linder, C. R. & Warnow, T. (2012). SuperFine: fast and accurate supertree estimation. Syst Biol 61(2): 214-227.

Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I., Yin, J. J. & Natale, D. A. (2003). The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4: 41.

Thalmann, O., Shapiro, B., Cui, P., Schuenemann, V. J., Sawyer, S. K., Greenfield, D. L., Germonpre, M. B., Sablin, M. V., Lopez-Giraldez, F., Domingo-Roura, X., Napierala, H., Uerpmann, H. P., Loponte, D. M., Acosta, A. A., Giemsch, L., Schmitz, R. W., Worthington, B., Buikstra, J. E., Druzhkova, A., Graphodatsky, A. S., Ovodov, N. D., Wahlberg, N., Freedman, A. H., Schweizer, R. M., Koepfli, K. P., Leonard, J. A., Meyer, M., Krause, J., Paabo, S., Green, R. E. & Wayne, R. K. (2013). Complete mitochondrial genomes of ancient canids suggest a European origin of domestic dogs. Science 342(6160): 871-874.

Tomes, N. (1998). The Gospel of Germs: Men, Women, and the Microbe in American Life. Cambridge, Mass.: Harvard University Press.

Toomey, D. (2013). Weird Life: The Search for Life that Is Very, Very Different from our Own. New York: Norton.

Touchon, M., Hoede, C., Tenaillon, O., Barbe, V., Baeriswyl, S., Bidet, P., Bingen, E., Bonacorsi, S., Bouchier, C., Bouvet, O., Calteau, A., Chiapello, H., Clermont, O., Cruveiller, S., Danchin, A., Diard, M., Dossat, C., Karoui, M. E., Frapy, E., Garry, L., Ghigo, J. M., Gilles, A. M., Johnson, J., Le Bouguenec, C., Lescat, M., Mangenot, S., Martinez-Jéhanne, V., Matic, I., Nassif, X., Oztas, S., Petit, M. A., Pichon, C., Rouy, Z., Ruf, C. S., Schneider, D., Tourret, J., Vacherie, B., Vallenet, D.,

Page 22: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Médigue, C., Rocha, E. P. & Denamur, E. (2009). Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5(1): e1000344.

Turnbaugh, P. J., Ley, R. E., Hamady, M., Fraser-Liggett, C. M., Knight, R. & Gordon, J. I. (2007). The human microbiome project. Nature 449(7164): 804-810.

Tyson, G. W., Chapman, J., Hugenholtz, P., Allen, E. E., Ram, R. J., Richardson, P. M., Solovyev, V. V., Rubin, E. M., Rokhsar, D. S. & Banfield, J. F. (2004). Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428(6978): 37-43.

van Domselaar, G., Graham, M. & Strothard, P. (2014).Prokaryotic genome annotation. In Bioinformatics and Data Analysis in Microbiology, 81-111 (Ed Ö. Taşton Bishop). Norfolk: Caister Academic Press.

Velasquez-Manoff, M. (2013).A cure for the allergy epidemic? In New York TimesNew York Times.Veyrier, F., Pletzer, D., Turenne, C. & Behr, M. A. (2009). Phylogenetic detection of horizontal gene

transfer during the step-wise genesis of Mycobacterium tuberculosis. BMC Evol Biol 9: 196.Vital, M., Chai, B., Ostman, B., Cole, J., Konstantinidis, K. T. & Tiedje, J. M. (2015). Gene expression

analysis of E. coli strains provides insights into the role of gene regulation in diversification. ISME J 9(5): 1130-1140.

Vos, M. (2011). A species concept for bacteria based on adaptive divergence. Trends Microbiol 19(1): 1-7.

Vos, M., te Beek, T. A., van Driel, M. A., Huynen, M. A., Eyre-Walker, A. & van Passel, M. W. (2013). ODoSE: a webserver for genome-wide calculation of adaptive divergence in prokaryotes. PLoS One 8(5): e62447.

Wasser, S. K., Brown, L., Mailand, C., Mondol, S., Clark, W., Laurie, C. & Weir, B. S. (2015). CONSERVATION. Genetic assignment of large seizures of elephant ivory reveals Africa's major poaching hotspots. Science 349(6243): 84-87.

Waterston, R. H., Lander, E. S. & Sulston, J. E. (2002). On the sequencing of the human genome. Proc Natl Acad Sci U S A 99(6): 3712-3716.

Welch, R. A., Burland, V., Plunkett, G., 3rd, Redford, P., Roesch, P., Rasko, D., Buckles, E. L., Liou, S. R., Boutin, A., Hackett, J., Stroud, D., Mayhew, G. F., Rose, D. J., Zhou, S., Schwartz, D. C., Perna, N. T., Mobley, H. L., Donnenberg, M. S. & Blattner, F. R. (2002). Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99(26): 17020-17024.

Wheeler, T. J. & Eddy, S. R. (2013). nhmmer: DNA homology search with profile HMMs. Bioinformatics 29(19): 2487-2489.

Whidden, C., Zeh, N. & Beiko, R. G. (2014). Supertrees Based on the Subtree Prune-and-Regraft Distance. Syst Biol 63(4): 566-581.

White, M. A., Ane, C., Dewey, C. N., Larget, B. R. & Payseur, B. A. (2009). Fine-scale phylogenetic discordance across the house mouse genome. PLoS Genet 5(11): e1000729.

Wiedenbeck, J. & Cohan, F. M. (2011). Origins of bacterial diversity through horizontal gene transfer and adaptation to new ecological niches. FEMS Microbiology Reviews 35: 957-976.

Williamson, S. H., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D. & Nielsen, R. (2007). Localizing recent adaptive evolution in the human genome. PLoS Genet 3(6): e90.

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N. N., Kunin, V., Goodwin, L., Wu, M., Tindall, B. J., Hooper, S. D., Pati, A., Lykidis, A., Spring, S., Anderson, I. J., D'Haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J. F., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E. M., Kyrpides, N. C., Klenk, H. P. & Eisen, J. A. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462(7276): 1056-1060.

Page 23: wesfiles.wesleyan.edu · Web viewMissing Microbes and Alanna Collen’s 10% Human are extremely engaging book-length accounts of the importance of our gut microbiomes in human health

Wu, G. D., Chen, J., Hoffmann, C., Bittinger, K., Chen, Y. Y., Keilbaugh, S. A., Bewtra, M., Knights, D., Walters, W. A., Knight, R., Sinha, R., Gilroy, E., Gupta, K., Baldassano, R., Nessel, L., Li, H., Bushman, F. D. & Lewis, J. D. (2011). Linking long-term dietary patterns with gut microbial enterotypes. Science 334(6052): 105-108.

Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24(8): 1586-1591.Zerbino, D. R. & Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn

graphs. Genome Res 18(5): 821-829.Zhang, C., Zhang, M., Pang, X., Zhao, Y., Wang, L. & Zhao, L. (2012). Structural resilience of the gut

microbiota in adult mice under high-fat dietary perturbations. ISME J 6(10): 1848-1857.Zhang, S. (2015).Microbiome Startup uBiome Will Sequence Poop for the CDC. In Wired.Zhao, L. (2013). The gut microbiota and obesity: from correlation to causality. Nat Rev Microbiol 11(9):

639-647.Zhong, Y., Jia, Y., Gao, Y., Tian, D., Yang, S. & Zhang, X. (2013). Functional requirements driving the gene

duplication in 12 Drosophila species. BMC Genomics 14: 555.Zmasek, C. M. & Eddy, S. R. (2001). A simple algorithm to infer gene duplication and speciation events

on a gene tree. Bioinformatics 17(9): 821-828.