28
Cracking the (bio)code Resources for research careers in computational biology & bioinformatics Felipe Zapata, PhD Brown University @zapata_f Conner Sandefur, PhD Univ. North Carolina @oshehoma Emilia Huerta-Sanchez, PhD Univ. California, Merced @emiliahsc Tracy Heath, PhD Iowa State Univ. @trayc7 Visit our website: crackingthebiocode.github.io Information about the session Resources for learning to program: workshops, online courses, tutorials, etc. Links to many degree programs in the U.S. for studying computational biology/bioinformatics Profiles of computational biologists and bioinformaticians

Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Embed Size (px)

Citation preview

Page 1: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Cracking the (bio)code Resources for research careers in computational biology & bioinformatics

Felipe Zapata, PhDBrown University@zapata_f

Conner Sandefur, PhDUniv. North Carolina @oshehoma

Emilia Huerta-Sanchez, PhDUniv. California, Merced @emiliahsc

Tracy Heath, PhDIowa State Univ.@trayc7

Visit our website: crackingthebiocode.github.io● Information about the session● Resources for learning to program: workshops, online courses, tutorials, etc.● Links to many degree programs in the U.S. for studying computational

biology/bioinformatics● Profiles of computational biologists and bioinformaticians

Page 2: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

How small changes can make a big difference Bioinformatics @UNC-Pembroke Investigating how changes in gene

expression drive system-wide behaviorComputational Biology @UNC-Chapel HillPredicting therapies to improve mucus clearance in cystic

fibrosis (CF) and chronic obstructive pulmonary disease (COPD) 1 hr 24 hrs

-4 0 4

Tools I use:

Page 3: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Dr. Conner I. SandefurSPIRE Postdoctoral Scholar at UNC-CHVisiting Assistant Professor at UCNP

PhD BioinformaticsUniversity of Michigan Ann Arbor, Michigan

BA Computer Science George Washington UniversityWashington, DC

email: [email protected]: http://www.unc.edu/~sandefurtwitter: @oshehoma

Page 4: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

What is the evolutionary history of species?Using transcriptomes and genomes to

resolve ancient animal radiationsPhylogeny of snails, slugs, and relatives

What genes are homologous?Using graph-based approaches to infer homology

Gene clusters inferred to be the “same” gene family across multiple species

AGALMA: https://bitbucket.org/caseywdunn/agalmaBitBucket (Git)

Page 5: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Dr. Felipe ZapataPostdoctoral Research AssociateBrown University

COLOMBIA

email: [email protected]: http://felipezapata.metwitter: @zapata_f

PhD Ecology, Evolution & SystematicsUniversity of Missouri-St. Louis St. Louis, Missouri

BSc Biology Universidad de Los AndesBogotá, Colombia

Page 6: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

What does genetics tell us about human history?

Page 7: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Dr. Emilia Huerta SanchezAssistant ProfessorUC Merced

email: [email protected]: http://www.stat.berkeley.edu/~emiliahstwitter: @emiliahsc

Postdoc in Integrative Biology and Statistics, UC Berkeley, Berkeley, CA

PhD Applied MathematicsCornell University, Ithaca, NY

BA Mathematics & FrenchMills College, Oakland, CA

Page 8: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Modeling macro- & molecular evolutionary processes to infer phylogenetic relationships

● How have rates of molecular and morphological

evolution changed across the tree of life?

● How do patterns of fossilization, preservation, and

recovery change across different taxa?

● Can we detect relationships between geological

events and species diversification?

● What are the evolutionary processes acting on

different regions of the genome and how have those

factors shaped the evolution of different genes?

C++RevBayes

Probabilistic graphical models

Page 9: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Dr. Tracy A. HeathAssistant Professor (Jan. 2015)Iowa State University

email: [email protected]: phyloworks.orgtwitter: @trayc7

Postdoctoral FellowU. Kansas & U.C. Berkeley

PhD Ecology, Evolution & BehaviorUniversity of Texas at Austin

BA Biology Boston University

Page 10: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

What is Computational Biology?

What is Bioinformatics?

http://crackingthebiocode.github.io/

Page 11: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Modeling infectious disease transmission

Page 12: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Compartmental models are one type of mathematical model used to investigate the spread of infectious disease

Rate of infectionRate of recovery

Change in proportion of Susceptible (S) people over time = - Susceptible (S) X Infected (I) X β

Susceptible Infected Recovered

=

Page 13: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Infection dynamics for different diseases can be simulated by selecting appropriate parameters

Page 14: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

We can use models to predict how interventions change disease transmission dynamics

Infection dynamics with R0 = 2

Infection dynamics after intervention at day 10, which reduced R

0 to 0.8

R0 > 1, infection peaks then disappears R

0 < 1, infection dies out

Simulations run in Python 3.4 (downloaded as part of Anaconda package: http://continuum.io/downloads)

Page 15: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Agalma: automated and reproducible phylogenetic

analyses

Page 16: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

From…a few key genes (e.g. 16S RNA, mitochondria, chloroplasts)across many species

To…High-Throughput Sequencing of 1000s of genes across many species

genes

spec

ies

spec

ies

genes

Phylogenetics

Page 17: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Challenges to phylogenetics• Many steps

• Many programs must be used together

• Computationally intensive

• Difficult to reproduce

Page 18: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Challenges to phylogenetics• Many steps

• Many programs must be used together

• Computationally intensive

• Difficult to reproduce

Automate!

Page 19: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Why automate?• Results are reproducible

• Results can be easily explored and extended

• Methods can be compared in a controlled setting

• Facilitate method development without reinventing

everything

Page 20: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

https://bitbucket.org/caseywdunn/agalmaThe tool

Page 21: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

The paper

Page 22: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

https://bitbucket.org/caseywdunn/dunnhowisonzapata2013/The example analysis

Page 23: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

For each transcriptome:• Quality control• Assemble transcriptome • Translate and annotate genes • Quantify gene expression• Put sequences in database

Can also:• Import DNA sequences from national databases (e.g., NCBI)• Process externally produced assemblies

Page 24: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Across transcriptomes (many species):• Identify homologous genes

• Build phylogenies using all genes!

silh

ouet

te im

ages

from

http

://ph

ylop

ic.o

rg/

Page 25: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

What tools do you need?

http://crackingthebiocode.github.io/

A biological question

programming skills

statistical modeling

C++

a mathematical model

Page 26: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Questions?

• What programming language should I learn?• How do I get started learning a programming language?• What is the best way to become proficient in a programming language?• What is the difference between C++ and python and java and R and

MatLab and ruby and ...?• What is version control? Do I need to know it?• Do I need a GitHub account?• Where are jobs or degree programs in computational

biology/bioinformatics listed?• What does it mean to be open source? Why is it important?• and ...?

http://crackingthebiocode.github.io/

Page 27: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Take-Home Messages • You don’t have to be an expert programmer to do computational

biology.• Anyone can learn to program, it’s just a matter of getting started.• Computational skills are extremely helpful for streamlining biology

research.• The skills you need to learn depend heavily on you background and

your research interests. • Quantitative skills – a firm understanding of math and statistics – are

important for any research field.• Don’t be overwhelmed by all there is to know, these skills grow over

time. If you consistently seek to improve them & use them for your work you will be amazed at how your expertise will develop.

http://crackingthebiocode.github.io/

Page 28: Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Find out more!http://crackingthebiocode.github.io/profiles.html