61
Bioinformatics to Undergraduates http://www.med.nyu.edu/rc r/ASM Stuart M. Brown Research Computing, NYU School of Medicine

Teaching Bioinformatics to Undergraduates Stuart M. Brown Research Computing, NYU School of Medicine

Embed Size (px)

Citation preview

Page 1: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Teaching Bioinformatics to Undergraduates

http://www.med.nyu.edu/rcr/ASM

Stuart M. BrownResearch Computing, NYU School of Medicine

Page 2: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

I. What is Bioinformatics?

II. Challenges of teaching bioinformatics to

undergraduates

III. Common bioinformatics tools that you can use for teaching

IV. The limits of knowledge

V. Resources for the teacher

Page 3: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

I. What is Bioinformatics? The use of information technology to collect, analyze,

and interpret biological data. The use of software tools that deal with biological

sequences, genome analysis, molecular structures, gene expression, regulatory and metabolic modeling

Computational biology - the design of new algorithms and software to support biology research

The routine use of computers in all phases of biology and medicine

Page 4: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

The Human Genome Project

Page 5: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

A Genome Revolution in Biology and Medicine

We are in the midst of a "Golden Era" of biology

The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine

The revolution is about treating biology as an information science, not about specific biochemical technologies.

Page 6: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

The job of the biologist is changing

As more biological information becomes available …and laboratory equipment becomes more automated ...– The biologist will spend more time using computers– The biologist will spend more time on experimental

design and data analysis (and less time doing tedious lab biochemistry)

– Biology will become a more quantitative science (think how the periodic table affected chemistry)

Page 7: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

II. Why teach bioinformatics in undergraduate education?

Demand for trained graduates from the biomedical industry

Bioinformatics is essential to understand current developments in all fields of biology

We need to educate an entire new generation of scientists, health care workers, etc.

Use bioinformatics to enhance the teaching of other subjects: genetics, evolution, biochemistry

Page 8: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Biochemistry & Protein Structures "Hands-on graphics is a powerful enhancement to

learning, particularly individualized learning. There is powerful synergy in learning about proteins and learning simultaneously about how to represent and manipulate them with computer graphics. When students learn to use graphics they see proteins and other complex biomolecules in a new and vivid way, and discover personal solutions to the problem of "seeing" new structural concepts."

Molecular Graphics ManifestoGale Rhodes

Chemistry Department, Univ. of Southern Maine

Page 9: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Challenges of presenting bioinformatics to undergraduates

Requires a deep understanding of molecular biology - lots of prerequisites

Training users or makers of these tools? A good bioinformatics program will require

substantially more math and statistics than most existing molecular biology and computer science curricula.

Who will teach?

Page 10: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Different Programs, Different Goals

Integrate into existing biology courses:– genetics, molecular biology, microbiology

Make one or a few cross-disciplinary courses– jointly taught by biology and computing faculty– open to both biology and computing students

Create a curriculum for a true bioinformatics major (is this a double major?)

Are you training for employment or providing the fundamentals for advanced training?

Page 11: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Shallow End

This workshop will focus on faculty skills needed at the shallow end of the continuum

(a few lectures or a short course). Use bioinformatics to teach biological

concepts» Evolution» Genetics» Protein structure and function

Page 12: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

How much Computing skills? Bioinformatics can be seen as a tool that the

biologist needs to use - like PCR Or should biologists be able to write their own

programs and build databases?– it is a big advantage to be able to design exactly the

tool that you want– this may be the wave of the future

Is your school going to train "bioinformatics professionals" or biologists with informatics skills?"

Page 13: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Designing a Curriculum To really master bioinformatics, students need

to learn a lot of molecular biology and genetics as well as become competent programmers.

Then they need to learn specific bioinformatics skills - dealing with sequence databases, similarity algorithms, etc.

How can students learn this much material and still manage a well rounded education?– Graduates of these programs will become scientists

and managers. Writing and presentation skills are essential components of their education.

Page 14: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Different Schools have Different Biases

There are still only a handful of bioinformatics undergraduate programs– [Many more schools offer a single course or a

"specialized track" similar to a biotechnology major]

You can generally predict the bias according to what school/department hosts the program– Computer Science vs. biology– Biomedical engineering– Medical informatics (library science)

Page 15: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Teaching the Teachers There are more graduate level bioinformatics

programs, but they are all very new. Graduates of these programs will have many

opportunities as more schools gear up to offer bioinformatics training

The reality is that most schools will draft existing faculty - often jointly from Bio and CompSci departments

We need to train an entire generation of existing faculty in a new discipline

Page 16: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Teaching Tips Strike a balance between theory and practical

experience– early bioinformatics training should be about what

you can do with the tools– deeper training can focus on how they work

Balance the "click here" tutorials against letting them figure it out for themselves– it will be different when they look at it next time– real bioinformatics work involves finding ways to

overcome frustrations with balky computer systems

Page 17: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Training "computer savvy" scientists

Know the right tool for the job

Get the job done with tools available

Network connection is the lifeline of the scientist

Jobs change, computers change, projects change, scientists need to be adaptable

Page 18: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

III. Bioinformatics Tools You Can Use

GenBank - genes, proteins, genomes Similarity Search tools: BLAST Alignment: CLUSTAL Protein families: Pfam, ProDom Protein Structures: PDB, RasMol Whole Genomes: UCSC, Entrez Genomes Human Mutations: OMIM Biochemical Pathways: KEGG Integrated tools: Biology Workbench,

BCM SearchLauncher

Page 19: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Large Databases Once upon a time, GenBank sent out

sequence updates on CD-ROM disks a few times per year.

Now GenBank is over 40 Gigabytes

(11 billion bases)

Most biocomputing sites update their copy of GenBank every day over the internet.

Scientists access GenBank directly over the Web

Page 20: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Finding Genes in GenBank

•These billions of G, A, T, and C letters would be almost useless without descriptions of what genes they contain, the organisms they come from, etc.

•All of this information is contained in the "annotation" part of each sequence record.

Page 21: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 22: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Entrez is a Tool for Finding Sequences GenBank is managed by the NCBI (National Center

for Biotechnology Information) which is a part of the US National Library of Medicine.

NCBI has created a Web-based tool called Entrez for finding sequences in GenBank.

http://www.ncbi.nlm.nih.gov

Each sequence in GenBank has a unique “accession number”.

Entrez can also search for keywords such as gene names, protein names, and the names of orgainisms or biological functions

Page 23: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 24: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Entrez Databases contain more than just DNA & protein sequences

Page 25: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Type in a Query term

Enter your search words in the

query box and hit the “Go” button

Page 26: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 27: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Refine the Query

Often a search finds too many (or too few) sequences, so you can go back and try again with more (or fewer) keywords in your query

The “History” feature allows you to combine any of your past queries.

The “Limits” feature allows you to limit a query to specific organisms, sequences submitted during a specific period of time, etc.

[Many other features are designed to search for literature in MEDLINE]

Page 28: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

You can search for a text term in sequence annotations or in MEDLINE abstracts, and find all articles, DNA, and protein sequences that mention that term.

Then from any article or sequence, you can move to "related articles" or "related sequences".

•Relationships between sequences are computed with BLAST

•Relationships between articles are computed with "MESH" terms (shared keywords

•Relationships between DNA and protein sequences rely on accession numbers

•Relationships between sequences and MEDLINE articles rely on both shared keywords and the mention of accession numbers in the articles.

Related Items

Page 29: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 30: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Database Search Strategies General search principles - not limited to

sequence (or to biology) Use accession numbers whenever possible Start with broad keywords and narrow the

search using more specific terms Try variants of spelling, numbers, etc. Search all relevant databases

Be persistent!!

Page 31: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 32: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

>gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'.

Length = 369

Score = 272 bits (137), Expect = 4e-71

Identities = 258/297 (86%), Gaps = 1/297 (0%)

Strand = Plus / Plus

Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76

|||||||||||||||| | ||| | ||| || ||| | |||| ||||| |||||||||

Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59

Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136

|||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| ||

Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119

Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196

|||||||| | || | ||||||||||||||| ||||||||||| || ||||||||||||

Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179

Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256

||||||||| | |||||||| |||||||||||||||||| ||||||||||||||||||||

Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239

Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313

|| || ||||| || ||||||||||| | |||||||||||||||||| ||||||||

Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296

Page 33: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Sample Multiple Alignment

Page 34: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Protein domains (from ProDom database)

Page 35: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Original studied protein from whichannotation was inherited.

New sequence

Closest database annotated entry

Limits on best Matched Annotation Inheritanceresult from many things including multi domain proteins transitivity.

Page 36: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Protein Structure

It is not really possible to predict protein structure from just amino acid sequence

PDB is a database of know protein structures (determined by X-ray crystallography and NMR)

There are also very handy structure viewers such as RasMol that are free for any computer

Page 37: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 38: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 39: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Genome Browsers Scientists need to work with a lot of

layers of information about the genome– coding sequence of known genes and

cDNAs

– computer-predicted genes

– genetic maps (known mutations and markers)

– gene expression

– cross species homology

Page 40: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 41: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

UCSC

Page 42: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Ensembl at EBI/EMBL

Page 43: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Human Alleles The OMIM (Online Mendelian Inheritance

in Man) database at the NCBI tracks all human mutations with known phenotypes.

It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene sequences]

It is designed for use by physicians:– can search by disease name– contains summaries from clinical studies

Page 44: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 45: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

KEGG: Kyoto Encylopedia of Genes and Genomes

Enzymatic and regulatory pathways Mapped out by EC number and cross-

referenced to genes in all known organisms

(wherever sequence information exits) Parallel maps of regulatory pathways

Page 46: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 47: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 48: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Integrated Online Tools National Database/Sequence Analsysis

Servers:– NCBI, EMBL/EBI, DDBJ

Tools for specific types of data or problems– Expasy (Protein, Mass Spec, 2-D PAGE)– 3-D Protein Structures:

PDB, Predict Protein Server Education oriented tools

– Biology Workbench Collections of links to other servers

– BCM SearchLauncher

Page 49: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 50: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 51: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 52: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 53: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine
Page 54: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

The Limits of our Knowledge

Bioinformatics is a very dynamic discipline– Teachers can't know everything in the field

The databases are clumsily built Biology is vastly more complex than our

software– Lots of our current bioinformatics programs don't

work well– We don't have even theoretical solutions for Gene

prediction, alternative splicing, protein structure & function prediction, regulatory networks

Page 55: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

What is a Gene? For every 2 biologists, you get 3 definitions

“A DNA sequence that encodes a heritable trait.”

The unit of heredity

Is it an abstract concept, or something you can isolate in a tube or print on your screen?

“Classic” vs.. “modern” understanding of molecular biology

Page 56: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Genome Confusion The sequence of a gene in the genome includes:

– protein coding sequence– introns and exons– 5' and 3' untranslated regions on the mRNA– promoter and 5' transcription factor binding sites– enhancers??

What about alternative splicing?– Multiple cDNAs with different sequences (that

produce different proteins) can be transcribed from the same genomic locus

Page 57: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

V. Teaching Resources The Biology Student WorkBench

http://peptide.ncsa.uiuc.edu/bioswb.html

RasMol/Chime/Protein Explorerhttp://www.umass.edu/microbio/rasmol/

http://www.umass.edu/microbio/chime/

http://www.umass.edu/microbio/chime/explorer/

Bioinformatics.org

Other courses - It ain't cheating to learn from your peers

Page 58: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Terri Attwood's Web Biocomputing tutorialshttp://www.biochem.ucl.ac.uk/bsm/dbbrowser/c32/index.html

http://www.biochem.ucl.ac.uk/bsm/dbbrowser/jj/prefacefrm.html

Sequence Analysis on the Web– Christian Büschking and Chris Schleiermacher

http://bibiserv.techfak.uni-bielefeld.de/sadr/

Online Lectures on Bioinformatics– Hannes Luz, Max Planck Institute for Molecular Genetics

http://lectures.molgen.mpg.de/

Using Computers in Molecular Biology – Stuart Brown, NYU School of Medicine

http://www.med.nyu.edu/rcr/rcr/course/index.htmlTeach Yourself Bioinformatics on the Web

http://www.med.nyu.edu/rcr/rcr/btr/index.html

Page 59: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Long Term Implications

A "periodic table for biology" will lead to an explosion of research and discoveries - we will finally have the tools to start making systematic analyses of biological processes (quantitative biology).

Understanding the genome will lead to the ability to change it - to modify the characteristics of organisms and people in a wide variety of ways

Page 60: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Genomics in Medical Education

“The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of education in genetics and are going to be severely challenged to pick up this new technology and run with it."

Francis Collins

Page 61: Teaching Bioinformatics to Undergraduates  Stuart M. Brown Research Computing, NYU School of Medicine

Stuart M. Brown, [email protected]

www.med.nyu/rcr

Bioinformatics: A Biologist's Guide to Biocomputing and the Internet