15
5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester Community and Technical College, Rochester, MN Project created by: Barbara N. Beck, Ph.D., Rochester Community and Technical College, Rochester, MN Chi-Cheng Lin, Ph.D., Winona State University, Winona, MN Mingrui Zhang, Ph.D., Winona State University, Winona, MN Gayle Olsen, M.S., C.N.P., Winona State University, Winona, MN Robyn L. Keyport, M.Ed., Hastings High School, Hastings, MN

Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

1

Simulation of Molecular

Evolution

with Bioinformatics Analysis

Barbara N. Beck, Rochester Community and Technical College, Rochester, MN

Project created by:

Barbara N. Beck, Ph.D., Rochester Community and Technical College, Rochester, MN

Chi-Cheng Lin, Ph.D., Winona State University, Winona, MN

Mingrui Zhang, Ph.D., Winona State University, Winona, MN

Gayle Olsen, M.S., C.N.P., Winona State University, Winona, MN

Robyn L. Keyport, M.Ed., Hastings High School, Hastings, MN

Page 2: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

2

Learning objective

Students will cause a set of molecules to “evolve” and

then use bioinformatics computational tools to analyze

the relatedness of this set of molecules, displaying the

results as phylogenetic trees.

Students will gain an understanding of how phylogenetic

trees display evolutionary relationships.

• The “evolved” set of molecules is created by manipulating strings of

Pop-It beads consisting of four colors of beads, representing the four

DNA bases.

• Base substitutions result from substituting one color of bead for

another.

Page 3: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

3

• The ancestor (original sequence) molecule diverges into two lineages, each of which undergoes an independent mutation.

• Each of these lineages also diverges, with the descendents undergoing independent mutations until a population of eight lineages is created.

#1 ______●______________________________

#2 _________________●____________________

#1a ______●__________●___________________

#1b ______●___________________●__________

#2a _________________●___●________________

#2b ___________●_____●____________________

#1aa ______●__________●______________●_____

#1ab ______●__________●__________________●_

#1ba ______●_______________●___●___________

#1bb ______●________●__________●___________

#2aa __●______________●___●________________

#2ab _______________●_●___●________________

#2ba ___________●_____●_______________●____

#2bb ___________●_____●_________●__________

● = a mutated site

Data recording

• Line up bead strings with “bead-size” Excel spreadsheet pages

(cells in spreadsheet same size as beads) to record mutations

(changes in bead color).

Page 4: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

4

Data recording

• Transfer changes to “live” spreadsheet.

Data recording

• Save the altered file with a new name and then select the data-

containing rows 3-10 from column B to BG and click Copy.

Page 5: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

5

Data format conversion

• We need to convert the bead data from RGBY (bead colors) to

ACGT (nucleotides) and to convert it to FASTA format, a format

compatible with the publicly available analysis tools.

• Open a new document in Wordpad or Notepad (easier) and click

Paste to transfer your data to one of these text editors.

Data format conversion

• Click File, Save as, choose a location for storing the file (the

Desktop is easy), type in a filename, and in the Save as type field,

choose Text Document (*.txt).

Page 6: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

6

Data format conversion

• Open the Java file BBead.jar by double-clicking it.

• Browse for the data file you just saved (as a .txt file) and select it as

the input file.

Data format conversion

• Type in an output file name and then click Convert Sequence.

• The converted sequence will be displayed on the screen and the

output file will be created in the location of the BBead.jar file.

Page 7: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

7

Data analysis

• Open a web browser (Explorer, Firefox) and enter the

URL http://workbench.sdsc.edu.

• A new user is required to register for a FREE account

and then log in.

• We will upload our FASTA-formatted data file and use

the computational tools available through this site to

align the sequences and calculate the phylogenetic tree.

• Your work will be saved in your session.

Data analysis

Page 8: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

8

Data analysis

Data analysis

• To start a new session, click on Session Tools.

• You will then see the following screen. Highlight Start New Session

and click Run

Page 9: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

9

Data analysis

• Name your session and then click Start New Session.

• Your new session will be highlighted as shown below.

Data analysis

• Our data are DNA sequences, so we will use Nucleic Tools.

• Click on Nucleic Tools. You will see the following page. “-Empty”

means that no DNA sequences have yet been imported.

• Click Add New Nucleic Sequence and then click Run.

Page 10: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

10

Data analysis

• Click on Browse, find the .txt data file that you converted to FASTA

format, select it, and then click Upload File.

Data analysis

• Once the file is uploaded, your sequences will appear on the page.

Click Save to store them on the site.

Page 11: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

11

Data analysis

• In the drop-down list, first highlight Select All Sequences (all the

boxes should get checked), and then highlight CLUSTALW –

Multiple Sequence Alignment.

Data analysis

• CLUSTALW compares and then aligns the sequences.

• You are now on a “Check” page. Verify that all your sequences are

listed. (If not, click Abort and re-do the last step.) Then click

Submit.

• Click on Submit

Page 12: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

12

Data analysis

• The alignment may take a minute or so. When it is finished, scroll down the page. The sequence alignments are color-coded. Blue indicates that all the nucleotides at that position are identical.

Guide trees vs. phylogenetic trees

• If you scroll down below the alignment, you will see a guide tree

displayed. CLUSTALW builds a guide tree to help align the

sequences; the guide tree is not the same as a phylogenetic tree,

although it may look very similar.

• In order to calculate the phylogenetic tree, we will do the following:

• On the CLUSTALW result page, click “Import Alignment(s)” near either

the top or the bottom of the page.

Page 13: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

13

Calculating phylogenetic trees

• On the result page, check the box in front of “CLUSTALW -

Nucleic”. In the drop-down box, highlight and click “DRAWTREE”.

Calculating phylogenetic trees

• Click Submit on the newly returned page and you will see the

inferred unrooted phylogenetic tree in the result. An unrooted tree

does not assume a direction of evolution and therefore will not

include a single ancestor node, but will show evolutionary distances.

Unrooted Phylogenetic Tree

Page 14: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

14

• Click Return to get back to the page showing the alignment.

Highlight and click “DRAWGRAM” in the drop-down box. Click

Submit on the newly returned page and you will see the inferred

rooted tree in the result. The rooted tree assumes there is a

direction of evolution and displays a single ancestor node.

Calculating Phylogenetic Trees

Rooted Phylogenetic Tree

Tree analysis

The detailed results obtained will vary each time, of course, since the

input sequences are created by “random” mutation of the original

sequence. However, the overall picture should be similar:

The paired sequences, 1aa and 1ab, 1ba and 1bb, 2aa and 2ab,

2ba and 2bb should be most closely related to each other.

Then the 1aa – 1ab and 1ba – 1bb pairs should cluster, as should

the 2aa – 2ab and 2ba – 2bb pairs.

The clustering reflects the temporal order in which the lineages were

derived, thus showing that the alignment and tree-drawing

algorithms can correctly infer the ancestral relationships.

Page 15: Simulation of Molecular Evolution with Bioinformatics Analysis · 2009-05-21 · 5/19/2009 1 Simulation of Molecular Evolution with Bioinformatics Analysis Barbara N. Beck, Rochester

5/19/2009

15

Conclusions

• It is hoped that the hands-on activity of creating a set of related

molecules and then of analyzing the sequence data represented by

those molecules will give students a firmer grasp on the concept of

how phylogenetic trees display information about evolutionary

relationships.

• This activity can be extended by providing (or having students find)

protein or DNA sequences using the Taxonomy Browser tool at the

NCBI Taxonomy Homepage and then using the tools at the SDSC

Biology Workbench to align the sequences and draw the

phylogenetic trees.

All files for this exercise, and a pdf version of this presentation are available by e-mail from the

presenter at [email protected]