10

What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

Embed Size (px)

Citation preview

Page 1: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH
Page 2: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

What is Jalview?Jalview is free-to-use computer software developed at the University of Dundee. It is designed to assist research scientists visualise and analyse DNA, RNA and proteins. Jalview has an interactive multi-window interface for viewing sequence alignments, annotations and three-dimensional structures.Jalview can read files directly from public biological databases. It has a number of analysis tools that let you align sequences, produce trees, measure similarities and compare structures. Jalview can also overlay features and annotations on sequences and structures.

About this workbookThis worksheet contains 3 practical exercises to visualize the sub-units that make up DNA, RNA and protein sequences and view their shape.

Page No. Exercise 1: Viewing DNA 1 Exercise 2: Viewing RNA 3 Exercise 3: Viewing proteins 4 Background Information 7

Who are these exercises for: Secondary school biology pupilsKnowledge is required: Moderate computer literacy

Equipment is needed: A desktop or laptop computer with web browser and internet access

Produced by Suzanne Ducewith help from Mungo Carstairs, Bob Hanson, Dmitry Finkelbergs,Charlotte Campbell, Benedict Soares, Jim Procter & Geoff Barton

www.jalview.org

Division of Computational Biology, School of Life Sciences,University of Dundee, Dundee, DD1 5EH

Schools Booklet JS - 1st February 2019

Page 3: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

1

1. Launch Jalview in a browser using the link:https://builds.jalview.org/artifact/JB-JB47/shared/build-latest/jalview-js-site/jalview_bin_Jalview.html2. Go to the File menu in the desktop menu.Select Fetch Sequences.

3. Select PDB in 'New Sequence Fetcher' box.Click/Select OK.

4. Enter the ID 3BSE followed by [return] in the 'PDB Sequence Fetcher' box.Select the first 3BSE entry.Click OK.(Close 'PDB Sequence Fetcher' box once sequence loaded.)

5. Go to the View menu in the alignment window.Select Show Sequence Features.NOTE: This needs to be toggled off otherwise it will mask the colour change in step 6.

6. Go to the Colour menu.Select Nucleotide.This colours the DNA sequence depending on the nucleotide bases.

6. This is the DNA code for the crystal structure analysis of a 16-base-pair B-DNA, for more information https://www.rcsb.org/structure/3BSE.

Exercise 1: Viewing DNALearning Objectives:

• Fetch DNA sequences from a public biological database• Colour the sequence four A, C, G, T nucleotide bases• View DNA structure

Page 4: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

2

7. Jalview's nucleotide colour-scheme.

8. Place the mouse on both the 3BSE DNA sequence name iPDB/3bse/3BSE /…. in alignment window by holding Shift key down when clicking with the mouse.Right click the mouse to open the pop-up menu.Select 3D Structure Data.9.Drag the mouse whilst holding down the Shift key to select both 3BSE entries from list in the 'Structure Chooser' box.Click New View.

10. A structure window opens containing the 3D structure. Click the screen with the mouse, then move the mouse to change the view.

This is the 16-base-pair B-DNA crystal structure, for more information https://www.rcsb.org/structure/3BSEChanging the Appearance:(i) Enlarge the window by placing the mouse on the bottom right hand corner, then click-and-drag the mouse.(ii) Rotate structure by placing the mouse on the structure, hold down the left mouse button and click-and-drag the mouse.(iii) Zoom by holding down the shift key, then click-and-drag the mouse up or down.For more Jmol mouse commands visit http://biomodel.uah.es/en/model4/dna/mouse/mouse-det.htm

Exercise 1:

Q: What are the names of the four different DNA bases?Q: What are the single letters that are used to represent them?

Page 5: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

Exercise 2: Viewing RNA

3

Learning Objectives:• Fetch RNA sequence from public biological database• Colour the A, C, G, U nucleotide bases• View a RNA structure

1. Go to the File menu in the desktop menu.Select Fetch Sequences.

2. Select PDB in 'New Sequence Fetcher' box.Click/Select OK.

3. Enter the ID 2GIS followed by [return] in the 'PDB Sequence Fetcher' box.Select 2gis in the results list.Click OK.(Close 'PDB Sequence Fetcher' box once sequence loaded.)4. Go to the View menu in the alignment window.Select Show Sequence Features.NOTE: This needs to be toggled off otherwise it will mask the colour change in step 5.5. Go to the Colour menu.Select Nucleotides.This colours the DNA sequence depending on the nucleotide bases.

6. This is the RNA for SAM responsive riboswitch mRNA, for more information https://en.wikipedia.org/wiki/SAM_riboswitch_(S_box_leader)7. Place the mouse on the sequence name PDB/2gis/2GIS/…. in the alignment window.Right click the mouse to open the pop-up menu.Select 3D Structure Data.8. Select 2gis from the list in the 'Structure Chooser' box.Click New View.

Page 6: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

4

Exercise 2:9. A 3D structure window opens containing the structure of the RNA.

For help with changing the appearance of the Jmol window go to page 2 or visit http://biomodel.uah.es/en/model4/dna/mouse/mouse-det.htm

Jalview's nucleotide colourscheme.

Q: What are the names of the four different RNA bases?Q: What are the single letters that are used to represent them?

Exercise 3: Viewing Proteins

Protein type Function Example PDB codesEnzyme Catalysis of

chemical reactionsAmylase1 1smd

http://www.rcsb.org/pdb/101/motm.do?momID=74

Structure Provides mechanical support to cells and tissues

Collagen2 1caghttp://www.rcsb.org/pdb/101/motm.do?momID=4

Defence Protection against disease

Antibody3 1igthttp://www.rcsb.org/pdb/101/motm.do?momID=21

Storage Stores small molecules or ions

Ferritin4 5xb1http://www.rcsb.org/pdb/101/motm.do?momID=35

Signaling Regulate body metabolism and nervous system

Insulin5 6bcxhttp://www.rcsb.org/pdb/101/motm.do?momID=14

Transport Carry substances around the body

Myoglobin6 1mbnhttp://www.rcsb.org/pdb/101/motm.do?momID=1

[1] https://en.wikipedia.org/wiki/Amylase; [2] https://en.wikipedia.org/wiki/Collagen; [3] https://en.wikipedia.org/wiki/Antibody;[4] https://en.wikipedia.org/wiki/Ferritin; [5] https://en.wikipedia.org/wiki/Insulin; [6] https://en.wikipedia.org/wiki/Myoglobin

Page 7: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

5

Exercise 3:Learning Objectives:

• Fetch protein sequences from the PDB database• View the amino acid sub-units in the proteins• Consider how the 3D shape of a protein affects its functions

Background:The amino acid sequence of a protein influences the shape and the chemical characteristics of the protein, this in turn influences the function (role) of the protein.

1. Go to the File menu in the desktop menu.Select Fetch Sequences.

2. Select PDB in 'New Sequence Fetcher' box.Click/Select OK.

3. Enter the PDB ID as appropriate eg 1smd, followed by [return] , in 'PDB Sequences Fetcher' box.Select the PDB ID entry from the list eg 1smd. Click OK.(Close 'PDB Sequence Fetcher' box once sequence loaded.)4. Go to the View menu in the alignment window.Select Show Sequence Features.NOTE: This needs to be toggled off otherwise it will mask the colour change in step 5.

5. Go to the Colour menu.Select Taylor.This colours the amino acid sub-units depending on their type.

6. The protein sequence is displayed in the alignment window.Use horizontal scroll bar to view the entire sequence.

7. Place the mouse on the protein sequence name PDB/pdbid/pdbID/…. in alignment window.Right click the mouse to open the pop-up menu.Select 3D Structure Data.

Page 8: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

6

Q. Identify alpha helix and beta sheet regions in the 3D protein structures?

8. Select PDB ID from the list in the 'Structure Chooser' box as appropriate.Click New View.

9. A 3D structure window opens containing the protein's 3D structure.

For help with changing the appearance of the Jmol window go to page 2 or visit http://biomodel.uah.es/en/model4/dna/mouse/mouse-det.htm

10. Each residue in the Taylor scheme has its own individual colour.

11. Repeat for the other protein types such as collagen (1cag), antibody (1igt), ferritin (5xb1), insulin (6bcx) and myoglobin (1mbn).

(TBA)

Page 9: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

7

Background InformationHelp with Jalview:

Help can be accessed from the Help menu in Jalview’s desktop window and select Documentation. A new window opens, use the Table of Contents or search function to navigate the appropriate page.If you are interested to find out more then the Jalview manual contains more hands-on exercises. This is available to view or download at www.jalview.org/about/documentation.

Structure of DNA & RNA:

Amino Acid:Amino Acid 3 letter

codeOne

letterSide chain

polarityLinear Species

Alanine Ala A nonpolar CH3-CH(NH2)-COOH C3H7NO2

Arginine Arg R basic polar HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH C6H14N4O2

Asparagine Asn N polar H2N-CO-CH2-CH(NH2)-COOH C4H8N2O3

Aspartic acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C4H7NO4

Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C3H7NO2SGlutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH C5H9NO4

Glutamine Gln Q polar H2N-CO-(CH2)2-CH(NH2)-COOH C5H10N2O3

Glycine Gly G nonpolar H-CH(NH2)-COOH C2H5NO2

Histidine His H basic polar NH-CH=N-CH=C-CH2-CH(NH2)-COOH C6H9N3O2

Isoleucine Ile I nonpolar CH3-CH2-CH(CH3)-CH(NH2)-COOH C6H13NO2

Leucine Leu L nonpolar (CH3)2-CH-CH2-CH(NH2)-COOH C6H13NO2

Lysine Lys K basic polar H2N-(CH2)4-CH(NH2)-COOH C6H14N2O2

Methionine Met M nonpolar CH3-S-(CH2)2-CH(NH2)-COOH C5H11NO2SPhenylalanine Phe F nonpolar Ph-CH2-CH(NH2)-COOH C9H11NO2

Proline Pro P nonpolar -NH-(CH2)3--*CH-COOH C5H9NO2

Serine Ser S polar HO-CH2-CH(NH2)-COOH C3H7NO3

Threonine Thr T polar CH3-CH(OH)-CH(NH2)-COOH C4H9NO3

Tryptophan Trp W nonpolar Ph-NH-CH=C-CH2-CH(NH2)-COOH C11H12N2O2

Tyrosine Tyr Y polar HO-Ph-CH2-CH(NH2)-COOH C9H11NO3

Valine Val V nonpolar (CH3)2-CH-CH(NH2)-COOH C5H11NO2

Page 10: What is Jalview? acid Asp D acidic polar HOOC-CH2-CH(NH2)-COOH C 4 H 7 NO 4 Cysteine Cys C nonpolar HS-CH2-CH(NH2)-COOH C 3 H 7 NO 2 S Glutamic acid Glu E acidic polar HOOC-(CH2)2-CH(NH2)-COOH

9

Background InformationGlossary:

Amino acid:- basic building block molecules of peptides and proteins.Bioinformatics:- the application of computer and statistical techniques to the management of large amounts of biological data.cDNA (Complementary DNA):- DNA obtained by reverse transcription of a mRNA template.CDS (Coding sequence):- the portion of a gene or an mRNA that codes for a protein. Introns are not coding sequences, nor are the 5' or 3' UTR.Chromosome:- the structure in the cell nucleus that contains all of the cellular DNA together with a number of proteins that compact and package DNA.Codon:- Three base pairs in either DNA or RNA that code for an amino acid.DNA (Deoxyribonucleic acid):- the molecule that encodes genetic information necessary for all cellular functions.Exon:- the part of the genomic sequence that remains in the transcript (mRNA) after introns have been spliced out.Gene:- a segment of DNA that encodes a specific protein or a protein subunit.Genetic code:- the set of triplet letters in DNA (or mRNA) that encode specific amino acids.Genome:- all of the genetic material in the chromosomes of a particular organism.Genotype:- all of the genes possessed by a particular individual.Intron:- the noncoding part of the genomic sequence that is transcribed and then spliced out of the transcript (mRNA).Phenotype:- the observable characteristics or features of a living organism.Phylogenetic tree:- an evolutionary tree for organismal species or cellular macromolecules that is built using inheritance or molecular sequence information.Protein:- a large biological molecule composed of a long string of amino acids joined by peptide bonds.Protein sequence:- the linear sequence of amino acids in a protein.Nucleoside:- the building blocks of RNA and DNA.Multiple sequence alignment:- an alignment of three or more sequences with gaps inserted in the sequences such that residues with common structural positions and/or ancestral residues are aligned in the same column.RNA (Ribonucleic acid):- RNA is a usually single-stranded nucleic acid similar to DNA but containing the sugar ribose rather than deoxyribose and the base uracil (U) rather than thymine (T).Sequence alignment:- the result of a linear comparison between two or more gene or protein sequences in order to determine their degree of nucleic or amino acid similarity.

Free Public Biological Databases:• UniProt is a database of protein sequence (http://www.uniprot.org/).• Protein Data Bank (PDB) is a database of crystallographic, three-dimensional structural data of large biological molecules (http://www.rcsb.org/).• Ensembl is a genomic database (http://ensemblgenomes.org/).• EMBL (CDS) data originates from the European Nucleotide Archive (ENA) database of annotated DNA and RNA sequences (https://www.ebi.ac.uk/ena).