Upload
girinath-g-pillai
View
49
Download
1
Embed Size (px)
Citation preview
Hemoglobin in cyber space (A web based tutorial on biological databases and bioinformatics tools)
Quite a large number of biological databases and the bioinformatics tools are available in
cyber space. A beginner really gets bewildered. Which is the place to begin searching,
which one suits one’s purpose, which database is more comprehensive, which one is
more non-redundant, which one gives the result faster, which tool is to be used for a
particular job… a number of such questions start haunting the beginner. Then she may
think of systematically going through all the databases and tools described in a good text
and she slowly begins examining each option available in these databases and tools. As
one who once started experimenting this and abandoned it after poking around a little bit,
I assure you that you will soon get bored of this ‘systematic’ exercise. If not, hats off to
you for your endurance!
The purpose of this tutorial is to make the bio-cyberspace familiar in a painless way. Here
we go around the cyber space with a protein well known to even a layman: haemoglobin.
To know more about haemoglobin, to find its brothers, sisters and distant relatives and to
analyze it, we visit a number of biological databases and utilize a number of powerful
tools. This tutorial is certainly not comprehensive. But we hope this can serve as a quick
starter. No originality is claimed.
And permission is hereby granted to everyone to rectify the errors that they may come
across while going through this tutorial. All sorts of suggestions for enriching this are
also welcome!
Let us start.
1. Get the details of hemoglobin from wikipedia, which is a good place in cyberspace
to start searching for any information. Point your browser to
http://en.wikipedia.org/wiki/Hemoglobin and read the document and also
understand the 3D structure of hemoglobin. You may find it better than many text
book lessons!
2. Go to http://www.ncbi.nlm.nih.gov/ - which is the NCBI home page, considered to
be the best biological database centre with quite a large number of tools for
analysis. Spend some time to go through the submenus and to find what all are
offered.
3. Let us find out in human being’s genome, where exactly the genes for hemoglobin
reside. Select mapviewer (from the submenu maps and markers take map viewer)
from NCBI home page or use direct URL:
http://www.ncbi.nlm.nih.gov/mapview/
4. Click on the old human build (build 36.3), old is gold!
5. See that all chromosomes (the 22 autosomes, the sex chromosomes along with the
mitochondria) are represented by symbols. Search for hemoglobin in all
6 In the result page, find out the chromosomes with the hemoglobin genes in them.
The hits are prolific in 11 and 16 chromosomes. Go down and have a look at the
document and click at all matches of reference assembly on chromosome 11 in
hemoglobin hits.
7 In the new page, on the Genes map, click at HBB-symbol HBB to get the Entrez
gene document on HBB- Take your own time to go through the document.
8 Let us read an OMIM paper. For that, click on MIM141900 (below phenotypes).
9 Go back to entrez gene document on HBB. Quite a lot of literature is available on
hemoglobin. Click on pubmed links (related articles in pubmed) and then click
on anyone paper to view the abstract.
10 Now it is time to get the actual sequences of hemoglobin genes and the protein.
Let us go back to entrez gene page of HBB and this time select refseqs.
11 Let us retrieve the nucleotide sequence for the human beta globin. For that, in
entrez gene doc for HBB, select RefSeqs NM 000518.4. We get the file in
genbank format. Go through this important document carefully.
12 Click on format FASTA and then save the FASTA file as plain text file –cut and
paste and save as hbbdna.txt using notepad.
13 Now go back to Entrez Gene doc of HBB and get the protein sequence (by
clicking in NP 000509) and get it in genbank and fasta formats, save the later as
plain text (hbbprotein.txt)
14 Now go back to Entrez Gene doc of HBB and get the source sequences (by
clicking at L4217) and get it in genbank and fasta formats save the later as plain
text (hbbsource.txt)
15 Now we have the sequences with us. Let us look in cyberspace for the proteins
similar to human beta globin. For that, go back to map viewer:
(http://www.ncbi.nlm.nih.gov/mapview/) and select the button B
against Build 36.3 to do a BLAST search of the sequence over human genome.
16 The BLAST page, the most popular bioinformatics tool, is opened for you.
Now, Blast hbbprotein.txt
-by pasting the sequence or by uploading file
-Select -Database: Build Protein
-Program: Blastp
-Expect-10
-Press Begin Search
-In the Format page that appears, press view report
-the result page appears.
-move the mouse over the graphic representation of matches.
The red matches are the best (red matches are with beta which is the query
itself, delta- a replacement for Beta, gamma- the beta of fetal Hb, epsilon- the beta
equivalent of embryo!)
-click on alpha match and examine the alignment. See how much is the identities,
positives, what the score is, what the E Value is-so on and so forth.
- Also examine the match with cytoglobin.
17. Now go to general BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi ) page and
examine the various options. Let us go for a PSI blast, a much fancied item.
-select protein blast, upload hbbprotein.txt. Use swissprot as database, select
organism human, select algorithm PSI blast, filter low complexity regions then
press BLAST. Only cytoglobin is seen with e-value better than threshold, other
than hemoglobins.
- Do the second iteration with all hits less than threshold included. Now
myoglobin and neuroglobin appears as ‘new’ with good e-values.
- Go for the next round of iteration with sequences with good e-values. We see
no new sequences with good e-values but myoglobin, cytoglobin and
neuroglobin e-vaues have improved confirming that they are all relatives!
18. Let go to back to the hemoglobin search. We will try to relate the proteins similar
to human beta globin. For that we need to make a multiple alignment of them.
So let us get all these proteins in FASTA format as a single file. Point your
browser to expasy (http://www.expasy.org/ ) and click at UniProtkb (swissprot
+TrEMBL), the protein database of greatest quality.
19. In the new page, search in Uniprot Knowledgebase, query human hemoglobin
And press search button.
20. In the result page, (human hemoglobin in UniProtKB),
-select A,B,D,G1,G2,Zeta,epsilon,theta1,mu,cytoglobin and press Retrieve
button to get the sequences in fasta.
- Click on align button to get the clustaw alignment of them.
-Press on TEXT button to get text only result,
- Come down, Press on ClustalW tree Show button to get phylogenetic tree.
--Click on amino acid properties show bar and select various properties like
Hydrophobic, polar etc
- also examine the sequence annotation features.
21 How much similar is myoglobin to human beta globin?
Go to entrez gene (though http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene is
the web address, better use google to get it for you!), search for gene-
myoglobin human.
22. In the result page, click on MB to fetch myglobin gene record
23. Get myoglobin protein sequence in fasta format (use the browser’s FIND to
locate it from RefSeqs) and save it as plain text (myoglobin.text)
24. Go to http://www.ebi.ac.uk/emboss/align/ and align beta globin (hbbprotein.txt)
with myoglobin.txt, use both Smith-Water (local alignment) and Needleman-
Wunch (global alignment) algorithms and examine the results.
25 Now let us use a gene finding tool which uses the inherent properties of a gene
to identify it. Go to GENSCAN ( http://genes.mit.edu/GENSCAN.html ), the
MIT tool based on Hidden Markov Models. Upload the hbbsorce.txt and Press
RUN GENSCAN button.
Examine the result ( You will find that there is one gene in the sequence with
three exons in it and a PolyA sequence). To ensure that the predicted peptide
sequence is a protein, copy it and BLAST:
(http://blast.ncbi.nlm.nih.gov/Blast.cgi )
26. Another important primary protein database is PIR. http://pir.georgetown.edu/
Click on search analysis tools-Search by Text- Give protein name haemoglobin
beta (anyfield) human. On the result page click HBB HUMAN’s PIRSF doc.
27. Now let search in the so called the secondary databases where motifs are
Stored as regular expressions and profiles.
Go to http://www.expasy.ch/prosite/, and search with hemoglobin and *
-click on PDOOOOO793 globin family profile
-get globin family profile matrix-PS01033
-click at HBB_HUMAN and get document
- go back to prosite and get a zinc finger signature, a de-tour to get a motif
expressed as a pattern or regular expression.
28. Another important secondary database is BLOCKS. Go to blocks
http://blocks.fhcrc.org/ . Select search BLOCKS by keywords. Search with beta
and haemoglobin to get the beta haemoglobin signature
29. Let us search the secondary database which stores finger prints, PRINTS. Go to
PRINTS.
( http://www.bioinf.manchester.ac.uk/dbbrowser /PRINTS/index.php
or use google) and search by text-beta haemoglobin. Get BETAHAEM
fingerprint. How many elements (motifs) are there in the finger print?
30. Now let us do the secondary structure prediction – predicting alpha helices,
beta sheets and coils. Go to JPRED-Paste the
Hbbprotein.txt (http://www.compbio.dundee.ac.uk/www-jpred) and click Make
Prediction. You will be asked whether to continue as homologues exist.
Continue. You will probably find yourself in a very big queue. Wait patiently if
you want to get the result! The result is a good pack. First explore the ‘simple’
result page. How many alpha helices you find? Any beta sheets?
31. Now let us visit a special type of database which does a structural classification.
Let us go to CATH. (http://www.cathdb.info/) -do sequence search by pasting in
fasta format the hbbprotein.txt. Click on 2dn1boo.htm and see the result.
32. Now let us go to OWL which is a composite database:
http://www.bioinf.manchester.ac.uk/dbbrowser/OWL/index.php
Search by text- beta globin, click at HBB HUMAN and examine the result.
33. Finally, let us visit the best 3D structure repository. Go to the pdb site
http://www.rcsb.org/pdb/ Search for 1Hab. See the 3D image and download the
pdb file and examine.
This is only a glimpse of the cyberspace. Please carry on. Problems like microarray
analysis, databases like interpro and tools like FASTA are waiting for you. Have a
nice cyber space tour! Bye!
- Compiled by Dr. Sreenadhan S, Asst.Professor, N.S.S. College of Engineering Palakkad