Biological Databases in Cyberspace

Hemoglobin in cyber space (A web based tutorial on biological databases and bioinformatics tools)

Quite a large number of biological databases and the bioinformatics tools are available in

cyber space. A beginner really gets bewildered. Which is the place to begin searching,

which one suits one’s purpose, which database is more comprehensive, which one is

more non-redundant, which one gives the result faster, which tool is to be used for a

particular job… a number of such questions start haunting the beginner. Then she may

think of systematically going through all the databases and tools described in a good text

and she slowly begins examining each option available in these databases and tools. As

one who once started experimenting this and abandoned it after poking around a little bit,

I assure you that you will soon get bored of this ‘systematic’ exercise. If not, hats off to

you for your endurance!

The purpose of this tutorial is to make the bio-cyberspace familiar in a painless way. Here

we go around the cyber space with a protein well known to even a layman: haemoglobin.

To know more about haemoglobin, to find its brothers, sisters and distant relatives and to

analyze it, we visit a number of biological databases and utilize a number of powerful

tools. This tutorial is certainly not comprehensive. But we hope this can serve as a quick

starter. No originality is claimed.

And permission is hereby granted to everyone to rectify the errors that they may come

across while going through this tutorial. All sorts of suggestions for enriching this are

also welcome!

Let us start.

1. Get the details of hemoglobin from wikipedia, which is a good place in cyberspace

to start searching for any information. Point your browser to

http://en.wikipedia.org/wiki/Hemoglobin and read the document and also

understand the 3D structure of hemoglobin. You may find it better than many text

book lessons!

2. Go to http://www.ncbi.nlm.nih.gov/ - which is the NCBI home page, considered to

be the best biological database centre with quite a large number of tools for

analysis. Spend some time to go through the submenus and to find what all are

offered.

3. Let us find out in human being’s genome, where exactly the genes for hemoglobin

reside. Select mapviewer (from the submenu maps and markers take map viewer)

from NCBI home page or use direct URL:

http://www.ncbi.nlm.nih.gov/mapview/

4. Click on the old human build (build 36.3), old is gold!

5. See that all chromosomes (the 22 autosomes, the sex chromosomes along with the

mitochondria) are represented by symbols. Search for hemoglobin in all

6 In the result page, find out the chromosomes with the hemoglobin genes in them.

The hits are prolific in 11 and 16 chromosomes. Go down and have a look at the

document and click at all matches of reference assembly on chromosome 11 in

hemoglobin hits.

7 In the new page, on the Genes map, click at HBB-symbol HBB to get the Entrez

gene document on HBB- Take your own time to go through the document.

8 Let us read an OMIM paper. For that, click on MIM141900 (below phenotypes).

9 Go back to entrez gene document on HBB. Quite a lot of literature is available on

hemoglobin. Click on pubmed links (related articles in pubmed) and then click

on anyone paper to view the abstract.

10 Now it is time to get the actual sequences of hemoglobin genes and the protein.

Let us go back to entrez gene page of HBB and this time select refseqs.

11 Let us retrieve the nucleotide sequence for the human beta globin. For that, in

entrez gene doc for HBB, select RefSeqs NM 000518.4. We get the file in

genbank format. Go through this important document carefully.

12 Click on format FASTA and then save the FASTA file as plain text file –cut and

paste and save as hbbdna.txt using notepad.

13 Now go back to Entrez Gene doc of HBB and get the protein sequence (by

clicking in NP 000509) and get it in genbank and fasta formats, save the later as

plain text (hbbprotein.txt)

14 Now go back to Entrez Gene doc of HBB and get the source sequences (by

clicking at L4217) and get it in genbank and fasta formats save the later as plain

text (hbbsource.txt)

15 Now we have the sequences with us. Let us look in cyberspace for the proteins

similar to human beta globin. For that, go back to map viewer:

(http://www.ncbi.nlm.nih.gov/mapview/) and select the button B

against Build 36.3 to do a BLAST search of the sequence over human genome.

16 The BLAST page, the most popular bioinformatics tool, is opened for you.

Now, Blast hbbprotein.txt

-by pasting the sequence or by uploading file

-Select -Database: Build Protein

-Program: Blastp

-Expect-10

-Press Begin Search

-In the Format page that appears, press view report

-the result page appears.

-move the mouse over the graphic representation of matches.

The red matches are the best (red matches are with beta which is the query

itself, delta- a replacement for Beta, gamma- the beta of fetal Hb, epsilon- the beta

equivalent of embryo!)

-click on alpha match and examine the alignment. See how much is the identities,

positives, what the score is, what the E Value is-so on and so forth.

- Also examine the match with cytoglobin.

17. Now go to general BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi ) page and

examine the various options. Let us go for a PSI blast, a much fancied item.

-select protein blast, upload hbbprotein.txt. Use swissprot as database, select

organism human, select algorithm PSI blast, filter low complexity regions then

press BLAST. Only cytoglobin is seen with e-value better than threshold, other

than hemoglobins.

- Do the second iteration with all hits less than threshold included. Now

myoglobin and neuroglobin appears as ‘new’ with good e-values.

- Go for the next round of iteration with sequences with good e-values. We see

no new sequences with good e-values but myoglobin, cytoglobin and

neuroglobin e-vaues have improved confirming that they are all relatives!

18. Let go to back to the hemoglobin search. We will try to relate the proteins similar

to human beta globin. For that we need to make a multiple alignment of them.

So let us get all these proteins in FASTA format as a single file. Point your

browser to expasy (http://www.expasy.org/ ) and click at UniProtkb (swissprot

+TrEMBL), the protein database of greatest quality.

19. In the new page, search in Uniprot Knowledgebase, query human hemoglobin

And press search button.

20. In the result page, (human hemoglobin in UniProtKB),

-select A,B,D,G1,G2,Zeta,epsilon,theta1,mu,cytoglobin and press Retrieve

button to get the sequences in fasta.

- Click on align button to get the clustaw alignment of them.

-Press on TEXT button to get text only result,

- Come down, Press on ClustalW tree Show button to get phylogenetic tree.

--Click on amino acid properties show bar and select various properties like

Hydrophobic, polar etc

- also examine the sequence annotation features.

21 How much similar is myoglobin to human beta globin?

Go to entrez gene (though http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene is

the web address, better use google to get it for you!), search for gene-

myoglobin human.

22. In the result page, click on MB to fetch myglobin gene record

23. Get myoglobin protein sequence in fasta format (use the browser’s FIND to

locate it from RefSeqs) and save it as plain text (myoglobin.text)

24. Go to http://www.ebi.ac.uk/emboss/align/ and align beta globin (hbbprotein.txt)

with myoglobin.txt, use both Smith-Water (local alignment) and Needleman-

Wunch (global alignment) algorithms and examine the results.

25 Now let us use a gene finding tool which uses the inherent properties of a gene

to identify it. Go to GENSCAN ( http://genes.mit.edu/GENSCAN.html ), the

MIT tool based on Hidden Markov Models. Upload the hbbsorce.txt and Press

RUN GENSCAN button.

Examine the result ( You will find that there is one gene in the sequence with

three exons in it and a PolyA sequence). To ensure that the predicted peptide

sequence is a protein, copy it and BLAST:

(http://blast.ncbi.nlm.nih.gov/Blast.cgi )

26. Another important primary protein database is PIR. http://pir.georgetown.edu/

Click on search analysis tools-Search by Text- Give protein name haemoglobin

beta (anyfield) human. On the result page click HBB HUMAN’s PIRSF doc.

27. Now let search in the so called the secondary databases where motifs are

Stored as regular expressions and profiles.

Go to http://www.expasy.ch/prosite/, and search with hemoglobin and *

-click on PDOOOOO793 globin family profile

-get globin family profile matrix-PS01033

-click at HBB_HUMAN and get document

- go back to prosite and get a zinc finger signature, a de-tour to get a motif

expressed as a pattern or regular expression.

28. Another important secondary database is BLOCKS. Go to blocks

http://blocks.fhcrc.org/ . Select search BLOCKS by keywords. Search with beta

and haemoglobin to get the beta haemoglobin signature

29. Let us search the secondary database which stores finger prints, PRINTS. Go to

PRINTS.

( http://www.bioinf.manchester.ac.uk/dbbrowser /PRINTS/index.php

or use google) and search by text-beta haemoglobin. Get BETAHAEM

fingerprint. How many elements (motifs) are there in the finger print?

30. Now let us do the secondary structure prediction – predicting alpha helices,

beta sheets and coils. Go to JPRED-Paste the

Hbbprotein.txt (http://www.compbio.dundee.ac.uk/www-jpred) and click Make

Prediction. You will be asked whether to continue as homologues exist.

Continue. You will probably find yourself in a very big queue. Wait patiently if

you want to get the result! The result is a good pack. First explore the ‘simple’

result page. How many alpha helices you find? Any beta sheets?

31. Now let us visit a special type of database which does a structural classification.

Let us go to CATH. (http://www.cathdb.info/) -do sequence search by pasting in

fasta format the hbbprotein.txt. Click on 2dn1boo.htm and see the result.

32. Now let us go to OWL which is a composite database:

http://www.bioinf.manchester.ac.uk/dbbrowser/OWL/index.php

Search by text- beta globin, click at HBB HUMAN and examine the result.

33. Finally, let us visit the best 3D structure repository. Go to the pdb site

http://www.rcsb.org/pdb/ Search for 1Hab. See the 3D image and download the

pdb file and examine.

This is only a glimpse of the cyberspace. Please carry on. Problems like microarray

analysis, databases like interpro and tools like FASTA are waiting for you. Have a

nice cyber space tour! Bye!

- Compiled by Dr. Sreenadhan S, Asst.Professor, N.S.S. College of Engineering Palakkad

Documents

Biological Databases in Cyberspace