37
Sequence Comparison and Genome Alignment in the Human Genome Jian Ma Jian Ma | Sequence Comparison and Genome Alignment 1 Powerpoint: Casey Hanson

Sequence Comparison and Genome Alignment in the Human Genome

  • Upload
    amena

  • View
    78

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequence Comparison and Genome Alignment in the Human Genome. Jian Ma. Powerpoint : Casey Hanson. Introduction. This goals of the lab are as follows: Gain experience using BLAST and Genome Browsers by looking at repeat families in the VHL gene. - PowerPoint PPT Presentation

Citation preview

Page 1: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

1

Sequence Comparison and

Genome Alignment in the Human Genome

Jian Ma

Powerpoint: Casey Hanson

Page 2: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

2

IntroductionThis goals of the lab are as follows:

1. Gain experience using BLAST and Genome Browsers by looking at repeat families in the VHL gene.

2. Become familiar with BLAT and the UCSC website by discovering the identity of a mystery sequence.

3. Visualize pairwise multi-genome alignment and chromosomal rearrangements.

4. View phylogeny based multi-genome alignment.

5. Use UCSC tools and Galaxy to intersect annotated functional regions between human and other placental animals.

Page 3: Sequence Comparison and  Genome Alignment in the Human Genome

Bacterial Genome Assembly v9 | C. Victor Jongeneel

3

Step 0: Shared Desktop Directory

For viewing and manipulating files on the classroom computers, we provide a shared directory in the following folder on the desktop:

classes/mayo

In today’s lab, we will be using the following folder in the shared directory:

classes/mayo/ma

Page 4: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

4

BLAST & Genome BrowserIn this exercise, we will use BLAST (Basic Local Alignment Search Tool) to search for significant occurrences of a class of transposable elements (TEs) called Short INterspersed Elements (SINEs), specifically of the ALU family, in the well-known VHL tumor suppressor gene.

The goal of this exercise is to gain experience using BLAST, particularly blastN, and the UCSC genome browser to answer biologically relevant questions.

Page 5: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

5

Step 1A: BLAST VLH in ALU DatabaseGo to the following web page: http://blast.ncbi.nlm.nih.gov/Blast.cgi

Click nucleotide_blast

In the Enter Query Sequence box, paste the accession # for VHL:

AF010238

In the Database drop-down list, select the following:

Human ALU repeat elements (alu_repeats)

Click the BLAST button.

Page 6: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

6

Step 1B: BLAST VLH in ALU Database

Page 7: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

7

Step 2A: Interpreting BLAST Results

Coordinates of VHL gene

Very Good Matches

Color Indicates Quality of Match

Good Matches

Okay Matches

A match is a significant similarity between a region of the query and a region of a database sequence.

Lines between boxes indicate ‘gaps’ between matches in the query sequence. (The next slide has a legend for interpretation)

Page 8: Sequence Comparison and  Genome Alignment in the Human Genome

8

Step 2B: Interpreting BLAST Results

Jian Ma | Sequence Comparison and Genome Alignment

Exonic regions less likely to have ALU repeats.

Matches like this are likely to be located in intronic regions.

Note the following legend for interpreting a match.

Excellent Match

Good Match Okay MatchExo

n

Intron

IntronExon

Intron

Page 9: Sequence Comparison and  Genome Alignment in the Human Genome

9

Step 3A: Examine VHL in UCSC BrowserLet’s look at the structure of the VHL gene in a Genome Browser to verify that ALU elements are confined to the introns.

Go to the following web page: http://genome.ucsc.edu/

Click Genome Browser

In the search term, type VHL

Click submit

Click the 2nd link: VHL (uc003bvd.3) at chr3:10183319-10195354

Jian Ma | Sequence Comparison and Genome Alignment

Page 10: Sequence Comparison and  Genome Alignment in the Human Genome

10

Step 3B: Examine VHL in UCSC BrowserEnter chr3:10,177,301-10,201,372 into input box and click go.

Right click on tracks NOT shown below and hide them.

Right click on the RepeatMasker track and click full. It is dense by default.

Adjust the zoom until you get a view you are comfortable with.

Jian Ma | Sequence Comparison and Genome Alignment

Page 11: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

11

Step 3C: Examine VHL in UCSC Browser

Repeat tracks are 3’ to the gene, 5’ to the gene, or in the intronic region. This validates our hypothesis.

ALUs are not the only family of SINEs located in the intronic regions. What other SINE families does VHL have? What about other TE classes other than SINE?(Answers provided in separate pdf)

Page 12: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

12

BLATIn this exercise, we will use BLAT (Basic Local Alignment Tool) to search for the identity of a mystery gene annotated in the human genome.

The goal of this exercise is to gain experience using BLAST, particularly blastN, and the UCSC genome browser to answer biologically relevant questions.

Page 13: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

13

BLAST v. BLAT

BLAST Can find matches to a query in any set of GenBank sequences.

Not limited to a given k-mer size.

× Consumes a lot of memory.

× Slow compared to BLAT.

BLAT× Limited to matches to a query in a particular reference genome.

× Limited to non-overlapping 11-mers for DNA.

Can fit an entire genome in memory ( < 1GB) of RAM.

Fast compared to BLAST.

Page 14: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

14

Step 1A: BLAT the Mystery Sequence Go to the following web page: http://genome.ucsc.edu/

Click BLAT

Open our mystery sequence, located below, in Notepad.

classes/mayo/ma/mystery_sequence.txt

Paste the sequence into the textarea

Click submit

Page 15: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

15

Step 1B: BLAT the Mystery SequenceScreenshot of the web form for BLAT.

Page 16: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

16

Step 2A: Identify Mystery SequenceBLAT will return a list of significant matches in the genome.

Investigate the matches in the list by clicking browser for each match

For example, click the first browser link here.

Page 17: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

17

Step 2B: Identify Mystery SequenceThe screenshot below shows UCSC and RefSeq genes aligned to the Mysterious Sequence. In particular, CYP2A13.

Examine the other matches on the previous slide in the genome browser.

Keep in mind 2 questions: (Answers provided at the end of the document)

A. How many potential genes does the mystery sequence come from?

B. What is the relationship among these genes?

Page 18: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

18

Pairwise Whole Genome AlignmentsIn this exercise, we will utilize the UCSC Genome Browser to view whole genome alignments computed by lastZ of the following genomes individually to human: organutan, mouse, dog, and opossum. We will investigate these alignments to see if we can discover chromosomal rearrangements.

Page 19: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

19

Step 1: Create a Custom UCSC TrackGo to the UCSC Genome Browser: http://genome.ucsc.edu/index.html

Under the My Data Tab, click Create Custom Tracks:

In the Paste URLs textbox paste the following and click submit: (no commas)

chr13 58481798 58486558

On the next page, click Go to Genome Browser

Page 20: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

20

Step 2A: Track Addition

The track should look similar to what is below:

Page 21: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

21

Step 2B: Track Addition and RemovalTo get ‘Pairwise Alignments’ we need to turn a few tracks on and one track off.

Specifically, we need to select:

Primate Chain/Net Placental Chain/Net Vertebrate Chain/Net.

Underneath the Comparative Genomics Tab, turn these tracks to dense.

Additionally, set Conservation to hide and click refresh.

Page 22: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

22

Step 2C: Track AdditionThe resulting view should look like the figure below.

There is one problem: our species of interest are not being displayed.

Page 23: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

23

Step 2D: Species SelectionTo select the correct species, go back to the Comparative Genomics Tab.

Click on the Primate Chain/Net link.

In the resulting window, set Chains to hide and make sure only Orangutan is selected. Click submit

Page 24: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

24

Step 2E: Species Selection ContinuedConduct Step 2D for the other two tracks:

Placental Chain/Net Vertebrate Chain/Net

Make sure your configuration resembles the screenshots below:

Placental Chain/Net Vertebrate Chain/Net

Page 25: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

25

Step 2F: Expand TracksOn the tracks for each species, Right Click and select Full.

The resulting Genome Browser (after moving the tracks to the top) should look like the following:

Page 26: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

26

Step 3: Whole Genome Alignment Analysis.Investigate the tracks for each species and answer the following questions.

A. Are the sequence counterparts co-linear with respect to human? If not, is their evidence of genomics rearrangements in this region? Which kind?

B. Can you infer when these rearrangements happened evolutionarily on the diagram to the right?

Answers provided in separate pdf.

Page 27: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

27

Phylogeny Based Whole Genome AlignmentIn this exercise, we will utilize the UCSC Genome Browser to view a refined whole genome alignment of orangutan, mouse, dog, and opossum genomes to human. This alignment is produced by Multiz, a program that utilizes pairwise whole genome alignments of many species and, using a phylogenetic tree, improves the alignment.

Page 28: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

28

Step 1: Setup Multiz Visualization

Go to the UCSC Genome Browser: http://genome.ucsc.edu/index.html

Upload the following as a Custom Track and go to the genome browser, as in the previous exercise: (no commas)

chr20 61733467 61733528

Under the Comparative Genomics tab in the genome browser, click on Conservation.

Ensure the following settings are in place on the next 2 pages:

Page 29: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

29

Step 1B: Setup Multiz Visualization

Page 30: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

30

Step 1C: Setup Multiz Visualization

Once your configuration resembles the last 2 figures, click submit

Page 31: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

31

Step 2: Multiz Visualization Analysis

Investigate the tracks for each species and answer the following questions:

A. Is this region highly conserved in mammals?

B. Look closely at the Multiz track. Do you see anything strange in the human sequence compared to the other species? What could be the reason for this discrepancy?

(Answers provided in separate pdf)

After rearranging tracks, the genome browser should resemble the figure below:

Page 32: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

32

Intersection of Annotated Regulatory Regions in Human and Placental MammalsIn this exercise, we will use Galaxy to intersect annotated regulatory regions in human with annotated regions in other placental mammals.

We will then view the intersection in the UCSC genome browser

Page 33: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

33

Step 1A: Place Regulatory Data in GalaxyLogin to Galaxy : https://galaxy.illinois.edu/

Upload the sequence of predicted regulatory regions in h19 to Galaxy:

classes/mayo/ma/PRe_Mod_hg19.bed

Make sure to identify hg19 as your reference genome.

Acquire all conserved regions in placental mammals from the UCSC Main Table Browser in Galaxy:

Page 34: Sequence Comparison and  Genome Alignment in the Human Genome

34

Step 1B: Place Regulatory Data in GalaxySelect Comparative Genomics for Group

Select Mammal E1: phastConsElements45wayPlacental for table.

Select Genome for region.

Select Galaxy for send output to.

Click Get Output

On the next screen, click Send Query to Galaxy.

Jian Ma | Sequence Comparison and Genome Alignment

Page 35: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

35

Step 2: Intersect Datasets

Go to Operate on Genomic Intervals in Galaxy and select Interesect.

Select the parameters below and click Execute.

When finished, click display at UCSC in history pane.

UCSC Resultschr19 regulatory regions.

Page 36: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

36

Step 3: Predicted Modules Overlap with PAX5 Regulators

Page 37: Sequence Comparison and  Genome Alignment in the Human Genome

Jian Ma | Sequence Comparison and Genome Alignment

37

Exploratory Exercise

Pick a gene of interest. (VHL, CMYC, ETS1, TBP, USF2, GATA-1, …)

Visualize the intersected intervals in the UCSC Genome Browser.

See how this region correlates with results from ENCODE to assess their functional roles.

We will come around to help.