39
Prosite and Prosite and UCSC Genome UCSC Genome Browser Browser Exercise 4 Exercise 4

Prosite and UCSC Genome Browser Exercise 4

  • Upload
    zelda

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Prosite and UCSC Genome Browser Exercise 4. What is a motif?. A sequence motif = a certain sequence that is widespread and conjectured to have biological significance Examples: KDEL – ER-lumen retention signal PKKKRKV – an NLS (nuclear localization signal). More loosely defined motifs. - PowerPoint PPT Presentation

Citation preview

Page 1: Prosite and  UCSC Genome Browser Exercise 4

Prosite and Prosite and UCSC Genome UCSC Genome

BrowserBrowser

Exercise 4Exercise 4

Page 2: Prosite and  UCSC Genome Browser Exercise 4

What is a motif?What is a motif?

A sequence motifA sequence motif = a certain sequence = a certain sequence that is widespread and conjectured to that is widespread and conjectured to have biological significancehave biological significance

Examples:Examples:KDELKDEL – ER-lumen retention signal – ER-lumen retention signalPKKKRKVPKKKRKV – an NLS (nuclear localization – an NLS (nuclear localization signal)signal)

Page 3: Prosite and  UCSC Genome Browser Exercise 4

More loosely defined motifsMore loosely defined motifs

KDEL (usually)KDEL (usually)++

HDEL (rarely) HDEL (rarely) ==

[HK]-D-E-L:[HK]-D-E-L:H H oror K at the first position K at the first position

This is called a pattern (in Biology), or a This is called a pattern (in Biology), or a regular expression (in computer science)regular expression (in computer science)

Page 4: Prosite and  UCSC Genome Browser Exercise 4

Syntax of a patternSyntax of a pattern

Example:Example: W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Page 5: Prosite and  UCSC Genome Browser Exercise 4

PatternsPatterns

W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]

Any amino-acid, between 9-11

times

F or Y or

V

WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ

Page 6: Prosite and  UCSC Genome Browser Exercise 4

Patterns - syntaxPatterns - syntax

The standard IUPAC one-letter codes. The standard IUPAC one-letter codes. ‘‘x’x’ : any amino acid. : any amino acid. ‘‘[]’[]’ : residues allowed at the position. : residues allowed at the position. ‘‘{}’{}’ : residues forbidden at the position. : residues forbidden at the position. ‘‘()’()’ : repetition of a pattern element are indicated in : repetition of a pattern element are indicated in

parenthesis. X(n) or X(n,m) to indicate the number or parenthesis. X(n) or X(n,m) to indicate the number or range of repetition. range of repetition.

‘‘-’-’ : separates each pattern element. : separates each pattern element. ‘‹’‘‹’ : indicated a N-terminal restriction of the pattern. : indicated a N-terminal restriction of the pattern. ‘›’‘›’ : indicated a C-terminal restriction of the pattern. : indicated a C-terminal restriction of the pattern. ‘‘.’.’ : the period ends the pattern. : the period ends the pattern.

Page 7: Prosite and  UCSC Genome Browser Exercise 4

Profile-pattern-consensusProfile-pattern-consensus

AAAACCTTTTGG

AAAAGGTTCCGG

CCAACCTTTTCC

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

AAAACCTTTTGG

]AC-[A-[GC]-T-[TC]-[GC]

multiple alignment

consensus

pattern

profile

NNAANNTTNNNN

Page 8: Prosite and  UCSC Genome Browser Exercise 4

PrositeProsite

A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences

DB of annotated protein families and DB of annotated protein families and functional sites as well as associated functional sites as well as associated patterns and profiles to identify thempatterns and profiles to identify them

Page 9: Prosite and  UCSC Genome Browser Exercise 4

PrositeProsite Entries are represented with Entries are represented with patternspatterns or or

profilesprofiles

pattern

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

profile

]AC-[A-[GC]-T-[TC]-[GC]

Profiles are used in Prosite when the motif is relatively Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a patterndivergent and it is difficult to represent as a pattern

Page 10: Prosite and  UCSC Genome Browser Exercise 4

Scanning PrositeScanning Prosite

Query: sequence

Query: pattern

Result: all patterns found in sequence

Result: all sequences which adhere to this pattern

Page 11: Prosite and  UCSC Genome Browser Exercise 4

Searching Prosite with a sequenceSearching Prosite with a sequence

Page 12: Prosite and  UCSC Genome Browser Exercise 4

PrositeProsite results for Hemoglobin subunit beta results for Hemoglobin subunit beta

Page 13: Prosite and  UCSC Genome Browser Exercise 4

Prosite profileProsite profile

Page 14: Prosite and  UCSC Genome Browser Exercise 4

Prosite profile Prosite profile sequence logo sequence logo

Page 15: Prosite and  UCSC Genome Browser Exercise 4

Sequence logoSequence logo

Page 16: Prosite and  UCSC Genome Browser Exercise 4

WebLogoWebLogo

http://weblogo.berkeley.edu/logo.cgi

Page 17: Prosite and  UCSC Genome Browser Exercise 4

Searching Prosite with a sequenceSearching Prosite with a sequence

Page 18: Prosite and  UCSC Genome Browser Exercise 4

Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence

Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.

Found in the majority of known protein Found in the majority of known protein sequences sequences

High probability of occurrenceHigh probability of occurrence

Page 19: Prosite and  UCSC Genome Browser Exercise 4

Searching Prosite with a patternSearching Prosite with a pattern

Page 20: Prosite and  UCSC Genome Browser Exercise 4

Searching Prosite with a patternSearching Prosite with a pattern

]TAFR-[W-Q-Y

Page 21: Prosite and  UCSC Genome Browser Exercise 4

Searching Prosite with a Prosite ACSearching Prosite with a Prosite AC

Page 22: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome BrowserUCSC Genome Browser

Page 23: Prosite and  UCSC Genome Browser Exercise 4

Reset all settings of

previous user

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 24: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 25: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 26: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome Browser query resultsUCSC Genome Browser query results

Page 27: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks

Vertebrate conservation

mRNA (GenBank)

RefSeq

UCSC Genes

Base position

Single species compared

SNPs

Repeats

Direction of transcription (<)

CDS

Intron

UTR

EST based sequence

Page 28: Prosite and  UCSC Genome Browser Exercise 4

USCS GeneUSCS Gene

Page 29: Prosite and  UCSC Genome Browser Exercise 4

UCSC Genome Browser - movementUCSC Genome Browser - movement

Zoom x3 + Center

Page 30: Prosite and  UCSC Genome Browser Exercise 4

mRNA mRNA annotation track optionannotation track option

Sickle-cell anemia distr.

Malariadistr.

Page 31: Prosite and  UCSC Genome Browser Exercise 4

BLATBLAT

BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on

DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.Good for:Good for:1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA2.2. Determining exons/intronsDetermining exons/introns3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…)

homologs of another vertebrate sequencehomologs of another vertebrate sequence4.4. Find upstream regulatory regionsFind upstream regulatory regions

Page 32: Prosite and  UCSC Genome Browser Exercise 4

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 33: Prosite and  UCSC Genome Browser Exercise 4

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 34: Prosite and  UCSC Genome Browser Exercise 4

BLAT ResultsBLAT Results

Page 35: Prosite and  UCSC Genome Browser Exercise 4

BLAT ResultsBLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

Page 36: Prosite and  UCSC Genome Browser Exercise 4

BLAT ResultsBLAT Results

Page 37: Prosite and  UCSC Genome Browser Exercise 4

BLAT Results on the browserBLAT Results on the browser

Page 38: Prosite and  UCSC Genome Browser Exercise 4

Getting Getting DNADNA sequence of region sequence of region

Page 39: Prosite and  UCSC Genome Browser Exercise 4

Getting Getting DNADNA sequence of region sequence of region