13
Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007.

Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Embed Size (px)

Citation preview

Page 1: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Aliya SadequeBIOC 599Supervisory Committee Meeting Wednesday December 19, 2007.

Page 2: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Outline

About me Thesis project blueprint Course selection

Page 3: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Curriculum Vitae

Queen’s University.Bachelor of Science (Honours) in Biochemistry. Minor in Computing.Graduated May, 2007

Page 4: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Previous Coursework Undergraduate Level

Biochemistry: Proteins and Enzymes Physical Biochemistry Metabolism Molecular Biology Introductory Biochemistry Laboratory Protein Structure and Function Current Topics in Biochemistry Biochemistry of the Cell Advanced Molecular Biology

Page 5: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Previous Coursework Undergraduate Level Computing:

Database Management Systems Neural and Genetic Computing Introduction to Data Mining System Level Programming Operating Systems

Undergraduate Level Mathematics: Introduction to Statistics Discrete Math for Computer Scientists Modeling Techniques in Biology

Page 6: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Thesis Project Blueprint Context

Why is this work necessary What kind of tools have been used to

address it

Longest Common Subsequence

Part I: Explore LCSs in poxvirus Visualization Threshold frequency equation

Part II: Develop an interface for use by biologists

Page 7: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Background

“Promoter sequences might be identified as conserved islands in a divergent sea”

Observed: 42-bp sequence showing “unusually high degree of sequence conservation” (Brunetti et al.) Are these claims reasonable? How can they be tested?

Page 8: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Tools Alignment 0 mismatch suffix tree Longest Common Subsequence

Algorithm

Page 9: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Visualization

Page 10: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Threshold FrequencyFigure 1. Table showing number of hits resulting from LCS trials with varying

values of n and k, or subsequence length and error number, respectively.

k = 1 k=2 k=3

length # solutions length # solutions length # solutions

10 118643 15 58492 51 27

12 63845 17 6554 52 24

13 23723 18 2105 53 20

14 5966 19 1004 54 16

15 1350 20 667 55 12

17 344 25 216 56 10

20 191 30 114 57 7

25 101 40 46 59 7

30 48 45 24 60 5

35 28 50 14 61 2

36 25 53 6 62 0

40 13 54 5 63 0

45 6 55 4 64 0

50 1 57 2 65 0

Page 11: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

User Interface

Design with usability in mind Selection of inputs – What kind of

genomes can/will this tool be used for?

Format of results – How should these be presented in order to allow interpretation?

Visualization Further processing of output

Page 12: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Timeline

Part I: Poxvirus LCS data collection and analysis 2 months

Part II: Interface 4-6 months

Page 13: Aliya Sadeque BIOC 599 Supervisory Committee Meeting Wednesday December 19, 2007

Course Selection

BIOC 570 - completed MICR 502 - Virology Courses to sit in for:

Biochemistry courses? Computing courses?

Data mining Bioinformatics Statistics