3
Bioc4010 Sample Questions: 1. A) What is the base call accuracy of a base in an Illumina sequenced short read with a Q value of 20? B) Is this better or worse than a Q value of 10? Answer: A) Probability 1 in 100 or 99% call accuracy B)Better. Q10 corresponds to a probability of 1 in 10 or 90% call accuracy Formula: Q = -10 log10 P 2. What two primary advantages does exome sequencing provide over whole genome sequencing? Answer: Cost and data reduction. Exome capture limits the sequencing to known protein-coding genes and some miRNAs. 3. Split and sort the string CAPTAINKIRKinto its appropriate suffix array Answer: Ainkirk Aptainkirk Captainkirk Inkirk Irk K Kirk Nkirk Ptainkirk Rk Tainkirk

Bioc4010 sample questions

Embed Size (px)

Citation preview

Page 1: Bioc4010 sample questions

Bioc4010 Sample Questions: 1. A) What is the base call accuracy of a base in an Illumina sequenced short read with a Q value of 20? B) Is this better or worse than a Q value of 10? Answer: A) Probability 1 in 100 or 99% call accuracy B)Better. Q10 corresponds to a probability of 1 in 10 or 90% call accuracy Formula: Q = -10 log10 P 2. What two primary advantages does exome sequencing provide over whole genome sequencing? Answer: Cost and data reduction. Exome capture limits the sequencing to known protein-coding genes and some miRNAs. 3. Split and sort the string ‘CAPTAINKIRK’ into its appropriate suffix array Answer: Ainkirk Aptainkirk Captainkirk Inkirk Irk K Kirk Nkirk Ptainkirk Rk Tainkirk

Page 2: Bioc4010 sample questions

4. Given a base-quality score threshold of Q30, the following short read alignment, and reference sequence, what is the genotype (two alleles, eg G/C)at the indicated position? Base qualities for the position are listed on the side for each of the reads.

AGCTCCCAGGGTCCAG Q29

GTCCAGTCTCGGTT Q40

CAGGGTCCAGTC Q47

TCCAGTCTCGGTTCCATC Q35

CCCAGGGCCCAG Q50

GGGTCCAGTCTC Q31

TCCCAGGGCC Q10

AGGGTCCAGT Q45

GCTCCCAGGGCCCAGTCT Q46

CTCCCAGGGCCC Q33

CCAGGGTCCAGTCQ38

GCTCCCAGGGCCCAGTCTCGG Q41

CAGGGTCCAGTCTCG Q15

AGCTCCCAGGGTCCAGTCTCGGTTCCATCTA

* Answer: Discard the reads where the base quality score is below Q30. Sum up the reference and alternate bases at the position. (T =6 , C = 4). Therefore the genotype called is T/C (heterozygous). 5. Sort the following types of genetic variants into the categories: Potentially Disease Causing, Unlikely to be Disease Causing 1. Splice Site 2. Non-Synonymous 3. Synonymous 4. FrameshiftIndel 5. Stop Loss 6. Stop Gain 7. Intronic (Non-Splice Site) 8. Intergenic Answer: Disease: 1, 2, 4, 5, 6 Non-Disease: 3, 7, 8

Page 3: Bioc4010 sample questions

6) What is the primary motivation for using “next gen” sequencing methods and modern genomics approaches to diagnosing human genetic diseases? Answer: Cost 7) What does the base quality of a sequencing read tell you? Answer: The base quality is equivalent to the probability of an incorrect base call. (Also acceptable answer is the base call accuracy) 8) What problem does binary search address? Answer: Efficiently searching the index of a genome