Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
1
Validation & Critical Thinking
Gerard J. KleywegtUppsala University
Critical thinking
What is wrong here?
Critical thinking
And what is wrong here?
Critical thinking
What is wrong here?
(1) The tacR gene regulates the humannervous system
(2) The tacQ gene is similar to tacR but isfound in E. coli
==> So, the tacQ gene regulates the E. colinervous system!
Critical thinking
Of course there is a fine line betweencritical thinking and silliness …
Knowledge pyramid
Data
Information
Knowledge
Wisdom
Nobel Prize
• Processing• Visualisation
• Analysis• Interpretation• Validation
• Insight• Experience
• Swedish friends ?• Luck ?• Longevity
2
Data versus information
DataFactsObservations
InformationContextMeaningInterpretation
ATOM 2567 N PHE B 175 7.821 -25.530 -22.848 1.00 8.71ATOM 2568 CA PHE B 175 8.845 -25.172 -21.877 1.00 9.41ATOM 2569 C PHE B 175 9.449 -23.798 -22.169 1.00 10.02ATOM 2570 O PHE B 175 10.664 -23.613 -22.103 1.00 10.37ATOM 2571 CB PHE B 175 9.928 -26.251 -21.848 1.00 9.53ATOM 2572 CG PHE B 175 10.969 -26.137 -22.982 1.00 10.03ATOM 2573 CD1 PHE B 175 12.356 -25.819 -22.988 1.00 10.51ATOM 2574 CD2 PHE B 175 11.725 -27.211 -23.402 1.00 10.25ATOM 2575 CE1 PHE B 175 11.821 -27.095 -22.869 1.00 11.17ATOM 2576 CE2 PHE B 175 12.282 -26.086 -24.008 1.00 10.95ATOM 2577 CZ PHE B 175 10.953 -26.335 -23.622 1.00 11.38
Karl Popper - falsifiability
A theory that is not falsifiable is not scientific Example
Theory: all swans are white New observation: black swan (Australia) New theory 1: Australian ornithologists are incompetent New theory 2: all swans except Cygnus atratus are white; C.
atratus is black
Astrology versus astronomy
Occam’s razor
Do not make more assumptions than strictlyneeded
When you hear hoof beats, think horses, notzebras (unless you are in Africa!)
KISS principle - Keep It Simple, Stupid Of two equivalent theories or explanations, all
other things being equal, the simpler one is tobe preferred
Maximum parsimony
Bioinformatics basics
Don’t always believe what databases /programs / lecturers tell you!They (almost) always give you some answer, but …
this can be misleading and is sometimes wrong
Don’t be a naïve userGarbage in, garbage outStatistical versus biological significanceUse common sense!
Bioinformatics basics
What is the right question to ask?
Understand limitations of data, databases,search algorithms, alignment methods,prediction methods, etc.
Evaluate result: does it answer yourquestion? Does it make sense?
3
Validation
Validation = establishing or checking thetruth or accuracy of (something)TheoryHypothesisModelAssertion, claim, statement, observation
Integral part of scientific activity!
Science, errors & validation
Prior knowledge ObservationsExperiment
Hypothesisor Model
Predictions
Precision versus accuracy
Precise, but not veryaccurateEx: π~4.0053±0.0001
Fairly accurate, butnot very preciseEx: π~3.1±0.1
Accurate and preciseEx: π~3.1416±0.0001
Errors affect measurements
Random errors (noise)Affect precisionUsually normally distributedReduce by increasing nr of observations
Systematic errors (bias)Affect accuracyIncomplete knowledge or inadequate designReproducible
Gross errors (bloopers)Incorrect assumptions, undetected mistakes or
malfunctionsSometimes detectable as outliers
Errors affect measurements
Bias(accuracy)
Precision (uncertainty; random error)
Errors affect measurements
How tall is Gerard?
200 203 202 203 202201 203 80
Random error? Systematic error? Gross error?
4
Science, errors & validation
Prior knowledge ObservationsExperiment
Hypothesisor Model
Predictions
ParameterisationOptimised values
Random errors ✔(precision)
✔
✔✔
Systematic errors ✔(accuracy)
✔✔
✔
✔
Gross errors ✔(both)
✔ ✔✔
✔
Science not immune to Murphy’s Law!
Science, errors & validation
Prior knowledge ObservationsExperiment
Hypothesisor Model
Predictions
Fit? Explain?
Quality?Quantity?Inf. content?Reliable?
Experiments
Correct?
Independentobservations
Predict?
Other priorknowledge
Fit?
Structure validation Structure validation
What type of residue is this? What is wrong with it? How did it end up in the PDB?
Structure validation
Should we trust the PDB?
Structures are based onexperimental data
Amount of data differs Structures are
interpretations of data PDB must accept all
depositions
Resolution
Low resolutionLittle detail
High resolutionMuch detail
5
Resolution
1ISR 4.0 Å 1EA7 0.9 Å
Interpretation
Structure validation
Users of structures must make sure thatthese are reliable for their purposes
Ramachandran plot
Fit of model and electron density(http://eds.bmc.uu.se/)
Validation tutorial:http://xray.bmc.uu.se/embo2001/modval/
Torsion angles
Dihedral or torsion angle - given 4sequential, bonded atoms A-B-C-D Dihedral = angle between the planes
ABC and BCD Torsion = looking at the projection
along bond B-C, the angle overwhich one has to rotate A to bring iton top of D (clockwise = positive)
note: torsion (ABCD) = torsion(DCBA)
phi = torsion (C[i-1]-N[i]-Cα[i]-C[i]) psi = torsion (N[i]-Cα[i]-C[i]-N[i+1])
Validation alert:The arrow pointsthe wrong way!!!
Ramachandran plot
Steric clashes (pink dashed lines) develop during rotationaround phi (left) and psi (right)
Only certain phi, psi combinations are stericallyfavourable/allowed: Ramachandran plot
Ramachandran plot
Favourable regions inthe Ramachandranplot
Good models havevery few residuesoutside these regions
If there are any, thereis usually a goodreason
6
Ramachandran plot
Good model:Few outliersStrong concentration in core regions
PDBsum
Same structure, different data
Global qualityimportant
Local quality alsoActive siteLigandSubstrate analogueMetal-binding siteImportant loop…
Electron density fit
Good fit of modeland density
Electron density fit
Poor fit of modeland density
Electron density fit
7
PDBreport
http://swift.cmbi.ru.nl/gv/pdbreport/
} !!!
Oops!
Playing the Blame Game …
Why do errors make it into the literatureand the PDB? Who is to blame?
Playing the Blame Game …
Suggestions from studentsCold Spring Harbor course, 2005Copenhagen University course, 2006
Playing the Blame Game …
Crystallographer (ignorance, lack of experience,incompetence, incorrect preconceptions/bias, cheating,laziness, “science by mouse-click”, stress, can’t bebothered to fix minor problems, no validation)
PI (pressure to publish/graduate fast, career interest,competition, grant writing, insufficient supervision)
Referees/Editors (lazy, inadequate reviewing routines, noaccess to raw data, “validation by senior author name”,lack of experience)
Software (misses or causes errors) PDB (doesn’t check) External (competition/danger of being scooped) Nature (limitations of the technique/resolution, errors hard
to detect, poor data)