View
217
Download
0
Tags:
Embed Size (px)
Citation preview
University of Nebraska at Omaha
Innovative Database Models and Advanced Tools in Bioinformatics
Hesham H. AliUNO Bioinformatics Research Group
Department of Computer Science College of Information Science and Technology
Key Challenges Facing Bioinformatics Research
• Significant gaps between tool developers and tool users– Different objectives
– Different funding agencies
– Different academic cultures
• Significant problems with available Biological Data – Archival based
– Lack of structure
Source: ncbi.nih.gov
Problems with Current Biological Data
• The availability of large biological data and the increasing rate in producing new data, available in public data banks or via microarray data
• The increasing pressure to maximize the use of the available data, particularly to impact key related industries (biotech companies, biotech drugs)
• The large degree of heterogeneity of the available data in terms of quality
UNO Bioinformatics Research Group
• Group Triangle
– Research motivated by real biological problems
– Innovative Database Models
– Advanced tools
Biological Questions Addresses by our Group
• Molecular diagnosis - Identification– Sequence based id– Enzyme (cutting order) based id– Instrumentation (Mass Spec, WAVE) based id
• Basic Molecular Biology - Gene regulation– Microarray Analysis
– Motif discovery/searching
• Epidemiology and Clinical Research– Patient tracking system– Clinical expert system
Bioinformatics Solutions to These Problems
• Develop new inventive database models– Custom database for specific domains– Centralized Structured integrated data
• Develop innovative Bioinformatics tools– Clustering algorithms– Advanced motif finding approaches
Database Models
• Customized (Private) Solution:
Custom based Data Base Model
High degree of quality and consistency
• Centralized (Public) Solution:
New Curated Integrated and Structured DataBase Model
Model One: Custom Databases
• Allowing researchers to create custom sets of genetic data suited to their specific needs.
• Allowing researchers to control the quality of genetic data in their custom data sets through fine-tuning parameters.
• Searching data using optimal alignment algorithms, rather than using heuristic methods.
• Giving researchers/clinicians the ability to formulate sequence identification concepts and test their ideas against a validated database
• Incorporating information from GenBank if needed
The Sequence Identification Problem
• Identification of organisms using obtained sequences is a very important problem
• Relying on wet lab methods only is not enough• Employing identification algorithms using signature motifs
to complement the experimental approaches• Currently, no robust software tool is available for aiding
researchers and clinicians in the identification process• Such a tool would have to utilize biological knowledge and
databases to identify sequences• Issues related to size of data and quality of data are suspect
and would need to be dealt with
Nebraska gets its very own organism
While trying to pinpoint the cause of a lung infection in local cancer patients, they discovered a previously unknown micro-organism. And they've named it "mycobacterium nebraskense," after the Cornhusker state.
It was discovered few weeks ago using Mycoalign: A Bioinformatics program developed at PKI
Source: Omaha World Herald, March 21, 2005
Model Two: Centralized Database - the Integrated Model
A new integrated model based on:
– Organized and curated database
– True non-redundancy by having one record for each polymorphic set with pointer to the rest of the set if needed
– Allowing advanced queries
– Being user-friendly and employing true automation
– Employing various algorithms with different levels of accuracy and speed for conducting homology searches.
The Clean Gene Package
• A set of integrated database and alignment tools: – Edited and curated
– Web based
– Of manageable size
– Based on hierarchical database model
– Utilize various alignment algorithms
– Allows advanced automated queries
– Allows fast and accurate searches
The Key Challenges
• The New Structured relational database model• Identification of equivalence classes of records
(polymorphic sets)• Identification of a good representative for each set• Curation and classification• Accurate annotation• Advanced data mining tools• A user-friendly interface that employs true automation for
interfacing with the database
Tool I: Clustering Biological Data
• Clustering is a fundamental technique in finding a structure in a collection of unlabeled data.
• Basically, clustering is the process of organizing objects into groups whose members are similar in some way.
• A good Clustering tool is a key component in analyzing microarray data
Message Passing Clustering (MPC)
• Inspired by real-world situations: elements with similar attributes cluster together simultaneously
• Advantages:– Easy to understand and use.– Taking the advantage of communication among data objects, MPC is
able to balance the global and local structure and be performed in parallel.
– “Message” has flexible structure which allows further development to fit to different research interests.
– We have extended the basic MPC to • Weighted MPC• Stochastic MPC• Semi-supervised MPC
Basic MPC
3 M .in tra ce llu la re M a c-D
4 M .in tra ce llu la re M a c-J
1 M .in tra ce llu la re M in -A
3 M .ch e lo n a e M ch -C
1 M .ch e lo n a e M ch -A
2 M .ch e lo n a e M ch -B
2 M .xe n o p iII/M xe -B
1 M .xe n o p iI/M xe -A
3 M .xe n o p iIII/M xe
1 M .ka n sa siiM ka -A
1 M .g o rd o n a e M g o -A
2 M .g o rd o n a e M g o -B
4 M .g o rd o n a e M g o -D
3 M .g o rd o n a e M g o -C
5 M .g o rd o n a e M g o -E
2 M .ka n sa siiM ka -B
4 M .ka n sa siiM ka -D
3 M .ka n sa siiM ka -C
5 M .ka n sa siiM ka -F
3 M .p e re g rin u m M p e
1 M .p e re g rin u m
1 M .fo rtiu tu m M fo
2 M .p e re g rin u m M p e
2 M .fla ve sce n sM fla -B
1 M .fla ve sce n sM fla -A A TCC1 4 4 7 4
3 M .fla ve sce n sM fla -A
3 M .te rra e III
1 M .te rra e I
2 M .te rra e II
4 M .fo rtu itu m M fo
2 M .fo rtu itu m M fo
3 M .fo rtu itu m M fo
2 M .in tra ce llu la re M A C-E
5 M .in tra ce llu la re M a c-L
M.intracellulare ATCC 35770 *M.intracellulare S 348 *
M.intracellulare S 350 *M.intracellulare ATCC 35847 *
M.intracellulare S 348 *
M.chelonae DSM 43276M.chelonae ATCC 35752M.chelonae ATCC 19536
M.xenopi S 88
M.xenopi S 91M.xenopi ATCC 19250
M.kansasii ATCC 12478 #
M.kansasii S 221 #M.kansasii S 233 #M.kansasii S 536 #M.kansasii DSM 44431 #
M.gordonae ATCC 14470
M.gordonae Bo 10681/99M.gordonae ATCC 35756
M.gordonae Bo 11340/99M.gordonae Bo 9411/99
M.peregrinum S 254 ^M.peregrinum ATCC 14467 ^
M.peregrinum ATCC 700686 ^
M.fortiutum ATCC 49403 $
M.flavescens DSM 43531M.flavescens ATCC 14474M.flavescens ATCC 23033M.terrae S 281M.terrae ATCC 15755M.terrae DSM 43541
M.fortiutum ATCC 6841 $M.fortiutum ATCC 49404 $M.fortiutum ATCC 43266 $
M.gordonae
M.flavescens
M.terrae
M.chelonae
M.xenopi
3M.intracellulareMac-D
4M.intracellulareMac-J
1M.intracellulareMin-A
3M.chelonaeMch-C
1M.chelonaeMch-A
2M.chelonaeMch-B
2M.xenopiII/Mxe-B
1M.xenopiI/Mxe-A
3M.xenopiIII/Mxe
1M.kansasiiMka-A
1M.gordonaeMgo-A
2M.gordonaeMgo-B
4M.gordonaeMgo-D
3M.gordonaeMgo-C
5M.gordonaeMgo-E
2M.kansasiiMka-B
4M.kansasiiMka-D
3M.kansasiiMka-C
5M.kansasiiMka-F
3M.peregrinumMpe
1M.peregrinum
1M.fortiutumMfo
2M.peregrinumMpe
2M.flavescensMfla-B
1M.flavescensMfla-A A TC C 14474
3M.flavescensMfla-A
3M.terraeIII
1M.terraeI
2M.terraeII
4M.fortuitumMfo
2M.fortuitumMfo
3M.fortuitumMfo
2M.intracellulareMA C -E
5M.intracellulareMac-L
M.intracellulare ATCC 35770 *M.intracellulare S 348 *
M.intracellulare S 350 *M.intracellulare ATCC 35847 *
M.intracellulare S 348 *
M.chelonae DSM 43276M.chelonae ATCC 35752M.chelonae ATCC 19536
M.xenopi S 88
M.xenopi S 91M.xenopi ATCC 19250
M.kansasii ATCC 12478 #
M.kansasii S 221 #M.kansasii S 233 #M.kansasii S 536 #M.kansasii DSM 44431 #
M.gordonae ATCC 14470
M.gordonae Bo 10681/99M.gordonae ATCC 35756
M.gordonae Bo 11340/99M.gordonae Bo 9411/99
M.peregrinum S 254 ^M.peregrinum ATCC 14467 ^
M.peregrinum ATCC 700686 ^
M.fortiutum ATCC 49403 $
M.flavescens DSM 43531M.flavescens ATCC 14474M.flavescens ATCC 23033M.terrae S 281M.terrae ATCC 15755M.terrae DSM 43541
M.fortiutum ATCC 6841 $M.fortiutum ATCC 49404 $M.fortiutum ATCC 43266 $
M.gordonae
M.flavescens
M.terrae
M.chelonae
M.xenopi
3M.intracellulareMac-D
4M.intracellulareMac-J
1M.intracellulareMin-A
3M.chelonaeMch-C
1M.chelonaeMch-A
2M.chelonaeMch-B
2M.xenopiII/Mxe-B
1M.xenopiI/Mxe-A
3M.xenopiIII/Mxe
1M.kansasiiMka-A
1M.gordonaeMgo-A
2M.gordonaeMgo-B
4M.gordonaeMgo-D
3M.gordonaeMgo-C
5M.gordonaeMgo-E
2M.kansasiiMka-B
4M.kansasiiMka-D
3M.kansasiiMka-C
5M.kansasiiMka-F
3M.peregrinumMpe
1M.peregrinum
1M.fortiutumMfo
2M.peregrinumMpe
2M.flavescensMfla-B
1M.flavescensMfla-A A TC C 14474
3M.flavescensMfla-A
3M.terraeIII
1M.terraeI
2M.terraeII
4M.fortuitumMfo
2M.fortuitumMfo
3M.fortuitumMfo
2M.intracellulareMA C -E
5M.intracellulareMac-L
M.intracellulare ATCC 35770 *M.intracellulare S 348 *
M.intracellulare S 350 *M.intracellulare ATCC 35847 *
M.intracellulare S 348 *
M.chelonae DSM 43276M.chelonae ATCC 35752M.chelonae ATCC 19536
M.xenopi S 88
M.xenopi S 91M.xenopi ATCC 19250
M.kansasii ATCC 12478 #
M.kansasii S 221 #M.kansasii S 233 #M.kansasii S 536 #M.kansasii DSM 44431 #
M.gordonae ATCC 14470
M.gordonae Bo 10681/99M.gordonae ATCC 35756
M.gordonae Bo 11340/99M.gordonae Bo 9411/99
M.peregrinum S 254 ^M.peregrinum ATCC 14467 ^
M.peregrinum ATCC 700686 ^
M.fortiutum ATCC 49403 $
M.flavescens DSM 43531M.flavescens ATCC 14474M.flavescens ATCC 23033M.terrae S 281M.terrae ATCC 15755M.terrae DSM 43541
M.fortiutum ATCC 6841 $M.fortiutum ATCC 49404 $M.fortiutum ATCC 43266 $
M.gordonae
M.flavescens
M.terrae
M.chelonae
M.xenopi
M.xenopi
M.kansasii
M.intracellulare
M.gordonae
M.terrae
M.peregrinum
M.fortuitum
M.flavescens
M.chelonae
M.chelonae ATCC 35752M.chelonae ATCC 19536M.chelonae DSM 43276M.flavescens ATCC 14474M.flavescens ATCC 23033M.flavescens DSM 43531M.fortiutum ATCC 49403M.fortuitum ATCC 49404M.fortuitum ATCC 43266M.fortuitum ATCC 6841M.peregrinum ATCC 14467M.peregrinum S 254M.peregrinum ATCC 700686M.terrae ATCC 15755M.terrae DSM 43541M.terrae S 281M.gordonae ATCC 14470M.gordonae ATCC 35756M.gordonae Bo 11340/99M.gordonae Bo 9411/99M.gordonae Bo 10681/99M.intracellulare ATCC 13950M.intracellulare ATCC 35770M.intracellulare S 348M.intracellulare ATCC 35847M.intracellulare S 350M.kansasii ATCC 12478M.kansasii S 221M.kansasii S 536M.kansasii DSM 44431M.kansasii S 233M.xenopi ATCC 19250M.xenopi S 91M.xenopi S 88 M.xe nopi
M.kans as ii
M.intrace llulare
M.gordonae
M.te rrae
M.pe re grinum
M.fortuitum
M.flav e s ce ns
M.che lonae
M.chelonae ATCC 35752M.chelonae ATCC 19536M.chelonae DSM 43276M.flavescens ATCC 14474M.flavescens ATCC 23033M.flavescens DSM 43531M.fortiutum ATCC 49403M.fortuitum ATCC 49404M.fortuitum ATCC 43266M.fortuitum ATCC 6841M.peregrinum ATCC 14467M.peregrinum S 254M.peregrinum ATCC 700686M.terrae ATCC 15755M.terrae DSM 43541M.terrae S 281M.gordonae ATCC 14470M.gordonae ATCC 35756M.gordonae Bo 11340/99M.gordonae Bo 9411/99M.gordonae Bo 10681/99M.intracellulare ATCC 13950M.intracellulare ATCC 35770M.intracellulare S 348M.intracellulare ATCC 35847M.intracellulare S 350M.kansasii ATCC 12478M.kansasii S 221M.kansasii S 536M.kansasii DSM 44431M.kansasii S 233M.xenopi ATCC 19250M.xenopi S 91M.xenopi S 88
M.chelonae ATCC 35752M.chelonae ATCC 19536M.chelonae DSM 43276M.flavescens ATCC 14474M.flavescens ATCC 23033M.flavescens DSM 43531M.fortiutum ATCC 49403M.fortuitum ATCC 49404M.fortuitum ATCC 43266M.fortuitum ATCC 6841M.peregrinum ATCC 14467M.peregrinum S 254M.peregrinum ATCC 700686M.terrae ATCC 15755M.terrae DSM 43541M.terrae S 281M.gordonae ATCC 14470M.gordonae ATCC 35756M.gordonae Bo 11340/99M.gordonae Bo 9411/99M.gordonae Bo 10681/99M.intracellulare ATCC 13950M.intracellulare ATCC 35770M.intracellulare S 348M.intracellulare ATCC 35847M.intracellulare S 350M.kansasii ATCC 12478M.kansasii S 221M.kansasii S 536M.kansasii DSM 44431M.kansasii S 233M.xenopi ATCC 19250M.xenopi S 91M.xenopi S 88
a. NJ b. MPC
• The phylogenetic trees of Mycobacterium (9 species, 34 strains), constructed by the Neighbor Joining and MPC method.
Weighted MPC (WMPC)—
with Adaptive Feature Scaling • Add weight associated with each cluster-feature
pair. A single feature have multiple weights in different clusters and, in one cluster, all features may have different weights.
• Update the weights during the clustering process. If on some dimension, the similarity between two going-to-merge clusters is high (/low), then we increase (/decrease) the weight on that dimension in the newly merged cluster.
• Test WMPC on Colon Cancer data (2000 genes in 40 tumor and 22 normal samples), giving higher classification rate.• Two benefits:
– Strengthen the signal features while reducing the noise features, so making clustering results more accurate. – More importantly, reveal the contribution of the features (genes) to the clusters (samples), so that identify the set of genes responsible for certain diseases.
Stochastic MPC (SMPC)Based on Kernel Functions
distance
TargetObject
0
Tie ?
a b
c
d e f
Chance to merge?
Kick out ?
Kernel Density Estimates Using Gaussian KernelsProbability Density Estimates Based on Little Gaussian Kernel Functions
Semi-supervised MPC
• Clustering methods are considered unsupervised, meaning that the reduction is derived solely from the data rather than reflecting any previous knowledge.
• Classification methods are considered supervised, because in the training phase, samples classes are already known, and we classify the objects into known groups.
• Between clustering and classification: Unlabeled data with prior knowledge, such as constraints and hypotheses.
• The goal of semi-supervised clustering is to guide the clustering, using the prior knowledge, to get better partitions.
Semi-supervised MPC Instance-level Constraints
• Colon Cancer data (2000 genes in 40 tumor and 22 normal samples).
• We cluster samples with genes as features. Since the samples (instances) labels (constrains) are known, it is call instance-level constraints.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50 55 60
Number of samples with an initial label
Accuracy (Rand index)
OP IP
We want to see how well our method could separate the normal and tumor tissues based on different numbers of known labels for the samples as prior knowledge.OP: Output partition after clustering. IP: Input constraints presented before clustering.Combining the power of clustering with background information achieves better performance than either in isolation.
Cluster t-statisticCluster
SizeGene Name (ORF)
1 -5.73831 22T57079, T60860, R38758, R55828, H55759, X58401, H63361, R56207, U25435, R09138, H82741, H16096, H64576, L34059, U12140, H78346, H62177, M55422,
H89481, U12134, H11460, R28608
2 -5.11799 19R00254, H41147, R42765, T51139, D21205, H70635, H69819, H71122, H81802, R46731, M29550, H14607, R88749, J00146, H18451, M96859, X69115, X51435,
H06189,
3 13.9581 17 L12723, M81637, D59253, X65873, M77698, T84049, M88108, R56401, H09719, X74330, D26018, D26067, R56630, X13482, H16991, T86749, H92195
4 9.81885 14 R39531, R21901, J02645, R34876, H72234, T92259, H47107, T65740, Z15115, D43951, L19437, X82103, U29175, T65438
5 -8.42722 42M16029*, M16029*, D38549, U03100, L24038, J02906, X93499, M73481, T71207, R73129, X56253, M81758, R27017, U18934, X17651, H47650, H89688, H23135, R39130, H82631, H45526*, H45526*, D90188, L06895, L34774, T58756, L11370, U34252, H78063, D21239, M96824, M24470, T77446, R53612,
R38513*, R38513*, R44677, M98045, T88712, R54317, M77477, H11054
6 14.9275 33R42127, T90350, L25941, T65790, R22779, M90516, R11485, X68194, X63629, J05032, M34175, X74795, U34074, M58050, H20512, U18299, H80114, R67987,
R56399, U10324, R56443, X17025, R40717*, R40717*, D13630, L24203, T96873, X53586, X73478, X14618, M55543, H42884, X54101
7 10.3472 12 D13315, T70062, U30825, T89115, D26600, T60437, D50063, R20804, R09479, R42837, D14658, H87473
8 -9.94085 14 T61333, M28882, M69066, M14539, T67406, M37721, T79831, M69135, M36634, U25138, R48303, U31525, T51539, H21042
9 8.59304 11 U22055, L41559, M88279, R37114, R96357, U29607, H82719, H04802, M22632, R27813, R60195
10 5.97321 29H48027, M14200, T52642, R85464, T52343, T72503, T49732, T95048, T47584, R21547, L28010, M21339, T87527, T61338, T79813, L36844, M34192, M69238,
L26405, H06970, M84326, T70063, X66363, H85878, X62153, T67905, T57872, Z48950, R62425
11 -7.47053 16 T93284, H80975, R73052, X05610, X79683, X55187, U30827, T64974, D31887, M92843, R67358, R46753, X68277, H50623, H15813, X51345
12 12.6072 12 T63591, T63370, T53412, T53396, T63133, R44884, R84411, X12671, R08183, M22382, D63874, D21261
13 5.07119 37 H46728, T57686, T68848, M11354, T69026, M14630, T72879, T47144, T61627, X61971, M94345, R42570, H13194, T65580, H15542, J03077, D30655
14 -5.35587 16 T61661, T62220, T96832, H80240, H86060, T63508, H24754, T63499, M11799, L28809, T63484, M57710, M33680, U12255, H88360, R78934
Semi-supervised MPCAttribute-level Constraints
Gene clusters illustrating differentially expressed genes in tumor and normal samples
a. Cluster 6
b. Cluster 8
Generalizations
• WMPC extends the unweighted MPC to the weighted MPC.– If we initialize all entries in w to be 1 and never change the weights,
MPC-AFS becomes a regular MPC.
• SMPC extends the deterministic MPC to the stochastic MPC.– If we choose the particular kernel function (rectangular) and the
particular bandwidth parameter (the minimum distance between the target cluster and all the others) to estimate the probability, SMPC can be reduced to a regular MPC .
• Semi-supervised MPC extends unsupervised MPC to somehow supervised MPC.– Unsupervised MPC can be considered as a special case of semi-
supervised MPC with null background info and constraints.
Tool II: Motif Finding/Data Mining Tool
• Given a set of known binding sites, develop a representation of these binding sites that can be used to search for additional instances of those binding sites in the genome.
• Given a set of sequences known to be co-regulated (i.e. by an expression array) determine the binding locations in the sequence and determine a representation for binding specificity.
Motif Representations
• Static Sequences: tataat• Regular Expressions (RegEx): tat[at].t• Sequences with N errors: tataat:2• RegEx with N errors: tat[at].t:2• Mononucleotide Scoring Matrices:
• Dinucleotide Scoring Matrices (HMMs)
• Multinucleotide
Scoring Matrices
a t [at] . t t
a 75% 25% 50% 25% 25% 100%
25% 75% 50% 25% 75% 0%
0% 0% 0% 25% 0% 0%
0% 0% 0% 25% 0% 0%
tgc
1 2 3 4 5 6
Searching for Known Motifs
1. Obtain a multiple sequence alignment of known motifs (e.g. from gel shift assay)
…atagtt……aattat……attatt……ttactt…
2. Constructrepresentation
a t [at] . t t
a 75% 25% 50% 25% 25% 100%
25% 75% 50% 25% 75% 0%
0% 0% 0% 25% 0% 0%
0% 0% 0% 25% 0% 0%
tgc
3. Score all possiblewindows in the dataSet based on:
∑=b b
ibibseq p
ffiI ,
2, log)(4. Output results that exist overa specified threshold from data set
Finding Unknown Motifs1. Input a set of co-expressed
sequences that are related by micro-array experiment
2. Input: motif length n
3. Score all possible windows by firstConstructing a multiple sequence Alignment of the window to all otherpossible matches in the other sequences
4. Rank the set of allpossible scoring matricesof length n based on information contentrelative to background.
∑=b b
ibibseq p
ffiI ,
2, log)( 5. Output an ordered list of motifsand corresponding scoring matrices.
AGAST: Advanced Grammar Alignment Search Tool
• Capitalize on the advantages of alignment.
• Provide a formal and robust method for computing bio-relationships
• Provide optimum results based on the input.
• Calculate relationships in the same time as alignment.
• Allow for user knowledge and subsequence relationships.
• Record attributes and sequence attributes can be considered simultaneously.
• Dynamically construct requisite algorithms in a user friendly way, thus limiting development time and technical knowledge requirements.
AGAST: Advanced Grammar Alignment Search Tool
Advantages:• It can evaluate regular expressions important to biology
as well as traditional RegEx tools.
• It can evaluate traditional alignments.
• It can do any combination of RegEx and traditional alignments.
BioRegEx
BioRegEx II
Example of an Advanced Query
Find a sequence that contains:
tatatagcagcccatgagccggcccgcadtgctagttcag
Transcription Start Site
5-10 basesStart Codon
Any Number of BasesFunctional Unit
Example Query:
tatata.*{5,10}atg.*[gc]ca[at]gct[atgc]g:2.*
tatatagcaggggcccatgagccggcccccadagctcgttcag
tatatagcagcccatgagccggcccgcadtgagttcag
Score: 0
Score: 2
Conventional Problem
• Motif Searching programs do not calculate based on combinatorial regulation modules (instead they calculate based on probability of a single motif).
• We have developed and tested a program that considers an ordered set of motifs (or sequence attributes) and searches based on a context of adjacent elements (or grammar).
Next Steps
• Extract multiple motifs in the context of regulatory control networks.
• Use phylogenetic footprinting and gene regulatory network information to compare and contrast gene regulation networks and extrapolate combinatorial control mechanisms and corresponding motifs upstream of genes.
Other Current Research Projects
Advanced tools for identifying splice sites Using ab initio Bayesian networks based approaches Using homology graph theory based approaches
Fast Recognition of Microorganisms using enzyme cutting sequence, mass spectrometry or sequence based approaches
Gene Prediction using Comparative Genomics Reconstructing Gene Regulatory Networks Clustering Techniques for Simplifying Protein Sequences
Acknowledgment
Kiran Bastola Alexander Churbanov Xutao Deng Huimin Geng Steven Hinrichs Xiaolu Huang Daniel Kuyper Mark Pauley Daniel Quest