9
Comparison of computational approaches for identication and quantication of urinary metabolites in 1 H NMR spectraMarc Cassi ` ede, * a Pascal Mercier, b Paul R. Shipley, c Meghan Dueck, a Samineh Kamravaei, a Sindhu Nair, a James Mino, a Lei Pei, a David Broadhurst, d Paige Lacy a and Bernadette Qu ´ emerais a Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive, and non-destructive tool to elucidate detailed structures of small molecules in complex mixtures. A major initiative in NMR is the identication of metabolic changes in biological uids, particularly urine, as potential biomarkers for specic diseases or occupational exposure. However, major challenges are encountered during data processing of complex NMR spectra, presenting obstacles in the use of NMR analysis in clinical applications. In this report, metabolite concentrations were determined using three dierent computational approaches with complex NMR spectra obtained using 33 replicates of quality control (QC) human urine samples. We have used a new computational method involving Monte Carlo (MC) simulation to automatically deconvolve and quantify metabolites in NMR spectra from human urine. MC simulation is independent of experimental bias or human error, and is recommended as the least biased approach to peak tting for NMR spectra derived from human urine samples. We found that similar results could be obtained using MC simulation in urine samples compared with two previous approaches that are subject to experimental bias and/or human error. Introduction Proton nuclear magnetic resonance ( 1 H NMR) spectroscopy is used for metabolomics analysis of biological uids including urine samples, 13 particularly for early disease detection. 47 Urine samples consist of hundreds of metabolites, 8 which generate NMR spectra containing a plethora of peaks. 9 Conse- quently, substantial peak overlap and spectral interference in specic regions of NMR spectra make it more challenging to accurately analyse metabolites in urine than in cerebrospinal uid (CSF) and serum or plasma samples. 1012 Several NMR spectral proling techniques have been devel- oped to determine precise identities and exact concentrations of metabolic compounds in biological uids. The most commonly used method is manual targeted proling, in which each individual metabolite is identied and quantied by manually tting the experimental spectrum with a theoretical spectrum provided by a reference compound library. 13 However this procedure is slow, expensive, and prone to experimental bias or operator error. 9 This is one reason that metabolomic proling using NMR has not yet been fully validated for many clinical purposes. To be adopted as a widespread tool for clin- ical applications, metabolomics analysis of human biouids using 1 H NMR must be automated. Some automated methods that have been developed consist of matching experimental NMR spectra against a reference compound library, such as Chenomx or the Human Metabolome Database, that present metabolite data obtained at dierent magnetic eld strengths and dierent pH values. 14 Several algorithms have been proposed in the literature to address this complexity. 1517 However, none has been found to be universally applicable for the determination of urinary metabolites. Improvements in NMR processing techniques over the past three decades have allowed us to obtain highly detailed spectra that generate more accurate information about the molecular composition of solutions. In general, a probabilistic approach is used to quickly approximate the most appropriate metabolic prole. Mercier et al. developed an automated algorithm (AutoFit) that was tested by comparing reconstructed spectra with those obtained by computer simulation of human bio- uids (CSF, serum, and urine). 13 In each case, the measured concentrations were compared with those obtained by manual peak tting. The results showed a high correlation between a Department of Medicine, University of Alberta, Edmonton, AB, Canada. E-mail: [email protected] b National High Field NMR Centre (NANUC), University of Alberta, Edmonton, AB, Canada c Department of Chemistry, University of British Columbia, Kelowna, BC, Canada d School of Science, Edith Cowan University, Perth, Western Australia, Australia Electronic supplementary information (ESI) available. See DOI: 10.1039/c8ay00830b Cite this: Anal. Methods, 2018, 10, 2129 Received 12th April 2018 Accepted 14th April 2018 DOI: 10.1039/c8ay00830b rsc.li/methods This journal is © The Royal Society of Chemistry 2018 Anal. Methods, 2018, 10, 21292137 | 2129 Analytical Methods PAPER Published on 18 April 2018. Downloaded by University of Alberta on 7/3/2018 4:09:07 PM. View Article Online View Journal | View Issue

Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

AnalyticalMethods

PAPER

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article OnlineView Journal | View Issue

Comparison of c

aDepartment of Medicine, University of A

[email protected] High Field NMR Centre (NANUC

CanadacDepartment of Chemistry, University of BritdSchool of Science, Edith Cowan University,

† Electronic supplementary informa10.1039/c8ay00830b

Cite this:Anal. Methods, 2018, 10, 2129

Received 12th April 2018Accepted 14th April 2018

DOI: 10.1039/c8ay00830b

rsc.li/methods

This journal is © The Royal Society of C

omputational approaches foridentification and quantification of urinarymetabolites in 1H NMR spectra†

Marc Cassiede, *a Pascal Mercier,b Paul R. Shipley, c Meghan Dueck,a

Samineh Kamravaei,a Sindhu Nair,a James Mino,a Lei Pei,a David Broadhurst,d

Paige Lacy a and Bernadette Quemerais a

Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful,

non-invasive, and non-destructive tool to elucidate detailed structures of small molecules in complex

mixtures. A major initiative in NMR is the identification of metabolic changes in biological fluids,

particularly urine, as potential biomarkers for specific diseases or occupational exposure. However, major

challenges are encountered during data processing of complex NMR spectra, presenting obstacles in the

use of NMR analysis in clinical applications. In this report, metabolite concentrations were determined

using three different computational approaches with complex NMR spectra obtained using 33 replicates

of quality control (QC) human urine samples. We have used a new computational method involving

Monte Carlo (MC) simulation to automatically deconvolve and quantify metabolites in NMR spectra from

human urine. MC simulation is independent of experimental bias or human error, and is recommended

as the least biased approach to peak fitting for NMR spectra derived from human urine samples. We

found that similar results could be obtained using MC simulation in urine samples compared with two

previous approaches that are subject to experimental bias and/or human error.

Introduction

Proton nuclear magnetic resonance (1H NMR) spectroscopy isused for metabolomics analysis of biological uids includingurine samples,1–3 particularly for early disease detection.4–7

Urine samples consist of hundreds of metabolites,8 whichgenerate NMR spectra containing a plethora of peaks.9 Conse-quently, substantial peak overlap and spectral interference inspecic regions of NMR spectra make it more challenging toaccurately analyse metabolites in urine than in cerebrospinaluid (CSF) and serum or plasma samples.10–12

Several NMR spectral proling techniques have been devel-oped to determine precise identities and exact concentrationsof metabolic compounds in biological uids. The mostcommonly used method is manual targeted proling, in whicheach individual metabolite is identied and quantied bymanually tting the experimental spectrum with a theoretical

lberta, Edmonton, AB, Canada. E-mail:

), University of Alberta, Edmonton, AB,

ish Columbia, Kelowna, BC, Canada

Perth, Western Australia, Australia

tion (ESI) available. See DOI:

hemistry 2018

spectrum provided by a reference compound library.13 Howeverthis procedure is slow, expensive, and prone to experimentalbias or operator error.9 This is one reason that metabolomicproling using NMR has not yet been fully validated for manyclinical purposes. To be adopted as a widespread tool for clin-ical applications, metabolomics analysis of human biouidsusing 1H NMR must be automated. Some automated methodsthat have been developed consist of matching experimentalNMR spectra against a reference compound library, such asChenomx or the Human Metabolome Database, that presentmetabolite data obtained at different magnetic eld strengthsand different pH values.14 Several algorithms have beenproposed in the literature to address this complexity.15–17

However, none has been found to be universally applicable forthe determination of urinary metabolites.

Improvements in NMR processing techniques over the pastthree decades have allowed us to obtain highly detailed spectrathat generate more accurate information about the molecularcomposition of solutions. In general, a probabilistic approach isused to quickly approximate the most appropriate metabolicprole. Mercier et al. developed an automated algorithm(AutoFit) that was tested by comparing reconstructed spectrawith those obtained by computer simulation of human bio-uids (CSF, serum, and urine).13 In each case, the measuredconcentrations were compared with those obtained by manualpeak tting. The results showed a high correlation between

Anal. Methods, 2018, 10, 2129–2137 | 2129

Page 2: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Analytical Methods Paper

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

concentrations obtained from automated proles with thoseobtained using simulated or manual techniques for CSF andserum uids. However, this correlation was lower for urinesamples, due to the higher complexity of their NMR spectra andgreater number of urinary metabolites compared to CSF andserum samples.

Other probabilistic approaches have been developed toaccurately identify metabolites in 1H NMR samples of serum,plasma, and CSF, such as the NMR spectral proling technique,BAYESIL, proposed by Ravanbakhsh et al.9 or BATMAN devel-oped by Hao et al.;18 however, these have also shown limitationsin their application to more complex urine samples andgenerally require long computing time.19

It should be noted that many new approaches have beendeveloped during the last decade to overcome the limitedresolution of proton NMR spectra, such as using highermagnetic elds, “pure shi” NMR spectroscopy and multidi-mensional NMR.20 However, while pure shi spectra have beenused to collapse peaks multiplets to singlets by decouplingsignals in highly overlapped proton NMR spectral regions, it hasbeen reported in recent studies that their sensitivity is ingeneral signicantly below that of regular 1H NMR spectra.20,21

Only recent high-resolution NMR methods based on thecombination of pure shi NMR and spectral aliasing tech-niques have enabled the distinction of different compoundswith very similar 1H NMR spectra.22,23 However, these newpromising approaches have not yet been widely used in thecontext of metabolic samples. Moreover, two-dimensional NMRhas been used to reduce signal overlap by spreading resonancefrequencies in two dimensions.24 However, applications of 2DNMR have been largely restricted to qualitative analysis. Onereason for this is that the time for collecting 2D NMR spectra istypically much longer than the time required for collecting 1DNMR spectra. A second reason is that cross-peak volumemeasurements from 2D spectra are inuenced by many exper-imental factors (relaxation times, mixing time, evolution time,uneven excitation proles), and therefore are not overly usefulfor quantication.25 Recent developments of fast quantitative2D NMR approaches have been proposed to speed up theduration of 2D NMR experiments and improve their analyticalperformance in terms of resolution, sensitivity and accessiblespectral width.26–28 Despite their sensitivity limitation comparedwith their conventional counterpart, these innovative tech-niques show promising potential for quantitative analysis ofcomplex biological systems.29,30 Nevertheless, as the over-whelming majority of studies have used 1D 1H NMR in humanurine analysis, we concentrated our efforts on analyzing thelimitations of the manual peak tting approaches for quanti-tation of human urine metabolites in 1D 1H NMR spectra only.

Based on the observed difficulties in quantication ofurinary metabolites, we rst determined that appropriateselection criteria must be utilized to lter urinary metabolitesusing quality control (QC) samples to avoid measurement oferroneously identied metabolites. Next, we compared threedifferent computational methods for accurate determination ofmetabolite concentrations in 33 replicates of a pooled QC urinesample. Urinary metabolite concentrations were obtained from

2130 | Anal. Methods, 2018, 10, 2129–2137

the already established Chenomxmanual peak tting techniqueand compared with a progressive spectral region reconstruction(PSRR) method and a Monte Carlo (MC) simulation technique,all of which were based on the deconvolution of complexexperimental NMR spectra using a target reference signal.

Experimental

Urine samples were obtained from 5 male non-smoking healthysubjects following 12 h of fasting using a standard urinecollection protocol. Samples were individually collected in pre-cleaned polystyrene urine collection bottles and immediatelyrefrigerated aer collection. All urine samples were subse-quently pooled, centrifuged in a refrigerated centrifuge, ali-quoted and then frozen at �80 �C until NMR analysis. Sampletreatment was performed within 3–4 hours of sample collection,which limits bacterial growth. Ethics approval was obtainedfrom the institutional ethics committee for human research atthe University of Alberta to carry out this study and informedconsent was obtained from all subjects.

Sample processing for NMR

On the day of NMR analysis, the sample was thawed at roomtemperature, vortexed for a minimum of 1 min to ensureuniformity, and then placed on ice. Replicate quality control(QC) samples (33) from the same pooled urine were prepared byadding 200 ml of internal standard (IS) solution (IS-1, ChenomxIS: DSS with added imidazole) to 1.8 ml urine, followed byadjustment to pH 7.0 � 0.1 using minimal appropriate volumesof 1 M NaOH. A portion of each sample was transferred intoa 5 mm NMR tube (750 ml) (Wilmad-LabGlass, Vineland, NJ).These samples are part of a larger study involving about220 samples and each QC sample was inserted at every other 9samples in the queue and were run on different days.

NMR spectra were acquired at the NANUC facility on anOxford 14.09 Tesla (600 MHz) VNMRS spectrometer equippedwith a 5 mm inverse-proton (HX) probe with Z-axis gradient coiland Varian 768 AS robotic system with sample cooling capa-bility. One-dimensional NMR spectra were collected at 25 �C byusing the rst increment of a 2-dimensional-1H,1H-NOESY witha transmitter presaturation delay of 990 ms for water suppres-sion, a 100 ms mixing time and a spectral width of 7200 MHz.All spectra were collected aer 8 steady-state scans, using a 90�

proton pulse width of z10.7 ms. The total acquisition time was4 s per transient, with 256 transients acquired per sample.

The data were apodized with an exponential window func-tion corresponding to a line broadening of 0.1 Hz, zero-lled to64 complex points, Fourier-transformed, phased, and baseline-corrected for further analysis. Reference deconvolution usingthe methyl peak of DSS as the reference peak was used to correctfor line shapes.

Manual tting with Chenomx NMR Suite soware

For metabolomics analysis of biological uids by NMR spec-troscopy, we used the Chenomx NMR Suite soware applicationdeveloped by Chenomx, Inc. (Edmonton, AB, Canada). This

This journal is © The Royal Society of Chemistry 2018

Page 3: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Paper Analytical Methods

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

soware allows targeted proling of NMR spectra and enablesthe rapid identication and quantication of hundreds ofmetabolites that are present in common biouids such as urine,blood, and cell extracts.31 This approach is different frompreviously published methods that depend on bucketing orbinning of spectral regions.32,33

For the rst computational approach for peak tting, Che-nomx employees manually matched the pH-sensitive library ofcompound signatures with peak positions and shapes in eachNMR spectrum, which is the “industry standard” for peaktting. A peak cluster is dened as a peak multiplet which isobtained by modeling and tting the experimental resonancepattern of every individual proton of a given metabolite to theexpected network of J-couplings, based on its molecular struc-ture.13 The center of each peak cluster was placed within aninterval of allowed chemical shi positions (so called transformwindow) dictated by the pH of the sample. For example, Fig. 1shows the peak tting for the peak at 3.6 ppm of glycine with thecorresponding transform window around neutral pH (7.0).

Computational generalities

The spectral deconvolution algorithms explored in this paperattempt to reconstruct the experimental NMR spectrum I in thefrequency domain using a linear combination of each of thereference compound spectra following a Lorentzian model forindividual peaks:

IðxÞ ¼Xn metabolites

i¼1

ciXn clusters

j¼1

Xn peaks

k¼1

hi;j;kwi;j;k2

4�x� �

di;j;k þ Ddi;j��2 þ wi;j;k

2(1)

where, �I(x) is the predicted spectral intensity at frequency x, cithe concentration of compound i (linear variable), di,j,k theresonance frequency of peak k of cluster j of compound i, Ddi,jthe center offset (called “transform”) of peak cluster j ofcompound i (non-linear variable), and hi,j,k and wi,j,k the inten-sity and line width of peak k in cluster j of compound i,respectively. The parameters hi,j,k and wi,j,k are constants oncethe compound library has been calibrated to the providedspectrum. The automated tting procedure consists of ndingthe c and Dd values that minimize the target function:

Fig. 1 Peak fitting for glycine in urine at 3.6 ppm performed withChenomx NMR Suite software. Black line represents the experimentalspectrum curve, blue line is the compound line, and red line representsthe sum of all fitted compounds.

This journal is © The Royal Society of Chemistry 2018

c2 ¼ kI � �Ik2 (2)

In matrix notation, this is equivalent of multiplying a designmatrix of dimensions (nspectralpoints, ncompounds) holdingan unscaled representation of the individual compound signa-tures by a concentration column vector of length ncompounds(Fig. 2). The mathematical challenge resides in nding both thebest combination of the peak transforms that denes the designmatrix, and the compound concentrations by which it ismultiplied to recreate the experimental spectrum with thehighest delity (eqn (2)).

In our study, a combined total number of 920 variables hadto be optimized (770 peak clusters, 150 concentrations) in eachspectrum. Despite the use of lower and upper bounds imposedon each variable, the dimensional space to explore is enormous.

Spectral deconvolution was achieved using two distinctautomated approaches. The pseudo-code with the differentsteps for data processing involved in these two automatedspectral tting algorithms, PSRR and MC, which werecompared with Chenomx manual tting approach is shown inTable 1. With both techniques, the process started by calibra-tion of the same reference spectral library supplied by Che-nomx, based on the position, intensity and linewidth of thereference peak of the internal standard DSS. Information aboutthe pH of the processed sample and the concentration of thereference compound was then used to calibrate the lower andupper bounds on the cluster transform windows. The nal stepwas the reconstruction of the query spectrum by the two opti-mization methods and the determination of the compoundconcentrations and the peak cluster positions. The timerequired by both algorithms varies with the number of spectralpoints and the number of compounds to be tted (the totalnumber of variables). In the current study, an average of �45min per spectra were required on a 2015 iMac computer withquad-core Intel i7 CPU.

Comparison between PSRR and MC methods

For the PSRR method, the query spectrum was reconstructedin an iterative manner, in which each spectrum is decomposedinto smaller sub-regions. The solutions found at each stepwere used as starting values for subsequent iterations. Ulti-mately, in the last minimization stage, the entire spectrumwas tted with the complete set of variables and a single

Fig. 2 Schematic representation of the spectral deconvolutionproblem in matrix form.

Anal. Methods, 2018, 10, 2129–2137 | 2131

Page 4: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Table 1 Schematic comparison between the two automated profiling algorithms (PSRR and MC)

Analytical Methods Paper

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

solution was obtained from median values of 10 repeated ts.The Monte Carlo algorithm, on the other hand, was used todigest the entire spectrum at once (in its current version), and

2132 | Anal. Methods, 2018, 10, 2129–2137

several solutions were obtained at the last stage. With MC,concentrations were calculated from the mean of the 1000 bestnal solutions.

This journal is © The Royal Society of Chemistry 2018

Page 5: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Table 2 Numbers of metabolites passing the test (RSD # 20%)between the three peak fitting methods (X ¼ included set)

Chenomx PSRRMonteCarlo

Passed metabolitesin selected sets(out of 149)

Percentage ofpassed metabolitesin selected sets

X 79 53X 61 40

X 60 40X X 38 25X X 42 28

X X 43 28X X X 28 18

Paper Analytical Methods

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

Results and discussionCorrelation between the three methods for creatininequantication

Creatinine is a breakdown product of creatine phosphate inmuscle, and is usually produced at a fairly constant rate by thebody depending on muscle mass.34 Therefore the urinary creat-inine concentration is commonly used to normalize metaboliteconcentrations in urine to remove inter-person variation due tosample dilution, which is important for population studies.35 Inthis study, we compared replicates from a pooled QC sample,which reduces inter-sample concentration issues and negatesthe need for creatinine normalization. However, because creat-inine is one of the most abundant detectable metabolites inurine, the tting procedure of its high amplitude peaks is rela-tively straightforward. To address reproducibility of creatininemeasurements, we determined creatinine levels in replicateurine samples using the three peak tting approaches. The NMRsignal of creatinine is composed of two singlet peaks locatedaround 3.0 and 4.0 ppm at neutral pH. In the analysis of repli-cates of our QC samples, creatinine concentrations obtained byPSRR and MC closely matched those obtained by the originallyestablished Chenomx manual peak tting technique, as shownin Fig. 3. The MC method was slightly higher in variance thanPSRR with a mean error relative to manual Chenomx techniqueof 1.2% and 0.5%, respectively. Moreover, only 6 of all measuredvalues (99) were outside the 5% relative error interval.

Calculation of relative standard deviations of metaboliteconcentrations in QC samples

Using the three tools for peak tting in the 33 replicate urinesamples had a major impact on metabolite detection and

Fig. 3 Comparison of creatinine concentrations determined by thethree peak fitting methods versus replicate number. : Chenomx, :PSRR and : MC.

This journal is © The Royal Society of Chemistry 2018

measurement (Table 2). While the original Chenomx peaktting method demonstrated that 79 metabolites (53% of allanalyzed metabolites) had #20% relative standard deviation(RSD), the number of metabolites that passed the same test forboth Chenomx and MC methods dropped to 42 (28% of passedmetabolites) and only 28 metabolites (18% of analyzed metab-olites) passed the test for all three methods. To identify thesource of discrepancies in metabolite concentration measure-ments, multivariate statistical analysis was carried out.

Statistical analysis

The complete quantication tables for the 149 targetedmetabolites were provided in ESI Table S1.† Their correspond-ing mean concentration in mM, standard deviation (SD) and %RSD values from the 33 measurements for each method werealso calculated. Themetabolites were ranked in order from least% RSD to most % RSD according to the MC method. Metabo-lites that passed #20% RSD were bolded. The discrepancy inpassed metabolites between the Chenomx and MC methods ofpeak tting may be explained by a signicant number ofmetabolites (60 out of 149) that have not been quantied bymanual Chenomx technique due to the complexity of the NMRspectral regions of interest, and would have been labeled “Not-a-Number” (NaN). A major strength of MC-based peak tting isthat it avoids the issue of missing values (NaNs) in the data,which happens more frequently with Chenomx and PSRR-basedpeak tting approaches. MC values had a mean error of 37%with 66% SD relative to manual Chenomx tting, with somemetabolites presenting higher differences in quanticationusing manual Chenomx peak tting as the “industry standard”.

We next examined the peak tting approaches using prin-cipal components analysis (PCA). PCA is a multivariate statis-tical technique that analyzes a data table representingobservations described by several dependent and inter-correlated variables. Its goal is to extract important informa-tion from the data table, and to express this information as a setof new orthogonal variables (principal components), which areobtained as linear combinations of the original variables.36 Therst principal component describes the largest variance, andthe second principal component is computed under theconstraint of being orthogonal to the rst component. In ourstudy, the 33 replicate QC samples represent the observations,

Anal. Methods, 2018, 10, 2129–2137 | 2133

Page 6: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Analytical Methods Paper

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

and the variables that describe them are the metabolites. Thedifferent levels of metabolite cluster separation between thethree methods are shown in Fig. 4. Mean centering and scalingto unit variance transformations were carried out beforemodelling. This data pretreatment method called autoscalinggives equal importance to each analyte, whether it is high or lowin concentration.37 First, when 149 metabolites that were orig-inally tted to the spectra were considered, there was a signi-cant separation in the rst principal component between theChenomx data, located in the negative PC1 space, and the dataobtained by PSRR and MC that were in the positive PC1 space.Separation was also observed on the second principal compo-nent between metabolite data obtained by PSRR and MCmethods (Fig. 4a). The same separation was observed betweeneach metabolite cluster when only the 42 metabolites thatpassed both the Chenomx and MC methods (because of having#20% RSD) were considered in the analysis (Fig. 4b). However,the separation between clusters was less pronounced than inFig. 4a. Finally, when only the 28 metabolites that passed thethree methods were considered, the clusters approached eachother, with overlapping PSRR and MC clusters (Fig. 4c).

The clusters representing the Chenomx data were also muchmore tightly grouped, in contrast to the greater variabilityobserved using the two algorithmic techniques (Fig. 4). Thismay be explained by a difference in the approach to spectralanalysis in Chenomx peak tting, which consists in matchingthe experimental NMR spectrum with the spectral signaturelibrary of targeted metabolites. By applying the same proletemplate to all QC samples, limited adjustments are needed inboth concentrations and chemical shis. That explains thesmaller variance in metabolite concentrations obtained bythe Chenomx approach. Such tight clustering suggests that theoperator based measurements on a previously set template forpeak tting to detect and quantify metabolites. Indeed, this was

Fig. 4 PCA plots representing metabolite concentration clusters obtametabolites that passed both Chenomx and MC methods. (c) Only the 2

2134 | Anal. Methods, 2018, 10, 2129–2137

the case as we were informed by Chenomx that all samples weresubjected to a similar peak tting strategy to the rst sample.Therefore, all subsequent ts by Chenomx only required verylittle adjustment, which may explain the tighter cluster ofChenomx points in PCA plots. Inversely, the automatedapproaches start from blank for all tted spectra, which impliesthat the result of a previous t is not used as a starting point toanother. The mathematical methods used for automatedproling contain by their nature elements of randomness. Theoptimal solution which is obtained (by chi-square minimiza-tion) may vary for a given spectrum from one run to the next. Inother terms, several solutions corresponding to several combi-nations of the linear and non-linear variables may exist, thatgive more or less the same minimum value of chi-square.That explains the larger variance found by the automatedprocedures.

Although the points in the Chenomx cluster were relativelytightly grouped, we observed that three points were locatedoutside the 95% condence ellipse, which is based on a Tdistribution of the scores in each principal component. On theother hand, the PSRR cluster presented four such outliers, whilethe MC cluster had only one.

To identify which specic metabolites drove the clusters foreach method in positive or negative PC1 and PC2 spaces, or areresponsible for outliers, the loading values for metabolites thatpassed the three methods were inspected (Fig. 5). Thecompounds starting with the letter “u” are unknownmetaboliteswhose peaks could not be assigned to any known metabolites inthe available library. The loading values for most metabolites inthe rst component were negative, revealing that concentrationvalues for those metabolites were underestimated using theChenomx peak tting technique relative to the two simulationtechniques, leading these to drive the Chenomx cluster intopositive PC1 space.

ined by the three methods. (a) All 149 metabolites. (b) Only the 428 metabolites that passed the three methods.

This journal is © The Royal Society of Chemistry 2018

Page 7: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Fig. 5 Loading values on the first and second principal components for metabolites that passed all three methods.

Paper Analytical Methods

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

To illustrate these differences, three example metaboliteswere chosen that had (1) strong negative PC1 and PC2 loadings(methylguanidine), (2) both loadings close to zero (hypoxan-thine), and (3) strong positive PC1 and PC2 loadings (glycine)(Fig. 6). For methylguanidine, greater variability in concentra-tions was obtained using the PSRR and MCmethods than thosedetermined by manual peak tting with Chenomx. There wasone outlier that was signicantly lower than the median usingthe Chenomx technique, whereas two values were consideredoutliers in PSRR concentrations. In addition, the median valuewas twice higher using simulation techniques than with themanual technique. Measured concentrations were equivalent inall three approaches for hypoxanthine while Chenomx valueswere higher relative to PSRR and MC for glycine.

To explain the origin of these discrepancies, we consideredthe peak tting of individual NMR spectra carried out with the

Fig. 6 Boxplots showing the spread in concentrations between QC sam

This journal is © The Royal Society of Chemistry 2018

three methods. We found that the manually tted peak in thecluster at 2.8 ppm for a resonance attributed to methyl-guanidine was distinct between the rst two QC samples (Fig. 7aand d), with different assigned chemical shis. Sample #1 of thereplicates was sent rst to Chenomx and was analyzed sepa-rately from the other replicate samples. The operator was biasedby a previously set template for peak tting. Then a differentpeak tting strategy was applied to sample #2 and the newtting template was used for all subsequent samples, leading todiscrepancies in metabolite concentrations. This explains thepresence of an outlier for methylguanidine using the Chenomxmethod (Fig. 6a). On the other hand, although the amplitude ofthe tting curve exceeded experimental spectral lines with PSRRand MC methods, the chemical shi position was the same forboth samples (Fig. 7b and e, c and f). The quality of peak ttingfor methylguanidine appeared to improve when using MC

ples for (a) methylguanidine, (b) hypoxanthine, and (c) glycine.

Anal. Methods, 2018, 10, 2129–2137 | 2135

Page 8: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Fig. 7 Peak fitting for the cluster at 2.8 ppm of methylguanidine. (a) QC sample no. 1 – Chenomx, (b) QC sample no. 1 – PSRR, (c) QC sample no.1 – MC, (d) QC sample no. 2 – Chenomx, (e) QC sample no. 2 – PSRR, (f) QC sample no. 2 – MC.

Analytical Methods Paper

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

compared with PSRR and manual tting. Therefore, the relativeprecision of concentrations between QC samples for methyl-guanidine may be explained by the ability of a human operatorto repeatedly t the same peak in a cluster regardless of theaccuracy of the measurement based on chemical shi.

The tted spectra for hypoxanthine and glycine, presented inESI Fig. S2,† revealed that the peak position was consistentbetween the three methods while a higher peak height variabilitybetween samples was observed for PSRR and MC approachescompared with Chenomx.

We have noticed similar inconsistencies in the analysisresults provided by Chenomx, such as differences in ttedsignal intensities or chemical shi positions between replicatesof the same QC sample for 12 other metabolites, which havebeen described in ESI Fig. S2.† While the choice of tted peakfor a particular metabolite could vary between replicates, thesame peak was usually assigned by both PSRR and MCmethods, which conrms the fact that the two automatedmethods are less precise, but generally agree with each otherand disagree with Chenomx in some cases.

The arbitrary choice of one peak in a multiple peak clustercan be an advantage when a person may decide to put moreweight on certain clusters than others, based on a judgementthat they reside in “clear” regions of the spectrumwith less peakoverlap. This peak selection was not possible in the currentversion of the computational algorithms used in this work.However, it can also be a limitation of the manual peak ttingtechnique using Chenomx soware as several operators usingthe manual peak tting technique may generate differentoutcomes. The new algorithmic approaches with PSRR and MCmethods may produce less precise, but more accurate, estima-tions of metabolite concentrations, as all linear and non-linearvariables of each problem may be optimized at the same time.In addition, operator error is eliminated by the application ofPSRR or MC peak tting approaches.

One limitation of our automated approach is that it was notpossible to check if the correct peak was picked for somemetabolites that have their clusters located in highly overlapped

2136 | Anal. Methods, 2018, 10, 2129–2137

spectral regions. It would be a very difficult and expensiveprocess to prepare a synthetic solution of urine containing allthe 149 targeted metabolites with known concentrations. Stan-dard mixtures of urine for NMR-based metabolomics analysisusually contain no more than 30 compounds.25,38 Therefore, thecapacity of automated peak tting approaches to accuratelymeasure urinary metabolite concentrations in real samples willbe difficult to assess.

One approach to this problem may be to use spike-inexperiments and to be able to spike a QC urine sample witha pure metabolite compound and see which specic peak(s) ina cluster increases in intensity, so that we may identify thecorrect peak to t.38 Another avenue for more detailed molec-ular structure information and more accurate peak identica-tion would be two dimensional NMR spectroscopy.39 This iswarranted for future investigations.

Conclusions

In this work, we have used a Monte Carlo simulation approachto optimize and automate metabolite concentrationmeasurements in 1D 1H NMR spectra obtained from replicatesof a single pooled QC urine sample. Statistical analysisrevealed signicant differences in Monte Carlo simulationoutcomes from results obtained with a standard manual peaktting technique. These ndings indicate that substantialvalidation must be carried out using appropriate qualitycontrol samples for NMR analysis of urine samples before anycomparison can be made between control and case samples.We strongly believe that this new Monte Carlo approach forpeak tting may become a benchmark technique for futureinvestigation in the prominent eld of urine metabolomicsusing NMR spectroscopy.

Conflicts of interest

There are no conicts to declare by any of the authors of thisstudy.

This journal is © The Royal Society of Chemistry 2018

Page 9: Comparison of computational approaches for identification ......Nuclear magnetic resonance (NMR) spectroscopy is extensively used in analytical chemistry as a powerful, non-invasive,

Paper Analytical Methods

Publ

ishe

d on

18

Apr

il 20

18. D

ownl

oade

d by

Uni

vers

ity o

f A

lber

ta o

n 7/

3/20

18 4

:09:

07 P

M.

View Article Online

Acknowledgements

The authors would like to thank Christopher Manning, WeldingProgram at the Northern Alberta Institute of Technology for hisassistance in obtaining urine samples from apprentice welders,which were used for this study. This work was supported by OHSFutures, Government of Alberta and the Lung Association ofAlberta and Northwest Territories.

References

1 J. K. Nicholson, J. C. Lindon and E. Holmes, Xenobiotica,1999, 29, 1181–1189.

2 C. W. Armstrong, N. R. McGregor, D. P. Lewis, H. L. Butt andP. R. Gooley, Metabolomics, 2017, 13, 8.

3 M. Cassiede, S. Nair, M. Dueck, J. Mino, R. McKay,P. Mercier, B. Quemerais and P. Lacy, Clin. Chim. Acta,2017, 464, 37–43.

4 E. Holmes, P. J. D. Foxall, J. K. Nicholson, G. H. Neild,S. M. Brown, C. R. Beddell, B. C. Sweatman, E. Rahr,J. C. Lindon, M. Spraul and P. Neidig, Anal. Biochem., 1994,220, 284–296.

5 E. Holmes, P. J. D. Foxall, M. Spraul, R. D. Farrant,J. K. Nicholson and J. C. Lindon, J. Pharm. Biomed. Anal.,1997, 15, 1647–1659.

6 P. Zaragoza, J. L. Ruiz-Cerda, G. Quintas, S. Gil,A. M. Costero, Z. Leon, J. L. Vivancos and R. Martınez-Manez, Analyst, 2014, 139, 3875–3878.

7 C. K. Larive, G. A. Barding Jr and M. M. Dinges, Anal. Chem.,2015, 87, 133–146.

8 A.-H. Emwas, C. Luchinat, P. Turano, L. Tenori, R. Roy,R. M. Salek, D. Ryan, J. S. Merzaban, R. Kaddurah-Daouk,A. C. Zeri, G. A. Nagana Gowda, D. Raery, Y. Wang,L. Brennan andD. S.Wishart,Metabolomics, 2015, 11, 872–894.

9 S. Ravanbakhsh, P. Liu, T. C. Bjordahl, R. Mandal, J. R. Grant,M. Wilson, R. Eisner, I. Sinelnikov, X. Hu, C. Luchinat,R. Greiner and D. S. Wishart, PLoS One, 2015, 10, 0124219.

10 D. S. Wishart, M. J. Lewis, J. A. Morrissey, M. D. Flegel,K. Jeroncic, Y. Xiong, D. Cheng, R. Eisner, B. Gautam, D. Tzur,S. Sawhney, F. Bamforth, R. Greiner and L. Li, J. Chromatogr.B: Anal. Technol. Biomed. Life Sci., 2008, 871, 164–173.

11 H. R. Cho, H. Wen, Y. J. Ryu, Y. J. An, H. C. Kim, W. K. Moon,M. H. Han, S. Park and S. H. Choi, Cancer Res., 2012, 72,5179–5187.

12 N. Psychogios, D. D. Hau, J. Peng, A. C. Guo, R. Mandal,S. Bouatra, I. Sinelnikov, R. Krishnamurthy, R. Eisner,B. Gautam, N. Young, J. Xia, C. Knox, E. Dong, P. Huang,Z. Hollander, T. L. Pedersen, S. R. Smith, F. Bamforth,R. Greiner, B. McManus, J. W. Newman, T. Goodfriend andD. S. Wishart, PLoS One, 2011, 6, e16957.

13 P. Mercier, M. J. Lewis, D. Chang, D. Baker and D. S. Wishart,J. Biomol. NMR, 2011, 49, 307–323.

14 J. J. Ellinger, R. A. Chylla, E. L. Ulrich and J. L. Markley, Curr.Metabolomics, 2013, 1, 28–40.

15 C. Zheng, S. Zhang, S. Ragg, D. Raery and O. Vitek,Bioinformatics, 2011, 27, 1637–1644.

This journal is © The Royal Society of Chemistry 2018

16 D. Tulpan, S. Leger, L. Belliveau, A. Culf and M. Cuperlovic-Culf, BMC Bioinf., 2011, 12, 400.

17 J. Gomez, J. Brezmes, R. Mallol, M. A. Rodrıguez, M. Vinaixa,R. M. Salek, X. Correig and N. Canellas, Anal. Bioanal. Chem.,2014, 406, 7967–7976.

18 J. Hao, M. Liebeke, W. Astle, M. De Iorio, J. G. Bundy andT. M. D. Ebbels, Nat. Protoc., 2014, 9, 1416–1427.

19 Maulidiani, Rudiyanto, A. Mediani, A. Khatib, A. Ismail,M. Hamid, N. H. Lajis, K. Shaari and F. Abas,Metabolomics, 2017, 13, 131.

20 K. Zangger, Prog. Nucl. Magn. Reson. Spectrosc., 2015, 86–87,1–20.

21 J. Mauhart, S. Glanzer, P. Sakhaii, W. Bermel and K. Zangger,J. Magn. Reson., 2015, 259, 207–215.

22 L. Castanar, R. Roldan, P. Clapes, A. Virgili and T. Parella,Chem.–Eur. J., 2015, 21, 7682–7685.

23 J. Ying, J. Roche and A. Bax, J. Magn. Reson., 2014, 241, 97–102.

24 R. K. Rai, P. Tripathi and N. Sinha, Anal. Chem., 2009, 81,10232–10238.

25 I. A. Lewis, S. C. Schommer, B. Hodis, K. A. Robb, M. Tonelli,W. M.Westler, M. R. Sussman and J. L. Markley, Anal. Chem.,2007, 79, 9385–9390.

26 P. Giraudeau, Magn. Reson. Chem., 2014, 52, 259–272.27 S. Akoka and P. Giraudeau, Magn. Reson. Chem., 2015, 53,

986–994.28 P. Giraudeau and L. Frydman, Annu. Rev. Anal. Chem., 2014,

7, 129–161.29 J. Farjon, C. Milande, E. Martineau, S. Akoka and

P. Giraudeau, Anal. Chem., 2018, 90, 1845–1851.30 C. Mauve, S. Khli, F. Gilard, G. Mouille and J. Farjon, Chem.

Commun., 2016, 52, 6142–6145.31 A. M. Weljie, J. Newton, P. Mercier, E. Carlson and

C. M. Slupsky, Anal. Chem., 2006, 78, 4430–4442.32 A. Beneduci, G. Chidichimo, G. Dardo and G. Pontoni, Anal.

Chim. Acta, 2011, 685, 186–195.33 J. C. Lindon, E. Holmes and J. K. Nicholson, Prog. Nucl.

Magn. Reson. Spectrosc., 2001, 39, 1–40.34 Y. Zuo, C. Wang, J. Zhou, A. Sachdeva and V. C. Ruelos, Anal.

Sci., 2008, 24, 1589–1592.35 E. J. Cone, Y. H. Caplan, F. Moser, T. Robert, M. K. Shelby

and D. L. Black, J. Anal. Toxicol., 2009, 33, 1–7.36 H. Abdi and L. J. Williams, Wiley Interdiscip. Rev. Comput.

Stat., 2010, 2, 433–459.37 R. A. van den Berg, H. C. J. Hoefsloot, J. A. Westerhuis,

A. K. Smilde and M. J. van der Werf, BMC Genomics, 2006,7, 142.

38 W. Gronwald, M. S. Klein, H. Kaspar, S. R. Fagerer,N. Nurnberger, K. Dettmer, T. Bertsch and P. J. Oefner,Anal. Chem., 2008, 80, 9288–9297.

39 J. L. Markley, R. Bruschweiler, A. S. Edison, H. R. Eghbalnia,R. Powers, D. Raery and D. S. Wishart, Curr. Opin.Biotechnol., 2017, 43, 34–40.

Anal. Methods, 2018, 10, 2129–2137 | 2137