Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
www.sciencemag.org/cgi/content/full/science.aaq1327/DC1
Supplementary Materials for
Co-regulatory networks of human serum proteins link genetics to disease
Valur Emilsson*†, Marjan Ilkov*, John R. Lamb*†, Nancy Finkel, Elias F. Gudmundsson, Rebecca Pitts, Heather Hoover, Valborg Gudmundsdottir, Shane R. Horman, Thor Aspelund, Le Shu, Vladimir Trifonov, Sigurdur Sigurdsson, Andrei Manolescu, Jun Zhu, Örn Olafsson,
Johanna Jakobsdottir, Scott A. Lesley, Jeremy To, Jia Zhang, Tamara B. Harris, Lenore J. Launer, Bin Zhang, Gudny Eiriksdottir, Xia Yang, Anthony P. Orth, Lori L. Jennings‡,
Vilmundur Gudnason†‡
*These authors contributed equally to this work. †Corresponding author. Email: [email protected] (V.E.); [email protected] (V.G.);
[email protected] (J.R.L.) ‡These authors contributed equally to this work.
Published 2 August 2018 on Science First Release
DOI: 10.1126/science.aaq1327
This PDF file includes:
Materials and Methods Figs. S1 to S14 Tables S2, S5, S8, S11, S12, S16, and S18 to S20 Captions for tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21, and S22 References
Other Supplementary Materials for this manuscript include the following: (available at www.sciencemag.org/cgi/content/full/science.aaq1327/DC1)
Tables S1, S3, S4, S6, S7, S9, S10, S13 to S15, S17, S21 and S22 (Excel)
2
Materials and Methods
1. The study cohort
Cohort participants aged 66 through 96 were from the (AGES) – Reykjavik Study (12), a
single-center prospective population-based study of deeply phenotyped subjects (N = 5,457,
mean age 76.6±6 years). AGES Reykjavik is the study of the elderly and all survivors of the
50-year-long prospective Reykjavik study (N = 19,360), an epidemiologic study focusing on
four biologic systems: vascular, neurocognitive (including sensory), musculoskeletal, and
body composition/metabolism. Blood samples were collected at the AGES-Reykjavik
baseline after an overnight fast, serum was prepared using a standardized protocol and stored
in 0.5 ml aliquots at -80°C.
Prevalent coronary heart disease (CHD) was defined as previous myocardial infarction
(MI), coronary artery bypass graft or percutaneous intervention (PCI) obtained from hospital
records or prevalent MI, and according to echocardiography at AGES visit. Incident CHD
events included fatal CHD or incident non-fatal CHD (International Classification of Diseases
(ICD) 9th edition, codes 410, 411, 414, 429, and ICD-10th edition, codes I21–I25), obtained
from cause of death registries and hospitalization records. The criteria for heart failure (HF)
was based on symptoms, signs, chest X-ray, and echocardiographic findings from hospital
records adjudicated by examining every record for both prevalent and incident HF (8 year
follow-up). Metabolic syndrome (MetS) is defined by three or more of the following: 1.
Fasting glucose ≥ 5.6mmol/L, blood pressure ≥ 140/90, triglycerides ≥ 1.7 mmol/L,
0<HDL<0.9 mmol/L males or 0<HDL<1.0 mmol/L for females, BMI > 30kg/m2. Roughly
20.7% of the AGES study population falls under the criteria of having developed MetS.
Systolic and diastolic blood pressure were measured using a Mercury sphygmomanometer 2
times in a supine position, BMI was calculated as weight (kg) divided by height (in meters)
squared, TG, HDL cholesterol, and plasma glucose levels were measured on fasting blood
samples. TG was measured using enzymatic colorimetry (Roche Triglyceride Assay Kit),
HDL with an enzymatic in vitro assay (Roche Direct HDL Cholesterol Assay Kit), and
glucose was measured using photometry (Roche Hitachi 717 Photometric Analysis System).
T2D (all prevalent cases) was determined from self-reported diabetes, diabetes medication
use, or fasting plasma glucose ≥ 7 mmol/L according to ADA (31). Computed tomography
(CT) imaging of the mid-thigh and abdomen at the L4/L5 vertebrae was performed with a 4-
row detector system (Sensation; Siemens Medical Systems, Erlangen, Germany). Visceral
adipose tissue (VAT) and abdominal subcutaneous adipose tissue (SAT) were estimated from
a single 10mm thick trans-axial section via CT. Images were loaded into an AVS5 display
environment. VAT was distinguished from SAT by tracing along the facial plane defining the
internal abdominal wall. Adipose areas were calculated by multiplying the number of pixels
by the pixel area using specialized software (University of California, San Francisco). Finally,
we assessed survival probability for individual proteins and the eigenvectors (E(q)
s, q denotes
a specific module as explained below) of protein modules in a 12-14 year follow-up study, i.e.
both overall survival with 2,982 events as well as survival post incident CHD with 692 events.
Follow-up time for overall survival was defined as the time from entry into AGES until death
from any cause or end of follow-up (end of year 2016), while follow-up time for survival post
incident CHD was defined as the time from 28 days after an incident CHD-event until death
from any cause or end of follow-up time. Table S2 reports the baseline characteristics of the
study cohort.
Given the frequent associations of the serum proteins to prevalent disease throughout
the present study, then we wanted to learn if the association of proteins to prevalent disease
like CHD was influenced by the time of diagnosis prior to the time of sampling. Note
3
however, that numerous protein biomarkers linked to CHD have been identified using
prevalent disease in the discovery phase (32), including CRP, LpPLA2, NT-proBNP, cTnT
and Lp(a) to mention few. More to the point, the mean±SD time of diagnosis of CHD to the
time of sampling was 6.08±5.37 years in the AGES cohort. We performed forward logistic
regression analysis using either all 1,217 prevalent CHD cases or 700 CHD cases that were
diagnosed with the disease within 5 years before the entry into the AGES study. We found
that 927 proteins were correlated with CHD at a Bonferroni adjusted P-value <0.05 using all
prevalent cases, while 859 proteins were associated with the disease in the time restricted
analysis (fig. S8). Of these, an identical set of 768 proteins were found in both analyses. The
list of top proteins and the effect sizes of the overlapping protein set was unchanged between
the analyses, while the P-value was numerically higher in the time-restricted analysis most
likely due to reduced power as there were 517 fewer cases (fig. S8). In summary, variable
levels of proteins associated with prevalent disease like CHD were not affected by restricting
the analysis to the time of diagnosis to the time of sampling. The AGES-Reykjavik study was approved by the NBC in Iceland (approval number
VSN-00-063), and by the National Institute on Aging Intramural Institutional Review Board
(US), and the Data Protection Authority in Iceland. Informed consent was obtained from all
study participants.
2. Protein measurements and assessment of aptamer specificity
Each protein has its own detection reagent selected from chemically modified DNA libraries,
referred to as Slow-Off rate Modified Aptamers (SOMAmer). With the focus set on proteins
known or predicted to be present extracellularly or on the surface of cells, a new custom-
designed Novartis SOMAscan 5K platform was developed that measures 5,034 protein
analytes in a single serum sample, of which 4,783 SOMAmers bind specifically to human
proteins (4,137 distinct human proteins) and 250 SOMAmers that recognize non-human
targets (47 non-human vertebrate proteins and 203 targeting human pathogens). SOMAmers
are small single-stranded 40-mer DNA aptamers with modified nucleic acids selected to
specifically recognize target proteins in their native three-dimensional state. SOMAmer
reagents were selected for slow dissociation kinetics (t1/2>30 min), which in combination with
stringent wash steps impedes nonspecific binding (11).
Protein levels were measured at SomaLogic Inc. (Boulder, US) essentially as
previously described (10, 33). In brief, 5,457 individual serum samples were treated with the
detergent Tween-20 to prevent loss of reagent material to tube walls and for lysis of
exosomes, and then incubated with the mixture of 5,034 SOMAmers to generate SOMAmer-
protein complexes. Unbound SOMAmers and unbound or nonspecifically bound proteins
were eliminated by 2 bead-based immobilization wash steps and the use of polyanionic
competitors. After eluting the enriched SOMAmers from their target proteins they were
directly quantified on an Agilent hybridization array (Agilent Technologies). Hybridization
controls were used to correct for systematic variability in detection and calibrator samples of
three dilution sets (40%, 1% and 0.005%) were included so that the degree of fluorescence
was a quantitative reflection of protein concentration. All scale factors were then used to
normalize the protein data. We note that albumin-tolerance testing is a part of standard assay
development at SomaLogic and has been evaluated for all analytes on the new custom-
designed aptamer-based platform, showing no effect of albumin addition on the SOMAmer-
protein interactions.
To avoid batch or time of processing biases, both sample collection and sample
processing for protein measurements were randomized and all samples run as a single set.
The 5,034 SOMAmers that passed quality control had median intra-assay and inter-assay
coefficient of variation, CV = 100×/µ, <5%, or similar to that reported on variability in the
4
SOMAscan assays (34). More specifically, we aliquoted samples from 30 subjects into two
separate plates (any two of the 67 plates processed), and assessed the inter-plate variability of
those proteins relevant to the present study, notably the serum protein network. First we
interrogated the inter-plate CV for the top 20% most connected proteins (kTotal) or 1,000
proteins/aptamers and found the median CV to be 0.60%. Secondly, we checked the inter-
plate CV for all the 390 proteins that constitute the PM26 module as an example, and found
that the median inter-plate CV for the proteins in PM26 was 0.42%. This exercise
demonstrates a remarkably low inter-plate CV in our dataset.
Information related to annotation of the 4,137 human proteins is provided in table S1.
Prior to the analysis of the protein data, we applied a Yeo-Johnson transformation on the
proteins to improve normality, symmetry and to maintain all protein variables on a similar
scale (35). Furthermore, we examined the pairwise correlation between all proteins measured
in the AGES population and found the median rho to be close to zero (rho = 0.0286,
interquartile range Q1 = -0.0577 and Q3 = 0.1268) consistent with there being no bias in the
data due to potential off-target binding of the aptamers.
2.1 Direct measures of aptamer specificity: confirmation of SOMAmer enrichment from
complex biological samples using mass spectrometry
SOMAmer reagents are selected not just for their affinity but also for low dissociation rates
(slow off-rates) with their target proteins (11). This kinetics-based element together with the
use of excess poly-anionic competitors and stringent wash steps during the screening process
can overcome non-specific binding interactions (11). To verify this, the authors selected and
purified a subset of random 20 SOMAmer-bound-to-protein complexes followed by a mass
spectrometry (MS) sequencing and confirmed the specificity for all of them with negligible
amount of contaminants (11). We have now expanded this work significantly to confirm
specificity of a much larger set, or 779 SOMAmers (tables S3 to S4). Thus in an effort to
confirm binding of SOMAmers on the custom designed SOMAscan platform to their
respective targets in an endogenous matrix, we have conducted a series of experiments using
SOMAmers to enrich proteins in complex biological matrices followed by measurements
using two mass spectrometry techniques: data dependent analysis (DDA) and multiple
reaction monitoring (MRM). Description of the experiments, results and data release is
detailed below.
For data dependent analysis, the library of 4,783 SOMAmers was multiplexed in sets
of 8 and used for enrichment of target proteins from cell lysate, conditioned media, human
plasma, and human serum. A subset was also screened in urine. Cell lines were selected
from Cancer Cell Line Encyclopedia (CCLE) (36), for screening by comparing gene
expression of the target proteins measured by RNA sequencing across the CCLE. Using the
criterion of Fragments Per Kilobase Million (FPKM) value greater than 5 for the target
protein transcript, a reduced number of cell lines were cultured to maximize coverage across
the proteins represented in the SOMAmer library. For a given cell line, the presence of at
least 8 target proteins with FPKM values greater than 5 was applied as a cutoff for inclusion.
It was not feasible to cover approximately 400 target proteins applying these criteria and these
proteins were screened in serum and plasma only. For enrichment from biological matrices,
SOMAmers were combined into sets of 8 such that the potential interaction with similar
proteins or binding partners was minimized, for example, isoforms and closely related
homologs were not included in the same batch of 8 targets. Non-specific binding of the
SOMAmers was assessed in each matrix by using a SOMAmer generated against the bacterial
protein phosphoadenosine phosphosulfate reductase (CysH). Target protein spectral counts in
the SOMAmer enriched samples were compared to the respective CysH control. A positive
5
hit is defined as target protein detection with a minimum of 2 spectral counts and signal over
CysH background greater than 10X.
The SOMAmers were screened following established protocols (37). The SOMAmer
mix was combined with 1 mg lysate, 500 uL of conditioned media or 100 uL of plasma,
serum or urine and diluted to 1000 uL total volume with buffer (40 mM HEPES, pH 7.5, 100
mM NaCl, 5mM KCl, 5 mM MgCl2, 1 mM EDTA, 0.5%NP-40). Enriched proteins were
released from the SOMAmers via denaturation, followed by reduction, alkylation, and
digestion with Trypsin/Lys-C. For plasma and serum matrices, a deglycosylation step with
PNGaseF was added before Trypsin/Lys-C digestion. Peptides were reconstituted in 40 uL
2% acetonitrile 0.1% formic acid and analyzed using a nanoflow liquid chromatography
system (Proxeon nano-LC ) coupled to a data dependent mass spectrometer (LTQ-Orbitrap-
Velos or Q Exactive Plus mass spectrometer, Thermo Scientific, San Jose, Ca). The sample
(2 uL) was loaded onto an EASY-Column™ (2 cm x 100 um ID, 5 um, 120Å C18-A1,
ThermoFisher Scientific) and separated using a 75 um id Picotip emitter with a 15 um
diameter tip (Cat No PF360-75-15-N-5, New Objective, Woburn, MA) hand packed with
Magic C18 100Å 3 um resin to a length of 12 cm. The following gradient conditions at a
flow rate of 400 nL/min (Mobile Phase A: 100% water 0.1% formic acid (FA); Mobile Phase
B: 90% acetonitrile (ACN), 10% water 0.1% FA) were used: 2-30% B over 40 min, 30-80%
B over 5 min, 80% B for 2 min, followed by column washing and equilibration. The mass
spectrometer was operated in the standard scan mode with positive ionization, electrospray
voltage of 2.75 kV and ion transfer tube temp of 275°C. Full MS spectra were acquired in the
Orbitrap mass analyzer over the 325-2000 m/z range with 60,000 mass resolution, AGC target
of 1x10^6, and 500 ms injection time. The 10 most intense peaks with a charge state ≥2 were
acquired with the LTQ ion trap with 1 microscan, minimum signal threshold of 5,000 counts,
100 ms injection time and 10,000 AGC. Dynamic exclusion was enabled with a repeat
duration of 10 sec, exclusion list size of 500, exclusion duration of 10 sec, and exclusion mass
width relative to reference mass (+/- 10 ppm). Comparable parameters were used on the Q
Exactive Plus. The data were searched using Uniprot Human canonical database (v Jan 2014)
with common contaminants and reverse database appended (43,136 sequences; 23,452,844
residues). Fifteen proteins on the SOMAmer list are present in the contaminants database, so
this subset of proteins was searched both with and without this database. Raw data were
processed with Mascot (v 2.4) using default parameters: trypsin enzyme specificity allowed
for up to 2 missed cleavages, monoisotopic mass values, unrestricted protein mass, peptide
mass tolerance +/- 15ppm, and fragment mass tolerance +/- 0.8Da. The fixed modification
Carbamidomethyl (C) and the following variable modifications were selected: Oxidation (M),
phospho (ST), and phospho (Y). In samples treated with PNGaseF, the additional variable
modification of deamidation (NQ) was used. The PeptideProphet and ProteinProphet
algorithms were used for peptide and protein identification, respectively (ISB/SPC Trans
Proteomic Pipeline TPP v4.3 JETSTREAM rev 1, Build 200909091257 (MinGW)). Protein
results were filtered using a false discovery rate (FDR) of less than 1%.
The MRM method was employed for selected protein targets in follow-up work in the
same biological matrices (cell lysate, conditioned media, serum, plasma and urine). The
higher sensitivity of MRM detection results in a higher success rate when compared to DDA,
however the additional time and expense requirements preclude the use of this methodology
for large-scale screening of the large SOMAmer library on the current custom designed
SOMAscan. Multiple tryptic peptides (minimum of 3 per protein) were selected based using
standard criteria (38). Heavy-labeled (13
C615
N4-arginine, 13
C615
N2-lysine) peptides were
synthesized to act as internal standards (crude or >97% purity with concentration determined
by AAA, JPT and ThermoFisher Scientific). Peptide optimization and transition selection was
completed using Skyline software (MacCoss Lab Software, University of Washington) (37,
6
39). SOMAmer enrichment was performed as above. Peptides were reconstituted in 10 uL
2% acetonitrile 0.1% formic acid, diluted in a mixture of respective internal standard peptides,
and analyzed using a nanoflow liquid chromatography system (Proxeon nano-LC ) coupled to
a triple quadrupole mass spectrometer (TSQ Vantage or TSQ Altis,Thermo Scientific, San
Jose, Ca). The sample (2 uL) was loaded onto an EASY-Column™ (2 cm x 100 um ID, 5
um, 120Å C18-A1, ThermoFisher Scientific) and separated using a 75 um id Picotip emitter
with a 15 um diameter tip (Cat No PF360-75-15-N-5, New Objective, Woburn, MA) hand
packed with Magic C18 100Å 3 um resin to a length of 12 cm. The following gradient
conditions at a flow rate of 250 nL/min (Mobile Phase A: 100% water 0.1% formic acid
(FA); Mobile Phase B: 90% acetonitrile (ACN), 10% water 0.1% FA) were used: 2-40% B
over 25 min, 40-80% B over 1 min, 80% B for 3 min, followed by column washing and
equilibration. The mass spectrometer was operated in the SRM scan mode with positive
ionization, electrospray voltage of 1800 V, capillary temperature of 225°C (TSQ Vantage) or
325°C (TSQ Altis), Q1 and Q3 resolution settings of 0.7 FWHM, and a cycle time of 1.0
second. Collision energy (CE) parameters were calculated using linear equations in Skyline
and collision cell gas pressure of 1.0 mTorr was used for fragmentation. Positive detection
was defined using standard criteria of co-elution and equivalent transition patterns with the
internal standard, as well as the absence of interferences. Data was processed using Skyline
software.
Results of the mass spectrometry experiments were combined into a database
containing confirmatory evidence of 779 SOMAmer reagents binding their endogenous
targets (736 by DDA and 104 by MRM). The raw DDA data have been deposited to the
ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE
partner repository (40), with the five dataset identifiers PXD008819-PXD008823. The raw
MRM data have been deposited to ProteomeXchange Consortium via the Peptide Atlas
PASSEL repository with the dataset identifier PASS01145. These databases can be used to
aid in the prioritization of SOMAscan results for technical and biological validation studies.
Results from the larger DDA screening efforts are utilized to set up targeted MRM assays for
additional follow-up. The annotation of the 779 SOMAmer reagents is provided in tables S3
and S4.
2.2 Inferential support for SOMAmer specificity towards target proteins
Here we highlight the use of inferred measures of SOMAmer specificity towards their
respective targets including: 1) Cross-platform validation of a number of known protein
biomarkers. Figure S2A demonstrates strong correlation between different measurements of
the well characterized serum proteins insulin (INS), C-reactive protein (CRP) and natriuretic
peptide B (NPPB) in the AGES population using either the custom-designed SOMAscan or
standard immunoassays. Although highly significant, the correlation for INS (r = 0.680, P =
1×10-264
) is apparently not as marked as for NPPB (r = 0.915, P < 1×10-300
) and CRP (r =
0.984, P < 1×10-300
). It is noted however, that the aim of the present study is not clinical assay
validation and development but discovery of new biomarkers that will enable us, and others,
to expand the toolkit for the development of novel diagnostics and therapeutics. Next, we
applied the SOMAscan to confirm the associations of a 20 different protein biomarkers to
relevant phenotypic measures previously found through standard immunoassays (table S5 and
figs. S2B and S3). Figure S2C highlights cross-platform validation of the known associations
of elevated serum levels of NPPB and growth differentiation factor 15 (GDF15) with reduced
survival probability post incident CHD (41, 42). 2) Assessment of cis-pSNP‘s is an internal
measure of specificity for the SOMAmer-protein interactions (see section “S4.1 Identification
of cis- and trans-acting protein SNPs” below). Here, a cis SNP is proximal to a given protein
encoding gene and affecting variable levels of the cognate protein, detected by a SOMAmer
7
designed to bind the cognate protein. In other words, the proximal cis variant localizes the
SOMAmer to the intended target protein, thus supporting its target specificity. 3) We note that
many of the results presented in the current work including for instance the functional
annotation of network modules and the expected stronger links of hub proteins to disease,
indirectly support the specificity of the aptamers towards the intended target. Overall, these
data in combination with the mass spectrometry approaches described above indicate
consistent target specificity across the platform. Table S6 lists all direct and inferential
measures of aptamer specificity in the present study. We note, however, that direct validation
of aptamers which still lack information regarding their target binding specificity is an
ongoing process.
Construction of the protein co-regulation network
We used a previously described method coupled with the Weighted Gene Co-Expression
Analysis (WGCNA) R package (43). Biological metabolic networks are scale-free as regards
topology and any network that doesn’t reflect this property is unreliable (44). Scale-free
networks have a degree distribution that follows a power law. Measurement of the network’s
scale-free topology is the 𝑟2 coefficient, which is the fitting index for the linear model
regressing log(𝑘) and log(𝑝(𝑘)), 𝑘 being the connectivity and 𝑝(𝑘) its distribution, but 𝑟2 = 1
signifies a perfectly scale-free network. This is never the case with real-world biological
networks, in which case the criterion requires nearly scale-free topology (𝑟2 ≳ 0.8). The
method starts by putting in a matrix the Pearson correlation, 𝑠𝑖𝑗 = 𝑐𝑜𝑟(𝑥𝑖 , 𝑥𝑗), between each
pair of proteins. This correlation matrix is then transformed through 𝑎𝑖𝑗 = |𝑠𝑖𝑗|𝛽
, which are
the elements of the adjacency matrix, 𝐴. This power transformation is used to punish weak
correlations and reward strong ones making less meaningful weak correlations weakened
further and strong ones amplified which in turn decreases noise and increases network
robustness. This condition alone constraints the value of 𝛽. Due to our large number of
samples and dynamic range in protein levels we were able to afford using 𝛽 = 5 even though
the community standard for unsigned networks like ours is 𝛽 = 6. Further details regarding
the scale-free property of the serum protein network are presented in the section “3.1
Assessing the robustness of the serum protein network” below.
In hyperspace, the distance between two proteins is given by 𝑑𝑖𝑗 = 1 − 𝑎𝑖𝑗 and is
called the dissimilarity measure. The WGCNA package uses the hierarchical clustering
algorithm to create groups of closely co-expressed proteins creating a tree and the Dynamic
Tree Cut package (29), cuts branches according to specific morphological characteristics
(branch size, structure, etc.). Each cut branch represents a module, a group of closely related
proteins. The connectivity of a protein is simply the sum of all the adjacencies with all the
other proteins, 𝑘𝑖 = ∑ 𝑎𝑖𝑗𝑖≠𝑗 , where intra-module connectivity (kWithin) is the same concept
but only for proteins inside a specific module. The maximum connectivity 𝑘𝑚𝑎𝑥, is simply the
largest connectivity from the list of all protein with the largest connectivity from the list of all
proteins within a specific module. For ease of comparison between modules, connectivity
values were scaled as 𝐾𝑖 =𝑘𝑖
𝑘𝑚𝑎𝑥 . Protein significance is the absolute value of the correlation
between a protein and a disease 𝑃𝑆𝑖 = |𝑐𝑜𝑟(𝑥𝑖 , 𝑇)|𝛽. By plotting the scaled intra-modular
connectivity of all proteins versus the protein significances we can uncover which proteins
were most strongly associated with each trait. The slope of the regression line is the hub
protein significance, defined as HubProtSignif =∑ 𝑃𝑆𝑖𝐾𝑖𝑖
∑ (𝐾𝑖)2
𝑖. Studies have shown that highly
connected hub proteins are essential for yeast survival and are preserved across species (17,
45-49). Next, we characterized each module´s eigenvector or better eigenprotein (E(q)
, i.e. the
1st principal component of a give module which is q) through a singular value decomposition
8
and transformation of the variable protein levels for any given module. E(q)
represent most
closely the behavior and biological relevance of each of the 27 modules as these modules can
be viewed as independent sub-networks. Finally, we carried out 1,000 permutations to test
whether the network modules could be derived from random data and performed module
preservation analysis as described in more details in the section “3.1 Assessing the robustness
of the serum protein network” below. Network visualization was performed with the igraph
package in R, a circle graph for smaller modules and spring graph for the larger modules (30).
3.1 Assessing the robustness of the serum protein network
In the random network model of Erdős-Rény, the degree connectivity of nodes follows a
Poisson distribution (13). Random networks, however, neither capture the degree distribution
nor the clustering coefficient of networks based on real data (13). Instead, real networks are
more clustered and consist of few highly connected hubs (13). In other words, biological
networks are not random but follow a scale-free power-law distribution (13, 14). The scale-
free criterion is imposed for the simple reason of cleaning out spurious and/or noise induced
connections. If we build the serum protein network without the scale-free based constraints
using power transformation, we should still get a modular network like before with several
expected differences like fewer and larger clusters since there is no punishment on the weak
connections between proteins. To confirm this, we reconstructed the network by omitting any
power transformation ( = 1), hence removing the scale-free criterion altogether. This resulted
in a network consisting of 11 large modules as opposed to 27 modules of smaller sizes (fig
S4A, B). The larger size of modules is due to the fact that there was no punishment imposed
on weak connections between protein nodes. Thus the group of unconnected proteins goes
from 716 to 2 proteins. The fewer number of clusters in the un-tuned network is due to the
fact that all proteins are included, both the weakly and strongly connected, thus many bridges
appear in the space been different clusters and close ones merge. Furthermore, the proteins
within these 11 modules show characteristic differential degree of connectivity as some
proteins were more strongly connected than others within the network. Thus the baseline un-
tuned serum protein network is scale-free as was initially anticipated.
The significance of the serum protein network was assessed by comparing the true
network to potential networks derived from 1,000 permutation tests through randomization of
each of the protein´s data across the AGES subjects. Here, we ran the co-expression algorithm
repeatedly and counted the number of modules created with the same parameters and
restrictions as were used for the true non-randomized protein data, i.e. that no cluster shall
have less than 20 proteins and the value was 5 for transforming the data. In our 1,000
permutation tests, we did not detect a single protein module based on these criteria beyond a
single group of a handful of proteins that clustered by chance. In contrast, with these
constraints, the true non-randomized protein data presented with 27 clusters/modules which
we in addition have shown to contain a deeper biological meaning through enrichment of
distinct functional categories and links to disease (see main text). Because of this a z-test,
𝑧 =x−µ
, where x is the number of modules from the real network, µ is the mean number of
modules from the permutations tests and is the standard deviation of the permutation test
results, is trivial. This is also apparent in the comparison between the degree connectivity
(kTotal) of proteins from the real network and corresponding proteins from the network based
on the permuted data (fig. S4C). More to the point, the mean kTotal for proteins from the real
network was 9.950, while kTotal was 0.000018 for the randomized protein data. In summary,
we have shown that the modularity and degree connectivity of the serum protein network is
highly robust and could not be explained by random chance. Finally, by systematically
9
changing the tuning parameter between 1 and 10, no network structures appeared based on
the randomized dataset.
To test the preservation of the network structure, we split the AGES cohort into a
training set (main network) and in an independent test set (to compare to), either by 2/3 vs.
1/3 split or 1/2 vs. 1/2 split applying a suite of statistics for quantifying the preservation of a
module’s topology as described in Langfelder et al. (15). Langfelder et al., applied the
preservation model (summary Z score statistics) successfully on many independent datasets to
show preservation of networks and pathways within and across species and datasets, as well
as preservation of sex differences across datasets (15).
We applied the summary Z score statistics which produce multitude of indicators
showing many facets of a given network:
𝑍𝑠𝑢𝑚𝑚𝑎𝑟𝑦(𝑞)
=𝑍𝑑𝑒𝑛𝑠𝑖𝑡𝑦(𝑞)
−𝑍𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦(𝑞)
2
Here we raised the following questions:
1. Density Z score: are the modules denser than background density created from
randomized data
2. Connectivity Z score: Is hub protein status preserved between the training and test
datasets?
The results are exhibited in Fig. 1B for the 1/2 vs. 1/2 split and fig. S10 for the 2/3 vs. 1/3
split. Here the summary Z score <2 indicates no preservation, 2< summary Z score <10
indicates moderate evidence of preservation, while a summary Z score >10 indicates strong
evidence of preservation. The summary Z-score thresholds were derived empirically as
previously described (15), applying multiple types of simulation tests condensing aggregate
multiple preservation statistics into a summary preservation statistic. The Bonferroni adjusted
P-value significance of the summary Z-score in our dataset for the different protein modules
was between 110-358
and 110-22
, revealing strong preservation of the serum protein
network. More to the point, all the 27 modules of the serum protein network showed summary
Z score >10 indicating strong validation of the network topology including connectivity status
and module density (Figs. 1B and S10A, B).
4. Genetic studies and statistical analyses
Genotyping was conducted using the Illumina Hu370CNV Array on 3,200 of the AGES study
subjects and SNPs and gene targets mapped using the GRCh37 build reference sequence.
AGES subjects were imputed with the imputation reference panel 1000G v3 for all ethnicities,
through the use of MACH v 1.0.16 (50). Also, genotypes assayed through the exome-wide
genotyping array Illumina HumanExome Beadchip were available for all the 5,457 AGES
subjects. For detection of network-associated protein SNPs (npSNPs) we applied conventional
P-value threshold of 510-8
for genome-wide significance and a P-value between 510-6
and
510-8
for suggestive evidence of associations. All AGES study cohort members were
European Caucasians. For a cis effect we considered an arbitrary window of 300kb region
across and including the protein coding gene in question, given majority of cis acting signals
detected for mRNA levels are found within that window (8), and assessed the window-wide
significance by correcting for number of SNPs tested in each window. For detection of cis-
trans pairs we used the Bonferroni corrected P-value threshold of 1×10-8
(adjusted for number
of proteins and cis SNPs tested). A more detailed description of the detection and
characterization of the cis and trans effects in the present study is found below.
For all single-point SNP association analyses we applied linear regression using an
additive genetic model. For the associations of individual proteins and modules (eigenvectors)
to different phenotypic measures we used forward linear or logistic regression or Cox
10
proportional hazards regression, depending on the outcome being continuous, binary or a time
to an event. Given consistency in terms of sample handling including time from blood draw to
processing (between 9-11 am), same personnel handling all specimens and the ethnic
homogeneity of the population we adjusted only for age and sex in all our regression and
network-based analyses unless otherwise noted.
4.1 Identification of cis- and trans-acting protein SNPs
To identify proximal cis-acting effects on serum proteins we classified a cis-acting pSNP-
protein if the pSNP was no more than 150kbp distance up- or downstream of the protein
coding gene, or within the introns and exons of the corresponding gene. We defined the
window-wide significance for the cis effects, by adjusting the P-value for number of SNPs
tested in each gene window (P-value threshold between 6×10-6
and 5×10-4
). We identified
1,046 significant cis pSNP-protein associations, or 25.3% of all human proteins screened.
Table S13 lists all significant cis-acting pSNP-proteins detected in the present study, as a
single lead pSNP per region showing the best P-value. For now, independent cis effects per
region were not considered. It is of note that 39.5% of all pSNPs were located within the
introns, exons or untranslated regions of the corresponding protein gene. Figure S11A
highlights some cis-acting effects on protein levels depending on if the lead pSNP was
missense, in UTR regions, intronic or intergenic.
To provide some insights into how the human serum proteome variation is regulated,
we cross-referenced the cis pSNP-proteins detected in serum to previously identified
expression SNPs/QTLs (eSNPs/eQTLs) identified in >30 different tissues and cell types using
PhenoScanner (20), and applying stringent cutoffs of P < 5×10-8
and SNP proxies at r2≥0.8
for significant matches. We found that 37.3% of the pSNP-proteins matched corresponding
eSNP-transcripts identified in one or more solid tissues (table S14). This suggests that 60%
of the genetic effects on serum protein levels are either mediated by as yet unknown
transcriptional effect and/or post-transcriptional mechanisms, which is consistent with
previous data from human and yeast studies (51).
Given cis-acting pSNPs are functionally annotated variants as they affect population
variation of adjacent proteins then they can be useful as genetic instruments in Mendelian
randomization studies to infer causality between a protein and disease (pSNP Protein
Disease). We cross-referenced all cis pSNPs detected in the present study with GWAS lead
SNPs reported in the PhenoScanner using P < 5×10-8
for genome-wide significance and
r2≥0.8 for relevant SNP proxies (20). We found that 232 or 20.7% of all cis-acting serum
pSNPs matched GWAS lead SNPs associated with various disease-related phenotypes
including inflammatory bowel disease (IBD), adiposity, age related macular degeneration
(AMD), blood pressure, CHD, Crohns disease, hematological parameters, lipoprotein
fractions, late-onset Alzheimer´s disease (LOAD), DNA methylation status, MetS, multiple
sclerosis (MS), prostate cancer, rheumatoid arthritis (RA), systemic lupus erythematosus
(SLE), diabetes and venous thrombosis (table S15).
Theoretical and experimental studies suggest that network hubs are evolutionary
conserved and robust against disturbances like deleterious mutations or hub removal (7, 17,
18, 21, 52). In other words, a removal of a hub protein in biological networks will have a
larger effect on phenotype outcome than a removal of a random protein. In fact we have
shown that protein hubs are more strongly connected to various disease outcomes than less
well connected proteins within the serum protein network (Figs. 3 and S9). We explored if the
proteins affected by cis pSNPs showed differential degree of connectivity depending on either
the strength of the -coefficient or in comparison to other proteins across the serum protein
network. First we found that the mean connectivity was significantly lower among proteins
with a detected cis effect compared to proteins with no detected cis effect (fig. S11B). Here,
11
the mean kTotal was 10.8 for proteins with no cis effects vs. 7.7 for proteins with significant
cis effects (down by 28.2%, P = 3×10-16
). Secondly, there was a significant negative
correlation between the standardized -coefficient (absolute values) of the cis effects and the
network connectivity of corresponding cis serum proteins (r = -0.231, P = 1×10-15
) (fig.
S11C). Thus cis pSNP-protein effects were significantly under-represented among highly
connected protein nodes which may reflect a relaxed selective constraint on proteins with low
connectivity. These results are in agreement with previous observations showing hub proteins
to be essential and evolutionary conserved (7, 17, 18, 21, 52), and to have a greater effect on
disease outcome (17). The observed phenomenon described above is not restricted to
humans, but has been noted in other kingdoms as well including plants (21).
We tested the association of each cis-acting pSNP to all proteins screened in the
present study (table S17). Here, 16.0% of the cis pSNPs affected one or more proteins in
trans at a Bonferroni adjusted P-value <1×10-8
, or 911 proteins in trans (table S17). Thus
together with the proteins regulated in cis, the cis-acting pSNPs affected levels of 1,954 serum
proteins. Of interest, 40.7% of the cis pSNPs that were trans-acting were also associated with
GWAS lead SNPs, which is an increase by 20% compared with 20.7% for all cis-acting
pSNPs (see above). This indicates that the trans effects on proteins levels could be a critical
part of the mechanism(s) underlying the genetic risk at GWAS loci.
In the past 10 years, GWASs have discovered thousands of disease-associated genetic
loci providing insights into the genetic architecture of complex disease (19). Here, many
common SNPs, each SNP contributing only a small amount to the total risk, act
synergistically to influence susceptibility to a complex disease. Majority of GWAS lead
SNPs are located outside the coding regions of genes, suggesting a key role for gene
regulation in the disease aetiology. In fact, a strong enrichment of cis-acting eSNP/eQTLs
among the GWAS signals has been observed (25, 53). A recent analytical study demonstrated
that GWAS SNPs that contribute most to the heritability of a given disease are not necessarily
located near genes with disease-specific effects or found in core pathways (25). In other
words, the numerous small peripheral GWAS effects converge onto a common biological
network that integrate other signals (e.g. environmental) as well, influencing activity/levels of
core protein hub(s) which in turn can cause a disease (25). The accumulated data suggest that
cis acting pSNPs affect proteins that are located at the periphery of the network and similar to
GWAS signals may individually or synergistically affect activity/levels of neighboring
proteins including protein hubs to affect disease.
4.2 Validation of cis and trans pSNP-protein findings across different study populations and
proteomic platforms
In this section we tested the replication of previously reported cis and trans pSNP-protein
findings identified in different study populations and across different or related proteomic
profiling platforms. Given the differences in the genotyping and proteomic platforms, and the
definition of cis and trans effects between the different studies, we have used a moderate
proxy threshold of r2 ≥ 0.5 between pSNPs for any comparison of pSNP-protein pairs
between studies. Generally, however, we interrogated the associations of the reported pSNP-
protein pairs directly in our dataset, at least for the large studies.
The percentage confirmation in the AGES of previous findings was only computed for
those proteins that are detected with the present multiplex aptamer-based platform. Proteins
encoded by genes on the X chromosome were excluded from the analysis as they were not
tested for cis linked association. Given our definition of cis effects within a 300kb window
was not necessarily applied in the other studies, we have followed the study-specific
definition. Therefore, in some cases therefore, the study-specific SNPs were not the strongest
12
cis or trans effects identified in the present study. For these we considered P < 1×10-4
to be a
significant replication provided the effect is directionally consistent across studies.
Johansson et al. (54), used mass spectrometry (MS) to quantify 163 proteins in
1,060 subjects and identified cis acting effects for five proteins. These effects were all
replicated in our dataset (table S18). Kim et al. (55), screened 132 proteins in plasma of
521 subjects from the ADNI cohort using multiplex immunoassay-based platform,
identifying 28 cis pSNP-proteins. We confirmed 73.9% of these cis effects in the AGES
(table S18). Further, Enroth et al. (56), applied a multiplex immunoassay-based platform
that quantified 92 inflammation related plasma proteins screened in 1,005 individuals
identifying cis acting effects for 23 proteins, of which 63.5% were replicated in the AGES
(table S18). Liu et al. (57), applied a SWATH mass spectrometry technique to measure 342
unique plasma proteins in 232 samples and identified cis-acting pSNPs affecting 13 proteins.
Out of the 13 proteins, eight proteins were measured with our aptamer-based platform of
which seven cis effects (87.5%), were replicated in the AGES (table S18). Thus on average,
we confirmed 74.6% of all pSNP-protein associations detected with non-aptamer based
technology.
Next, we tested replication of pSNP-protein findings in studies applying the aptamer-
based platform (58, 59). We note that these studies often report multiple pSNPs per locus,
thus we explored all cis and trans pSNPs detected in their studies for association to
corresponding serum protein(s) in the AGES cohort. For the cis and trans effects reported in
Suhre et al. (59), we confirmed 88.3% of all cis and 84.5% of all trans effects in the AGES
dataset (table S18). For instance, Suhre et al., reported 14 trans effects mediated by six
independent pSNPs at the ABO locus (59). We ran two of these trans acting pSNPs rs651007
and rs8176749 proximal to the ABO locus and confirmed all of these trans effects except for
NOTCH1. For the cis and trans effects reported in Sun et al. (58), 75.7% of the cis effects and
72.8% of the trans effects were confirmed in the AGES (table S18). For instance, Sun et al.
detected 115 proteins regulated in trans by the rs704 missense variant (NP_000629.3:
p.Thr400Met) in VTN while we detected 488 trans regulated proteins at their Bonferroni
adjusted P-value < 1.510-11
. The overlap between the rs704 mediated trans effects of the two
studies was 81.7%. In another example, Sun et al. detected 36 proteins that were affected by
20 independent pSNPs acting in trans at that ABO locus (58). We find that 13 of these pSNPs
affected 88 proteins in trans in the AGES dataset at P < 1.510-11
, with an 81% overlap of the
trans regulated proteins at the ABO locus between the two studies.
Of the aptamer-based studies mentioned above (table S18), Sun et al. (58), comes
closest to the present study as regards sample size and number of proteins measured.
However, they used a smaller version of the aptamer-based platform or 28.3% fewer proteins
and 40% fewer study participants which were predominantly of young age. Below we present
the reproducibility of selected examples of cis and trans effects described in Sun et al. in the
AGES dataset. Sun et al. reported a pSNP mediating a cis effect on WFIKKN2 as well as
mediating a trans effect on the myostatin protein GDF11/8 (58). We note that the cis effect
for WFIKKN2 was also reported in Suhre et al. (59). Using a window size of 300kb across
WFIKKN2, we detected a strong cis acting effect for WFIKKN2 (P = 210-93
) and also
mediating a trans effect on GDF11/8 serum levels (P = 210-9
) (fig. S14A). Here, the lead
SNP, the synonymous variant rs9675120 (NP_783165:p.Ser135=) in WFIKKN2, was highly
correlated (r2=0.928) with pSNP rs11079936 (58). The common allele T for rs9675120 was
associated with lower levels of both WFIKKN2 and GDF11/8 (fig. S14A). Furthermore, we
find that the proteins WFIKKN2 and GDF11/8 were positively correlated in the AGES data
(fig. S14B). The direction of all effects is consistent with that reported in Sun et al. (58).
GDF11/8 has been implicated in muscular dystrophy (60), and experimental studies have
shown that WFIKKN2 has strong affinity for GDF11/8 (61). Interestingly, we found that both
13
WFIKKN2 and GDF11/8 map to the same protein module PM27 (table S7), a module
enriched for proteins involved in extracellular matrix organization and vascular disease. This
module is also enriched in fibrosis related signatures (62), where 8 out the 16 well-established
fibrosis-related proteins are found in PM27 (Fisher exact test P-value = 610-7
).
The second example from Sun et al. (58), is the GWAS locus for inflammatory bowel
disease (IBD) at the missense variant rs3197999 (NP_066278.3: p.Arg703Cys) in MST1.
This locus also affected five other proteins in trans including PRDM1 (aka BLIMP1) at
chromosome 6 (58). We find a strong cis acting effect on MST1 and significant trans acting
effects on 11 proteins including three of the five reported in Sun et al. (fig. S14C,D). In the
third and final example of replicated findings from the work of Sun et al., we focused on the
pQTL hotspot at the vasculitis associated missense variant rs28929474 (NP_001121179:
p.Glu366Lys) in SERPINA1 that was associated with 13 proteins (58). We find that
rs28929474 was associated with 17 proteins in trans, of which 8 were reported in Sun et al.
(58), and we find were directionally consistent across both studies (fig. S14E, F). Also, we
find the rs28929474 mediated a weak cis effect on SERPINA1 (T allele, = -0.471, P =810-
6), directionally consistent with that of Sun et al. (58).
In summary, this extensive validation and comparative study not only reveals the
robustness of our multiplex aptamer-based platform to confirm findings across independent
study populations and proteomics platforms, but highlights the added information the present
study can provide in terms of identifying links to new proteins and the relationship between
proteins in the context of the serum protein network. Although the study cohorts were
different in terms of subject recruitment, age range, health status and ethnic homogeneity, and
in the genotyping and proteomic platforms applied, on average 80% of all reported cis effects
and 74% of all trans effects were confirmed in our dataset. It is possible that study-specific cis
and trans effects exist that appear in a single study only. Finally, a lack of replication of cis
and trans effects may indicate false positive findings in the discovery study.
5. Assessment of tissue specificity of cis and trans proteins and protein modules
Transcript expression data for 53 different human tissues as median RPKM by tissue, was
downloaded from GTEx (https://www.gtexportal.org) on 07/25/2017. The GTEx project
provides RNA-Seq based transcriptome data in over 40 tissues from hundreds of human
donors and since multiple tissues are collected from the same individuals, cross-tissue
analysis is feasible (63). The specificity score for a gene in a tissue was calculated by
subtracting from its RPKM value the mean value in all other tissues for that gene and dividing
by the standard deviation of those values. The top 0.5 to 2.5% (Z >9.24 to >2.75) were
declared as tissue specific and mapped to modules after removal of duplicate matches.
Similarly, the subset of cis-trans protein pairs, were selected where mRNA levels for both
scored in the top 2.5% for tissue specificity (Z>2.75). Here, 158 cis-trans pairs showed the
same tissue specific expression while 2,119 pairs exhibited different tissue specific expression
(table S21).
The npSNP discovery also allowed us to assess if the serum networks resulted from
cross-tissue regulatory control. For example, the rs704 control of VTN protein levels occurred
primarily in liver (tissue specific Z >123), and this npSNP regulated proteins across several
modules including other tissue specific proteins. For example tissue specific proteins from
five and 19 distinct tissues were regulated by VTN in the PM7 and PM10 modules
respectively (18 non-liver tissues, table S22). These results provide evidence that in a number
of cases, npSNPs affected serum levels of a tissue specific protein and that subsequently
affected variable serum levels of other proteins synthesized in distinct tissues.
Finally, we interrogated how well the protein modules agree with the gene mRNA co-
expression modules constructed in solid tissues and evaluated if similar network organization
14
is shared at both the protein and gene expression mRNA levels. In addition, this may help
indicate the potential tissues of origin of the serum protein network modules. The assessment
of overlaps between serum protein modules and 2,672 gene mRNA co-expression modules
constructed from whole-genome transcript information from multiple solid human tissues
(16), was based on how well two modules shared similar set of genes encoding either mRNA
or proteins. Here, we counted the number of genes/proteins that were common between a
protein module and a gene mRNA co-expression module in a given tissue and calculated the
overlap ratio of the match (fig. S5). Next, we assessed the significance of this overlap against
random expectation using Fisher´s exact test. Heatmaps of overlap ratio values were used to
show that most module pairs have a very low gene member overlap (<8%) (fig. S5). A
heatmap based on the statistical test P-values showed three protein modules with weak but
significant overlaps with the tissue (mainly liver, muscle and adipose tissue) mRNA co-
expression modules (fig. S5). The accumulated data suggest that the serum protein network
arose at least in part via systemic cross-tissue regulation.
15
Fig. S1. A general workflow of the present study. The figure demonstrates the datasets used
in the present study and the analyses of the datasets including the construction of the serum
protein network, identification of its individual protein modules and their association to
genetic variants and disease related outcomes.
16
A
B
C
Fig. S2. Cross-platform validation of protein measurements. (A) A comparison between
the SOMAmer-based technology and immunoassays measuring serum levels of C-reactive
protein (CRP), r=0.984, P<1×10-300
, insulin (INS), r=0.680, P=1×10-264
, and natriuretic
peptide B (NPPB), r=0.915, P<1×10-300
. (B) Cross-platform validation of the correlation of
five known plasma protein biomarkers to the phenotypic measures previously observed using
immunoassays including prevalent heart failure (prev HF), metabolic syndrome (MetS), type
2 diabetes (T2D), lean (BMI<25), overweight (25BMI<30) or obese (BMI≥30) (see table
S5). (C) The custom-designed SOMAscan was used to confirm the association of elevated
17
serum levels of NPPB (red curve) and growth differentiation factor 15 (GDF15) (red curve) to
lower probability of survival post incident coronary heart disease (CHD) (highest vs. lowest
quartiles of the respective protein levels). General: controls are subjects free of the disease in
question. Data were analyzed using forward linear or logistic regression or Cox proportional
hazards regression, depending on the outcome being continuous, binary or a time to an event.
Kaplan-Meier plots were used to display survival probabilities.
18
Fig. S3. A correlation matrix for selected candidate proteins. The correlation matrix
demonstrates the relationship between the candidate proteins from table S5, and includes as
well the highly connected hub proteins highlighted in Figs. 3, S9 and S10.
19
A
B
C
Fig. S4. Clustering and robustness of the serum protein network. (A) Hierarchical
clustering by applying dynamic tree cut and a power transformation of 5 (=5) resulting in 27
protein modules each containing a minimum of 20 proteins (table S7). (B) A dynamic tree cut
using power transformation of 1 (= 1) and a minimum of 20 proteins per module, resulting
in 11 relatively large modules compared to using power transformation = 5, thus
maintaining the scale-free property of the network. (C) Comparison between connectivity
(kTotal) of proteins from the real network (blue curve) and corresponding proteins from a
network based on random protein data (cyan curve). Proteins (x-axis) were ordered by
20
annotation and increasing kTotal (y-axis). The mean kTotal for proteins from the real network
was 9.950, while the mean kTotal was 0.000018 for the randomized protein data.
21
Fig. S5. Heat plots of the overlap between modules of the serum protein network and
gene mRNA co-expression modules generated from solid tissues. Limited overlap was
found between protein modules within the serum protein networks and 2,672 gene co-
expression modules constructed from multiple solid tissues (21). Top panel is the overlap
heatmap, representing the overlap ratio between each pair of protein module (rows) vs gene
co-expression module (columns). Proteins in each protein module were assessed for overlaps
with genes in each gene co-expression module by Jaccard Index, defined as the number of
shared genes between the two modules divided by the sum of unique genes in both modules.
Jaccard index values are plotted on a color scale at the intersections of each protein module-
gene module pair. The best Jaccard index is only 8%, a very low overlap ratio. The bottom
panel is the overlap heatmap based on statistical significance of the module overlap analysis
as evaluated by Fisher's Exact Test with Bonferroni correction. -log10 (adjusted P-values)
were used in this heatmap. Similarly, protein modules are in rows and gene modules are in
columns, and the intersection between a row and a column is colored based on the
significance of the -log10 (adjusted P-values). Only three protein modules demonstrated
significant overlap with gene co-expression modules at the cutoff of Jaccard Index > 5% and
Bonferroni-corrected P-value < 0.05 (shown in the heatmap to the right). Among these, PM23
overlaps with only liver gene co-expression modules, PM27 overlaps with liver and muscle
gene-coexpression modules, whereas PM24 overlaps with adipose, hypothalamus and liver
gene co-expression modules. Hierarchical clustering was applied to the rows and columns of
both heatmaps, and dendrograms were plotted accordingly.
22
Fig. S6. A dendrogram showing the inter-module clustering of the different protein
modules via correlation of their eigenproteins (E(q)
s). PM1 does not link to any other
modules, while the other modules form four major super-clusters reflecting the functionality
shared between modules (tables S8 and S11). The numbers at the branches of the dendrogram
refer to the number of proteins found in a given protein module. Functional categories and
tissue/cell specific signatures enriched in the different super-clusters were obtained using
annotation tools like WebGestalt, DAVID, GeneMANIA and CTen, also reported in table
S11. Modules are ordered and annotated according to their inter-module relationship here as
well as throughout the present study.
23
A B C
D E F
G H I
Fig. S7. The relationship between modules E
(q)s to disease related measures and
outcomes. (A) The modules PM7 and PM10 are members of super-cluster II. (B) Inverse
association of the modules E(PM7)
and E(PM10)
to prevalent heart failure (prev HF), ***P110-
16. (C) Reduced overall survival probability for low E
(PM7) levels (cyan curve) compared to
high E(PM7)
levels (red curve). (D) PM16, a 170 protein module, is a member of super-cluster
IV. (E) Positive association of E(PM16)
quintiles to variation (cm2) in visceral adipose tissue
(VAT), P = 310-16
, to the metabolic syndrome (MetS) and prevalent coronary heart disease
(prev CHD) and HF, ***P <110-11
. (F) Reduced overall survival probability for high E(PM16)
levels (red curve) compared to low E(PM16)
levels (cyan curve). (G) PM26, a 390 protein
module, is a member of super-cluster V. (H) Positive association of E(PM26)
to prevalent CHD
and HF as well as incident CHD (inc CHD) and HF (inc HF), ***P110-8
. (I) Reduced post
CHD and overall survival probability for high E(PM26)
levels (red curve) compared to low
E(PM26)
levels (cyan curve). Controls are subjects free of the disease in question. Data were
analyzed using forward linear or logistic regression or Cox proportional hazards regression,
depending on the outcome being continuous, binary or a time to an event. Kaplan-Meier plots
were used to display survival probabilities. For more details see fig. S6 and tables S7 and S12.
The number of proteins per module are denoted at the branches of the dendrogram.
24
Fig. S8. A volcano plot of the association of global serum proteins to prevalent CHD
diagnosed at different times before sampling. The plot demonstrates the significance –
log(Bonferroni adjusted P-value) as a function of effect sizes (log odds ratio), either when all
prevalent CHD cases (N=1,217) were included in the analysis (blue circles) or when only
CHD cases diagnosed with the disease within five years before entry in the AGES (N=700)
were included (orange circles). Two different aptamers were used to detect and measure
PCSK9. In terms of effect sizes variable levels of proteins associated with prevalent disease
like CHD were not affected by restricting the analysis to the time of diagnosis to the time of
sampling (see material and methods).
25
A B C
D E F
G H I
Fig. S9. The relationship between network connectivity of proteins and disease related
measures and outcomes. (A) Spring graph of PM10 highlighting the hub protein DYRK3
located in the hub region of the module. (B) Positive correlation between within module
connectivity (Ki) (x-axis) of PM10 proteins and the absolute value of the effect (-coefficient)
size of their association to prevalent heart failure (HF) (y-axis), Pearson´s r=0.782, P=110-72
.
(C) Positive association of DYRK3 to prevalent HF, P<110-30
, and reduced overall survival
(all-cause mortality post entry into the AGES study cohort) associated with low serum
DYRK3 levels (cyan curve). (D) Spring graph of the PM16 showing location of the hub
HNRNPA1 within the hub region. (E) Positive correlation between Ki (x-axis) and the
association to incident coronary heart disease (inc CHD), r=0.712, P=110-22
. (F) Positive
association of HNRNPA1 to incident CHD, P=110-10
, and high serum levels of HNRNPA1
(red curve) predict reduced overall survival. (G) A spring graph of PM26 highlighting the
module´s hub FSTL3. (H) Positive correlation between Ki (x-axis) and the association to
prevalent HF, r=0.431, P=110-16
. (I) Positive association of FSTL3 to prevalent HF,
P<110-30
, and reduced overall survival associated with high serum FSTL3 levels (red curve).
Network visualization was performed with the igraph package in R (30). Controls are subjects
26
free of the disease in question. Data were analyzed using forward linear or logistic regression
or Cox proportional hazards regression, depending on the outcome being continuous, binary
or a time to an event. Kaplan-Meier plots were used to display survival probabilities.
27
A
B
Fig. S10. Preservation analysis of the serum protein network structure. (A) The cohort
was randomly split into two parts, 2/3 for a training set and a 1/3 for the test set, and the
summary Z score statistics plotted for each of the 27 modules presented as colored data
points. Here the summary Z score <2 (blue dotted line) indicates no preservation, 2< summary
Z score <10 (between the blue and green dotted lines) indicates moderate evidence of
preservation, while a summary Z score >10 (green dotted line) indicates strong evidence of
preservation. All the modules showed strong preservation or Z score >10. (B) Preservation of
the connectivity status for the top 10 hubs within each module (kWithin). The modules and
protein hubs highlighted are also presented in Fig. 3 and figs. S3 and S9.
28
A
B C
Fig. S11. Highlighted examples of cis acting SNPs depending on genomic location and in
relation to network connectivity. (A) Cis-acting pSNPs may be located in intergenic
regions (rs7547965), or within genes including missense (rs1250259, NP_997647.1:
p.Gln15Leu), 5´-UTR (rs16923189), 3´-UTR (rs15881) or intronic (rs76426991). (B) Mean
total connectivity ±2CI (2× 95% Confidence Interval) for all significant cis effects (yes)
compared to proteins with no detectable cis effect (no), Student´s t-test P = 310-16
. (C)
Pearson, correlation between the absolute value for the -coefficient of all cis effects (x-axis)
vs. total connectivity of corresponding cis regulated proteins (y-axis), r = -0.231, P=110-15
.
29
A
B C
D E
Fig. S12. Examples of GWAS risk loci affecting serum protein levels. (A) A box plot of
five cis regulated proteins by known GWAS loci listed in table S16. (B) Trans acting effects
at the rs1050362 GWAS locus and a corresponding boxplot of two proteins affected. (C)
Trans acting effects at the rs964184 GWAS locus and a boxplot of two proteins affected. (D)
The strong cis and trans acting effects at the CHD-associated locus rs579459 affecting 43
proteins in trans. The rs579459 mediates a strong proximal cis acting effect on serum ABO
levels as highlighted in the boxplot. Also shown are boxplots for two proteins regulated in
trans by rs579459. (E) The Venn diagram demonstrates a significant enrichment of the
rs579459 trans affected proteins within the PM27 module (Fisher Exact Test P = 210-10
).
Here, 18 out of 25 proteins regulated by rs579459 map to PM27. Chromosomal ideograms
were reprinted from the NCBI chromosome Map Viewer. The genotypes and pSNPs are at the
x-axis of each box plot while the normalized levels of serum proteins are denoted at the y-
axis.
30
A B C
D E
F G
H I
Fig. S13. Selected examples of known GWAS risk loci for CHD, T2D and/or adiposity.
(A) A box plot of the trans regulated protein PROC at the CHD locus rs867186. (B) Trans
acting effects at the rs1892094 GWAS CHD locus and a boxplot of a protein affected by the
pSNP. (C) A trans acting effect at the rs1165669, another CHD GWAS locus, and a boxplot
31
of a protein affected by the locus. (D) The well documented T2D locus rs7756992 at
CDKAL1 affects the protein MLN in trans as highlighted in the boxplot. (E) The T2D GWAS
locus rs3132524 exerts trans effects on five proteins including proteins in the corresponding
box plots. (F) The distribution of the ABO protein serum levels in the AGES study population
as per genotypes for the CHD lead SNP rs579459. (G) A strong cis acting effect on ABO
serum levels using a 300kb window across the ABO locus, also representing many well
established GWAS risk lead SNPs for various disease related outcome data (right panel). (H)
The distribution of the VTN protein serum levels in the AGES study population as per
genotypes for the npSNP rs704. (I) The E(PM11)
representing module PM11 is strongly
associated with LDL cholesterol and triglycerides (TG) but not HDL cholesterol, using
forward linear regression analysis. Chromosomal ideograms were reprinted from the NCBI
chromosome Map Viewer. The genotypes and pSNPs are at the x-axis of each box plot while
the normalized levels of proteins are denoted at the y-axis.
32
A B
C D
E F
Fig. S14. Examples of replicated cis and trans effects reported by others. (A) Applying a
genomic window of 300kb across WFIKKN2, we detected a strong cis acting effect for
WFIKKN2. The lead pSNP rs9675120 is also associated with GDF11/8 levels acting in trans.
The T allele represents the major allele in the AGES. The rs9675120 is highly correlated
(r2=0.928) with the rs11079936 reported in Sun et al. (58). (B) There was a significant
positive correlation between the protein levels of WFIKKN2 and GDF11/8 in the AGES
cohort, Pearson´s r=0.498, P=110-241
. (C) The missense variant rs3197999 (NP_066278.3:
p.Arg703Cys) in MST1 mediated trans effects on 11 proteins in the AGES dataset. (D) Also,
the boxplot shows a strong cis effect on the proximal protein MST1 (P < 110-300
). Two trans
effects are highlighted as well. (E) The pSNP hotspot at rs28929474 (NP_001121179:
p.Glu366Lys) in SERPINA1 affects 17 proteins in trans at P < 110-5
. The regression values
in the table are based on copy T allele (also called the Z allele). Subjects homozygous for the
Z allele are not found in the AGES cohort. Many of these effects were also reported in Sun et
al. (58). (F). Boxplots of three proteins affected by the pQTL hotspot rs28929474. The
genotypes and pSNPs are at the x-axis of each box plot while the normalized levels of serum
proteins are denoted at the y-axis.
33
Table S1. Annotation of the human proteins targeted in the present study Annotation of the 4,137 human protein targets detected with the custom-designed SOMAscan
platform.
(Excel table hosted online)
34
Table S2. Descriptive statistics of the present study cohort for relevant measures
Baseline characteristics of the AGES Reykjavik study cohort: Numbers are mean(SD) for
continuous-, N(%) for categorical- and median[IQR] for skewed variables. Abbreviations:
SBP, systolic blood pressure; DBP, diastolic blood pressure; TOT-C, total cholesterol; LDL-
C, LDL cholesterol; TG, triglyceride; FG, fasting blood glucose; VAT, visceral adipose
tissue; SAT, subcutaneous adipose tissue; T2D, type 2 diabetes; MetS, metabolic syndrome;
CHD, coronary heart disease; HF, heart failure; N/A, not applicable.
*For sex differences, obtained from two sided T-test for continous-, χ2 test for categorical- and
quantile regression for skewed variables.
Characteristic Variable Males Females P-value* Total
Demographics
Numbers
Age (years)
2330 (42.7%)
76.7 (5.4)
3127 (57.3%)
76.5 (5.7)
N/A
0.280
5457
76.6 (5.6)
Anthropometry
BMI (kg/m2)
Obese (BMI>30)
26.9 (3.8)
439 (18.9%)
27.2 (4.8)
777 (24.9%)
0.004
<0.001
27.1 (4.4)
1216 (22.3%)
Physiological
SBP (mmHg)
DBP (mmHg)
TOT-C (mmol/L)
LDL-C (mmol/L)
TG (mmol/L)
FG (mmol/L)
VAT (cm2)
SAT (cm2)
143.2 (20.4)
76.2 (9.6)
5.2 (1.1)
3.2 (1.0)
1.0 [0.8,1.4]
5.9 (1.2)
203.0 (86.2)
203.4 (86.8)
142.2 (20.9)
72.2 (9.5)
6.0 (1.1)
3.7 (1.0)
1.1 [0.8,1.5]
5.7 (1.1)
150.3 (67.2)
294.9 (112.3)
0,075
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
142.6 (20.7)
73.9 (9.7)
5.6 (1.2)
3.5 (1.0)
1.0 [0.8,1.4]
5.8 (1.2)
172.8 (80.2)
255.7 (111.7)
Medication
Antihypertension
Lipid lowering
1460 (62.7%)
656 (28.2%)
2016 (64.5%)
575 (18.4%)
0,169
<0.001
3476 (63.7%)
1231 (22.6%)
Lifestyle
Smoker
265 (11.7%)
390 (12.8%)
0.199
655 (12.3%)
Metabolic
T2D
MetS
363 (15.6%)
486 (20.9%)
291 (9.3%)
641 (20.5%)
<0.001
0.746
654 (12.0%)
1127 (20.7%)
Heart disease
CHD prevalent
CHD incl recurrent
CHD incident
HF prevalent
HF incl recurrent
HF incident
Followup yrs CHD
Followup yrs death
777 (33.6%)
938 (40.6%)
421 (27.4%)
101 (4.4%)
287 (12.4%)
233 (10.5%)
7.4 [3.2,10.1]
10.5 [6.2,12.3]
440 (14.2%)
681 (22.0%)
451 (17.0%)
71 (2.3%)
242 (7.8%)
207 (6.9%)
9.7 [5.8,10.8]
11.6 [8.1,12.8]
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
<0.001
1217 (22.5%)
1619 (30.0%)
872 (20.8%)
172 (3.2%)
529 (9.8%)
440 (8.4%)
9.2 [4.4,10.6]
11.3 [7.2,12.6]
35
Table S3. Direct assessment of aptamer specificity via DDA mass spectrometry List of proteins with confirmation by data dependent analysis (DDA) mass spectrometry after
SOMAmer enrichment in biological matrices. Column Biological Matrix; Cell line name if
detected in lysate or conditioned media (cm), otherwise noted as blood serum, blood plasma,
or urine biofluid. Column File Name: Refers to raw data file name uploaded to PRIDE
Proteome Exchange with five dataset identifiers PXD008819-PXD008823.
(Excel table hosted online)
Table S4. Direct assessment of aptamer specificity via MRM mass spectrometry
List of proteins with confirmation by multiple reaction monitoring (MRM) mass spectrometry
after SOMAmer enrichment in biological matrices. Cell line name if detected in lysate or
conditioned media (cm), otherwise noted as blood serum, blood plasma, or urine biofluid. The
MRM dataset has been deposited to Peptide Atlas PASSEL repository with the dataset
identifier PASS01145.
(Excel table hosted online)
36
Table S5. Cross-platform validation of known links of proteins to disease related traits
Confirmation, via application of the custom designed SOMAscan platform, in the AGES
cohort, of known associations of protein biomarkers to relevant disease related outcomes
detected with conventional immunoassays. The beta coefficients (-coeff) were estimated
through either linear or logistic regression analysis. N/A, not applicable.
Protein Reference Trait Reported
levels
Prevalent disease, AGES
-coeff P-value
Incident disease, AGES
-coeff P-value
IL-18
CRP
SAA
IL6
NPPB
MPO
PAPPA
GDF15
LGALS3
ADIPOQ
LEP
IGFBP2
ADIPOQ
LEP
sLEPR
ADIPOQ
RBP4
FABP4
EDN1
NPPB
UCN3
LECT2
PAI-1
PTX3
21481392
20182820
20182820
10769275
20182820
20182820
20182820
27811204
22230397
19029992
29236298
22554827
11479627
27906690
12075576
11479627
18239568
17553506
8149524
24807464
19961889
28278265
8673927
21900125
CHD
CHD
CHD
CHD
CHD
CHD
CHD
CHD
CHD
T2D
T2D
T2D
SAT
SAT
SAT
MetS
MetS
MetS
HF
HF
HF
VAT
VAT
VAT
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Reduced
Elevated
Reduced
Reduced
Elevated
Reduced
Reduced
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
Elevated
0.117
0.077
0.075
0.066
0.656
0.277
0.163
0.327
0.285
0.543
0.491
-0.632
-19.140
88.848
-15.472
-0.903
0.398
1.043
0.527
1.303
0.250
10.670
8.190
3.511
0.0007
0.02
0.02
0.045
1e-64
9e-16
6e-07
3e-19
9e-16
1e-55
1e-41
<1e-258
3e-37
<1e-300
4e-29
<1e-300
4e-24
<1e-300
1e-13
<1e-300
0.001
2e-23
7e-15
0.0008
0.094
0.176
0.087
0.111
0.401
0.159
0.128
0.300
0.179
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
0.239
0.807
0.165
N/A
N/A
N/A
0.002
1e-08
0.004
0.0002
1e-26
3e-07
0.0002
2e-18
2e-08
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
2e-07
<1e-300
0.0005
N/A
N/A
N/A
37
Table S6. The degree validation of aptamer specifcity for all human proteins measured
in the present study A summary of direct and/or inferred validation of aptamer specificity for the 4,137 human
proteins detected in the present study.
(Excel table hosted online)
Table S7. The modules of the serum protein network and corresponding proteins Annotation of the modules and the proteins that constitute each module of the serum protein
network together with information related to degree connectivity (kWithin, kOut, and kTotal).
(Excel table hosted online)
38
Table S8. Enrichment of functional categories in the different modules
Functional categories and tissue/cell specific signatures enriched in the different protein
modules using annotation tools like WebGestalt, DAVID, GeneMANIA and CTen (64-67).
Modules are ordered and annotated according to their inter-module relationship. N/A, not
applicable.
Module Size Over-represented
pathways & tissue signatures
FDR P-value
(Bonferroni
adjusted)
Database
PM1 31 Signal peptide
Autoimmunity
Notch signaling
BDCA4+ dentritic cells
N/A
0.03
0.00001
0.01
0.0007
N/A
N/A
N/A
DAVID
WebGestalt
GeneMANIA
CTen
PM2 86 Signal peptide
Circadian rhythm
Adenocarcinoma
Lymphocyte mediated immunity
Whole blood
N/A
N/A
0.00002
0.0002
0.006
2e-07
0.006
N/A
N/A
N/A
DAVID
DAVID
WebGestalt
GeneMANIA
CTen
PM3 921 Signal peptide
Growth factor activity
MAPK cascade
Zymogen
Cytokine-cytokine receptor
JAK - STAT signaling
PI3K – AKT signaling
Immune system diseases
Hypotension
Smooth muscle
Pancreas
N/A
N/A
N/A
N/A
1e-30
2e-11
1e-10
1e-06
0.0008
0.002
0.028
1e-78
1e-25
1e-12
1e-07
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
DAVID
DAVID
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
CTen
PM4 86 Signal peptide
Pattern recognition receptor activity
Hepatitis B
SIDS
Rheumatoid arthritis
Leukemia lymphoblastic
N/A
1e-06
0.00005
0.005
0.02
0.002
0.002
N/A
N/A
N/A
N/A
N/A
DAVID
GeneMANIA
WebGestalt
WebGestalt
WebGestalt
CTen
PM5 65 Extracellular exosome
IkB / NF-kB signaling pathway
CD33+ Myeloid
Skin
N/A
0.0001
0.002
0.007
0.002
N/A
N/A
N/A
DAVID
GeneMANIA
CTen
CTen
PM6 157 Signal peptide
Calcium ion transport
Heart valve disease
Cardiac myocytes
N/A
0.004
0.04
0.002
5e-15
N/A
N/A
N/A
DAVID
GeneMania
WebGestalt
CTen
PM7 88 Protein binding N/A 0.02 DAVID
PM8 84 Signal peptide
Four helical cytokine core
Natural killer cell activation
Intravascular coagulation
N/A
N/A
2e-06
0.03
3e-07
0.01
N/A
N/A
DAVID
DAVID
GeneMania
WebGestalt
PM9 286 Signal peptide
Growth factor binding
Complement and coagulation
Liver
Pancreatic islets
N/A
0.0002
0.002
0.002
0.006
1e-31
N/A
N/A
N/A
N/A
DAVID
GeneMania
WebGestalt
CTen
CTen
PM10 312 Signal peptide N/A 8e-28 DAVID
39
Leukocyte differentiation
Fc-epsilon receptor
Inate immune system
Lung diseases
Globus pallidus
Cingulate cortex subthalamic
0.0004
0.0005
0.001
0.004
0.002
0.002
N/A
N/A
N/A
N/A
N/A
N/A
GeneMania
GeneMania
WebGestalt
WebGestalt
CTen
CTen
PM11 26 Secreted proteins
Lipoprotein particles
Sterol homeostasis
Familial hypercholesterolemia
Adrenal gland
Fetal liver
N/A
1e-12
1e-10
0.002
0.02
0.03
0.01
N/A
N/A
N/A
N/A
N/A
DAVID
GeneMania
GeneMania
WebGestalt
CTen
CTen
PM12 69 Signal peptide
Telomere maintenance
Ovary
Atrioventricular node
N/A
0.00004
0.01
0.01
0.00003
N/A
N/A
N/A
DAVID
GeneMania
CTen
CTen
PM13 318 Signal peptide
Biological rhythms
Epstein-Barr virus infection
Skeletal muscle
Uterus
N/A
N/A
0.009
0.0002
0.006
1e-09
0.006
N/A
N/A
N/A
DAVID
DAVID
WebGestalt
CTen
CTen
PM14 81 Signal peptide
Bone marrow
N/A
0.01
0.00003
N/A
DAVID
CTen
PM15 118 Signal peptide
TNF mediated signaling
Kaposi sarcoma
T- cell activation
N/A
N/A
0.01
0.004
1e-08
0.0005
N/A
N/A
DAVID
DAVID
WebGestalt
GeneMania
PM16 170 Poly(A) RNA binding
Acetylation
Ubiquitin conjugation
Secreted proteins
Antibiotic activity
Neutrophil degranulation
Inflammation
Liver carcinoma
RNA spliceosome
Bone marrow
CD33+ myeloid
N/A
N/A
N/A
N/A
N/A
1e-12
1e-06
0.01
1e-07
5e-14
4e-13
1e-08
3e-07
8e-07
0.00001
0.00002
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
DAVID
DAVID
DAVID
DAVID
WebGestalt
WebGestalt
WebGestalt
GeneMania
CTen
CTen
PM17 53 Acetylation
Phosphoprotein
Stress
Vesicle mediated transport
SNARE complex
N/A
N/A
0.0009
0.003
0.002
1e-06
3e-06
N/A
N/A
N/A
DAVID
DAVID
WebGestalt
WebGestalt
GeneMania
PM18 83 Cytoplasm
ERBB signaling pathway
Platelet activation
EGF / EGFR signaling pathway
Drug-drug interaction
CD28 costimulation
Focal adhesion
FCg mediated phagocytosis
CCKR signaling
Angiogenesis
CD56+ NK Cells
N/A
2e-13
3e-13
1e-12
1e-10
1e-08
5e-07
3e-06
0.0004
0.01
0.0003
1e-15
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
GeneMania
GeneMania
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
40
CD19+ B cells 0.0028 N/A CTen
PM19 81 Acetylation
Extracellular exosomes
Hereditary hemolytic anemia
Cofactor metabolic process
Protein folding
CD71+ early erythroid
CD105+ endothelial
N/A
N/A
N/A
0.0005
0.002
1e-07
0.0001
1e-28
2e-14
0.00001
N/A
N/A
N/A
N/A
DAVID
DAVID
DAVID
GeneMania
GeneMania
CTen
CTen
PM20 32 Signal peptide
Immunoglobulin C1
Lymph node
Small intestine
N/A
N/A
0.01
0.02
4e-10
2e-06
N/A
N/A
DAVID
DAVID
CTen
CTen
PM21 18 Cellular ion homeostasis 0.004 N/A GeneMania
PM22 39 Signal peptide
Calcium ion binding
Bronchial epithelial cells
Adipocyte
N/A
N/A
0.01
0.02
3e-08
0.00002
N/A
N/A
DAVID
DAVID
CTen
CTen
PM23 35 Extracellular exosome
Biosynthesis of antibiotics
NAD(P)-binding domains
Disease mutation
Metabolic pathways
Amino acid metabolism
Carbon metabolism
Metabolism, inborn errors
Ethanol oxidation
Oxidoreductase
Liver
Kidney
Small intestine
Adrenal gland
N/A
N/A
N/A
N/A
1e-10
5e-10
0.00001
0.0001
1e-09
3e-07
2e-14
7e-06
0.00008
0.0003
3e-08
1e-07
3e-06
0.0001
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
DAVID
DAVID
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
GeneMania
GeneMania
CTen
CTen
CTen
CTen
PM24 37 Secreted proteins
Protein activation cascade
Vesicle lumen
Complement activation
Platelet degranulation
Thrombosis
Fetal liver
Fetal lung
Lymph node
N/A
1e-18
1e-15
2e-10
2e-09
3e-09
1e-21
5e-10
6e-06
1e-27
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
GeneMania
GeneMania
GeneMania
WebGestalt
WebGestalt
CTen
CTen
CTen
PM25 30 Signal peptide N/A 0.003 DAVID
PM26 390 Signal peptide
Extracellular exosome
Ephrin receptor signaling
Inflammation
Glomerular filtration rate
Spontenous abortion
Axon guidance
Osteoporosis
Prostatic neoplasms
Smooth muscle
Adipocyte
Lung
N/A
N/A
2e-08
5e-08
0.00001
0.00002
0.00002
0.04
0.04
1e-06
4e-06
6e-06
6e-101
1e-19
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
DAVID
GeneMania
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
CTen
CTen
PM27 378 Signal peptide N/A 1e-113 DAVID
41
Extracellular exosome
Cell adhesion (CAMs)
Extracellular matrix organization
Collagen diseases
Vascular diseases
Axon guidance
Neoplasm metastasis
Osteoblast signaling
Adipocyte
Uterus
Smooth muscle
N/A
1e-30
1e-18
7e-07
1e-06
0.00001
0.0005
0.005
2e-15
6e-12
1e-09
1e-20
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
CTen
CTen
42
Table S9. Tissue specific expression of individual serum proteins GTEx gene expression data (https://www.gtexportal.org) related to potential tissue of origin
of individual proteins. The Z>9.24 represents the top 0.5% of all tissue-specific Z-scores for
the proteins measured.
(Excel table hosted online)
Table S10. Tissue specific expression of serum protein modules
GTEx gene expression data (https://www.gtexportal.org) related to potential tissue of origin
of individual protein modules using a Z>2.75 cut-off, i.e. the top 2.5% of tissue specificity.
The numbers refer to percentage of all proteins in each module passing this cut-off.
(Excel table hosted online)
43
Table S11. Enrichment of functional categories in the different superclusters
Functional categories enriched in the five super-clusters using annotation tools like
WebGestalt, DAVID, GeneMANIA and CTen (64-67). N/A, not applicable. GEFs,
guanine nucleotide exchange factors.
Modules Super-
cluster
Over-representation of
pathways & tissues
FDR P-value
(Bonferroni
adjusted)
Database
PM1 I Signal peptide
Autoimmunity
Notch signaling
BDCA4+ dendritic cells
N/A
0.03
0.00001
0.01
0.0007
N/A
N/A
N/A
DAVID
WebGestalt
GeneMANIA
CTen
PM2-10 II Signal peptide
Immune diseases
Necrosis
Inflammation
Cytokine
Growth factor
Jak STAT signaling
PI3K-AKT signaling
N/A
1e-100
1e-90
1e-90
1e-34
1e-33
3e-18
1e-15
1e-169
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
WebGestalt
PM11-15 III Signal peptide
MAPK cascade
Ras GEFs
Extracellular matrix
N/A
N/A
N/A
3e-30
1e-31
3e-18
2e-18
N/A
DAVID
DAVID
DAVID
WebGestalt
PM16-19 IV Extracellular exosomes
Kit receptor signaling
Drug-drug interaction
Nucleotide binding
Fc epsilon RI pathway
Bone marrow
CD33+ myeoloid
N/A
1e-30
1e-20
7e-12
1e-06
1e-14
1e-12
4e-28
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
CTen
PM20-27 V Signal peptide
Extracellular exosome
Biological adhesion
Neoplasm invasivness
Angiogenesis
Axon guidance
Adipocyte
Smooth muscle
Lung
N/A
N/A
1e-43
1e-20
1e-09
1e-10
1e-20
1e-15
1e-13
1e-251
3e-56
N/A
N/A
N/A
N/A
N/A
N/A
N/A
DAVID
DAVID
WebGestalt
WebGestalt
WebGestalt
WebGestalt
CTen
CTen
CTen
44
Table S12. Association of the modules E(q)
s to disease related phenotypic measures
Correlation of different modules E(q)
s to various disease related outcomes in the AGES study
cohort. The significance threshold of module trait correlations to outcome data was set at a
conservative P-value <110-7
. N/A, not applicable; NS, not significant.
E(module)
Size Super-
cluster
Outcome* Data N cases, events,
measurements
Direction
of effect
P-value
PM1 31 I VAT
MetS
SAT
T2D
Survival
Survival
CHD
HF
Prevalent
Prevalent
Prevalent
Prevalent
Post CHD
Overall
Incident
Incident
5239
1127
5239
654
692
2982
872
440
Direct
Direct
Direct
Direct
Direct
Direct
Direct
Direct
1e-65
1e-55
5e-27
2e-19
1e-17
<1e-14
6e-13
1e-11
PM2 86 II N/A N/A N/A N/A NS
PM3 921 II N/A N/A N/A N/A NS
PM4 86 II CHD
VAT
Prevalent
Prevalent
1217
5239
Inverse
Direct
2e-13
6e-12
PM5 65 II HF
MetS
CHD
HF
Prevalent
Prevalent
Prevalent
Incident
172
1127
1217
440
Inverse
Inverse
Inverse
Inverse
4e-18
2e-14
8e-14
8e-09
PM6 157 II HF
CHD
HF
Survival
Prevalent
Prevalent
Incident
Overall
172
1217
440
2982
Inverse
Inverse
Inverse
Inverse
1e-26
3e-14
2e-12
8e-08
PM7 88 II HF
Survival
Prevalent
Overall
172
2982
Inverse
Inverse
3e-20
2e-10
PM8 84 II VAT
HF
Survival
Prevalent
Prevalent
Overall
5239
172
2982
Direct
Inverse
Inverse
2e-22
4e-11
1e-09
PM9 286 II HF
VAT
CHD
HF
Survival
Prevalent
Prevalent
Prevalent
Incident
Overall
172
5239
1217
440
2982
Inverse
Direct
Inverse
Inverse
Inverse
2e-23
1e-13
1e-09
2e-09
3e-09
PM10 312 II HF
Survival
Prevalent
Overall
172
2982
Inverse
Inverse
7e-17
1e-12
PM11 26 III MetS
CHD
Prevalent
Prevalent
1127
1217
Direct
Direct
1e-15
1e-14
PM12 69 III N/A N/A N/A N/A NS
PM13 318 III N/A N/A N/A N/A NS
PM14 81 III N/A N/A N/A N/A NS
PM15 118 III N/A N/A N/A N/A NS
PM16 170 IV CHD
CHD
VAT
MetS
Survival
Prevalent
Incident
Prevalent
Prevalent
Overall
1217
872
5239
1127
2982
Direct
Direct
Direct
Direct
Direct
1e-18
5e-17
3e-16
1e-12
2e-12
45
*Survival probability was estimated either as post incident CHD or overall survival post entry into the
AGES study (see material and methods). Data were analyzed using forward linear or logistic
regression or Cox proportional hazards regression, depending on the outcome being continuous, binary
or a time to an event. Abbreviations: MetS, metabolic syndrome; VAT, visceral adipose tissue via CT;
SAT, subcutaneous adipose tissue via CT; T2D, type 2 diabetes; CHD, coronary heart disease; HF,
heart failure. See table S2 for descriptive statistics of the study cohort.
HF
HF
Prevalent
Incident
172
440
Direct
Direct
3e-12
2e-09
PM17 53 IV HF
CHD
HF
CHD
Survival
Survival
SAT
MetS
Prevalent
Prevalent
Incident
Incident
Overall
Post CHD
Prevalent
Prevalent
172
1217
440
872
2982
692
5339
1127
Direct
Direct
Direct
Direct
Direct
Direct
Direct
Direct
2e-22
6e-22
1e-18
3e-18
<1e-16
3e-11
1e-10
1e-09
PM18 83 IV N/A N/A N/A N/A NS
PM19 81 IV N/A N/A N/A N/A NS
PM20 32 V N/A N/A N/A N/A NS
PM21 18 V N/A N/A N/A N/A NS
PM22 39 V N/A N/A N/A N/A NS
PM23 35 V VAT
MetS
SAT
T2D
CHD
Prevalent
Prevalent
Prevalent
Prevalent
Prevalent
5239
1127
5239
654
1217
Direct
Direct
Direct
Direct
Direct
6e-90
8e-65
4e-42
5e-32
2e-14
PM24 37 V VAT
MetS
SAT
Prevalent
Prevalent
Prevalent
5239
1127
5239
Inverse
Inverse
Inverse
2e-18
3e-12
4e-10
PM25 30 V N/A N/A N/A N/A NS
PM26 390 V HF
HF
Survival
CHD
CHD
Survival
Prevalent
Incident
Overall
Prevalent
Incident
Post CHD
172
440
2982
1217
872
692
Direct
Direct
Direct
Direct
Direct
Direct
5e-20
2e-18
<1e-16
5e-13
2e-10
1e-08
PM27 378 V VAT
MetS
HF
Survival
Prevalent
Prevalent
Prevalent
Overall
5239
1127
172
2982
Inverse
Inverse
Direct
Direct
6e-34
4e-11
6e-09
9e-08
46
Table S13. Cis-acting serum pSNP-protein pairs All cis-acting pSNP-protein pairs detected within a 300kb window across and including a
given serum protein encoding gene. For each specific cis effect we report the single strongest
one (lead pSNP), and do not consider multiple independent cis effects per region.
(Excel table hosted online)
Table S14. Cross-referencing cis acting serum pSNP-proteins with eSNP-transcript
pairs Matching cis pSNP-proteins to expression eSNPs-transcripts pairs identified in >30 solid
tissues or cell types, using the stringent cutoffs of P < 5e-08 for significance and r2≥0.8 for
SNP proxy.
(Excel table hosted online)
Table S15. Cross-referencing cis acting serum pSNPs with GWAS lead SNPs
Cross-referencing cis pSNPs to genome-wide significant GWAS lead SNPs, using the
stringent cutoffs of P < 5×10-8 for significance and r2≥0.8 for pSNP proxy.
(Excel table hosted online)
47
Table S16. Highlighted examples of serum pSNP-protein pairs underlying GWAS risk
Selected examples of genome-wide significant GWAS risk loci for various disease-related
outcomes (68), showing cis and/or trans acting effects on serum protein levels in the AGES
study population. P-value threshold for significant trans effects was set at P<510-7
based on
Bonferroni corrections (20 GWAS loci and SOMAmers tested). The proximal cis acting
window was 150kb from 5´ and 3´ of a gene and including the gene in question. MA, minor
allele.
GWAS
risk locus
Phenotype Reported
gene
PMID Protein(s)
affected
Cis or
trans Effect ()
per MA
P-value
rs1892094 CHD ATP1B1 28530674 SIGLEC11
EHBP1
Trans
Trans
-0.134
-0.157
3e-10
7e-08
rs2820315 CHD LMOD1 28530674 RNPEP Cis -0.157 2e-09
rs2258287 CHD HNF1A 28530674 CRP Trans -0.156 4e-09
rs1050362 CHD DHX38 28530674 APOL1
SERPIND1
APOA1
Trans
Trans
Trans
0.373
-0.148
0.130
6e-51
1e-08
5e-08
rs867186 CHD PROCR 28530674 PROC Trans 0.630 6e-59
rs964184 CHD ZNF259
APOA1
APOC3
APOA4
APOA5
21378990 APOA5
NXPH2
PCSK7
ANGPTL3
FAM159B
APOC3
LRP1B
Cis
Trans
Trans
Trans
Trans
Cis
Trans
-0.371
-0.276
-0.272
0.213
0.206
0.194
0.189
8e-24
2e-13
3e-13
8e-09
3e-08
6e-08
7e-08
rs1165669 CHD HSP90B1 21626137
26343387
HSP90B1
HNRNPM
Cis
Trans
0.836
0.564
6e-228
7e-95
rs10840293 CHD SWAP70 26343387 SWAP70 Cis 0.281 7e-30
rs579459 CHD ABO
LCN1P2
21378990
ABO
SELE
ADGRF5
ROBO4
IL3RA
QSOX2
INSR
ICAM2
FAM3D
KDR
EPHA4
ICAM5
FLT4
F8
ENG
ISLR2
KIN
GOLM1
CD200
MET
GLCE
LIFR
C1GALT1C1
SHANK3
ICAM4
ACE
Cis
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
1.087
-0.963
-0.815
-0.730
-0.613
0.625
-0.595
-0.593
0.581
-0.508
-0.492
-0.457
-0.442
0.393
-0.394
-0.387
-0.376
0.379
-0.357
-0.367
0.360
-0.337
0.326
0.314
-0.323
-0.316
8e-244
9e-193
8e-125
1e-114
5e-81
4e-73
3e-70
3e-67
8e-66
5e-50
2e-49
5e-40
1e-36
9e-32
1e-30
1e-28
2e-28
2e-28
2e-25
3e-25
1e-24
1e-22
2e-20
4e-20
5e-20
2e-19
48
CHST15
SELP
IGF1R
CDH5
VWF
SEMA6A
L1CAM
CD109
CCL28
IL6ST
CHST12
DPEP2
JAG1
MBL2
B3GNT2
GNS
PEAR1
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
Trans
-0.292
-0.248
-0.237
-0.242
0.244
-0.236
-0.232
-0.211
-0.217
-0.202
0.199
-0.196
-0.185
0.194
0.191
-0.189
-0.184
1e-17
7e-13
1e-12
1e-12
2e-12
6e-12
2e-11
7e-10
8e-10
3e-09
1e-08
2e-08
2e-08
2e-08
5e-08
7e-08
9e-08
rs6235
Adiposity
Proinsulin
PCSK1
PCSK1
18604207
21873549
PCSK1 Cis 0.979 1e-300
rs7756992 T2D CDKAL1 24509480 MLN Trans 0.245 2e-12
rs3132524 T2D TCF19
POU5F1
24509480 C4A/B
TGM3
DNAJC10
KIR2DS2
H6PD
Trans
Trans
Trans
Trans
Trans
0.244
0.215
0.204
0.164
0.145
2e-20
6e-17
3e-15
2e-10
4e-08
rs16861329 T2D ST6GAL1 21874001 ST6GAL1 Trans 0.193 2e-08
49
Table S17. Identification of cis-to-trans effects on serum proteins
All cis-to-trans pSNP-protein pairs effects detected in the present study using P < 1×10-8
after
Bonferroni corrections (number of proteins and number of cis effects) for significant hits.
(Excel table hosted online)
50
Table S18. Replication of previously reported cis and trans pSNPs
Confirmation and comparison, via application of the custom designed SOMAscan platform in
the AGES cohort, of known cis and trans acting pSNPs-proteins across different study
populations and proteomic technologies. The percentage confirmed applies to proteins
detected in the AGES. N/A, not applicable. See material and methods for more details.
Study (reference) Platform Number % Confirmed
Cis Trans Subjects Proteins Cis Trans
Johansson et al. (54) Mass
spectometry
1,060 163 5 0 100 N/A
Kim et al. (55)
Immunoassay 521 132 28 0 73.9 N/A
Enroth et al. (56) Immunoassay 1,005 92 23 0 62.5 N/A
Liu et al. (57) Mass
spectometry
232 342 13 0 87.5 N/A
Suhre et al. (59) SOMAmers 1,000 1,124 384 148 88.3 84.5
Sun et al. (58) SOMAmers 3,301 2,994 552 1,104 75.7 72.8
Emilsson et al.
(present study)
SOMAmers 5492 4,173 1,046 911 N/A N/A
51
Table S19. Common variants associated with module E(q)
s
Identification of genetic variants associated with different modules E(q)
s. Associations were
considered genome-wide significant when P<510-8
(Bonferroni adjusted) while the P-values
for suggestive evidence of association were between 510-8
and 510-6
. N/A, not applicable.
E(q)
Lead
npSNP
P-value Known GWAS
SNP (r2≥0.8)*
GWAS
phenotype**
PMID***
PM1 rs204896 1e-09 rs204896 RA 24390342
PM2 rs704 3e-11 rs704 OPG 25080503
PM3 rs6813952
rs7144389
3e-09
2e-12
None
None
N/A
N/A
N/A
N/A
PM4 rs10761731 1e-08 rs10761731 Platelets, TG
22139419
20686565
PM6 rs13026392 6e-07 None N/A N/A
PM7 rs704
rs887829
<1e-300
1e-25
rs704
rs887829
OPG
Bilirubin
25080503
19414484
PM9 rs1250229 2e-39 rs1250229 LDL 24097068
PM10 rs704 1e-70 rs704 OPG 25080503
PM11 rs445925
rs157582
rs6857
rs1803274
1e-88
1e-86
1e-66
1e-15
rs157582
rs445925
rs6857
rs844200
rs6445035
LDL, CHD, Lp-PLA2
LOAD, TG
LOAD, LDL
BCHE
ASPA
28334899
22005930
24162737
21862451
23508960
PM12 rs17836931 1e-07 None N/A N/A
PM13 rs1329424
rs541862
9e-13
2e-11
rs1329424
rs541862
AMD, NV
RA, AMD, NV
23455636
24390342
23455636
PM14 rs541862 3e-10 rs541862 RA, AMD, NV 24390342
23455636
PM15 rs1329424
rs389512
6e-18
1e-17
rs1329424
rs389512
rs406936
AMD, NV
AMD, RA, NV
T1D
23455636
22694956
24390342
PM16 rs1970793 2e-07 None N/A N/A
PM17 rs17080938 1e-07 None N/A N/A
PM18 rs2562545 1e-09 None N/A N/A
PM19 rs17091323 3e-08 None N/A N/A
PM20 rs719482
rs2885162
4e-66
2e-07
None
None
N/A
N/A
N/A
N/A
PM21 rs719482 1e-180 None N/A N/A
PM23 rs357707 3e-10 None N/A N/A
PM26 rs881029 1e-07 None N/A N/A
PM27 rs6683597 4e-08 None N/A N/A
*For a qualified proxy the correlation between npSNP and corresponding lead GWAS SNP was
r2≥0.8.
**RA, rheumatoid arthritis; OPG, Osteoprotegerin levels; Metabolites, blood metabolites; LDL, LDL-
cholesterol levels; LOAD, late-onset Alzheimer´s disease; TG, triglyceride levels; BCHE,
butyrylcholinesterase; ASPA, plasma aspirin activity; AMD, age-related macular degeneration; NV,
neovascularization
***Known GWAS findings are reported in the PhenoScanner (20), and/or the GWAS catalogue (68).
52
Table S20. Effects of network associated SNP (npSNP) on individual serum proteins
The SNPs associated with module E(q)
s listed in table S19 mediated cis and trans acting
effects on multiple proteins which cluster within specific protein modules. The genome-wide
significant association threshold for individual cis and trans effects mediated by the npSNPs
was set at Bonferroni adjusted P<510-7
(corrected for number of aptamers and npSNPs
tested). FET, Fisher exact test. N/A, not applicable.
E(q)
Lead
npSNP
Adjacent
cis effect(s)
#Trans
effects
Module
affected
#Cis and trans
effects in module
FET
P-value
PM1 rs204896
C4B, TNXB 78 PM13
PM15
33
18
1e-19
6e-13
PM2 rs704 VTN
698 PM2
PM4
PM6
PM7
PM10
27
34
68
87
160
1e-06
3e-10
4e-21
1e-75
4e-54
PM3 rs6813952 None 81 PM3 67 4e-39
PM4 rs10761731 None 27 PM3 18 4e-09
PM6 rs13026392 None 61 PM6
PM7
22
15
1e-17
4e-13
PM7
rs704
rs887829
VTN
UGT1A6
698
8
PM2
PM4
PM6
PM7
PM10
PM1
27
34
68
87
160
7
1e-06
3e-10
4e-21
1e-75
4e-54
1e-14
PM9 rs1250229 FN1 6 None N/A N/A
PM10 rs704
VTN
698
PM2
PM4
PM6
PM7
PM10
27
34
68
87
160
1e-06
3e-10
4e-21
1e-75
4e-54
PM11 rs445925
rs157582
rs6857
rs1803274
APOE
APOE
None
BCHE
37
37
35
20
PM11
PM11
PM11
PM11
16
19
19
9
4e-25
1e-31
5e-32
3e-15
PM12 rs17836931 None 27 PM12
PM14
6
8
2e-06
1e-08
PM13 rs1329424
rs541862
CFHR1, 4, 5
C4A/B, CFB
129
106
PM13
PM15
PM13
PM15
48
32
55
37
1e-24
3e-22
6e-37
3e-31
PM14 rs541862 C4A/B, CFB 106 PM13
PM15
55
37
6e-37
3e-31
PM15 rs1329424
rs389512
CFHR1, 4, 5
C4A/B, CFB
129
158
PM13
PM15
PM13
PM15
48
32
68
46
1e-25
3e-22
1e-38
4e-34
PM16 rs1970793 None 19 PM16 15 4e-19
PM17 rs17080938 None 7 PM17 5 3e-09
PM18 rs2562545 None 9 None N/A N/A
PM19 rs17091323 None 11 PM19
PM26
3
5
0.0006
0.0007
PM20 rs719482 IGHG1-4 68 PM20 25 7e-34
53
rs2885162
None
10
PM25
PM20
23
6
6e-31
2e-11
PM21 rs719482
IGHG1-4
68
PM20
PM25
25
23
7e-34
6e-31
PM23 rs357707 None 20 PM23 11 2e-18
PM26 rs881029 None 43 PM26 31 8e-26
PM27 rs6683597 None 28 PM27
PM26
15
9
1e-10
0.0001
54
Table S21. Tissue specificity of cis-to-trans protein pairs Tissue specific expression of transcripts encoding the cis-to-trans regulated proteins based on
53 different human tissues (median RPKM by tissue) downloaded from GTEx
(https://www.gtexportal.org) on 07/25/2017.
(Excel table hosted online)
Table S22. Tissue specificity of npSNPs
Tissue-specificity of network-associated protein SNPs (npSNPs).
(Excel table hosted online)
55
References and Notes 1. J. M. Schwenk, G. S. Omenn, Z. Sun, D. S. Campbell, M. S. Baker, C. M. Overall, R.
Aebersold, R. L. Moritz, E. W. Deutsch, The Human Plasma Proteome Draft of 2017: Building on the Human Plasma PeptideAtlas from Mass Spectrometry and Complementary Assays. J. Proteome Res. 16, 4299–4310 (2017). doi:10.1021/acs.jproteome.7b00467 Medline
2. M. Uhlén, L. Fagerberg, B. M. Hallström, C. Lindskog, P. Oksvold, A. Mardinoglu, Å. Sivertsson, C. Kampf, E. Sjöstedt, A. Asplund, I. Olsson, K. Edlund, E. Lundberg, S. Navani, C. A.-K. Szigyarto, J. Odeberg, D. Djureinovic, J. O. Takanen, S. Hober, T. Alm, P.-H. Edqvist, H. Berling, H. Tegel, J. Mulder, J. Rockberg, P. Nilsson, J. M. Schwenk, M. Hamsten, K. von Feilitzen, M. Forsberg, L. Persson, F. Johansson, M. Zwahlen, G. von Heijne, J. Nielsen, F. Pontén, Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015). doi:10.1126/science.1260419 Medline
3. M. Stastna, J. E. Van Eyk, Secreted proteins as a fundamental source for biomarker discovery. Proteomics 12, 722–735 (2012). doi:10.1002/pmic.201100346 Medline
4. I. M. Conboy, M. J. Conboy, A. J. Wagers, E. R. Girma, I. L. Weissman, T. A. Rando, Rejuvenation of aged progenitor cells by exposure to a young systemic environment. Nature 433, 760–764 (2005). doi:10.1038/nature03260 Medline
5. S. A. Villeda, J. Luo, K. I. Mosher, B. Zou, M. Britschgi, G. Bieri, T. M. Stan, N. Fainberg, Z. Ding, A. Eggel, K. M. Lucin, E. Czirr, J.-S. Park, S. Couillard-Després, L. Aigner, G. Li, E. R. Peskind, J. A. Kaye, J. F. Quinn, D. R. Galasko, X. S. Xie, T. A. Rando, T. Wyss-Coray, The ageing systemic milieu negatively regulates neurogenesis and cognitive function. Nature 477, 90–94 (2011). doi:10.1038/nature10357 Medline
6. E. E. Schadt, Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 (2009). doi:10.1038/nature08454 Medline
7. B. Zhang, C. Gaiteri, L.-G. Bodea, Z. Wang, J. McElwee, A. A. Podtelezhnikov, C. Zhang, T. Xie, L. Tran, R. Dobrin, E. Fluder, B. Clurman, S. Melquist, M. Narayanan, C. Suver, H. Shah, M. Mahajan, T. Gillis, J. Mysore, M. E. MacDonald, J. R. Lamb, D. A. Bennett, C. Molony, D. J. Stone, V. Gudnason, A. J. Myers, E. E. Schadt, H. Neumann, J. Zhu, V. Emilsson, Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer’s disease. Cell 153, 707–720 (2013). doi:10.1016/j.cell.2013.03.030 Medline
8. V. Emilsson, G. Thorleifsson, B. Zhang, A. S. Leonardson, F. Zink, J. Zhu, S. Carlson, A. Helgason, G. B. Walters, S. Gunnarsdottir, M. Mouy, V. Steinthorsdottir, G. H. Eiriksdottir, G. Bjornsdottir, I. Reynisdottir, D. Gudbjartsson, A. Helgadottir, A. Jonasdottir, A. Jonasdottir, U. Styrkarsdottir, S. Gretarsdottir, K. P. Magnusson, H. Stefansson, R. Fossdal, K. Kristjansson, H. G. Gislason, T. Stefansson, B. G. Leifsson, U. Thorsteinsdottir, J. R. Lamb, J. R. Gulcher, M. L. Reitman, A. Kong, E. E. Schadt, K. Stefansson, Genetics of gene expression and its effect on disease. Nature 452, 423–428 (2008). doi:10.1038/nature06758 Medline
9. Y. Chen, J. Zhu, P. Y. Lum, X. Yang, S. Pinto, D. J. MacNeil, C. Zhang, J. Lamb, S. Edwards, S. K. Sieberts, A. Leonardson, L. W. Castellini, S. Wang, M.-F. Champy, B. Zhang, V. Emilsson, S. Doss, A. Ghazalpour, S. Horvath, T. A. Drake, A. J. Lusis, E. E. Schadt, Variations in DNA elucidate molecular networks that cause disease. Nature 452, 429–435 (2008). doi:10.1038/nature06757 Medline
56
10. D. R. Davies, A. D. Gelinas, C. Zhang, J. C. Rohloff, J. D. Carter, D. O’Connell, S. M. Waugh, S. K. Wolk, W. S. Mayfield, A. B. Burgin, T. E. Edwards, L. J. Stewart, L. Gold, N. Janjic, T. C. Jarvis, Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. Proc. Natl. Acad. Sci. U.S.A. 109, 19971–19976 (2012). doi:10.1073/pnas.1213933109 Medline
11. L. Gold, D. Ayers, J. Bertino, C. Bock, A. Bock, E. N. Brody, J. Carter, A. B. Dalby, B. E. Eaton, T. Fitzwater, D. Flather, A. Forbes, T. Foreman, C. Fowler, B. Gawande, M. Goss, M. Gunn, S. Gupta, D. Halladay, J. Heil, J. Heilig, B. Hicke, G. Husar, N. Janjic, T. Jarvis, S. Jennings, E. Katilius, T. R. Keeney, N. Kim, T. H. Koch, S. Kraemer, L. Kroiss, N. Le, D. Levine, W. Lindsey, B. Lollo, W. Mayfield, M. Mehan, R. Mehler, S. K. Nelson, M. Nelson, D. Nieuwlandt, M. Nikrad, U. Ochsner, R. M. Ostroff, M. Otis, T. Parker, S. Pietrasiewicz, D. I. Resnicow, J. Rohloff, G. Sanders, S. Sattin, D. Schneider, B. Singer, M. Stanton, A. Sterkel, A. Stewart, S. Stratford, J. D. Vaught, M. Vrkljan, J. J. Walker, M. Watrobka, S. Waugh, A. Weiss, S. K. Wilcox, A. Wolfson, S. K. Wolk, C. Zhang, D. Zichi, Aptamer-based multiplexed proteomic technology for biomarker discovery. PLOS ONE 5, e15004 (2010). doi:10.1371/journal.pone.0015004 Medline
12. T. B. Harris, L. J. Launer, G. Eiriksdottir, O. Kjartansson, P. V. Jonsson, G. Sigurdsson, G. Thorgeirsson, T. Aspelund, M. E. Garcia, M. F. Cotch, H. J. Hoffman, V. Gudnason, Age, Gene/Environment Susceptibility-Reykjavik Study: Multidisciplinary applied phenomics. Am. J. Epidemiol. 165, 1076–1087 (2007). doi:10.1093/aje/kwk115 Medline
13. A. L. Barabási, R. Albert, Emergence of scaling in random networks. Science 286, 509–512 (1999). doi:10.1126/science.286.5439.509 Medline
14. B. Zhang, S. Horvath, A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, e17 (2005). doi:10.2202/1544-6115.1128 Medline
15. P. Langfelder, R. Luo, M. C. Oldham, S. Horvath, Is my network module preserved and reproducible? PLOS Comput. Biol. 7, e1001057 (2011). doi:10.1371/journal.pcbi.1001057 Medline
16. L. Shu, K. H. K. Chan, G. Zhang, T. Huan, Z. Kurt, Y. Zhao, V. Codoni, D.-A. Trégouët, J. Yang, J. G. Wilson, X. Luo, D. Levy, A. J. Lusis, S. Liu, X. Yang; Cardiogenics Consortium, Shared genetic regulatory networks for cardiovascular disease and type 2 diabetes in multiple populations of diverse ethnicities in the United States. PLOS Genet. 13, e1007040 (2017). doi:10.1371/journal.pgen.1007040 Medline
17. H. Jeong, S. P. Mason, A. L. Barabási, Z. N. Oltvai, Lethality and centrality in protein networks. Nature 411, 41–42 (2001). doi:10.1038/35075138 Medline
18. A. L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: A network-based approach to human disease. Nat. Rev. Genet. 12, 56–68 (2011). doi:10.1038/nrg2918 Medline
19. M. Muñoz, R. Pong-Wong, O. Canela-Xandri, K. Rawlik, C. S. Haley, A. Tenesa, Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nat. Genet. 48, 980–983 (2016). Medline
20. J. R. Staley, J. Blackshaw, M. A. Kamat, S. Ellis, P. Surendran, B. B. Sun, D. S. Paul, D. Freitag, S. Burgess, J. Danesh, R. Young, A. S. Butterworth, PhenoScanner: A database of human genotype-phenotype associations. Bioinformatics 32, 3207–3209 (2016). doi:10.1093/bioinformatics/btw373 Medline
57
21. N. Mähler, J. Wang, B. K. Terebieniec, P. K. Ingvarsson, N. R. Street, T. R. Hvidsten, Gene co-expression network connectivity is an important determinant of selective constraint. PLOS Genet. 13, e1006402 (2017). doi:10.1371/journal.pgen.1006402 Medline
22. J. K. Pickrell, T. Berisa, J. Z. Liu, L. Ségurel, J. Y. Tung, D. A. Hinds, Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016). doi:10.1038/ng.3570 Medline
23. M. Franchini, G. Lippi, The intriguing relationship between the ABO blood group, cardiovascular disease, and cancer. BMC Med. 13, 7 (2015). doi:10.1186/s12916-014-0250-y Medline
24. M. Franchini, F. Capra, G. Targher, M. Montagnana, G. Lippi, Relationship between ABO blood group and von Willebrand factor levels: From biology to clinical implications. Thromb. J. 5, 14 (2007). doi:10.1186/1477-9560-5-14 Medline
25. E. A. Boyle, Y. I. Li, J. K. Pritchard, An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell 169, 1177–1186 (2017). doi:10.1016/j.cell.2017.05.038 Medline
26. D. Alfego, U. Rodeck, A. Kriete, Global mapping of transcription factor motifs in human aging. PLOS ONE 13, e0190457 (2018). doi:10.1371/journal.pone.0190457 Medline
27. J. Yang, T. Huang, F. Petralia, Q. Long, B. Zhang, C. Argmann, Y. Zhao, C. V. Mobbs, E. E. Schadt, J. Zhu, Z. Tu; GTEx Consortium, Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Sci. Rep. 5, 15145 (2015). doi:10.1038/srep15145 Medline
28. J. M. Zahn, S. Poosala, A. B. Owen, D. K. Ingram, A. Lustig, A. Carter, A. T. Weeraratna, D. D. Taub, M. Gorospe, K. Mazan-Mamczarz, E. G. Lakatta, K. R. Boheler, X. Xu, M. P. Mattson, G. Falco, M. S. H. Ko, D. Schlessinger, J. Firman, S. K. Kummerfeld, W. H. Wood 3rd, A. B. Zonderman, S. K. Kim, K. G. Becker, AGEMAP: A gene expression database for aging in mice. PLOS Genet. 3, e201 (2007). doi:10.1371/journal.pgen.0030201 Medline
29. P. Langfelder, B. Zhang, S. Horvath, Defining clusters from a hierarchical cluster tree: The Dynamic Tree Cut package for R. Bioinformatics 24, 719–720 (2008). doi:10.1093/bioinformatics/btm563 Medline
30. G. Csardi, T. Nepusz, The igraph software package for complex network research. InterJournal. Complex Syst. 1695, 1 (2006).
31. American Diabetes Association, Diagnosis and classification of diabetes mellitus. Diabetes Care 36 (suppl. 1), S67–S74 (2013). doi:10.2337/dc13-S067 Medline
32. A. Agarwala, S. Virani, D. Couper, L. Chambless, E. Boerwinkle, B. C. Astor, R. C. Hoogeveen, J. Coresh, A. R. Sharrett, A. R. Folsom, T. Mosley, C. M. Ballantyne, V. Nambi, Biomarkers and degree of atherosclerosis are independently associated with incident atherosclerotic cardiovascular disease in a primary prevention cohort: The ARIC study. Atherosclerosis 253, 156–163 (2016). doi:10.1016/j.atherosclerosis.2016.08.028 Medline
33. Y. Hathout, E. Brody, P. R. Clemens, L. Cripe, R. K. DeLisle, P. Furlong, H. Gordish-Dressman, L. Hache, E. Henricson, E. P. Hoffman, Y. M. Kobayashi, A. Lorts, J. K. Mah, C. McDonald, B. Mehler, S. Nelson, M. Nikrad, B. Singer, F. Steele, D. Sterling, H. L. Sweeney, S. Williams, L. Gold, Large-scale serum protein biomarker
58
discovery in Duchenne muscular dystrophy. Proc. Natl. Acad. Sci. U.S.A. 112, 7153–7158 (2015). doi:10.1073/pnas.1507719112 Medline
34. J. Candia, F. Cheung, Y. Kotliarov, G. Fantoni, B. Sellers, T. Griesman, J. Huang, S. Stuccio, A. Zingone, B. M. Ryan, J. S. Tsang, A. Biancotto, Assessment of Variability in the SOMAscan Assay. Sci. Rep. 7, 14248 (2017). doi:10.1038/s41598-017-14755-5 Medline
35. K. J. Max Kuhn, Applied Predictive Modeling (Springer, 2013).
36. J. Barretina, G. Caponigro, N. Stransky, K. Venkatesan, A. A. Margolin, S. Kim, C. J. Wilson, J. Lehár, G. V. Kryukov, D. Sonkin, A. Reddy, M. Liu, L. Murray, M. F. Berger, J. E. Monahan, P. Morais, J. Meltzer, A. Korejwa, J. Jané-Valbuena, F. A. Mapa, J. Thibault, E. Bric-Furlong, P. Raman, A. Shipway, I. H. Engels, J. Cheng, G. K. Yu, J. Yu, P. Aspesi Jr., M. de Silva, K. Jagtap, M. D. Jones, L. Wang, C. Hatton, E. Palescandolo, S. Gupta, S. Mahan, C. Sougnez, R. C. Onofrio, T. Liefeld, L. MacConaill, W. Winckler, M. Reich, N. Li, J. P. Mesirov, S. B. Gabriel, G. Getz, K. Ardlie, V. Chan, V. E. Myer, B. L. Weber, J. Porter, M. Warmuth, P. Finan, J. L. Harris, M. Meyerson, T. R. Golub, M. P. Morrissey, W. R. Sellers, R. Schlegel, L. A. Garraway, The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012). doi:10.1038/nature11003 Medline
37. B. MacLean, D. M. Tomazela, S. E. Abbatiello, S. Zhang, J. R. Whiteaker, A. G. Paulovich, S. A. Carr, M. J. Maccoss, Effect of collision energy optimization on the measurement of peptides by selected reaction monitoring (SRM) mass spectrometry. Anal. Chem. 82, 10116–10124 (2010). doi:10.1021/ac102179j Medline
38. Y. Mohammed, D. Domański, A. M. Jackson, D. S. Smith, A. M. Deelder, M. Palmblad, C. H. Borchers, PeptidePicker: A scientific workflow with web interface for selecting appropriate peptides for targeted proteomics experiments. J. Proteomics 106, 151–161 (2014). doi:10.1016/j.jprot.2014.04.018 Medline
39. B. MacLean, D. M. Tomazela, N. Shulman, M. Chambers, G. L. Finney, B. Frewen, R. Kern, D. L. Tabb, D. C. Liebler, M. J. MacCoss, Skyline: An open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010). doi:10.1093/bioinformatics/btq054 Medline
40. J. A. Vizcaíno, R. G. Côté, A. Csordas, J. A. Dianes, A. Fabregat, J. M. Foster, J. Griss, E. Alpi, M. Birim, J. Contell, G. O’Kelly, A. Schoenegger, D. Ovelleiro, Y. Pérez-Riverol, F. Reisinger, D. Ríos, R. Wang, H. Hermjakob, The PRoteomics IDEntifications (PRIDE) database and associated tools: Status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013). doi:10.1093/nar/gks1262 Medline
41. M. M. Chan, R. Santhanakrishnan, J. P. C. Chong, Z. Chen, B. C. Tai, O. W. Liew, T. P. Ng, L. H. Ling, D. Sim, K. T. G. Leong, P. S. D. Yeo, H.-Y. Ong, F. Jaufeerally, R. C.-C. Wong, P. Chai, A. F. Low, A. M. Richards, C. S. P. Lam, Growth differentiation factor 15 in heart failure with preserved vs. reduced ejection fraction. Eur. J. Heart Fail. 18, 81–88 (2016). doi:10.1002/ejhf.431 Medline
42. P. G. van Peet, A. J. de Craen, J. Gussekloo, W. de Ruijter, Plasma NT-proBNP as predictor of change in functional status, cardiovascular morbidity and mortality in the oldest old: The Leiden 85-plus study. Age (Dordr.) 36, 9660 (2014). doi:10.1007/s11357-014-9660-1 Medline
59
43. P. Langfelder, S. Horvath, WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008). doi:10.1186/1471-2105-9-559 Medline
44. E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, A. L. Barabási, Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002). doi:10.1126/science.1073374 Medline
45. S. L. Carter, C. M. Brechbühler, M. Griffin, A. T. Bond, Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20, 2242–2250 (2004). doi:10.1093/bioinformatics/bth234 Medline
46. M. C. Oldham, S. Horvath, D. H. Geschwind, Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc. Natl. Acad. Sci. U.S.A. 103, 17973–17978 (2006). doi:10.1073/pnas.0605938103 Medline
47. R. Albert, H. Jeong, A.-L. Barabási, Error and attack tolerance of complex networks. Nature 406, 378–382 (2000). doi:10.1038/35019019 Medline
48. R. Albert, A.-L. Barabási, Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 47–97 (2002). doi:10.1103/RevModPhys.74.47
49. J.-D. J. Han, N. Bertin, T. Hao, D. S. Goldberg, G. F. Berriz, L. V. Zhang, D. Dupuy, A. J. M. Walhout, M. E. Cusick, F. P. Roth, M. Vidal, Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93 (2004). doi:10.1038/nature02555 Medline
50. G. Chauhan, C. R. Arnold, A. Y. Chu, M. Fornage, A. Reyahi, J. C. Bis, A. S. Havulinna, M. Sargurupremraj, A. V. Smith, H. H. H. Adams, S. H. Choi, S. L. Pulit, S. Trompet, M. E. Garcia, A. Manichaikul, A. Teumer, S. Gustafsson, T. M. Bartz, C. Bellenguez, J. S. Vidal, X. Jian, O. Kjartansson, K. L. Wiggins, C. L. Satizabal, F. Xue, S. Ripatti, Y. Liu, J. Deelen, M. den Hoed, S. Bevan, J. C. Hopewell, R. Malik, S. R. Heckbert, K. Rice, N. L. Smith, C. Levi, P. Sharma, C. L. M. Sudlow, A. M. Nik, J. W. Cole, R. Schmidt, J. Meschia, V. Thijs, A. Lindgren, O. Melander, R. P. Grewal, R. L. Sacco, T. Rundek, P. M. Rothwell, D. K. Arnett, C. Jern, J. A. Johnson, O. R. Benavente, S. Wassertheil-Smoller, J.-M. Lee, Q. Wong, H. J. Aparicio, S. T. Engelter, M. Kloss, D. Leys, A. Pezzini, J. E. Buring, P. M. Ridker, C. Berr, J.-F. Dartigues, A. Hamsten, P. K. Magnusson, M. Traylor, N. L. Pedersen, L. Lannfelt, L. Lindgren, C. M. Lindgren, A. P. Morris, J. Jimenez-Conde, J. Montaner, F. Radmanesh, A. Slowik, D. Woo, A. Hofman, P. J. Koudstaal, M. L. P. Portegies, A. G. Uitterlinden, A. J. M. de Craen, I. Ford, J. W. Jukema, D. J. Stott, N. B. Allen, M. M. Sale, A. D. Johnson, D. A. Bennett, P. L. De Jager, C. C. White, H. J. Grabe, M. R. P. Markus, U. Schminke, G. B. Boncoraglio, R. Clarke, Y. Kamatani, J. Dallongeville, O. L. Lopez, J. I. Rotter, M. A. Nalls, R. F. Gottesman, M. E. Griswold, D. S. Knopman, B. G. Windham, A. Beiser, H. S. Markus, E. Vartiainen, C. R. French, M. Dichgans, T. Pastinen, M. Lathrop, V. Gudnason, T. Kurth, B. M. Psaty, T. B. Harris, S. S. Rich, A. L. deStefano, C. O. Schmidt, B. B. Worrall, J. Rosand, V. Salomaa, T. H. Mosley, E. Ingelsson, C. M. van Duijn, C. Tzourio, K. M. Rexrode, O. J. Lehmann, L. J. Launer, M. A. Ikram, P. Carlsson, D. I. Chasman, S. J. Childs, W. T. Longstreth, S. Seshadri, S. Debette; Neurology Working Group of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium, the Stroke Genetics Network (SiGN), and the International Stroke Genetics Consortium, Identification of additional risk loci for stroke and small vessel disease: A meta-analysis of genome-wide association studies. Lancet Neurol. 15, 695–707 (2016). doi:10.1016/S1474-4422(16)00102-2
60
51. E. J. Foss, D. Radulovic, S. A. Shaffer, D. R. Goodlett, L. Kruglyak, A. Bedalov, Genetic variation shapes protein networks mainly through non-transcriptional mechanisms. PLOS Biol. 9, e1001144 (2011). doi:10.1371/journal.pbio.1001144 Medline
52. C. Gaiteri, Y. Ding, B. French, G. C. Tseng, E. Sibille, Beyond modules and hubs: The potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav. 13, 13–24 (2014). doi:10.1111/gbb.12106 Medline
53. D. L. Nicolae, E. Gamazon, W. Zhang, S. Duan, M. E. Dolan, N. J. Cox, Trait-associated SNPs are more likely to be eQTLs: Annotation to enhance discovery from GWAS. PLOS Genet. 6, e1000888 (2010). doi:10.1371/journal.pgen.1000888 Medline
54. Å. Johansson, S. Enroth, M. Palmblad, A. M. Deelder, J. Bergquist, U. Gyllensten, Identification of genetic variants influencing the human plasma proteome. Proc. Natl. Acad. Sci. U.S.A. 110, 4673–4678 (2013). doi:10.1073/pnas.1217238110 Medline
55. S. Kim, S. Swaminathan, M. Inlow, S. L. Risacher, K. Nho, L. Shen, T. M. Foroud, R. C. Petersen, P. S. Aisen, H. Soares, J. B. Toledo, L. M. Shaw, J. Q. Trojanowski, M. W. Weiner, B. C. McDonald, M. R. Farlow, B. Ghetti, A. J. Saykin; Alzheimer’s Disease Neuroimaging Initiative (ADNI), Influence of genetic variation on plasma protein levels in older adults using a multi-analyte panel. PLOS ONE 8, e70269 (2013). doi:10.1371/journal.pone.0070269 Medline
56. S. Enroth, A. Johansson, S. B. Enroth, U. Gyllensten, Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nat. Commun. 5, 4684 (2014). doi:10.1038/ncomms5684 Medline
57. Y. Liu, A. Buil, B. C. Collins, L. C. Gillet, L. C. Blum, L.-Y. Cheng, O. Vitek, J. Mouritsen, G. Lachance, T. D. Spector, E. T. Dermitzakis, R. Aebersold, Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015). doi:10.15252/msb.20145728 Medline
58. B. B. Sun, J. C. Maranville, J. E. Peters, D. Stacey, J. R. Staley, J. Blackshaw, S. Burgess, T. Jiang, E. Paige, P. Surendran, C. Oliver-Williams, M. A. Kamat, B. P. Prins, S. K. Wilcox, E. S. Zimmerman, A. Chi, N. Bansal, S. L. Spain, A. M. Wood, N. W. Morrell, J. R. Bradley, N. Janjic, D. J. Roberts, W. H. Ouwehand, J. A. Todd, N. Soranzo, K. Suhre, D. S. Paul, C. S. Fox, R. M. Plenge, J. Danesh, H. Runz, A. S. Butterworth, Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018). doi:10.1038/s41586-018-0175-2 Medline
59. K. Suhre, M. Arnold, A. M. Bhagwat, R. J. Cotton, R. Engelke, J. Raffler, H. Sarwath, G. Thareja, A. Wahl, R. K. DeLisle, L. Gold, M. Pezer, G. Lauc, M. A. El-Din Selim, D. O. Mook-Kanamori, E. K. Al-Dous, Y. A. Mohamoud, J. Malek, K. Strauch, H. Grallert, A. Peters, G. Kastenmüller, C. Gieger, J. Graumann, Connecting genetic risk to disease end points through the human blood plasma proteome. Nat. Commun. 8, 14357 (2017). doi:10.1038/ncomms14357 Medline
60. I. Bhattacharya, Z. Manukyan, P. Chan, A. Heatherington, L. Harnisch, Application of Quantitative Pharmacology Approaches in Bridging Pharmacokinetics and Pharmacodynamics of Domagrozumab From Adult Healthy Subjects to Pediatric Patients With Duchenne Muscular Disease. J. Clin. Pharmacol. 58, 314–326 (2018). Medline
61
61. K. Kondás, G. Szláma, M. Trexler, L. Patthy, Both WFIKKN1 and WFIKKN2 have high affinity for growth and differentiation factors 8 and 11. J. Biol. Chem. 283, 23677–23684 (2008). doi:10.1074/jbc.M803025200 Medline
62. H. Sun, Y. Zhu, H. Pan, X. Chen, J. L. Balestrini, T. T. Lam, J. E. Kanyo, A. Eichmann, M. Gulati, W. H. Fares, H. Bai, C. A. Feghali-Bostwick, Y. Gan, X. Peng, M. W. Moore, E. S. White, P. Sava, A. L. Gonzalez, Y. Cheng, L. E. Niklason, E. L. Herzog, Netrin-1 Regulates Fibrocyte Accumulation in the Decellularized Fibrotic Sclerodermatous Lung Microenvironment and in Bleomycin-Induced Pulmonary Fibrosis. Arthritis Rheumatol. 68, 1251–1261 (2016). Medline
63. G. T. Consortium; GTEx Consortium, The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science 348, 648–660 (2015). doi:10.1126/science.1262110 Medline
64. J. Wang, D. Duncan, Z. Shi, B. Zhang, WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): Update 2013. Nucleic Acids Res. 41, W77–W83 (2013). doi:10.1093/nar/gkt439 Medline
65. J. E. Shoemaker, T. J. S. Lopes, S. Ghosh, Y. Matsuoka, Y. Kawaoka, H. Kitano, CTen: A web-based platform for identifying enriched cell types from heterogeneous microarray data. BMC Genomics 13, 460 (2012). doi:10.1186/1471-2164-13-460 Medline
66. D. Warde-Farley, S. L. Donaldson, O. Comes, K. Zuberi, R. Badrawi, P. Chao, M. Franz, C. Grouios, F. Kazi, C. T. Lopes, A. Maitland, S. Mostafavi, J. Montojo, Q. Shao, G. Wright, G. D. Bader, Q. Morris, The GeneMANIA prediction server: Biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38 (suppl. 2), W214–W220 (2010). doi:10.1093/nar/gkq537 Medline
67. W. Huang, B. T. Sherman, R. A. Lempicki; W. Huang da, Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009). doi:10.1038/nprot.2008.211 Medline
68. D. Welter, J. MacArthur, J. Morales, T. Burdett, P. Hall, H. Junkins, A. Klemm, P. Flicek, T. Manolio, L. Hindorff, H. Parkinson, The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42 (D1), D1001–D1006 (2014). doi:10.1093/nar/gkt1229 Medline