14
Edinburgh Research Explorer Identification of genetic variants affecting vitamin D receptor binding and associations with autoimmune disease Citation for published version: Gallone, G, Haerty, W, Disanto, G, Ramagopalan, SV, Ponting, CP & Berlanga-Taylor, AJ 2017, 'Identification of genetic variants affecting vitamin D receptor binding and associations with autoimmune disease', Human Molecular Genetics, vol. 26, no. 11, pp. 2164-2176. https://doi.org/10.1093/hmg/ddx092 Digital Object Identifier (DOI): 10.1093/hmg/ddx092 Link: Link to publication record in Edinburgh Research Explorer Document Version: Publisher's PDF, also known as Version of record Published In: Human Molecular Genetics Publisher Rights Statement: This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 20. Feb. 2020

Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

Edinburgh Research Explorer

Identification of genetic variants affecting vitamin D receptorbinding and associations with autoimmune disease

Citation for published versionGallone G Haerty W Disanto G Ramagopalan SV Ponting CP amp Berlanga-Taylor AJ 2017Identification of genetic variants affecting vitamin D receptor binding and associations with autoimmunedisease Human Molecular Genetics vol 26 no 11 pp 2164-2176 httpsdoiorg101093hmgddx092

Digital Object Identifier (DOI)101093hmgddx092

LinkLink to publication record in Edinburgh Research Explorer

Document VersionPublishers PDF also known as Version of record

Published InHuman Molecular Genetics

Publisher Rights StatementThis is an Open Access article distributed under the terms of the Creative Commons Attribution License(httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work isproperly cited

General rightsCopyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s)and or other copyright owners and it is a condition of accessing these publications that users recognise andabide by the legal requirements associated with these rights

Take down policyThe University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorercontent complies with UK legislation If you believe that the public display of this file breaches copyright pleasecontact openaccessedacuk providing details and we will remove access to the work immediately andinvestigate your claim

Download date 20 Feb 2020

A S S O C I A T I O N S T U D I E S A R T I C L E

Identification of genetic variants affecting vitamin D

receptor binding and associations with

autoimmune diseaseGiuseppe Gallone123 Wilfried Haerty12Dagger Giulio Disanto2Sreeram V Ramagopalan4paradagger Chris P Ponting125dagger andAntonio J Berlanga-Taylor678dagger

1MRC Functional Genomics Unit 2Department of Physiology Anatomy and Genetics University of OxfordSouth Parks Road Oxford OX1 3PT UK 3Wellcome Trust Sanger Institute Wellcome Trust Genome CampusHinxton CB10 1SA UK 4Real-World Evidence Evidera London W6 8DL UK 5MRC Human Genetics Unit TheInstitute of Genetics and Molecular Medicine University of Edinburgh Western General Hospital Crewe RoadEdinburgh EH4 2XU UK 6Wellcome Trust Centre for Human Genetics Nuffield Department of ClinicalMedicine University of Oxford Oxford OX3 7BN UK 7CGAT MRC Functional Genomics Unit Department ofPhysiology Anatomy and Genetics University of Oxford Oxford OX1 3PT UK and 8MRC-PHE Centre forEnvironment and Health Department of Epidemiology amp Biostatistics School of Public Health Faculty ofMedicine Imperial College London St Maryrsquos Campus Norfolk Place London W2 1PG UK

To whom correspondence should be addressed Tel thorn44 1223498698 Email giuseppegallonesangeracuk (GG) Tel thorn44 7915490167Email sreeramramagopalannet (SVR) Tel thorn44 7504609744 Email aberlangaimperialacuk (AJB-T)

AbstractLarge numbers of statistically significant associations between sentinel SNPs and case-control status have been replicated bygenome-wide association studies Nevertheless few underlying molecular mechanisms of complex disease are currentlyknown We investigated whether variation in binding of a transcription factor the vitamin D receptor (VDR) whose activatingligand vitamin D has been proposed as a modifiable factor in multiple disorders could explain any of these associations VDRmodifies gene expression by binding DNA as a heterodimer with the Retinoid X receptor (RXR) We identified 43332 geneticvariants significantly associated with altered VDR binding affinity (VDR-BVs) using a high-resolution (ChIP-exo) genome-wideanalysis of 27 HapMap lymphoblastoid cell lines VDR-BVs are enriched in consensus RXRVDR binding motifs yet most felloutside of these motifs implying that genetic variation often affects the binding affinity only indirectly Finally we compared341 VDR-BVs replicating by position in multiple individuals against background sets of variants lying within VDR-bindingregions that had been matched in allele frequency and were independent with respect to linkage disequilibrium In thisstringent test these replicated VDR-BVs were significantly (q lt 01) and substantially (gt2-fold) enriched in genomic intervalsassociated with autoimmune and other diseases including inflammatory bowel disease Crohnrsquos disease and rheumatoid

Dagger

Present address Earlham Institute Norwich Research Park Norwich NR4 7UZ UKparaPresent address Centre for Observational Research and Data Sciences Bristol-Myers Squibb Uxbridge UB8 1DH UKdagger

These authors equally contributed to this workReceived October 16 2016 Revised February 28 2017 Accepted March 7 2017

VC The Author 2017 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

2164

Human Molecular Genetics 2017 Vol 26 No 11 2164ndash2176

doi 101093hmgddx092Advance Access Publication Date 9 March 2017Association Studies Article

arthritis The approachrsquos validity is underscored by RXRVDR motif sequence being predictive of binding strength and beingevolutionarily constrained Our findings are consistent with altered RXRVDR binding contributing to immunity-related dis-eases Replicated VDR-BVs associated with these disorders could represent causal disease risk alleles whose effect may bemodifiable by vitamin D levels

IntroductionGenetic variants can alter transcription factor (TF) bindingchromatin status and gene expression (1ndash3) and can be explana-tory of disease susceptibility variation Rather than residing inprotein-coding sequence the majority (approximately 93) ofdisease- or trait-associated variants lie in non-coding sequence(4) Furthermore non-coding variants in regulatory elementsexplain 8-fold more heritability of complex traits than protein-coding variants (5) If we are to better understand the mechan-istic bases to complex disease (and their interaction withenvironmental factors) then we will need to understand howsequence differences in TF binding sites (TFBSs) across multiplegenotypes contribute to disease susceptibility Variations in TFaffinities will then need to be cross-referenced not just withthe sentinel variant from genome-wide association studies(GWASs) which has a very low probability (5) of being causaland is on average 14 kb from the true causal variant (6) but alsowith all other variants with which it lies in strong linkage dis-equilibrium (LD) True causal variants will be enriched incis-regulatory elements especially in enhancers and to a lesserextent in gene promoters (6) These elements are often activein only a few cell types (7) and at limited developmental timepoints (8) which provides an opportunity for relating geneticvariation mdash via molecular observations (TF binding differences)mdash to cellular or organ-based aetiologies

Understanding complex disease will also need the deter-mination of how functional genetic variants interact with en-vironmental factors One such factor is Vitamin D a class offat-soluble secosteroids that enhance intestinal absorption ofdietary minerals which are synthesised in the skin upon expos-ure to sunlight Many observational studies have yielded associ-ations between low serum concentrations of 25-hydroxyvitaminD [25(OH)D] and the risks of developing diverse diseases (910)However Mendelian randomisation studies for example on sus-ceptibility to multiple sclerosis (11) and all-cause mortality (12)have indicated a causal role for this hormone Identifying a mo-lecular basis to these statistical associations would also indicatea direct causal role for vitamin D levels in these diseasesVitamin D signalling occurs principally following the binding ofcalcitriol the active form of vitamin D to its cognate nuclear vita-min D receptor (VDR) which binds DNA in a heterodimer with theRetinoid X receptor (RXR (13)) We previously have shown thatVDR binding in lymphoblastoid cell lines (LCLs) occurs at 2776 lo-cations and preferentially (gt2-fold) within intervals geneticallyassociated with diseases such as Crohnrsquos disease and MultipleSclerosis (14) On one hand these enrichments could indicate adirect modifying effect of VDR-binding on disease risk On theother hand they could more trivially reflect a general associ-ation between regulatory regions and disease-associated gen-omic intervals These enrichments of VDR binding within theseintervals also are not informative of whether altered VDR bindingto DNA functional elements explains genetic contributions to dis-ease risk If this is the case we then also wish to determinewhether it is a gain or loss of VDR binding that causally increasesdisease risk

In this study we inferred VDR-binding sites using ChIP-exo (chromatin immunoprecipitation combined with lambdaexonuclease digestion followed by high-throughput sequencing(1516)) from each of 27 LCL samples using three complementaryapproaches The first lsquopeak-callingrsquo method models the vari-ation in read density The second and third approaches predictVDR-binding sites using genetic variation to explain differencesin VDR-binding affinity either by quantitative trait loci associ-ation testing (lsquoQTLrsquo) across all samples or by allele-specific bind-ing (lsquoASBrsquo) analysis of allelic imbalance based on read depth atsites with heterozygous single nucleotide variants Sequencevariants were identified that both alter VDR binding and havebeen statistically associated by GWAS with altered risk for par-ticular diseases (or are in strong LD with such variants) Theseare excellent candidates for sequence variants that throughtheir alteration of VDR binding directly alter disease suscepti-bility Compared against a background of all VDR-binding siteswe found that human variants associated with variable VDR-binding are enriched (by up to 2-fold) in genomic intervals pre-viously associated with particular traits including some auto-immune diseases

ResultsChIP-exo yields finely resolved VDR binding peaksgenome-wide

Calcitriolndashstimulated lymphoblastoid cells for 30 HapMap sam-ples were grown and prepared for ChIP essentially as describedpreviously (14) with modifications for ChIP-exo (1516)(Methods Supplementary Material Table S1) In total 27 sam-ples successfully passed quality checking (SupplementaryMaterial) Unlike for data from the more traditional ChIP-seq ap-proach analysis of ChIP-exo data has yet to become standar-dised We thus undertook an in-depth study using contrastingapproaches to modelling the ChIP-exo data background to han-dling duplicate reads to performing cross-correlation analysesand to calling peaks these approaches are discussed in detail inthe Supplementary Material (see also Supplementary MaterialFigs S1 and S2 Tables S2ndashS9)

VDR peaks were both highly reproducible between sampleswith 15509 intervals containing peaks called in at least 3 sam-ples (CPo3 peak set) (Fig 1A and B Supplementary Material FigsS3 and S4) and highly concordant with our previously pub-lished VDR ChIP-seq data for calcitriol-stimulated LCLs (CPo3763 Fig 1C Supplementary Material Table S11) (14) Forpeaks identified in both experiments binding intervals wereconsiderably better resolved (typically gt5-fold) using ChIP-exo(Supplementary Material Fig S2A pairwise Kruskal-WallisTest Plt0001 )

As expected VDR binding peaks occurred preferentiallyat protein coding transcriptional start sites (TSS) (Fig 1D)and within regions of open chromatin (at LCL DNase 1 hyper-sensitivity sites DHS) especially for those observed repeatedly(Supplementary Material Figs S5 and S6) VDR bindingpeaks were significantly concentrated within proposed func-tional elements specifically regions of high regulatory factor

2165Human Molecular Genetics 2017 Vol 26 No 11 |

binding (HOT regions (17) and clustered TF binding sites fromENCODE) active promoters enhancers and insulators (Fig 1ESupplementary Material Fig S7) and previously reportedGWAS disease loci principally for autoimmune disorders(Supplementary Material Fig S8) Notably this latter observationcould reflect a causal effect in which VDR-binding influencesdisease susceptibility andor with gene loci being correlated non-causally with both disease susceptibility intervals and VDR bind-ing sites

VDR binding motifs occur preferentially in enhancers

De novo motif analyses of peaks within ChIP-exo VDR bindingintervals revealed motifs strongly resembling those for theRXRVDR DR3 heterodimer (1819) (Fig 2A SupplementaryMaterial Fig S9A and Table S12) For example 9899 (638 ofthe CPo3 set) instances of this DR3 heterodimer motif occurredin binding sites (PScanChIP (20) score gt07) 874 instances of themonomeric VDR motif were also identified in these peak inter-vals (Supplementary Material Fig S9B PScanChIP scorefrac14 1)which are known to have limited affinity for functional dimericnuclear hormone receptor complexes (18) The heterodimericDR3 motif was strongly over-represented (SupplementaryMaterial Table S12) both locally (in the peak sequences com-pared to neighbouring regions) and globally (in the peak regions

compared to LCL DHS sites (20)) and was significantly centrallyenriched in the binding regions (Fig 2A)

VDR binding sites could be separated into functionally dis-tinct classes based on the presence of RXRVDR DR3-like motifsClass I sites contain strong DR3-like motifs (top 20 of thePScanChIP score distribution 3094 regions SupplementaryMaterial Fig S10A) and tend to be located away from genesrsquotranscriptional start sites (TSSs) and yet close to genes involvedin immune response processes (Fig 2B and C) Class I VDR bind-ing sites show the greatest significance of proximity to genomicregions associated with cancer and autoimmune diseases(Fig 2C) We compare these to Class II sites which contain onlyweak or no DR3-like motifs (bottom 20 of PScanChip scores3102 regions Supplementary Material Fig S10A) (Fig 2B andD) Class I binding events occur preferentially in strong or weakenhancers defined by ENCODE rather than in promoters(Supplementary Material Fig S10B and C)

Thousands of variants explain differential VDR binding

We next used the QTL and ASB methods to explain differencesin VDR binding affinity (Supplementary Material) The QTL ana-lysis used Bayesian regression modelling of variants underlyingbinding peaks (2122) (Supplementary Material Tables S13 andS14 Supplementary Material Figs S12ndashS15) The ASB analysis of

8726925915

15509102577037

514439273193269823292018178016071429127011371033

927821732647554461389

300226

145

CPo3

CPo10

CPo20

P

10 1000 100000

Meta-gene Profile - gene body protein codingVDR ChIP-exo peaks - overlap rate

13391 2118658

ChIP-exo (CPo3)

50x Fold Enrichment(p=00001)

ChIP-seq(2776)

002

003

004

005

006

Rea

d co

unt P

er M

illio

n m

appe

d re

ads

Genomic Region (5 minusgt 3)minus2kb TSS 33 66 TES 2kb

Annotation - Encode and HOT regions

log2(fc)

A B

C

D

E

Figure 1 ChIP-exo peaks are reproducible among samples and with previous ChIP-seq results and are enriched in promoters and enhancers(A) Numbers of overlap-

ping ChIP-exo peaks across 27 LCL samples (log scale) Highlighted rows indicate numbers of peaks called in at least three (CPo3) ten (CPo10) or twenty (CPo20) samples

(B) Normalised peak read depths (636278) (Supplementary Material) are concordant between samples VDR ChIP-exo peak binding affinity heatmap showing pair-

wise binding affinity correlation among all 27 sample pairs (Inset) histogram of correlation counts (C) 50-fold enriched overlap between CPo3 peaks and the consensus

ChIP-seq peakset from (14) (enrichment test GAT (64) Supplementary Material) (D) Meta-gene profile of VDR binding reads The panel shows a meta-profile of read

pile-up for the pooled 27 LCL samples averaged over gene bodies (Ensembl v 75 protein coding genes) (E) Enrichment of VDR binding sites in the CPo3 consensus with

ENCODE chromatin states (chromHMM (79)) from LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites) and DNase I hypersensitive areas

(ENCODE) Asterisk denotes significance at 1 False Discovery Rate (FDR) threshold (enrichment test GAT (64) Supplementary Material)

2166 | Human Molecular Genetics 2017 Vol 26 No 11

allelic imbalance is based on read depth at heterozygous single-nucleotide variants (23) and uses sample-specific maternal andpaternal reference maps which we corrected for unannotatedcopy number variation and from which we excluded variantsin ENCODE blacklisted repeat regions (24) The two methodsrsquovariants associated with differences in VDR binding (VDR-Binding Variants VDR-BVs) were annotated and prioritisedbased on the guidelines proposed by (25) and related algorithms(Funseq2 (26))

We report a total of 43332 VDR-BVs Of these only 66ndash105(2867 or 4447) lie within (or lsquohitrsquo) RXRVDR consensus motifs(Pscanchip score gt 07 orgt 05 respectively) This could be ex-plained in part by variants that alter DNA-binding affinity eitherfor other subunits of a larger multi-molecular complex (lsquocollab-orative bindingrsquo (27)) or for factors that inhibit RXRVDR binding

Consequently we next considered whether a sequence vari-ation in motifs for potential binding cofactors of RXRVDR couldexplain the remaining gt 90 of binding variation We found that

A B

Pro

babi

lity

0000

0002

0004

0006

0008

0010

0012

0014

0016

0018

-120 -90 -60 -30 0 30 60 90 120 150

-150 -120 -90 -60 -30 0 30 60 90 120 150

00200018

001600140012

0010000800060004

00020000

Position of best site in sequence

1

2

3

4

5

12345

Reg

ion-

gene

ass

ocia

tions

()

Distance to nearest TSS (kb)

TSS

0 to 5

5 to 5

050

to 50

0

gt500

-5 to

0

-50 t

o -5

-500

to -5

0

lt-50

0

307 71

440278

1684

543441

2548

358

1123

3866

353

4221924

668

437

2817656

3752453

30 16108

VDR binding regions

CPo3 - ALL

CPo3 - RXRVDR class ICPo3 - RXRVDR class II

CClass I

Biological Process Disease Ontology

DClass II

Biological Process

-log1

0(p)

[PosReg] of B cell [Act]

antigen receptor-mediated [SigPath]

[Reg] of B cell [Act][ImmRsp]-[Act] [CellSrfRcp] [SigPath]

[ImmRsp]-[Reg] [CellSrfRcp] [SigPath]

[ImmRsp]-[Act] [SigTransd][ImmRsp]-[Reg] [SigPath]

[Act] of [ImmRsp][Reg] of innate [ImmRsp]

[PosReg] of lymphocyte [Act][Reg] of leukocyte [Prlf]

[PosReg] of leukocyte [Act][ImmRsp]

[Reg] of [ImmRsp][PosReg] of [ImmRsp][PosReg] of cell [Act]

[CellRsp] to cytokine stimulus[Rsp] to cytokine stimulus

[PosReg] of [ImmSys] process[Reg] of cytokine production

fold change

-log1

0(p)

[LrySqmCl] [Crc]

laryngeal [Dis]

mouth [Dis]

Herpesviridae [InfDis]upper [ResTr] [Dis]nasopharynx [Crc]

dsDNA [VirInfDis][Dmy] [Dis] of [CNS]

demyelinating [Dis]

systemic [LE]

chronic [LmpLeu]

[LE]

fold change

[Reg] of translational [Init][PrtCmp][Dsb]

[McrCmp] [Dsb] cellular [McrCmp] [Dsb] cellular [PrtCmp] [Dsb]

[CC] [Dis] at cellular level[CC] [Dsb]

[Rsp] to DNA [DmgStim]

[Reg] of cell cycle arrest

-log1

0(p)

fold change

[ImmRsp] immune response

[SigPath] signaling pathway

[PosReg] positive regulation

[CellRsp] cellular response

[Reg] regulationregulating

[CellSrfRcp] cell surface receptor

[Act] activationactivating

[SigTransd] signal transduction

[Prlf] proliferation

[ImmSys] immune system

[CNS] central nervous system

[Dis] disease

[Crc] carcinoma

[InfDis] infectious disease

[VirInfDis] virus infectious disease

[Init] initiation

[CC] cellular component

[Dsb] disassembly

[PrtCmp] protein complex

[McrCmp] macromolecular complex

[DmgStim] damage stimulus

[Rsp] Response

[LrySqmCl] laryngeal squamous cell

[LE] lupus erythematosus[LmpLeu] lymphocytic leukemia

[ResTr] respiratory tract

[Dmy] demyelinating

Term legend

Figure 2 VDR binding intervals are immune-associated or are housekeeping-associated tending to be either present far from or adjacent to transcriptional start sites

respectively (A) The most highly ranking motif cluster ensuing from a de novo motif analysis of the VDR CPo3 consensus intervals using MEME-ChIP (65) including a

CentriMO (80) analysis of enrichment with respect to VDR peak centres B-D GREAT analyses (81) for subsets of VDR peaks from the CPo3 consensus set containing ei-

ther a strong-to-perfect (class I) or weak-to-non-existent (class II) instance of the RXRVDR heterodimer motif (B) Distributions of distances between known TSS sites

(UCSC hg19 Feb 2009) and VDR binding locations for classes I or II or all VDR peaks (CD) Enrichment for Gene Ontology Biological Process and Disease Ontology terms

for class I (C) or class II (D) RXRVDR containing sites respectively For (C) and (D) bars map to fold change whereas Hinton plots map to log10 of the enrichment sig-

nificance P-value

2167Human Molecular Genetics 2017 Vol 26 No 11 |

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 2: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

A S S O C I A T I O N S T U D I E S A R T I C L E

Identification of genetic variants affecting vitamin D

receptor binding and associations with

autoimmune diseaseGiuseppe Gallone123 Wilfried Haerty12Dagger Giulio Disanto2Sreeram V Ramagopalan4paradagger Chris P Ponting125dagger andAntonio J Berlanga-Taylor678dagger

1MRC Functional Genomics Unit 2Department of Physiology Anatomy and Genetics University of OxfordSouth Parks Road Oxford OX1 3PT UK 3Wellcome Trust Sanger Institute Wellcome Trust Genome CampusHinxton CB10 1SA UK 4Real-World Evidence Evidera London W6 8DL UK 5MRC Human Genetics Unit TheInstitute of Genetics and Molecular Medicine University of Edinburgh Western General Hospital Crewe RoadEdinburgh EH4 2XU UK 6Wellcome Trust Centre for Human Genetics Nuffield Department of ClinicalMedicine University of Oxford Oxford OX3 7BN UK 7CGAT MRC Functional Genomics Unit Department ofPhysiology Anatomy and Genetics University of Oxford Oxford OX1 3PT UK and 8MRC-PHE Centre forEnvironment and Health Department of Epidemiology amp Biostatistics School of Public Health Faculty ofMedicine Imperial College London St Maryrsquos Campus Norfolk Place London W2 1PG UK

To whom correspondence should be addressed Tel thorn44 1223498698 Email giuseppegallonesangeracuk (GG) Tel thorn44 7915490167Email sreeramramagopalannet (SVR) Tel thorn44 7504609744 Email aberlangaimperialacuk (AJB-T)

AbstractLarge numbers of statistically significant associations between sentinel SNPs and case-control status have been replicated bygenome-wide association studies Nevertheless few underlying molecular mechanisms of complex disease are currentlyknown We investigated whether variation in binding of a transcription factor the vitamin D receptor (VDR) whose activatingligand vitamin D has been proposed as a modifiable factor in multiple disorders could explain any of these associations VDRmodifies gene expression by binding DNA as a heterodimer with the Retinoid X receptor (RXR) We identified 43332 geneticvariants significantly associated with altered VDR binding affinity (VDR-BVs) using a high-resolution (ChIP-exo) genome-wideanalysis of 27 HapMap lymphoblastoid cell lines VDR-BVs are enriched in consensus RXRVDR binding motifs yet most felloutside of these motifs implying that genetic variation often affects the binding affinity only indirectly Finally we compared341 VDR-BVs replicating by position in multiple individuals against background sets of variants lying within VDR-bindingregions that had been matched in allele frequency and were independent with respect to linkage disequilibrium In thisstringent test these replicated VDR-BVs were significantly (q lt 01) and substantially (gt2-fold) enriched in genomic intervalsassociated with autoimmune and other diseases including inflammatory bowel disease Crohnrsquos disease and rheumatoid

Dagger

Present address Earlham Institute Norwich Research Park Norwich NR4 7UZ UKparaPresent address Centre for Observational Research and Data Sciences Bristol-Myers Squibb Uxbridge UB8 1DH UKdagger

These authors equally contributed to this workReceived October 16 2016 Revised February 28 2017 Accepted March 7 2017

VC The Author 2017 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40)which permits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

2164

Human Molecular Genetics 2017 Vol 26 No 11 2164ndash2176

doi 101093hmgddx092Advance Access Publication Date 9 March 2017Association Studies Article

arthritis The approachrsquos validity is underscored by RXRVDR motif sequence being predictive of binding strength and beingevolutionarily constrained Our findings are consistent with altered RXRVDR binding contributing to immunity-related dis-eases Replicated VDR-BVs associated with these disorders could represent causal disease risk alleles whose effect may bemodifiable by vitamin D levels

IntroductionGenetic variants can alter transcription factor (TF) bindingchromatin status and gene expression (1ndash3) and can be explana-tory of disease susceptibility variation Rather than residing inprotein-coding sequence the majority (approximately 93) ofdisease- or trait-associated variants lie in non-coding sequence(4) Furthermore non-coding variants in regulatory elementsexplain 8-fold more heritability of complex traits than protein-coding variants (5) If we are to better understand the mechan-istic bases to complex disease (and their interaction withenvironmental factors) then we will need to understand howsequence differences in TF binding sites (TFBSs) across multiplegenotypes contribute to disease susceptibility Variations in TFaffinities will then need to be cross-referenced not just withthe sentinel variant from genome-wide association studies(GWASs) which has a very low probability (5) of being causaland is on average 14 kb from the true causal variant (6) but alsowith all other variants with which it lies in strong linkage dis-equilibrium (LD) True causal variants will be enriched incis-regulatory elements especially in enhancers and to a lesserextent in gene promoters (6) These elements are often activein only a few cell types (7) and at limited developmental timepoints (8) which provides an opportunity for relating geneticvariation mdash via molecular observations (TF binding differences)mdash to cellular or organ-based aetiologies

Understanding complex disease will also need the deter-mination of how functional genetic variants interact with en-vironmental factors One such factor is Vitamin D a class offat-soluble secosteroids that enhance intestinal absorption ofdietary minerals which are synthesised in the skin upon expos-ure to sunlight Many observational studies have yielded associ-ations between low serum concentrations of 25-hydroxyvitaminD [25(OH)D] and the risks of developing diverse diseases (910)However Mendelian randomisation studies for example on sus-ceptibility to multiple sclerosis (11) and all-cause mortality (12)have indicated a causal role for this hormone Identifying a mo-lecular basis to these statistical associations would also indicatea direct causal role for vitamin D levels in these diseasesVitamin D signalling occurs principally following the binding ofcalcitriol the active form of vitamin D to its cognate nuclear vita-min D receptor (VDR) which binds DNA in a heterodimer with theRetinoid X receptor (RXR (13)) We previously have shown thatVDR binding in lymphoblastoid cell lines (LCLs) occurs at 2776 lo-cations and preferentially (gt2-fold) within intervals geneticallyassociated with diseases such as Crohnrsquos disease and MultipleSclerosis (14) On one hand these enrichments could indicate adirect modifying effect of VDR-binding on disease risk On theother hand they could more trivially reflect a general associ-ation between regulatory regions and disease-associated gen-omic intervals These enrichments of VDR binding within theseintervals also are not informative of whether altered VDR bindingto DNA functional elements explains genetic contributions to dis-ease risk If this is the case we then also wish to determinewhether it is a gain or loss of VDR binding that causally increasesdisease risk

In this study we inferred VDR-binding sites using ChIP-exo (chromatin immunoprecipitation combined with lambdaexonuclease digestion followed by high-throughput sequencing(1516)) from each of 27 LCL samples using three complementaryapproaches The first lsquopeak-callingrsquo method models the vari-ation in read density The second and third approaches predictVDR-binding sites using genetic variation to explain differencesin VDR-binding affinity either by quantitative trait loci associ-ation testing (lsquoQTLrsquo) across all samples or by allele-specific bind-ing (lsquoASBrsquo) analysis of allelic imbalance based on read depth atsites with heterozygous single nucleotide variants Sequencevariants were identified that both alter VDR binding and havebeen statistically associated by GWAS with altered risk for par-ticular diseases (or are in strong LD with such variants) Theseare excellent candidates for sequence variants that throughtheir alteration of VDR binding directly alter disease suscepti-bility Compared against a background of all VDR-binding siteswe found that human variants associated with variable VDR-binding are enriched (by up to 2-fold) in genomic intervals pre-viously associated with particular traits including some auto-immune diseases

ResultsChIP-exo yields finely resolved VDR binding peaksgenome-wide

Calcitriolndashstimulated lymphoblastoid cells for 30 HapMap sam-ples were grown and prepared for ChIP essentially as describedpreviously (14) with modifications for ChIP-exo (1516)(Methods Supplementary Material Table S1) In total 27 sam-ples successfully passed quality checking (SupplementaryMaterial) Unlike for data from the more traditional ChIP-seq ap-proach analysis of ChIP-exo data has yet to become standar-dised We thus undertook an in-depth study using contrastingapproaches to modelling the ChIP-exo data background to han-dling duplicate reads to performing cross-correlation analysesand to calling peaks these approaches are discussed in detail inthe Supplementary Material (see also Supplementary MaterialFigs S1 and S2 Tables S2ndashS9)

VDR peaks were both highly reproducible between sampleswith 15509 intervals containing peaks called in at least 3 sam-ples (CPo3 peak set) (Fig 1A and B Supplementary Material FigsS3 and S4) and highly concordant with our previously pub-lished VDR ChIP-seq data for calcitriol-stimulated LCLs (CPo3763 Fig 1C Supplementary Material Table S11) (14) Forpeaks identified in both experiments binding intervals wereconsiderably better resolved (typically gt5-fold) using ChIP-exo(Supplementary Material Fig S2A pairwise Kruskal-WallisTest Plt0001 )

As expected VDR binding peaks occurred preferentiallyat protein coding transcriptional start sites (TSS) (Fig 1D)and within regions of open chromatin (at LCL DNase 1 hyper-sensitivity sites DHS) especially for those observed repeatedly(Supplementary Material Figs S5 and S6) VDR bindingpeaks were significantly concentrated within proposed func-tional elements specifically regions of high regulatory factor

2165Human Molecular Genetics 2017 Vol 26 No 11 |

binding (HOT regions (17) and clustered TF binding sites fromENCODE) active promoters enhancers and insulators (Fig 1ESupplementary Material Fig S7) and previously reportedGWAS disease loci principally for autoimmune disorders(Supplementary Material Fig S8) Notably this latter observationcould reflect a causal effect in which VDR-binding influencesdisease susceptibility andor with gene loci being correlated non-causally with both disease susceptibility intervals and VDR bind-ing sites

VDR binding motifs occur preferentially in enhancers

De novo motif analyses of peaks within ChIP-exo VDR bindingintervals revealed motifs strongly resembling those for theRXRVDR DR3 heterodimer (1819) (Fig 2A SupplementaryMaterial Fig S9A and Table S12) For example 9899 (638 ofthe CPo3 set) instances of this DR3 heterodimer motif occurredin binding sites (PScanChIP (20) score gt07) 874 instances of themonomeric VDR motif were also identified in these peak inter-vals (Supplementary Material Fig S9B PScanChIP scorefrac14 1)which are known to have limited affinity for functional dimericnuclear hormone receptor complexes (18) The heterodimericDR3 motif was strongly over-represented (SupplementaryMaterial Table S12) both locally (in the peak sequences com-pared to neighbouring regions) and globally (in the peak regions

compared to LCL DHS sites (20)) and was significantly centrallyenriched in the binding regions (Fig 2A)

VDR binding sites could be separated into functionally dis-tinct classes based on the presence of RXRVDR DR3-like motifsClass I sites contain strong DR3-like motifs (top 20 of thePScanChIP score distribution 3094 regions SupplementaryMaterial Fig S10A) and tend to be located away from genesrsquotranscriptional start sites (TSSs) and yet close to genes involvedin immune response processes (Fig 2B and C) Class I VDR bind-ing sites show the greatest significance of proximity to genomicregions associated with cancer and autoimmune diseases(Fig 2C) We compare these to Class II sites which contain onlyweak or no DR3-like motifs (bottom 20 of PScanChip scores3102 regions Supplementary Material Fig S10A) (Fig 2B andD) Class I binding events occur preferentially in strong or weakenhancers defined by ENCODE rather than in promoters(Supplementary Material Fig S10B and C)

Thousands of variants explain differential VDR binding

We next used the QTL and ASB methods to explain differencesin VDR binding affinity (Supplementary Material) The QTL ana-lysis used Bayesian regression modelling of variants underlyingbinding peaks (2122) (Supplementary Material Tables S13 andS14 Supplementary Material Figs S12ndashS15) The ASB analysis of

8726925915

15509102577037

514439273193269823292018178016071429127011371033

927821732647554461389

300226

145

CPo3

CPo10

CPo20

P

10 1000 100000

Meta-gene Profile - gene body protein codingVDR ChIP-exo peaks - overlap rate

13391 2118658

ChIP-exo (CPo3)

50x Fold Enrichment(p=00001)

ChIP-seq(2776)

002

003

004

005

006

Rea

d co

unt P

er M

illio

n m

appe

d re

ads

Genomic Region (5 minusgt 3)minus2kb TSS 33 66 TES 2kb

Annotation - Encode and HOT regions

log2(fc)

A B

C

D

E

Figure 1 ChIP-exo peaks are reproducible among samples and with previous ChIP-seq results and are enriched in promoters and enhancers(A) Numbers of overlap-

ping ChIP-exo peaks across 27 LCL samples (log scale) Highlighted rows indicate numbers of peaks called in at least three (CPo3) ten (CPo10) or twenty (CPo20) samples

(B) Normalised peak read depths (636278) (Supplementary Material) are concordant between samples VDR ChIP-exo peak binding affinity heatmap showing pair-

wise binding affinity correlation among all 27 sample pairs (Inset) histogram of correlation counts (C) 50-fold enriched overlap between CPo3 peaks and the consensus

ChIP-seq peakset from (14) (enrichment test GAT (64) Supplementary Material) (D) Meta-gene profile of VDR binding reads The panel shows a meta-profile of read

pile-up for the pooled 27 LCL samples averaged over gene bodies (Ensembl v 75 protein coding genes) (E) Enrichment of VDR binding sites in the CPo3 consensus with

ENCODE chromatin states (chromHMM (79)) from LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites) and DNase I hypersensitive areas

(ENCODE) Asterisk denotes significance at 1 False Discovery Rate (FDR) threshold (enrichment test GAT (64) Supplementary Material)

2166 | Human Molecular Genetics 2017 Vol 26 No 11

allelic imbalance is based on read depth at heterozygous single-nucleotide variants (23) and uses sample-specific maternal andpaternal reference maps which we corrected for unannotatedcopy number variation and from which we excluded variantsin ENCODE blacklisted repeat regions (24) The two methodsrsquovariants associated with differences in VDR binding (VDR-Binding Variants VDR-BVs) were annotated and prioritisedbased on the guidelines proposed by (25) and related algorithms(Funseq2 (26))

We report a total of 43332 VDR-BVs Of these only 66ndash105(2867 or 4447) lie within (or lsquohitrsquo) RXRVDR consensus motifs(Pscanchip score gt 07 orgt 05 respectively) This could be ex-plained in part by variants that alter DNA-binding affinity eitherfor other subunits of a larger multi-molecular complex (lsquocollab-orative bindingrsquo (27)) or for factors that inhibit RXRVDR binding

Consequently we next considered whether a sequence vari-ation in motifs for potential binding cofactors of RXRVDR couldexplain the remaining gt 90 of binding variation We found that

A B

Pro

babi

lity

0000

0002

0004

0006

0008

0010

0012

0014

0016

0018

-120 -90 -60 -30 0 30 60 90 120 150

-150 -120 -90 -60 -30 0 30 60 90 120 150

00200018

001600140012

0010000800060004

00020000

Position of best site in sequence

1

2

3

4

5

12345

Reg

ion-

gene

ass

ocia

tions

()

Distance to nearest TSS (kb)

TSS

0 to 5

5 to 5

050

to 50

0

gt500

-5 to

0

-50 t

o -5

-500

to -5

0

lt-50

0

307 71

440278

1684

543441

2548

358

1123

3866

353

4221924

668

437

2817656

3752453

30 16108

VDR binding regions

CPo3 - ALL

CPo3 - RXRVDR class ICPo3 - RXRVDR class II

CClass I

Biological Process Disease Ontology

DClass II

Biological Process

-log1

0(p)

[PosReg] of B cell [Act]

antigen receptor-mediated [SigPath]

[Reg] of B cell [Act][ImmRsp]-[Act] [CellSrfRcp] [SigPath]

[ImmRsp]-[Reg] [CellSrfRcp] [SigPath]

[ImmRsp]-[Act] [SigTransd][ImmRsp]-[Reg] [SigPath]

[Act] of [ImmRsp][Reg] of innate [ImmRsp]

[PosReg] of lymphocyte [Act][Reg] of leukocyte [Prlf]

[PosReg] of leukocyte [Act][ImmRsp]

[Reg] of [ImmRsp][PosReg] of [ImmRsp][PosReg] of cell [Act]

[CellRsp] to cytokine stimulus[Rsp] to cytokine stimulus

[PosReg] of [ImmSys] process[Reg] of cytokine production

fold change

-log1

0(p)

[LrySqmCl] [Crc]

laryngeal [Dis]

mouth [Dis]

Herpesviridae [InfDis]upper [ResTr] [Dis]nasopharynx [Crc]

dsDNA [VirInfDis][Dmy] [Dis] of [CNS]

demyelinating [Dis]

systemic [LE]

chronic [LmpLeu]

[LE]

fold change

[Reg] of translational [Init][PrtCmp][Dsb]

[McrCmp] [Dsb] cellular [McrCmp] [Dsb] cellular [PrtCmp] [Dsb]

[CC] [Dis] at cellular level[CC] [Dsb]

[Rsp] to DNA [DmgStim]

[Reg] of cell cycle arrest

-log1

0(p)

fold change

[ImmRsp] immune response

[SigPath] signaling pathway

[PosReg] positive regulation

[CellRsp] cellular response

[Reg] regulationregulating

[CellSrfRcp] cell surface receptor

[Act] activationactivating

[SigTransd] signal transduction

[Prlf] proliferation

[ImmSys] immune system

[CNS] central nervous system

[Dis] disease

[Crc] carcinoma

[InfDis] infectious disease

[VirInfDis] virus infectious disease

[Init] initiation

[CC] cellular component

[Dsb] disassembly

[PrtCmp] protein complex

[McrCmp] macromolecular complex

[DmgStim] damage stimulus

[Rsp] Response

[LrySqmCl] laryngeal squamous cell

[LE] lupus erythematosus[LmpLeu] lymphocytic leukemia

[ResTr] respiratory tract

[Dmy] demyelinating

Term legend

Figure 2 VDR binding intervals are immune-associated or are housekeeping-associated tending to be either present far from or adjacent to transcriptional start sites

respectively (A) The most highly ranking motif cluster ensuing from a de novo motif analysis of the VDR CPo3 consensus intervals using MEME-ChIP (65) including a

CentriMO (80) analysis of enrichment with respect to VDR peak centres B-D GREAT analyses (81) for subsets of VDR peaks from the CPo3 consensus set containing ei-

ther a strong-to-perfect (class I) or weak-to-non-existent (class II) instance of the RXRVDR heterodimer motif (B) Distributions of distances between known TSS sites

(UCSC hg19 Feb 2009) and VDR binding locations for classes I or II or all VDR peaks (CD) Enrichment for Gene Ontology Biological Process and Disease Ontology terms

for class I (C) or class II (D) RXRVDR containing sites respectively For (C) and (D) bars map to fold change whereas Hinton plots map to log10 of the enrichment sig-

nificance P-value

2167Human Molecular Genetics 2017 Vol 26 No 11 |

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 3: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

arthritis The approachrsquos validity is underscored by RXRVDR motif sequence being predictive of binding strength and beingevolutionarily constrained Our findings are consistent with altered RXRVDR binding contributing to immunity-related dis-eases Replicated VDR-BVs associated with these disorders could represent causal disease risk alleles whose effect may bemodifiable by vitamin D levels

IntroductionGenetic variants can alter transcription factor (TF) bindingchromatin status and gene expression (1ndash3) and can be explana-tory of disease susceptibility variation Rather than residing inprotein-coding sequence the majority (approximately 93) ofdisease- or trait-associated variants lie in non-coding sequence(4) Furthermore non-coding variants in regulatory elementsexplain 8-fold more heritability of complex traits than protein-coding variants (5) If we are to better understand the mechan-istic bases to complex disease (and their interaction withenvironmental factors) then we will need to understand howsequence differences in TF binding sites (TFBSs) across multiplegenotypes contribute to disease susceptibility Variations in TFaffinities will then need to be cross-referenced not just withthe sentinel variant from genome-wide association studies(GWASs) which has a very low probability (5) of being causaland is on average 14 kb from the true causal variant (6) but alsowith all other variants with which it lies in strong linkage dis-equilibrium (LD) True causal variants will be enriched incis-regulatory elements especially in enhancers and to a lesserextent in gene promoters (6) These elements are often activein only a few cell types (7) and at limited developmental timepoints (8) which provides an opportunity for relating geneticvariation mdash via molecular observations (TF binding differences)mdash to cellular or organ-based aetiologies

Understanding complex disease will also need the deter-mination of how functional genetic variants interact with en-vironmental factors One such factor is Vitamin D a class offat-soluble secosteroids that enhance intestinal absorption ofdietary minerals which are synthesised in the skin upon expos-ure to sunlight Many observational studies have yielded associ-ations between low serum concentrations of 25-hydroxyvitaminD [25(OH)D] and the risks of developing diverse diseases (910)However Mendelian randomisation studies for example on sus-ceptibility to multiple sclerosis (11) and all-cause mortality (12)have indicated a causal role for this hormone Identifying a mo-lecular basis to these statistical associations would also indicatea direct causal role for vitamin D levels in these diseasesVitamin D signalling occurs principally following the binding ofcalcitriol the active form of vitamin D to its cognate nuclear vita-min D receptor (VDR) which binds DNA in a heterodimer with theRetinoid X receptor (RXR (13)) We previously have shown thatVDR binding in lymphoblastoid cell lines (LCLs) occurs at 2776 lo-cations and preferentially (gt2-fold) within intervals geneticallyassociated with diseases such as Crohnrsquos disease and MultipleSclerosis (14) On one hand these enrichments could indicate adirect modifying effect of VDR-binding on disease risk On theother hand they could more trivially reflect a general associ-ation between regulatory regions and disease-associated gen-omic intervals These enrichments of VDR binding within theseintervals also are not informative of whether altered VDR bindingto DNA functional elements explains genetic contributions to dis-ease risk If this is the case we then also wish to determinewhether it is a gain or loss of VDR binding that causally increasesdisease risk

In this study we inferred VDR-binding sites using ChIP-exo (chromatin immunoprecipitation combined with lambdaexonuclease digestion followed by high-throughput sequencing(1516)) from each of 27 LCL samples using three complementaryapproaches The first lsquopeak-callingrsquo method models the vari-ation in read density The second and third approaches predictVDR-binding sites using genetic variation to explain differencesin VDR-binding affinity either by quantitative trait loci associ-ation testing (lsquoQTLrsquo) across all samples or by allele-specific bind-ing (lsquoASBrsquo) analysis of allelic imbalance based on read depth atsites with heterozygous single nucleotide variants Sequencevariants were identified that both alter VDR binding and havebeen statistically associated by GWAS with altered risk for par-ticular diseases (or are in strong LD with such variants) Theseare excellent candidates for sequence variants that throughtheir alteration of VDR binding directly alter disease suscepti-bility Compared against a background of all VDR-binding siteswe found that human variants associated with variable VDR-binding are enriched (by up to 2-fold) in genomic intervals pre-viously associated with particular traits including some auto-immune diseases

ResultsChIP-exo yields finely resolved VDR binding peaksgenome-wide

Calcitriolndashstimulated lymphoblastoid cells for 30 HapMap sam-ples were grown and prepared for ChIP essentially as describedpreviously (14) with modifications for ChIP-exo (1516)(Methods Supplementary Material Table S1) In total 27 sam-ples successfully passed quality checking (SupplementaryMaterial) Unlike for data from the more traditional ChIP-seq ap-proach analysis of ChIP-exo data has yet to become standar-dised We thus undertook an in-depth study using contrastingapproaches to modelling the ChIP-exo data background to han-dling duplicate reads to performing cross-correlation analysesand to calling peaks these approaches are discussed in detail inthe Supplementary Material (see also Supplementary MaterialFigs S1 and S2 Tables S2ndashS9)

VDR peaks were both highly reproducible between sampleswith 15509 intervals containing peaks called in at least 3 sam-ples (CPo3 peak set) (Fig 1A and B Supplementary Material FigsS3 and S4) and highly concordant with our previously pub-lished VDR ChIP-seq data for calcitriol-stimulated LCLs (CPo3763 Fig 1C Supplementary Material Table S11) (14) Forpeaks identified in both experiments binding intervals wereconsiderably better resolved (typically gt5-fold) using ChIP-exo(Supplementary Material Fig S2A pairwise Kruskal-WallisTest Plt0001 )

As expected VDR binding peaks occurred preferentiallyat protein coding transcriptional start sites (TSS) (Fig 1D)and within regions of open chromatin (at LCL DNase 1 hyper-sensitivity sites DHS) especially for those observed repeatedly(Supplementary Material Figs S5 and S6) VDR bindingpeaks were significantly concentrated within proposed func-tional elements specifically regions of high regulatory factor

2165Human Molecular Genetics 2017 Vol 26 No 11 |

binding (HOT regions (17) and clustered TF binding sites fromENCODE) active promoters enhancers and insulators (Fig 1ESupplementary Material Fig S7) and previously reportedGWAS disease loci principally for autoimmune disorders(Supplementary Material Fig S8) Notably this latter observationcould reflect a causal effect in which VDR-binding influencesdisease susceptibility andor with gene loci being correlated non-causally with both disease susceptibility intervals and VDR bind-ing sites

VDR binding motifs occur preferentially in enhancers

De novo motif analyses of peaks within ChIP-exo VDR bindingintervals revealed motifs strongly resembling those for theRXRVDR DR3 heterodimer (1819) (Fig 2A SupplementaryMaterial Fig S9A and Table S12) For example 9899 (638 ofthe CPo3 set) instances of this DR3 heterodimer motif occurredin binding sites (PScanChIP (20) score gt07) 874 instances of themonomeric VDR motif were also identified in these peak inter-vals (Supplementary Material Fig S9B PScanChIP scorefrac14 1)which are known to have limited affinity for functional dimericnuclear hormone receptor complexes (18) The heterodimericDR3 motif was strongly over-represented (SupplementaryMaterial Table S12) both locally (in the peak sequences com-pared to neighbouring regions) and globally (in the peak regions

compared to LCL DHS sites (20)) and was significantly centrallyenriched in the binding regions (Fig 2A)

VDR binding sites could be separated into functionally dis-tinct classes based on the presence of RXRVDR DR3-like motifsClass I sites contain strong DR3-like motifs (top 20 of thePScanChIP score distribution 3094 regions SupplementaryMaterial Fig S10A) and tend to be located away from genesrsquotranscriptional start sites (TSSs) and yet close to genes involvedin immune response processes (Fig 2B and C) Class I VDR bind-ing sites show the greatest significance of proximity to genomicregions associated with cancer and autoimmune diseases(Fig 2C) We compare these to Class II sites which contain onlyweak or no DR3-like motifs (bottom 20 of PScanChip scores3102 regions Supplementary Material Fig S10A) (Fig 2B andD) Class I binding events occur preferentially in strong or weakenhancers defined by ENCODE rather than in promoters(Supplementary Material Fig S10B and C)

Thousands of variants explain differential VDR binding

We next used the QTL and ASB methods to explain differencesin VDR binding affinity (Supplementary Material) The QTL ana-lysis used Bayesian regression modelling of variants underlyingbinding peaks (2122) (Supplementary Material Tables S13 andS14 Supplementary Material Figs S12ndashS15) The ASB analysis of

8726925915

15509102577037

514439273193269823292018178016071429127011371033

927821732647554461389

300226

145

CPo3

CPo10

CPo20

P

10 1000 100000

Meta-gene Profile - gene body protein codingVDR ChIP-exo peaks - overlap rate

13391 2118658

ChIP-exo (CPo3)

50x Fold Enrichment(p=00001)

ChIP-seq(2776)

002

003

004

005

006

Rea

d co

unt P

er M

illio

n m

appe

d re

ads

Genomic Region (5 minusgt 3)minus2kb TSS 33 66 TES 2kb

Annotation - Encode and HOT regions

log2(fc)

A B

C

D

E

Figure 1 ChIP-exo peaks are reproducible among samples and with previous ChIP-seq results and are enriched in promoters and enhancers(A) Numbers of overlap-

ping ChIP-exo peaks across 27 LCL samples (log scale) Highlighted rows indicate numbers of peaks called in at least three (CPo3) ten (CPo10) or twenty (CPo20) samples

(B) Normalised peak read depths (636278) (Supplementary Material) are concordant between samples VDR ChIP-exo peak binding affinity heatmap showing pair-

wise binding affinity correlation among all 27 sample pairs (Inset) histogram of correlation counts (C) 50-fold enriched overlap between CPo3 peaks and the consensus

ChIP-seq peakset from (14) (enrichment test GAT (64) Supplementary Material) (D) Meta-gene profile of VDR binding reads The panel shows a meta-profile of read

pile-up for the pooled 27 LCL samples averaged over gene bodies (Ensembl v 75 protein coding genes) (E) Enrichment of VDR binding sites in the CPo3 consensus with

ENCODE chromatin states (chromHMM (79)) from LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites) and DNase I hypersensitive areas

(ENCODE) Asterisk denotes significance at 1 False Discovery Rate (FDR) threshold (enrichment test GAT (64) Supplementary Material)

2166 | Human Molecular Genetics 2017 Vol 26 No 11

allelic imbalance is based on read depth at heterozygous single-nucleotide variants (23) and uses sample-specific maternal andpaternal reference maps which we corrected for unannotatedcopy number variation and from which we excluded variantsin ENCODE blacklisted repeat regions (24) The two methodsrsquovariants associated with differences in VDR binding (VDR-Binding Variants VDR-BVs) were annotated and prioritisedbased on the guidelines proposed by (25) and related algorithms(Funseq2 (26))

We report a total of 43332 VDR-BVs Of these only 66ndash105(2867 or 4447) lie within (or lsquohitrsquo) RXRVDR consensus motifs(Pscanchip score gt 07 orgt 05 respectively) This could be ex-plained in part by variants that alter DNA-binding affinity eitherfor other subunits of a larger multi-molecular complex (lsquocollab-orative bindingrsquo (27)) or for factors that inhibit RXRVDR binding

Consequently we next considered whether a sequence vari-ation in motifs for potential binding cofactors of RXRVDR couldexplain the remaining gt 90 of binding variation We found that

A B

Pro

babi

lity

0000

0002

0004

0006

0008

0010

0012

0014

0016

0018

-120 -90 -60 -30 0 30 60 90 120 150

-150 -120 -90 -60 -30 0 30 60 90 120 150

00200018

001600140012

0010000800060004

00020000

Position of best site in sequence

1

2

3

4

5

12345

Reg

ion-

gene

ass

ocia

tions

()

Distance to nearest TSS (kb)

TSS

0 to 5

5 to 5

050

to 50

0

gt500

-5 to

0

-50 t

o -5

-500

to -5

0

lt-50

0

307 71

440278

1684

543441

2548

358

1123

3866

353

4221924

668

437

2817656

3752453

30 16108

VDR binding regions

CPo3 - ALL

CPo3 - RXRVDR class ICPo3 - RXRVDR class II

CClass I

Biological Process Disease Ontology

DClass II

Biological Process

-log1

0(p)

[PosReg] of B cell [Act]

antigen receptor-mediated [SigPath]

[Reg] of B cell [Act][ImmRsp]-[Act] [CellSrfRcp] [SigPath]

[ImmRsp]-[Reg] [CellSrfRcp] [SigPath]

[ImmRsp]-[Act] [SigTransd][ImmRsp]-[Reg] [SigPath]

[Act] of [ImmRsp][Reg] of innate [ImmRsp]

[PosReg] of lymphocyte [Act][Reg] of leukocyte [Prlf]

[PosReg] of leukocyte [Act][ImmRsp]

[Reg] of [ImmRsp][PosReg] of [ImmRsp][PosReg] of cell [Act]

[CellRsp] to cytokine stimulus[Rsp] to cytokine stimulus

[PosReg] of [ImmSys] process[Reg] of cytokine production

fold change

-log1

0(p)

[LrySqmCl] [Crc]

laryngeal [Dis]

mouth [Dis]

Herpesviridae [InfDis]upper [ResTr] [Dis]nasopharynx [Crc]

dsDNA [VirInfDis][Dmy] [Dis] of [CNS]

demyelinating [Dis]

systemic [LE]

chronic [LmpLeu]

[LE]

fold change

[Reg] of translational [Init][PrtCmp][Dsb]

[McrCmp] [Dsb] cellular [McrCmp] [Dsb] cellular [PrtCmp] [Dsb]

[CC] [Dis] at cellular level[CC] [Dsb]

[Rsp] to DNA [DmgStim]

[Reg] of cell cycle arrest

-log1

0(p)

fold change

[ImmRsp] immune response

[SigPath] signaling pathway

[PosReg] positive regulation

[CellRsp] cellular response

[Reg] regulationregulating

[CellSrfRcp] cell surface receptor

[Act] activationactivating

[SigTransd] signal transduction

[Prlf] proliferation

[ImmSys] immune system

[CNS] central nervous system

[Dis] disease

[Crc] carcinoma

[InfDis] infectious disease

[VirInfDis] virus infectious disease

[Init] initiation

[CC] cellular component

[Dsb] disassembly

[PrtCmp] protein complex

[McrCmp] macromolecular complex

[DmgStim] damage stimulus

[Rsp] Response

[LrySqmCl] laryngeal squamous cell

[LE] lupus erythematosus[LmpLeu] lymphocytic leukemia

[ResTr] respiratory tract

[Dmy] demyelinating

Term legend

Figure 2 VDR binding intervals are immune-associated or are housekeeping-associated tending to be either present far from or adjacent to transcriptional start sites

respectively (A) The most highly ranking motif cluster ensuing from a de novo motif analysis of the VDR CPo3 consensus intervals using MEME-ChIP (65) including a

CentriMO (80) analysis of enrichment with respect to VDR peak centres B-D GREAT analyses (81) for subsets of VDR peaks from the CPo3 consensus set containing ei-

ther a strong-to-perfect (class I) or weak-to-non-existent (class II) instance of the RXRVDR heterodimer motif (B) Distributions of distances between known TSS sites

(UCSC hg19 Feb 2009) and VDR binding locations for classes I or II or all VDR peaks (CD) Enrichment for Gene Ontology Biological Process and Disease Ontology terms

for class I (C) or class II (D) RXRVDR containing sites respectively For (C) and (D) bars map to fold change whereas Hinton plots map to log10 of the enrichment sig-

nificance P-value

2167Human Molecular Genetics 2017 Vol 26 No 11 |

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 4: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

binding (HOT regions (17) and clustered TF binding sites fromENCODE) active promoters enhancers and insulators (Fig 1ESupplementary Material Fig S7) and previously reportedGWAS disease loci principally for autoimmune disorders(Supplementary Material Fig S8) Notably this latter observationcould reflect a causal effect in which VDR-binding influencesdisease susceptibility andor with gene loci being correlated non-causally with both disease susceptibility intervals and VDR bind-ing sites

VDR binding motifs occur preferentially in enhancers

De novo motif analyses of peaks within ChIP-exo VDR bindingintervals revealed motifs strongly resembling those for theRXRVDR DR3 heterodimer (1819) (Fig 2A SupplementaryMaterial Fig S9A and Table S12) For example 9899 (638 ofthe CPo3 set) instances of this DR3 heterodimer motif occurredin binding sites (PScanChIP (20) score gt07) 874 instances of themonomeric VDR motif were also identified in these peak inter-vals (Supplementary Material Fig S9B PScanChIP scorefrac14 1)which are known to have limited affinity for functional dimericnuclear hormone receptor complexes (18) The heterodimericDR3 motif was strongly over-represented (SupplementaryMaterial Table S12) both locally (in the peak sequences com-pared to neighbouring regions) and globally (in the peak regions

compared to LCL DHS sites (20)) and was significantly centrallyenriched in the binding regions (Fig 2A)

VDR binding sites could be separated into functionally dis-tinct classes based on the presence of RXRVDR DR3-like motifsClass I sites contain strong DR3-like motifs (top 20 of thePScanChIP score distribution 3094 regions SupplementaryMaterial Fig S10A) and tend to be located away from genesrsquotranscriptional start sites (TSSs) and yet close to genes involvedin immune response processes (Fig 2B and C) Class I VDR bind-ing sites show the greatest significance of proximity to genomicregions associated with cancer and autoimmune diseases(Fig 2C) We compare these to Class II sites which contain onlyweak or no DR3-like motifs (bottom 20 of PScanChip scores3102 regions Supplementary Material Fig S10A) (Fig 2B andD) Class I binding events occur preferentially in strong or weakenhancers defined by ENCODE rather than in promoters(Supplementary Material Fig S10B and C)

Thousands of variants explain differential VDR binding

We next used the QTL and ASB methods to explain differencesin VDR binding affinity (Supplementary Material) The QTL ana-lysis used Bayesian regression modelling of variants underlyingbinding peaks (2122) (Supplementary Material Tables S13 andS14 Supplementary Material Figs S12ndashS15) The ASB analysis of

8726925915

15509102577037

514439273193269823292018178016071429127011371033

927821732647554461389

300226

145

CPo3

CPo10

CPo20

P

10 1000 100000

Meta-gene Profile - gene body protein codingVDR ChIP-exo peaks - overlap rate

13391 2118658

ChIP-exo (CPo3)

50x Fold Enrichment(p=00001)

ChIP-seq(2776)

002

003

004

005

006

Rea

d co

unt P

er M

illio

n m

appe

d re

ads

Genomic Region (5 minusgt 3)minus2kb TSS 33 66 TES 2kb

Annotation - Encode and HOT regions

log2(fc)

A B

C

D

E

Figure 1 ChIP-exo peaks are reproducible among samples and with previous ChIP-seq results and are enriched in promoters and enhancers(A) Numbers of overlap-

ping ChIP-exo peaks across 27 LCL samples (log scale) Highlighted rows indicate numbers of peaks called in at least three (CPo3) ten (CPo10) or twenty (CPo20) samples

(B) Normalised peak read depths (636278) (Supplementary Material) are concordant between samples VDR ChIP-exo peak binding affinity heatmap showing pair-

wise binding affinity correlation among all 27 sample pairs (Inset) histogram of correlation counts (C) 50-fold enriched overlap between CPo3 peaks and the consensus

ChIP-seq peakset from (14) (enrichment test GAT (64) Supplementary Material) (D) Meta-gene profile of VDR binding reads The panel shows a meta-profile of read

pile-up for the pooled 27 LCL samples averaged over gene bodies (Ensembl v 75 protein coding genes) (E) Enrichment of VDR binding sites in the CPo3 consensus with

ENCODE chromatin states (chromHMM (79)) from LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites) and DNase I hypersensitive areas

(ENCODE) Asterisk denotes significance at 1 False Discovery Rate (FDR) threshold (enrichment test GAT (64) Supplementary Material)

2166 | Human Molecular Genetics 2017 Vol 26 No 11

allelic imbalance is based on read depth at heterozygous single-nucleotide variants (23) and uses sample-specific maternal andpaternal reference maps which we corrected for unannotatedcopy number variation and from which we excluded variantsin ENCODE blacklisted repeat regions (24) The two methodsrsquovariants associated with differences in VDR binding (VDR-Binding Variants VDR-BVs) were annotated and prioritisedbased on the guidelines proposed by (25) and related algorithms(Funseq2 (26))

We report a total of 43332 VDR-BVs Of these only 66ndash105(2867 or 4447) lie within (or lsquohitrsquo) RXRVDR consensus motifs(Pscanchip score gt 07 orgt 05 respectively) This could be ex-plained in part by variants that alter DNA-binding affinity eitherfor other subunits of a larger multi-molecular complex (lsquocollab-orative bindingrsquo (27)) or for factors that inhibit RXRVDR binding

Consequently we next considered whether a sequence vari-ation in motifs for potential binding cofactors of RXRVDR couldexplain the remaining gt 90 of binding variation We found that

A B

Pro

babi

lity

0000

0002

0004

0006

0008

0010

0012

0014

0016

0018

-120 -90 -60 -30 0 30 60 90 120 150

-150 -120 -90 -60 -30 0 30 60 90 120 150

00200018

001600140012

0010000800060004

00020000

Position of best site in sequence

1

2

3

4

5

12345

Reg

ion-

gene

ass

ocia

tions

()

Distance to nearest TSS (kb)

TSS

0 to 5

5 to 5

050

to 50

0

gt500

-5 to

0

-50 t

o -5

-500

to -5

0

lt-50

0

307 71

440278

1684

543441

2548

358

1123

3866

353

4221924

668

437

2817656

3752453

30 16108

VDR binding regions

CPo3 - ALL

CPo3 - RXRVDR class ICPo3 - RXRVDR class II

CClass I

Biological Process Disease Ontology

DClass II

Biological Process

-log1

0(p)

[PosReg] of B cell [Act]

antigen receptor-mediated [SigPath]

[Reg] of B cell [Act][ImmRsp]-[Act] [CellSrfRcp] [SigPath]

[ImmRsp]-[Reg] [CellSrfRcp] [SigPath]

[ImmRsp]-[Act] [SigTransd][ImmRsp]-[Reg] [SigPath]

[Act] of [ImmRsp][Reg] of innate [ImmRsp]

[PosReg] of lymphocyte [Act][Reg] of leukocyte [Prlf]

[PosReg] of leukocyte [Act][ImmRsp]

[Reg] of [ImmRsp][PosReg] of [ImmRsp][PosReg] of cell [Act]

[CellRsp] to cytokine stimulus[Rsp] to cytokine stimulus

[PosReg] of [ImmSys] process[Reg] of cytokine production

fold change

-log1

0(p)

[LrySqmCl] [Crc]

laryngeal [Dis]

mouth [Dis]

Herpesviridae [InfDis]upper [ResTr] [Dis]nasopharynx [Crc]

dsDNA [VirInfDis][Dmy] [Dis] of [CNS]

demyelinating [Dis]

systemic [LE]

chronic [LmpLeu]

[LE]

fold change

[Reg] of translational [Init][PrtCmp][Dsb]

[McrCmp] [Dsb] cellular [McrCmp] [Dsb] cellular [PrtCmp] [Dsb]

[CC] [Dis] at cellular level[CC] [Dsb]

[Rsp] to DNA [DmgStim]

[Reg] of cell cycle arrest

-log1

0(p)

fold change

[ImmRsp] immune response

[SigPath] signaling pathway

[PosReg] positive regulation

[CellRsp] cellular response

[Reg] regulationregulating

[CellSrfRcp] cell surface receptor

[Act] activationactivating

[SigTransd] signal transduction

[Prlf] proliferation

[ImmSys] immune system

[CNS] central nervous system

[Dis] disease

[Crc] carcinoma

[InfDis] infectious disease

[VirInfDis] virus infectious disease

[Init] initiation

[CC] cellular component

[Dsb] disassembly

[PrtCmp] protein complex

[McrCmp] macromolecular complex

[DmgStim] damage stimulus

[Rsp] Response

[LrySqmCl] laryngeal squamous cell

[LE] lupus erythematosus[LmpLeu] lymphocytic leukemia

[ResTr] respiratory tract

[Dmy] demyelinating

Term legend

Figure 2 VDR binding intervals are immune-associated or are housekeeping-associated tending to be either present far from or adjacent to transcriptional start sites

respectively (A) The most highly ranking motif cluster ensuing from a de novo motif analysis of the VDR CPo3 consensus intervals using MEME-ChIP (65) including a

CentriMO (80) analysis of enrichment with respect to VDR peak centres B-D GREAT analyses (81) for subsets of VDR peaks from the CPo3 consensus set containing ei-

ther a strong-to-perfect (class I) or weak-to-non-existent (class II) instance of the RXRVDR heterodimer motif (B) Distributions of distances between known TSS sites

(UCSC hg19 Feb 2009) and VDR binding locations for classes I or II or all VDR peaks (CD) Enrichment for Gene Ontology Biological Process and Disease Ontology terms

for class I (C) or class II (D) RXRVDR containing sites respectively For (C) and (D) bars map to fold change whereas Hinton plots map to log10 of the enrichment sig-

nificance P-value

2167Human Molecular Genetics 2017 Vol 26 No 11 |

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 5: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

allelic imbalance is based on read depth at heterozygous single-nucleotide variants (23) and uses sample-specific maternal andpaternal reference maps which we corrected for unannotatedcopy number variation and from which we excluded variantsin ENCODE blacklisted repeat regions (24) The two methodsrsquovariants associated with differences in VDR binding (VDR-Binding Variants VDR-BVs) were annotated and prioritisedbased on the guidelines proposed by (25) and related algorithms(Funseq2 (26))

We report a total of 43332 VDR-BVs Of these only 66ndash105(2867 or 4447) lie within (or lsquohitrsquo) RXRVDR consensus motifs(Pscanchip score gt 07 orgt 05 respectively) This could be ex-plained in part by variants that alter DNA-binding affinity eitherfor other subunits of a larger multi-molecular complex (lsquocollab-orative bindingrsquo (27)) or for factors that inhibit RXRVDR binding

Consequently we next considered whether a sequence vari-ation in motifs for potential binding cofactors of RXRVDR couldexplain the remaining gt 90 of binding variation We found that

A B

Pro

babi

lity

0000

0002

0004

0006

0008

0010

0012

0014

0016

0018

-120 -90 -60 -30 0 30 60 90 120 150

-150 -120 -90 -60 -30 0 30 60 90 120 150

00200018

001600140012

0010000800060004

00020000

Position of best site in sequence

1

2

3

4

5

12345

Reg

ion-

gene

ass

ocia

tions

()

Distance to nearest TSS (kb)

TSS

0 to 5

5 to 5

050

to 50

0

gt500

-5 to

0

-50 t

o -5

-500

to -5

0

lt-50

0

307 71

440278

1684

543441

2548

358

1123

3866

353

4221924

668

437

2817656

3752453

30 16108

VDR binding regions

CPo3 - ALL

CPo3 - RXRVDR class ICPo3 - RXRVDR class II

CClass I

Biological Process Disease Ontology

DClass II

Biological Process

-log1

0(p)

[PosReg] of B cell [Act]

antigen receptor-mediated [SigPath]

[Reg] of B cell [Act][ImmRsp]-[Act] [CellSrfRcp] [SigPath]

[ImmRsp]-[Reg] [CellSrfRcp] [SigPath]

[ImmRsp]-[Act] [SigTransd][ImmRsp]-[Reg] [SigPath]

[Act] of [ImmRsp][Reg] of innate [ImmRsp]

[PosReg] of lymphocyte [Act][Reg] of leukocyte [Prlf]

[PosReg] of leukocyte [Act][ImmRsp]

[Reg] of [ImmRsp][PosReg] of [ImmRsp][PosReg] of cell [Act]

[CellRsp] to cytokine stimulus[Rsp] to cytokine stimulus

[PosReg] of [ImmSys] process[Reg] of cytokine production

fold change

-log1

0(p)

[LrySqmCl] [Crc]

laryngeal [Dis]

mouth [Dis]

Herpesviridae [InfDis]upper [ResTr] [Dis]nasopharynx [Crc]

dsDNA [VirInfDis][Dmy] [Dis] of [CNS]

demyelinating [Dis]

systemic [LE]

chronic [LmpLeu]

[LE]

fold change

[Reg] of translational [Init][PrtCmp][Dsb]

[McrCmp] [Dsb] cellular [McrCmp] [Dsb] cellular [PrtCmp] [Dsb]

[CC] [Dis] at cellular level[CC] [Dsb]

[Rsp] to DNA [DmgStim]

[Reg] of cell cycle arrest

-log1

0(p)

fold change

[ImmRsp] immune response

[SigPath] signaling pathway

[PosReg] positive regulation

[CellRsp] cellular response

[Reg] regulationregulating

[CellSrfRcp] cell surface receptor

[Act] activationactivating

[SigTransd] signal transduction

[Prlf] proliferation

[ImmSys] immune system

[CNS] central nervous system

[Dis] disease

[Crc] carcinoma

[InfDis] infectious disease

[VirInfDis] virus infectious disease

[Init] initiation

[CC] cellular component

[Dsb] disassembly

[PrtCmp] protein complex

[McrCmp] macromolecular complex

[DmgStim] damage stimulus

[Rsp] Response

[LrySqmCl] laryngeal squamous cell

[LE] lupus erythematosus[LmpLeu] lymphocytic leukemia

[ResTr] respiratory tract

[Dmy] demyelinating

Term legend

Figure 2 VDR binding intervals are immune-associated or are housekeeping-associated tending to be either present far from or adjacent to transcriptional start sites

respectively (A) The most highly ranking motif cluster ensuing from a de novo motif analysis of the VDR CPo3 consensus intervals using MEME-ChIP (65) including a

CentriMO (80) analysis of enrichment with respect to VDR peak centres B-D GREAT analyses (81) for subsets of VDR peaks from the CPo3 consensus set containing ei-

ther a strong-to-perfect (class I) or weak-to-non-existent (class II) instance of the RXRVDR heterodimer motif (B) Distributions of distances between known TSS sites

(UCSC hg19 Feb 2009) and VDR binding locations for classes I or II or all VDR peaks (CD) Enrichment for Gene Ontology Biological Process and Disease Ontology terms

for class I (C) or class II (D) RXRVDR containing sites respectively For (C) and (D) bars map to fold change whereas Hinton plots map to log10 of the enrichment sig-

nificance P-value

2167Human Molecular Genetics 2017 Vol 26 No 11 |

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 6: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

position weight matrices for 26 additional TFs (SupplementaryMaterial Tables S15 and S16) were significantly enriched (Plt001)(i) relative to both open chromatin genome-wide and chromoso-mally adjacent regions (lt 150bp) andor (ii) at the peak sum-mits within VDR-BV intervals (Pscanchip score gt 05 or 07Supplementary Methods) Virtually all (841ndash883) of bindingvariation could be explained by VDR-BVs lying within a RXRVDRconsensus motif or within one or more of these 26 additionalmotifs (Pscanchip score gt 07 orgt 05) These are upper-boundvalues because of the inaccuracy of motif prediction and becausenot all variants lying in motifs will alter binding affinity

Functional annotation of VDR-BVs

To begin to understand whether VDR-BVs could contribute toparticular human diseases we first considered their enrichmentin relevant genomic annotations We were interested inwhether these variably-bound sites are enriched relative to abackground of all VDR binding regions genome-wide so that thenon-uniform chromosomal distribution of binding sites isaccounted for We chose for our null expectation all variantsthat were tested for differential binding within replicated (CPo3)VDR binding peaks (Supplementary Material) Relative to thisstringent background VDR-BVs are 20ndash40 enriched within en-hancers but not within promoters (Fig 3A) These patterns ofenrichment are reinforced when we stringently select only themost significant VDR-BVs (VDR-sBVs Supplementary MaterialFig S16A AlleleSeq FDR 001 6715 VDR-BVs) or the 357 recur-rent VDR-BVs that are called at the same position in multiple

samples (VDR-rBVs Supplementary Material Fig S16D) Weconclude that VDR-BVs are not uniformly distributed among allVDR binding genomic intervals and are especially frequent inenhancer regions manifested in LCLs

VDR-BVs occurred 60 more frequently in RXRVDR motifswithin replicated (CPo3) VDR binding peaks and 120 more fre-quently in strong (class I) RXRVDR motifs in the same regions(Fig 3B) Again these enrichments strengthened substantiallyfor VDR-sBVs or VDR-rBVs (Supplementary Material Fig S16Band E respectively)

We then considered whether VDR-BVs are significantly en-riched within LCL expression QTLs (eQTLs) relative to 1000Genomes variants under VDR ChIP-exo pileups tested for VDRbinding variation potential (Supplementary Material) In additionthese background variants were matched by derived allele fre-quency and LD-dependence confounders (Methods) We also tookcare to only consider variants that do not share strong linkagedis-equilibrium (32183 VDR-BVs 5808 VDR-sBVs and 341 VDR-rBVs)

VDR-BVs were enriched with dsQTLs (14-fold) and eQTLs (11ndash12-fold Fig 3C) particularly for the more stringent VDR-BVsubsets(Supplementary Material Fig S16C and F) The enrichment ineQTLs was confirmed using an independent eQTL resource(GEUVADIS LCL eQTLs (28) Supplementary Material)

Complex trait-associated variants influenceVDR binding

To investigate whether susceptibility to specific diseases couldbe influenced by VDR-BVs we considered their locations

91 [641]

364 [3108]

1070 [10154]

1503 [14513]

76 [1217]

log2(fc)

112 [796]

243 [1886]

2646 [2310]

407 [3667]

log2(fc)

59 [273]

145 [879]

21 [127]

log2(fc)

RXRVDR

RXRVDR

RXRVDR

A B

C

Figure 3 Genomic association testing of VDR-BVs with functional annotation VDR-BVs are enriched at enhancers in RXRVDR DR3-type motifs and in LCL eQTLs

(A) Enrichment of VDR-BVs within ENCODE chromatin state segmentation tracks for LCL NA12878 TF-dense regions (HOT (17) and ENCODE clustered TF binding sites)

and DNase hypersensitive areas (ENCODE) (B) Enrichment of VDR-BVs in VDR consensus motif intervals present in VDR CPo3 binding regions (C) Enrichment of VDR-

BVs at DNase-QTLs and eQTLs from the Pritchard resource and CEUYRI LCL eQTLs from the GEUVADIS resource Significant enrichments are indicated using blue

histogram bars (fcfrac14 fold change of observed versus expected overlaps FDR qlt01 numbers in a box close to each bar indicate lsquoobserved [expected]rsquo overlap counts

light grey bars indicate lack of significance) For A enrichments shown are above-and-beyond the previously observed (Figure 1E) enrichments of VDR CPo3 peaks in

the same functional annotation classes (relative to a background of 20330 1000 Genomes SNPs in CPo3 VDR binding regions) For B enrichments are relative to a back-

ground of 114155 1000 Genomes variants lying under VDR ChIP-exo read pileups ( 5 reads) which had been tested as potential VDR binding affinity modifiers This

background was further corrected for the analyses in C were only LD-independent foreground VDR-BVs were tested and 10000 DAF-matched random background sets

were extracted with replacement from the main set of 114155 background variants

2168 | Human Molecular Genetics 2017 Vol 26 No 11

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 7: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

relative to GWAS variants significantly associated with 472 di-verse diseases or traits from 688 studies in the largest catalogueavailable (GRASP v20 (29)) with at least 5 associated genome-wide significant SNPs (Plt5 108) (Supplementary Material) Alltraits available in these catalogues were considered so as not tobias subsequent findings Again we employed stringent pro-cedures matching variants for allele frequency and LD-dependence and discarding VDR-BVs in the MHC region as wellas again comparing against a background of all VDR-binding re-gions (Test 3rsquo p 34 of the Supplementary Material)

VDR-BVs were significantly and up to 2-fold enriched withinLD intervals associated with 17 diseases or traits (Fig 4) It isnotable that these are not drawn randomly from all 472 traitsconsidered but are mostly immune- or inflammatory-relateddisorders such as sarcoidosis Gravesrsquo disease Crohnrsquos diseaseirritable bowel syndrome and type I diabetes (Fig 4A) The mostenriched trait is Alzheimerrsquos disease for which 25-hydroxyvita-min D level is a proposed causal risk factor (30)

For the 341 VDR-rBVs intervals from six disorders containedmore VDR-rBVs than expected from sets of DAF-matched LD-accounted regions that bind VDR (Fig 4B) We emphasisethat this represents a highly stringent analysis that accounts forall known confounding effects namely potential biases from

called VDR-binding sites population stratification and physicallinkage

Functional impact of genetic variation on loss or gain ofVDR binding

This evidence is consistent with altered RXRVDR binding con-tributing to immunity-related diseases Nevertheless becausevariants in TFBS often fail to alter binding affinity andor haveno measurable effect on gene expression levels (3132) weneeded to demonstrate that VDR-BVs are functional specificallythat they influence gene expression and have been negativelyselected over evolution (33) As expected if they are enriched infunctional variants we found that 50ndash100 more VDR-BVs thanrandom samples lie near (lt 10kb) genes that are differentiallyexpressed in LCLs upon addition of calcitriol (14) (Plt104 Supplementary Methods)

Also as expected if as a class they are functional replicatedVDR-binding peaks exhibit variable cross-species conservation(34) across RXRVDR motif positions (Supplementary MaterialFig S17A and B) This signature of uneven selection acrossthe DR3 motif is evident for class I VDR binding sites(Supplementary Material Fig S17C and D Kruskal Wallis test

B

log2

(fc

)

347[2814]

122[1005]

198[1671]

146[1165]

55[419]

66[501]

52[39]

50[328]

32[20]

23[137]

20[118]

63[376]

16[9]

13[66]

8[37]

2[05]

10[43]

A

-log10(q)110170

210

Irritable bowel sy

ndrome

Crohns dise

ase

log2

(fc

)

2[002]

2[01]

4[11]

2[03]

2[04]

2[05]

-log10(q)

200

120

101

Figure 4 VDR-BVs are enriched in GWAS LD blocks for autoimmune inflammatory and other diseases (AB) Traits and diseases showing enrichment of VDR-BVs (A) or

VDR-rBVs (B) Diseasestrait associations were acquired from the GRASP catalog v20 (29) Statistical association was computed between VDR-BVs and strong-LD

(Supplementary Material) intervals around genome-wide significant (Plt5108) disease tag-SNPs from the GRASP catalog All available GRASP diseases and traits rep-

resented by at least 5 SNPs were analysed and those showing significant enrichment of VDR binding beyond a 1 FDR threshold were retained disease associations

supported by only 1 VDR-BV were discarded (full data are presented in the Supplementary Material) The sizes of black squares indicate the statistical significance of

enrichments

2169Human Molecular Genetics 2017 Vol 26 No 11 |

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 8: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

Pfrac14 19 1011 and post-hoc pairwise Kruskal Wallis compari-sons) but not in class II sites (Kruskal Wallis test Pfrac14014) con-sistent with the DR3 heterodimer-binding motif regulatingtranscription

Next we assessed the impact on the VDR binding affinity ofVDR-BVs with respect to their location within the consensusDR3-type motif (JASPAR database Fig 5) the transcriptionallyactive binding site motif for the RXRVDR heterodimer (35ndash37)Mapping of allele-specific ChIP-exo reads to multiply replicatedVDR binding regions (CPo3 peaks) was highly predictive ofwhether variants substantially alter (lsquobreakrsquo) functional motifsFor 89 of observations the change in the strength of the VDRbinding motif correctly predicts the direction of VDR binding af-finity change (Fig 5A) a proportion that rises to 100 in class IVDR binding sites (Supplementary Material Fig S18A) Motif

breaks do not favour 5rsquo RXR over 3rsquo VDR hexamers (Fig 5AWilcoxon rank sum test Pfrac14071) but especially impact con-served G and C nucleotides (positions 2 5 11 and 14) and aT nucleotide (position 13) Notably fewer events occur at thenear-essential T nucleotides at positions 4 and 12 Interestinglyin class I sites there are no VDR-BVs disrupting the T nucleotideat position 12 (Supplementary Material Fig S18A) which couldreflect either low mutation rates (mutational lsquocold spotsrsquo) orstrong purifying selection against deleterious nucleotide substi-tutions at these binding sites (38)

To distinguish these possibilities we next compared thepopulation frequencies of alleles associated with either histor-ical Loss Of Binding (LOB) or Gain of Binding (GOB) affinity toVDR (Methods) LOB VDR-BVs cause significantly stronger ef-fects when within either 5rsquo or 3rsquo hexamer than within the DR3

bits

2

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mag

nitu

de o

f VD

R b

indi

ng a

ffini

ty c

hang

e

mot

if br

eak

even

ts

A

C

D

DR3 Spacer

no asym bindingLOB + Motif BreakLOB

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV - concordance of variant genotype and binding affinity phenotype

VDR-BV (LOB) - impact on binding affinity

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

VDR-BV (GOB) - impact on binding affinity

B

5 Hexamer - RXR 3 Hexamer - VDR

no asym bindingGOB

genotype and phenotype are concordantgenotype and phenotype are discordant

Figure 5 Genetic variation modulating VDR binding affinity in LCLs within the canonical RXRVDR heterodimeric binding motif The figure shows a genome-wide

quantification of the effect of genetic variation on VDR binding based on the analysis of VDR-BVs in CPo3 binding peaks which hit the canonical RXRVDR DR3 hetero-

dimeric consensus motif (motif shown in B JASPAR database (19)) (A) Distribution across the RXRVDR motif of VDR-BVs which cause significant motif disruption (sig-

nificance assessed via FunSeq2 using TFMPvalue (82) with default threshold Plt4108) and quantification of the concordance between directionality of VDR PWM

disruption and directionality of resulting VDR binding affinity variation 102115 (89) VDR-BVs in CPo3 VDR peaks that significantly break the RXRVDR motif predict

the direction of VDR binding affinity change (CD) Genome-wide quantification of the phenotypic effect of all VDR-BVs intersecting the RXRVDR motif (including

those which do not generate a motif break at the above significance level) The vertical axis (Magnitude of VDR binding affinity change) indicates the fold change of read

depth of the ancestral versus the derived allele (C) or derived versus the ancestral allele (D) (C) Impact on VDR binding affinity of Loss of Binding (LOB orange dots)

VDR-BVs and Loss of Binding VDR-BVs which test for significant RXRVDR motif break (red dots) (D) Impact on VDR binding affinity of Gain of Binding (GOB) VDR-BVs

(blue dots) Nucleotide position has a significant influence on LOB (Kruskal-Wallis rank sum test x2(14) frac14 33385 Plt001 ) but not on GOB (x2(12) frac14 1768 Pfrac1412) pheno-

type magnitude (CD) Grey dots indicate impact on binding affinity of those 1000 Genomes variants carried by the LCL samples which do not test for significant asym-

metric binding (AlleleSeq (23) FDR threshold frac14 01) and are not therefore VDR-BVs

2170 | Human Molecular Genetics 2017 Vol 26 No 11

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 9: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

spacer sequence (Wilcoxon rank sum test Plt001 GOB variantsPfrac14011) VDR-BVs affecting one of the most highly conservedmotif positions (position 13) always result in motif-breakingLOB effect (Fig 5C and D Supplementary Material Fig S18C andD)

We next found that RXRVDR motif sequences have beenunder constraint during recent evolution within the extanthuman population We did so by comparing the frequencies ofderived alleles (Derived Allele Frequency [DAF] test) for LOB orGOB VDR-BVs in either Northern and Western European (CEU)or Yoruban (YRI) populations In the absence of demographicconfounders stronger purifying selection is implied by an ex-cess of low population frequency variants in one set of variantsover another (39)

We detected significant increases in DAF distribution forGOB over LOB VDR-BVs both for CEU (Fig 6C top-left panel)and YRI (Fig 6C bottom-left panel) cohorts Class I binding siteVDR-BVs exhibited significant differences in GOB versus LOBDAF distributions (Supplementary Material Fig S19 Plt005 forboth CEU and YRI) By contrast variants in RXRVDR hexamersnot assigned as VDR-BVs showed no significant difference inGOB versus LOB DAF distributions (Fig 6C top-right and

bottom-right panels) Significant differences between GOB andLOB DAFs were attained even when observing the RXR and VDRmotif hexamers separately and in both sub-populations(Plt0012) Consequently LOB variants are under substantiallystronger purifying selection than are GOB variants This is animportant finding because it indicates that reduced binding ofVDR at these genomic positions and presumably reduced vita-min D levels are commonly deleterious

DiscussionWe have demonstrated how genetic variation alters the bindingaffinity of the vitamin D receptor (VDR) a member of the nu-clear receptor family of transcription factors Genetic variantssignificantly associated with immune and inflammatory dis-ease were found to disrupt VDR binding and to be located pref-erentially within enhancer regions (Fig 3) in line with a recentanalysis of causal immune disease-related variants (6) Thestudy benefited from the 5-fold greater spatial resolution andsignal-to-noise ratios provided by the ChIP-exo assay

In its best studied model of allosteric modulation theRXRVDR heterodimer binds to DNA and enhances

4 2 0 2 4Frequency

nsGOB LOB

Variants not affecting VDR binding

4 2 0 2 4Frequency

nsGOB LOB

4 2 0 2 4 6Frequency

GOB LOB

VDR-BVs

4 2 0 2 4Frequency

GOB LOBA

B

C

D

Figure 6 Evolutionary conservation of VDR-BVs at RXRVDR consensus motifs The diagrams show distributions of the Derived Allele Frequency (DAF) for variants in

RXRVDR consensus motifs within reproducible CPo3 binding peaks DAF values are separated by ethnicity (A and C CEU B and D YRI) variants are separated by their

effect on VDR binding affinity (A and B VDR-BVs C and D 1000 Genomes variants carried by the LCL samples which do not test for significant asymmetric binding and

are not therefore VDR-BVs) Within each of the four panels variants are split in two groups based on their effect on VDR binding affinity direction (whether GOB or

LOB) For all quadrants only DAFs for variants hitting hexamer positions (ie hitting either the RXR recognition element at positions 1-6 or the VDR recognition element

at positions 10-15) in the RXRVDR motif are shown An asterisk indicates significance (non-parametric Wilcoxon rank sum tests afrac14005) A Wilcoxon rank sum test

Pfrac1428104 B Wilcoxon rank sum test Pfrac1446105 C and D Pfrac14055 and Pfrac14014 respectively nsfrac14not significant

2171Human Molecular Genetics 2017 Vol 26 No 11 |

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 10: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

transcription via response elements that typically consist oftwo hexameric ((AG)G(TG)TCA) half-sites with RXR andVDR DNA binding domains occupying the upstream and down-stream half-site respectively (40) This DR3 motif which wepreviously identified using ChIP-seq data from two calcitriol-activated CEU LCLs (14) was also the predominant VDR bindinginterface in calcitriol-activated LCLs in the present ChIP-exostudy We identified strong enrichment of VDR binding withinenhancers with stronger enrichments associated with themore stringent subsets of VDR-BVs (Figs 1 and 3) Whilst VDRbinding occurs preferentially within promoters these inter-actions are only rarely associated with RXRVDR DR3 motifsand may be mediated by additional DNA-binding co-factorsincluding CTCF and might reflect unproductive binding to openchromatin In contrast those VDR binding sites containingstrong instances of the RXRVDR DR3 motif were particularlyenriched in enhancers and also near to genes involved in im-munity processes and related diseases (Fig 2) VDR thus is ex-pected to exert a substantial effect on the biology of theimmune system as a sequence-specific enhancer regulator

Most remarkably we found variants weakening or strengthen-ing the RXRVDR PWM score to be highly predictive of the loss orgain of VDR binding affinity (Fig 5) Furthermore the VDR-boundRXRVDR motif shows evidence of levels of sequence conserva-tion across vertebrate evolution that mirror its information con-tent and loss of VDR-binding variants over modern humanevolution has occurred under stronger purifying selection thanthe gain of binding variants (Fig 6) Together these results implythat VDR-binding and thus presumably vitamin D levels have ingeneral been protective of disease

Our study will have underestimated the true extent of regu-latory variation at VDR binding sites This is because the asym-metric binding analysis (23) by definition can only test variantswhich are heterozygous for a given LCL sample therefore truepositive VDR-BVs will be unobserved if all LCL samples we con-sidered are homozygous Similarly not all of the required threebiallelic genotypes will have been present in these samples forthe regression-on-genotype analysis (2221) Consequently alarger scale study would be expected to show improved powerto detect VDR-BVs

At least 90 of VDR-BVs lay outside of RXRVDR DR3 motifsThe majority of variable binding could occur as with other TFs(41) at weak or non-canonical binding motifs or its affinity mayalter because of genetic disruption to a cofactor binding site(42) Another explanation is that VDR-BVs commonly exert theireffect by altering the DNA conformation of regions flanking thecore binding site (43) Nevertheless our de novo motif discoveryidentified several proteins as potential RXRVDR cofactorswhose altered DNA-binding affinity could influence VDR bind-ing Over-representation of motifs in VDR peaks has beenobserved before for example in a recent study of VDR bindingin monocytes and monocyte-derived inflammatory and tolero-genic dendritic cells (44) Direct interaction between VDR andCREB1 has been reported previously (45) as have functionalinteractions between vitamin DVDR and oestrogen metabolismand MYC proteins (4647) Co-occurrence of GABPA and ESRRBmotifs with VDR-binding sites had been reported previously(48) ZNF423 is not known to bind VDR yet it might do so indir-ectly because it physically associates with its heterodimer part-ner RXRa (49)

Our finding that VDR-BVs are significantly enriched withineQTLs (Fig 3C) is intriguing because the eQTLs were inferredfrom LCLs not treated with calcitriol and implied that these en-hancersrsquo activities may not be entirely vitamin D-dependent in

these cells Nevertheless there was no clear concordance or dis-cordance between the direction of change of VDR binding atVDR-BVs (in calcitriol-treated LCLs) with the direction of changeof gene expression in calcitriol-minus human LCLs reported forthe GEUVADIS eQTL variants (data not shown) This and thelack of enrichment of the DR3 motif in the VDR binding sites forthe basal (unstimulated) state in LCLs (14) indicate that acalcitriol-activated LCLsrsquo study at a similar scale to those em-ploying unstimulated LCLs (28) will be required to fully resolvethis issue

Our most important results derived from a conservative ana-lysis of a stringent set of replicated binding variants (Fig 4B)From this we report a significant excess of VDR-rBVs coincidentor in strong LD with genome-wide significant GWAS tag vari-ants for six disorders including three that are autoimmune dis-orders (inflammatory bowel disease Crohnrsquos disease andrheumatoid arthritis) whilst a fourth endometriosis is fre-quently comorbid with autoimmune disorders (50) Deficiencyof serum 25(OH)D levels is associated with cardiovascular dis-ease risk factors in adults (10) while the association betweenvitamin D levels and coronary artery disease as for many dis-orders is debated (51) These results associate a molecular phe-nomenon the genetic disruption of nuclear receptor bindingwith a narrow set of immune-related syndromes and diseases

Based on the combined functional evolutionary and GWAS-based evidence we propose that VDR-BVs represent good tar-gets for subsequent experimental validation using for examplegenome editing in cells It is also expected that only a small mi-nority of genes bound by VDR will be altered in expression Alarge-scale expression QTL study in calcitriol-activated LCLswill thus be required to further narrow down the set of candi-date disease genes whose regulatory elements are variablybound by VDR and which are differentially expressed across thehuman population

We have provided the first quantitative association-basedanalysis to explain the genetic effect on binding affinity vari-ation for a nuclear receptor and to propose a mechanisticmodel for disease susceptibility Our results support the hy-pothesis that DNA variants altering transcription factor bindingat enhancers contribute to complex disease aetiology and sug-gest that altered VDR binding and by inference variable vitaminD levels explain in part altered autoimmune and other com-plex disease risk

Materials and MethodsVDR ChIP-exo sample preparation

Calcitriol stimulated lymphoblastoid cell lines (LCLs) weregrown and prepared for ChIP as described previously with somemodifications (52) and adapted for ChIP-exo (1516) LCLs areknown to be a good model of primary B-cells (53) and we choselines from HapMap a unique resource whose genomes have al-ready been sequenced Briefly cells were incubated in phenolred free RPMI-1640 10 charcoal stripped FBS 2mM glutamine-L penicillin with streptomycin solution (100 Umlthorn 100 microGml) medium at 37C and 5 CO2 Cells were harvested afterstimulation for 36 hours with 01 mM calcitriol (Sigma) and cross-linked using a 1 formaldehyde buffer for 15 minutes at roomtemperature and quenched with 0125 M glycine Cells werelysed and chromatin sheared by sonication into fragments of200ndash1000 bp VDR-bound genomic DNA regions were isolatedusing a rabbit polyclonal antibody against VDR (Santa CruzBiotechnology sc-1008) Immunoprecipitated chromatin was

2172 | Human Molecular Genetics 2017 Vol 26 No 11

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 11: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

then processed enzymatically on magnetic beads Sampleswere polished A-tailed ligated to sequencing library adaptorsand then digested with lambda exonuclease to remove nucleo-tides from 5rsquo ends of double stranded DNA Single-strandedDNA was eluted and converted to double-stranded DNA by pri-mer annealing and extension A second sequencing adaptorwas ligated to exonuclease treated ends PCR amplified gel puri-fied and sequenced Libraries were prepared for Illumina asdescribed in (16)

Data processing and peak calling

Full data processing steps are presented in the SupplementaryMaterial Briefly we mapped the ChIP-exo sequence reads(Peconics Inc USA single-end 40bp) against the hg19 build ofthe human reference genome (54) using BWA (v 074 (55)) fol-lowed by Stampy (v 1021 (56)) We filtered the reads to retainonly uniquely mapping reads with MAPQ gt 20 and removedreads mapping to empirical blacklist regions identified by theENCODE consortium (24)

For peak calling we first computed strand cross-correlationprofiles (57) of read start densities to obtain consensus esti-mates of the ChIP-exo digested fragment sizes which we cor-rected for the presence of phantom peaks (58) using amappability-corrected approach to cross-correlation inference(MaSC (59)) We used the cross-correlation based estimates for thedigested fragment size as a MACS2 (version 2010 (60)) input par-ameter to obtain peak calls Separately to increase sensitivity wecalled peaks using GPSGEM (v 241 (61)) with recommendedChIP-exo options ndashsmooth 3 ndashmrc 20 We then pooled for eachsample the two sets of peak calls (Supplementary Material)Finally we used DiffBind (62) to manipulate the per-samplemerged peaksets and to obtain consensus peakset (CPo3 CPo10CPo20) based on an overlap threshold To do this we normalisedthe read numbers using DiffBindrsquos embedded EdgeR routinesbased on the Trimmed Means of Mrsquos (TMM) algorithm (63)

Interval overlap enrichment analysis

We used the Genomic Association Tester (64) to perform therandomization-based interval enrichment analyses All ana-lyses were based on 10000 randomisations over the chosenbackground and backgrounds differed depending on the spe-cific analysis performed All analyses accounted for genome-wide patterns of GC content variability and where relevant foruniquely mappable regions (40bp reads) For the enrichment ana-lysis of VDR-BVs in GWAS intervals and QTLs we used an in-house bootstrapping pipeline to perform LD-filtering (to retainonly LD-independent foreground and background variants) andDAF matching Further details on annotations background andanalytical design are provided in the Supplementary Material

Motif discovery

Full motif analysis steps are documented in the SupplementaryMaterial Briefly we carried out de novo motif analysis usingmultiple parallel approaches MEME-ChIP (65) from the MEMEsuite (66) XXmotif (67) and PScanChIP (20) from the WeederMoDtools suite (68) For the MEME-ChIP and XXmotif analyseswe used the peak intervals in CPo10 and extracted the sequenceunderlying each interval 6 150bp from a repeat-masked versionof the hg19 reference (54) For the PScanChIP analysis we usedas the input the full set of CPo3 binding regions with Jaspar PWM

descriptors to which we added the best heterodimericRXRVDR PWM found by XXmotif and the best monomeric VDRPWM found by DREME (MEME-ChIP package) The global back-ground chosen for the analysis consisted of built-in PScanChIPDNase I digital genomic footprinting data for the LCL CEPH indi-vidual NA12865 For the motif analysis at the 43332 VDR-BVswe used PScanChIP (20) with Jaspar PWM descriptors and amixed background of the promoters and LCL DNase cut sites

Differential binding analyses

We performed both allele-specific (VDR-ASB) and regression ongenotype (VDR-QTL) differential binding analyses Both meth-ods start from analyses of read counts For the VDR-QTL ana-lysis reads were normalised and covariant-corrected asdetailed in the Supplementary material

For the VDR-ASB tests we utilised the sequence compositionof ChIP-exo sequence reads overlapping heterozygous SNPs todetermine the sequences originating from each allele separately(236970) and to identify allele-specific binding events showinga significant difference in the number of mapped reads betweenparental alleles We employed a modified version of theAlleleSeq pipeline (23) to carry out the VDR-ASB analysisBriefly we tested for significant allelic imbalance among allread pile-ups intersecting a variant showing heterozygousgenotype given a null hypothesis of 50 paternal versus 50maternal reads We followed (23) in controlling for two majorsources of bias typically encountered when running an allele-specific analysis of short read data a bias due to mapping to thereference hg19 genome (71) and a bias due to unannotated copynumber variants skewing the read counts for some of the SNPsbeing tested (72) In addition we corrected for a third source ofpotential erroneous ASB SNP calls by removing any significantallelic imbalance hypothesis falling under repeat regions con-tained in the ENCODE blacklist data (24)

For the VDR-QTL tests we analysed all 27 LCL samples sim-ultaneously We grouped the samples based on the genotypesof the underlying variant and carried out QTL association test-ing between the variantrsquos genotype (under the assumption of anadditive genetic model (73)) and the phenotype at that location(the VDR binding affinity based on the quantile-normalisedChIP-exo peak size at that location) We used an in-house pipe-line based on SNPs and indels imputed with IMPUTE2 (74) andBayesian regression modelling based on SNPTEST (21) andBIMBAM (22) Bayesian methods for analysing SNP associationsare now an established tool in the GWAS analysis (75ndash77) andshow advantages over the use of p-values in power and the in-terpretation (73) ) though they require tighter initial modellingassumptions when compared to frequentist methods For de-tails on the underlying model and priors we refer the reader tothe Supplementary document

Variant annotation

We annotated all variants associated with differences in VDRbinding and obtained by pooling the VDR-QTL and the VDR-ASBvariants (VDR-Binding Variants VDR-BVs) according to theguidelines in (25) and using a customised version of theFunseq2 suite of algorithms (26) We added additional VDR-centric information to Funseq2 VDR CPo3 binding regions motifintervals from the PScanChIP analysis for the VDRRXR JasparDR3 motif and for the DREME VDR monomer and VDR dimermonomer PWM information Gene annotation used Funseq2rsquos

2173Human Molecular Genetics 2017 Vol 26 No 11 |

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 12: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

default (Gencode v 16) and the HOT annotation in Funseq wasfiltered to only retain LCL HOT information We ran Funsequsing the -m 2 -nc options so that all annotation referred to theancestral allele of the variant

Availability of Data and MaterialSequencing data from this study have been submitted to theGene Expression Omnibus archive (GEO httpwwwncbinlmnihgovgeo accession number GSE73254)

Supplementary MaterialSupplementary Material is available at HMG online

Acknowledgements

We thank Mark Gerstein Jieming Chen Joel Rozowsky AlexejAbyzov Ekta Khurana and Yao Fu for precious help and tech-nical insight on the AlleleSeq CNVnator and Funseq2 plat-forms We also thank Matthew Stephens and Yongtao Guan forhelp and insight on the PIMASS and BIMBAM platforms Wethank Gosia Trynka and Kamil Slowikowksi for providing r2-based LD block data Yuchun Guo and David Gifford for precioushelp and technical insight on the GEM peak caller GertonLunter for help on the STAMPY platform Andreas Heger forhelp and technical insight on the GAT software and genomicrandomization testing Rory Stark and Simon Anders for insighton differential analysis of ChIP-seq data Laura Clarke for helpwith the 1000 Genomes data and Daniel Gaffney for providinghelpful pointers regarding the GEUVADIS QTLs calls We thankLi Shen and Ning-Yi Shao (Icahn School of Medicine at MountSinai) for insight into NGSplot utilisation and Oscar BedoyaReina and Christoffer Nellaker for helpful discussions

Conflict of Interest statement None declared

FundingMedical Research Council [to GG AJBT WH CPP MC_UU_120081 MC_UU_120211 MC_EX_G1000902] Wellcome Trust andGenzyme [to SVR] the Multiple Sclerosis Society UK [grant num-ber 91509] and the Council for Science and Technology(CONACyT) Mexico [grant number 211990 to AJBT] Funding topay the Open Access publication charges for this article wasprovided by the Research Councils UK (RCUK) open access fund

References1 Kasowski M Kyriazopoulou-Panagiotopoulou S Grubert

F Zaugg JB Kundaje A Liu Y Boyle AP Zhang QCZakharia F Spacek DV et al (2013) Extensive variation inchromatin states across humans Science 342 750ndash752

2 McVicker G van de Geijn B Degner JF Cain CEBanovich NE Raj A Lewellen N Myrthil M Gilad Y andPritchard JK (2013) Identification of Genetic Variants ThatAffect Histone Modifications in Human Cells Science 342747ndash749

3 Kilpinen H Waszak SM Gschwind AR Raghav SKWitwicki RM Orioli A Migliavacca E Wiederkehr MGutierrez-Arcelus M Panousis NI et al (2013) Coordinatedeffects of sequence variation on DNA binding chromatinstructure and transcription Science 342 744ndash747

4 Maurano MT Humbert R Rynes E Thurman REHaugen E Wang H Reynolds AP Sandstrom R Qu HBrody J et al (2012) Systematic localization of commondisease-associated variation in regulatory DNA Science 3371190ndash1195

5 Gusev A Lee SH Trynka G Finucane H VilhjalmssonBJ Xu H Zang C Ripke S Bulik-Sullivan B Stahl Eet al (2014) Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases Am JHum Genet 95 535ndash552

6 Farh KK Marson A Zhu J Kleinewietfeld M HousleyWJ Beik S Shoresh N Whitton H Ryan RJ ShishkinAA et al (2015) Genetic and epigenetic fine mapping ofcausal autoimmune disease variants Nature 518 337ndash343

7 Heinz S Romanoski CE Benner C and Glass CK (2015)The selection and function of cell type-specific enhancersNat Rev Mol Cell Biol 16 144ndash154

8 Arner E Daub CO Vitting-Seerup K Andersson R et al(2015) Transcribed enhancers lead waves of coordinatedtranscription in transitioning mammalian cells Science 3471010ndash1014

9 Holick MF (2007) Vitamin D Deficiency N Eng JMed 357266ndash281 PMID 17634462

10 Martins D Wolf M Pan D and et al (2007) Prevalence ofcardiovascular risk factors and the serum levels of 25-hydroxyvitamin D in the United States data from the ThirdNational Health and Nutrition Examination Survey ArchIntern Med 167 1159ndash1165

11 Mokry LE Ross S Ahmad OS Forgetta V Smith GDLeong A Greenwood CMT Thanassoulis G andRichards JB (2015) Vitamin D and Risk of Multiple SclerosisA Mendelian Randomization Study PLoS Med 12 1ndash20

12 Afzal S Brondum-Jacobsen P Bojesen SE andNordestgaard BG (2014) Genetically low vitamin D concen-trations and increased mortality mendelian randomisationanalysis in three large cohorts BMJ 349

13 Kliewer SA Umesono K Mangelsdorf DJ and Evans RM(1992) Retinoid X receptor interacts with nuclear receptors inretinoic acid thyroid hormone and vitamin D3 signallingNature 355 446ndash449

14 Ramagopalan SV Heger A Berlanga AJ Maugeri NJLincoln MR Burrell A Handunnetthi L Handel AEDisanto G Orton SM et al (2010) A ChIP-seq definedgenome-wide map of vitamin D receptor bindingAssociations with disease and evolution Genome Res 201352ndash1360

15 Rhee HS and Pugh BF (2011) Comprehensive Genome-wide Protein-DNA Interactions Detected at Single-Nucleotide Resolution Cell 147 1408ndash1419

16 Rhee HS and Pugh BF (2012) ChIP-exo method for iden-tifying genomic location of DNA-binding proteins with near-single-nucleotide accuracy Curr Protoc Mol Biol Chapter21 Unit 2124

17 Yip K Cheng C Bhardwaj N Brown J Leng J KundajeA Rozowsky J Birney E Bickel P Snyder M andGerstein M (2012) Classification of human genomic regionsbased on experimentally determined binding sites of morethan 100 transcription-related factors Genome Biol 13 R48

18 Carlberg C Bendik I Wyss A Meier E SturzenbeckerLJ Grippo JF and Hunziker W (1993) Two nuclear signal-ling pathways for vitamin D Nature 361 657ndash660

19 Mathelier A Zhao X Zhang AW Parcy F Worsley-HuntR Arenillas DJ Buchman S Chen CY Chou AIenasescu H et al (2014) JASPAR 2014 an extensively

2174 | Human Molecular Genetics 2017 Vol 26 No 11

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 13: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

expanded and updated open-access database of transcrip-tion factor binding profiles Nucleic Acids Res 42 D142ndashD147

20 Zambelli F Pesole G and Pavesi G (2013) PscanChIPFinding over-represented transcription factor-binding sitemotifs and their correlations in sequences from ChIP-Seqexperiments Nucleic Acids Res 41 W535ndashW543

21 Marchini J Howie B Myers S McVean G and Donnelly P(2007) A new multipoint method for genome-wide associ-ation studies by imputation of genotypes Nat Genet 39906ndash913

22 Servin B and Stephens M (2007) Imputation-based analysisof association studies candidate regions and quantitativetraits PLoS Genet 3 e114

23 Rozowsky J Abyzov A Wang J Alves P Raha DHarmanci A Leng J Bjornson R Kong Y Kitabayashi Net al (2011) AlleleSeq analysis of allele-specific expressionand binding in a network framework Mol Sys Biol 7 522

24 Dunham I Kundaje A Aldred SF Collins PJ Davis CADoyle F Epstein CB Frietze S Harrow J Kaul R et al(2012) An integrated encyclopedia of DNA elements in thehuman genome Nature 489 57ndash74

25 Khurana E Fu Y Colonna V Mu XJ Kang HMLappalainen T Sboner A Lochovsky L Chen JHarmanci A et al (2013) Integrative annotation of variantsfrom 1092 humans application to cancer genomics Science342 1235587

26 Fu Y Liu Z Lou S Bedford J Mu X Yip KY KhuranaE and Gerstein M (2014) FunSeq2 A framework for priori-tizing noncoding regulatory variants in cancer Genome Biol15 480

27 Heinz S Romanoski CE Benner C Allison KAKaikkonen MU Orozco LD and Glass CK (2013) Effect ofnatural genetic variation on enhancer selection and func-tion Nature 503 487ndash492

28 Lappalainen T Sammeth M Friedlander MR rsquot HoenPA Monlong J Rivas MA Gonzalez-Porta M KurbatovaN Griebel T Ferreira PG et al (2013) Transcriptome andgenome sequencing uncovers functional variation inhumans Nature 501 506ndash511

29 Eicher JD Landowski C Stackhouse B Sloan A ChenW Jensen N Lien JP Leslie R and Johnson AD (2015)GRASP v20 an update on the genome-wide repository of as-sociations between SNPs and phenotypes Nucleic Acids Res43 799ndash804

30 Mokry LE Ross S Morris JA Manousaki D Forgetta Vand Richards JB (2016) Genetically decreased vitamin Dand risk of Alzheimer disease Neurology 87 2567ndash2574

31 Cusanovich DA Pavlovic B Pritchard JK and Gilad Y(2014) The functional consequences of variation in tran-scription factor binding PLoS Genet 10 e1004226

32 Spivakov M (2014) Spurious transcription factor bindingnon-functional or genetically redundant Bioessays 36798ndash806

33 Villar D Berthelot C Aldridge S Rayner TF Lukk MPignatelli M Park TJ Deaville R Erichsen JT JasinskaAJ et al (2015) Enhancer evolution across 20 mammalianspecies Cell 160 554ndash566

34 Siepel A Bejerano G Pedersen JS Hinrichs AS Hou MRosenbloom K Clawson H Spieth J Hillier LWRichards S et al (2005) Evolutionarily conserved elementsin vertebrate insect worm and yeast genomes Genome Res15 1034ndash1050

35 Jimenez-Lara AM and Aranda A (1999) The vitamin D re-ceptor binds in a transcriptionally inactive form and without

a defined polarity on a retinoic acid response elementFASEB J 13 1073ndash1081

36 Toell A Polly P and Carlberg C (2000) All natural DR3-typevitamin D response elements show a similar functionalityin vitro Biochem J 352 Pt 2 301ndash309

37 Zhang J Chalmers MJ Stayrook KR Burris LL WangY Busby SA Pascal BD Garcia-Ordonez RD BruningJB Istrate MA et al (2011) DNA binding alters coactivatorinteraction surfaces of the intact VDR-RXR complex NatStruct Mol Biol 18 556ndash563

38 Drake JA Bird C Nemesh J Thomas DJ Newton-ChehC Reymond A Excoffier L Attar H Antonarakis SEDermitzakis ET et al (2006) Conserved noncoding se-quences are selectively constrained and not mutation coldspots Nat Genet 38 223ndash227

39 Fay JC Wyckoff GJ and Wu CI (2001) Positive and nega-tive selection on the human genome Genetics 1581227ndash1234

40 Orlov I Rochel N Moras D and Klaholz BP (2012)Structure of the full human RXRVDR nuclear receptor heter-odimer complex with its DR3 target DNA Embo J 31291ndash300

41 Deplancke B Alpern D and Gardeux V (2016) TheGenetics of Transcription Factor DNA Binding Variation Cell166 538ndash554

42 Reddy TE Gertz J Pauli F Kucera KS Varley KENewberry KM Marinov GK Mortazavi A Williams BASong L et al (2012) Effects of sequence variation on differ-ential allelic transcription factor occupancy and gene ex-pression Genome Res 22 860ndash869

43 Levo M Zalckvar E Sharon E Dantas Machado ACKalma Y Lotam-Pompan M Weinberger A Yakhini ZRohs R and Segal E (2015) Unraveling determinants oftranscription factor binding outside the core binding siteGenome Res 25 1018ndash1029

44 Booth DR Ding N Parnell GP Shahijanian F CoulterS Schibeci SD Atkins AR Stewart GJ Evans RMDownes M et al (2016) Cistromic and genetic evidence thatthe vitamin D receptor mediates susceptibility to latitude-dependent autoimmune diseases Genes Immun 17213ndash219

45 Nakane M Ma J Ruan X Kroeger PE and Wu-Wong R(2007) Mechanistic analysis of VDR-mediated renin suppres-sion Nephron Physiol 107 35ndash44

46 Salehi-Tabar R Nguyen-Yamamoto L Tavera-MendozaLE Quail T Dimitrov V An BS Glass L Goltzman Dand White JH (2012) Vitamin D receptor as a master regula-tor of the c-MYCMXD1 network Proc Natl Acad Sci USA109 18827ndash18832

47 Capellino S Straub RH and Cutolo M (2014) Aromataseand regulation of the estrogen-to-androgen ratio in synovialtissue inflammation common pathway in both sexes AnnN Y Acad Sci 1317 24ndash31

48 Tuoresmaki P Vaisanen S Neme A Heikkinen S andCarlberg C (2014) Patterns of genome-wide VDR locationsPLoS One 9 e96105

49 Huang S Laoukili J Epping MT Koster J Holzel MWesterman BA Nijkamp W Hata A Asgharzadeh SSeeger RC et al (2009) ZNF423 Is Critically required for ret-inoic acid-induced differentiation and is a marker of neuro-blastoma outcome Cancer Cell 15 328ndash340

50 Matarese G De Placido G Nikas Y and Alviggi C (2003)Pathogenesis of endometriosis natural immunity dysfunc-tion or autoimmune disease Trends Mol Med 9 223ndash228

2175Human Molecular Genetics 2017 Vol 26 No 11 |

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11

Page 14: Edinburgh Research Explorer...equilibrium (LD). True causal variants will be enriched in cis-regulatory elements, especially in enhancers and, to a lesser extent, in gene promoters

51 Kunadian V Ford GA Bawamia B Qiu W and MansonJE (2014) Vitamin D deficiency and coronary artery diseaseA review of the evidence Am Heart J 167 283ndash291

52 Ramagopalan SV Maugeri NJ Handunnetthi L LincolnMR Orton SM Dyment DA Deluca GC Herrera BMChao MJ Sadovnick AD et al (2009) Expression of themultiple sclerosis-associated MHC class II Allele HLA-DRB11501 is regulated by vitamin D PLoS Genet 5e1000369

53 Caliskan M Cusanovich DA Ober C and Gilad Y (2011)The effects of EBV transformation on gene expression levelsand methylation profiles Hum Mol Genet 20 1643ndash1652

54 Kuhn RM Haussler D and Kent WJ (2013) The UCSC gen-ome browser and associated tools Brief Bioinformatics 14144ndash161

55 Li H and Durbin R (2009) Fast and accurate short readalignment with Burrows-Wheeler transform Bioinformatics25 1754ndash1760

56 Lunter G and Goodson M (2011) Stampy a statistical algo-rithm for sensitive and fast mapping of Illumina sequencereads Genome Res 21 936ndash939

57 Kharchenko PV Tolstorukov MY and Park PJ (2008)Design and analysis of ChIP-seq experiments for DNA-binding proteins Nat Biotechnol 26 1351ndash1359

58 Kundaje A Jung LY Kharchenko P Wold B Sidow ABatzoglou S and Park P (2013) Assessment of ChIP-seqdata quality using cross-correlation analysis

59 Ramachandran P Palidwor GA Porter CJ and PerkinsTJ (2013) MaSC mappability-sensitive cross-correlation forestimating mean fragment length of single-end short-readsequencing data Bioinformatics 29 444ndash450

60 Feng J Liu T Qin B Zhang Y and Liu XS (2012)Identifying ChIP-seq enrichment using MACS Nat Protoc 71728ndash1740

61 Guo Y Mahony S and Gifford DK (2012) High ResolutionGenome Wide Binding Event Finding and Motif DiscoveryReveals Transcription Factor Spatial Binding ConstraintsPLoS Comput Biol 8 e1002638

62 Stark R and Brown G (2011) DiffBind differential bindinganalysis of ChIP-Seq peak data R package

63 Robinson M and Oshlack A (2010) A scaling normalizationmethod for differential expression analysis of RNA-seq dataGenome Biol 11 R25

64 Heger A Webber C Goodson M Ponting CP and LunterG (2013) GAT a simulation framework for testing the associ-ation of genomic intervals Bioinformatics 29 2046ndash2048

65 Ma W Noble WS and Bailey TL (2014) Motif-based ana-lysis of large nucleotide data sets using MEME-ChIP NatProtoc 9 1428ndash1450

66 Bailey TL Boden M Buske FA Frith M Grant CEClementi L Ren J Li WW and Noble WS (2009) MEMESUITE tools for motif discovery and searching Nucleic AcidsRes 37 W202ndashW208

67 Hartmann H Guthohrlein EW Siebert M Luehr S andSoding J (2013) P-value-based regulatory motif discoveryusing positional weight matrices Genome Res 23 181ndash194

68 Pavesi G Mereghetti P Zambelli F Stefani M Mauri Gand Pesole G (2006) MoD Tools regulatory motif discoveryin nucleotide sequences from co-regulated or homologousgenes Nucleic Acids Res 34 W566ndashW570

69 McDaniell R Lee BK Song L Liu Z Boyle AP ErdosMR Scott LJ Morken MA Kucera KS Battenhouse Aet al (2010) Heritable Individual-Specific and Allele-SpecificChromatin Signatures in Humans Science 328 235ndash239

70 Pickrell JK Marioni JC Pai AA Degner JF et al (2010)Understanding mechanisms underlying human gene ex-pression variation with RNA sequencing Nature 464768ndash772

71 Degner JF Marioni JC Pai AA Pickrell JK Nkadori EGilad Y and Pritchard JK (2009) Effect of read-mappingbiases on detecting allele-specific expression from RNA-sequencing data Bioinformatics 25 3207ndash3212

72 Pickrell JK Gaffney DJ Gilad Y and Pritchard JK (2011)False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copynumber regions Bioinformatics 27 2144ndash2146

73 Stephens M and Balding DJ (2009) Bayesian statisticalmethods for genetic association studies Nat Rev Genet 10681ndash690

74 Howie BN Donnelly P and Marchini J (2009) A flexibleand accurate genotype imputation method for the next gen-eration of genome-wide association studies PLoS Genet 5e1000529

75 Scott LJ Mohlke KL Bonnycastle LL Willer CJ et al(2007) A genome-wide association study of type 2 diabetes inFinns detects multiple susceptibility variants Science 3161341ndash1345

76 Burton PR Clayton DG Cardon LR Craddock N et al(2007) Genome-wide association study of 14000 cases ofseven common diseases and 3000 shared controls Nature447 661ndash678

77 Wakefield J (2009) Bayes factors for genome-wide associ-ation studies comparison with P-values Genet Epidemiol33 79ndash86

78 Anders S McCarthy DJ Chen Y Okoniewski M SmythGK Huber W and Robinson MD (2013) Count-based dif-ferential expression analysis of RNA sequencing data usingR and Bioconductor Nat Protoc 8 1765ndash1786

79 Ernst J Kheradpour P Mikkelsen TS Shoresh N WardLD Epstein CB Zhang X Wang L Issner R Coyne Met al (2011) Mapping and analysis of chromatin state dy-namics in nine human cell types Nature 473 43ndash49

80 Bailey TL and Machanick P (2012) Inferring direct DNAbinding from ChIP-seq Nucleic Acids Res 40 e128

81 McLean CY Bristor D Hiller M Clarke SL Schaar BTLowe CB Wenger AM and Bejerano G (2010) GREAT im-proves functional interpretation of cis-regulatory regionsNat Biotechnol 28 495ndash501

82 Touzet H and Varre JS (2007) Efficient and accurate P-valuecomputation for position weight matrices Algorithms MolBiol 2 15

2176 | Human Molecular Genetics 2017 Vol 26 No 11