9
Screening reveals conserved and nonconserved transcriptional regulatory elements including an E3/E4 allele-dependent APOE coding region enhancer Hsuan Pu Chen a,1 , Andy Lin b , Joshua S. Bloom c,2 , Arshad H. Khan c , Christopher C. Park c , Desmond J. Smith c,d, a Department of Biomedical Engineering, University of California, Los Angeles, CA 90095, USA b Department of Psychology, University of California, Los Angeles, CA 90095, USA c Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USA d Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, 23-120 CHS, Box 951735, University of California, Los Angeles, CA 90095-1735, USA abstract article info Article history: Received 31 January 2008 Accepted 24 July 2008 Available online 3 September 2008 Keywords: APOE Bioinformatics Enhancer Luciferase Nucleic acid regulatory sequences Shotgun cloning Silencer SNPs Transcription We performed an unbiased experimental search for enhancers and silencers in a 153-kb region containing the human apolipoprotein (APO) E/C1/C4/C2 gene cluster using shotgun cloning into a luciferase vector. A continuum of transcriptional effect sizes was observed, possibly explaining the limited success of bioinformatics in identifying regulatory regions. We identied nine statistically signicant enhancers and ve silencers functional in either liver or astrocyte cells, including two previously known enhancers. Only two of the fourteen elements contained conserved noncoding sequences. Within the coding sequence of the APOE gene we identied an enhancer for the E4 allele associated with Alzheimer's disease, but not E3. The single nucleotide polymorphism (SNP) causing the E4/E3 amino acid substitution was responsible for these variations, potentially explaining the higher expression levels of E4. Our results suggest a wider variety of mammalian transcriptional regulatory sequences than is currently recognized and that these may include coding region SNPs. © 2008 Elsevier Inc. All rights reserved. An essential task in deciphering the human genome sequence will be to compile a qdictionaryq of sequences responsible for regulating gene expression [1]. Furthermore, disruption of normal regulation has been linked to a number of diseases including hemophilia, anemia, epilepsy, and a variety of cancers [2]. However, our knowledge of transcriptional regulatory elements is scarce. Enhancers and silencers can reside far from the core promoter of a gene and be either upstream or downstream or in intergenic or intronic regions [3]. Known regulatory sequences are often enriched in transcription factor binding sites, for activators in the case of enhancers or repressors in the case of silencers [4,5]. Enhancers and silencers are usually orientation-independent. Bioinformatic approaches have had limited success in identifying regulatory elements, with many false predictions. These algorithms frequently assume that regulatory sequences are likely to be conserved in evolution [6,7]. Complicating this approach is the fact that regulatory sequences are often short and highly degenerate. Nevertheless, one recent study directly assayed highly conserved noncoding sequences in transgenic mice and showed enrichment in tissue-specic regulatory sequences [8]. Recently, the notion that all regulatory sequences are evolutiona- rily conserved has been challenged by a study from the ENCODE consortium [9]. This large project examined 1% of the human genome for functional elements, using a number of approaches including transcription factor binding, detailed transcription start site mapping, and identication of DNase I hypersensitive sites. Up to 50% of the identied functional elements were nonconserved, a phenomenon possibly endowing the genome with an evolutionary stockpileof reserve elements. Direct high-throughput analysis of transcriptional regulatory elements by transfecting luciferase reporter genes into mammalian cells has also been employed [10]. The promoters of 613 genes were examined, consisting of 1,000bp upstream of the transcriptional start site. Interestingly, there was a signicant relationship between the activity prole of the isolated promoters in various tissue culture lines and the endogenous gene expression proles. Here we evaluate in a relatively unbiased fashion the transcrip- tional properties of a 153kb region of human chromosome 19. We shotgun cloned a BAC harboring the APOE/C1/C4/C2 gene cluster into a luciferase vector and quantitatively evaluated the resulting fragments for regulatory activity in liver and astrocyte cells. Overall, our data Genomics 92 (2008) 292300 Corresponding author. Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, 23-120 CHS, Box 951735, University of California, Los Angeles, CA 90095-1735, USA. Fax: +1310 825 6267. E-mail address: [email protected] (D.J. Smith). 1 Current address: Department of Neurology, University of California, San Francisco, CA 94143, USA. 2 Current address: Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA. 0888-7543/$ see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2008.07.009 Contents lists available at ScienceDirect Genomics journal homepage: www.elsevier.com/locate/ygeno

Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

Genomics 92 (2008) 292–300

Contents lists available at ScienceDirect

Genomics

j ourna l homepage: www.e lsev ie r.com/ locate /ygeno

Screening reveals conserved and nonconserved transcriptional regulatory elementsincluding an E3/E4 allele-dependent APOE coding region enhancer

Hsuan Pu Chen a,1, Andy Lin b, Joshua S. Bloom c,2, Arshad H. Khan c,Christopher C. Park c, Desmond J. Smith c,d,⁎a Department of Biomedical Engineering, University of California, Los Angeles, CA 90095, USAb Department of Psychology, University of California, Los Angeles, CA 90095, USAc Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA 90095, USAd Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, 23-120 CHS, Box 951735, University of California, Los Angeles, CA 90095-1735, USA

⁎ Corresponding author. Department of Molecular andGeffen School of Medicine, 23-120 CHS, Box 951735Angeles, CA 90095-1735, USA. Fax: +1 310 825 6267.

E-mail address: [email protected] (D.J. Smith1 Current address: Department of Neurology, Univers

CA 94143, USA.2 Current address: Department of Molecular Biology, P

NJ 08544, USA.

0888-7543/$ – see front matter © 2008 Elsevier Inc. Aldoi:10.1016/j.ygeno.2008.07.009

a b s t r a c t

a r t i c l e i n f o

Article history:

We performed an unbiased Received 31 January 2008Accepted 24 July 2008Available online 3 September 2008

Keywords:APOEBioinformaticsEnhancerLuciferaseNucleic acid regulatory sequencesShotgun cloningSilencerSNPsTranscription

experimental search for enhancers and silencers in a 153-kb region containingthe human apolipoprotein (APO) E/C1/C4/C2 gene cluster using shotgun cloning into a luciferase vector. Acontinuum of transcriptional effect sizes was observed, possibly explaining the limited success ofbioinformatics in identifying regulatory regions. We identified nine statistically significant enhancers andfive silencers functional in either liver or astrocyte cells, including two previously known enhancers. Onlytwo of the fourteen elements contained conserved noncoding sequences. Within the coding sequence of theAPOE gene we identified an enhancer for the E4 allele associated with Alzheimer's disease, but not E3. Thesingle nucleotide polymorphism (SNP) causing the E4/E3 amino acid substitution was responsible for thesevariations, potentially explaining the higher expression levels of E4. Our results suggest a wider variety ofmammalian transcriptional regulatory sequences than is currently recognized and that these may includecoding region SNPs.

© 2008 Elsevier Inc. All rights reserved.

Anessential task in deciphering the humangenome sequencewill be

to compile a qdictionaryq of sequences responsible for regulating geneexpression [1]. Furthermore, disruption of normal regulation has beenlinked to a number of diseases including hemophilia, anemia, epilepsy,and a variety of cancers [2]. However, our knowledge of transcriptionalregulatory elements is scarce. Enhancers and silencers can reside farfrom the core promoterof a gene andbe either upstreamordownstreamor in intergenic or intronic regions [3]. Known regulatory sequences areoften enriched in transcription factor binding sites, for activators in thecase of enhancers or repressors in the case of silencers [4,5]. Enhancersand silencers are usually orientation-independent.

Bioinformatic approaches have had limited success in identifyingregulatory elements, with many false predictions. These algorithmsfrequently assume that regulatory sequences are likely to beconserved in evolution [6,7]. Complicating this approach is the factthat regulatory sequences are often short and highly degenerate.

Medical Pharmacology, David, University of California, Los

).ity of California, San Francisco,

rinceton University, Princeton,

l rights reserved.

Nevertheless, one recent study directly assayed highly conservednoncoding sequences in transgenic mice and showed enrichment intissue-specific regulatory sequences [8].

Recently, the notion that all regulatory sequences are evolutiona-rily conserved has been challenged by a study from the ENCODEconsortium [9]. This large project examined 1% of the human genomefor functional elements, using a number of approaches includingtranscription factor binding, detailed transcription start site mapping,and identification of DNase I hypersensitive sites. Up to 50% of theidentified functional elements were nonconserved, a phenomenonpossibly endowing the genome with an evolutionary “stockpile” ofreserve elements.

Direct high-throughput analysis of transcriptional regulatoryelements by transfecting luciferase reporter genes into mammaliancells has also been employed [10]. The promoters of 613 genes wereexamined, consisting of 1,000bp upstream of the transcriptional startsite. Interestingly, there was a significant relationship between theactivity profile of the isolated promoters in various tissue culture linesand the endogenous gene expression profiles.

Here we evaluate in a relatively unbiased fashion the transcrip-tional properties of a 153kb region of human chromosome 19. Weshotgun cloned a BAC harboring the APOE/C1/C4/C2 gene cluster into aluciferase vector and quantitatively evaluated the resulting fragmentsfor regulatory activity in liver and astrocyte cells. Overall, our data

Page 2: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

293H.P. Chen et al. / Genomics 92 (2008) 292–300

indicated that traditional principles guiding the discovery of regula-tory elements by sequence comparison are too constrained, encoura-ging the use of more inclusive discovery strategies.

Results

Shotgun cloning, transfection, and luciferase assays

A 153-kb BAC from human chromosome 19 was selected foranalysis. The BAC contains seven known genes: PVRL2, TOMM40,APOE, APOC1, APOC4, APOC2, and CLPTM1. The APOE/C1/C4/C2 genecluster and the APOE gene had previously been analyzed for enhanceractivity by deletion mapping. Two liver enhancers, HCR1 [11] andHCR2 [12], associated with the APOE/C1/C4/C2 gene cluster wereidentified, as well as two astrocyte enhancers, ME1 and ME2 [13],associated with the APOE gene. Another enhancer linked with theAPOE gene, BCR, is active in neurons and microglia, but not the cellstested here, liver and astroglia [14].

The BAC was digested independently by AluI or Sau3AI andsubcloned into the multiple cloning site of the pGL3-promoter vector.This vector contains the basal SV40 early promoter upstream of thefirefly luciferase gene. Fragments cloned into the multiple cloning siteupstream of the SV40 promoter can act as enhancers to increase fireflyluciferase transcription. A total of 1,798 clones were obtained, 1,049from AluI and 749 from Sau3AI, with an average size of ∼230bp. Thecoverage of the BAC was 1.2-fold for AluI and 1.4-fold for Sau3AI,giving a total coverage of 2.6-fold.

All members of the pGL3 shotgun library were individuallylipofected into the human liver cell line C3A in a 96-well plate format.Nearly the entire library (N99%) was also transfected into the humanastrocyte glial cell line SVGP12. All clones were co transfected withphRL-TK, which contains the Renilla luciferase gene driven by theubiquitously expressed HSV-TK promoter. The firefly and Renillaluciferases employ different substrates and can be assayed indepen-dently. The Renilla luciferase acts as an internal control for transfectionefficiency and results are expressed as the log2 ratio of firefly to Renillaluciferase activities. As judged using a CMV-GFP construct (pEGFP-N3,Clontech), the transfection efficiency in the 96-well plate format wasuniform at ∼7% for the C3A liver cells and ∼25% for the SVGP12 glialcells.

Mammalian DNA sequences show a continuum of transcriptionalregulatory properties

The BeadArray package in R was used to quantile normalize therelative luciferase activities across the experimental plates in the C3A(Fig. 1A) and SVGP12 (Fig. 1B) cell lines. The control pGL3-basic vectorwith no promoter or enhancer had low activity, as expected. ThepGL3-promoter vector containing an SV40 promoter but no enhancerwas found to have a medium amount of luciferase activity. A vectorcontaining both the promoter and the known enhancer element HCR1had very high activity.

The distribution of raw relative activity measures for all fragmentsin liver C3A cells appeared nearly normal, with a negative tailpotentially representing silencers (Figs. 1C and 1D). To evaluatewhether the apparent normality of the distribution was solelyattributable to experimental noise or whether clone identity madean appreciable contribution, we modeled the relative luciferaseactivity measure under a linear mixed effects model. We treated thebaseline activity as a fixed effect and the fragment effect as random.We chose 20 clones using a random number generator and assayedtheir luciferase activity in liver cells, using 10 replicates for each clone.One clone was subsequently found to be E. coli DNA and its resultsexcluded. The model was thus fit to activity data from 19 clones.

A likelihood ratio test revealed that the model including a cloneeffect better fitted the data than one in which all variability was

treated as noise (pb0.001). The clone effect accounted for approxi-mately 40% of the variance, with experimental noise contributing thebalance. Thus the clone effect was substantial, although noisecontributed most to shaping the distribution of luciferase activities.Fig. 1E shows that the distribution of the estimated activity means ofthe 19 random fragments follows closely the distribution of singlemeasures of activity of all fragments (Fig. 1D), reinforcing the idea thatfragment effects make a significant contribution to the variance.

The observation of an apparently continuous effect size raised thepossibility that there may be no categorical distinction betweennonregulatory and regulatory sequences, particularly enhancers. Weexplored this issue using Gaussian mixture modeling of 10,000bootstrap samples of the raw relative activity measures of allfragments in liver (Fig. 1C). The distributionwas most likely composedof two modes (p=0.957), corresponding to silencers and nonsilencers.This is consistentwith the negative tail of the bulk distribution (Fig.1D)and implies difficulty in separating enhancers from nonregulatorysequences. Three modes consisting of silencer, nonregulatory, andenhancer sequences had p=0.043, while a unimodal distribution wasnot observed; i.e., pb0.0001.

We used mixture modeling to investigate the role of experimentalnoise and sample size in the low power to detect an enhancer mode.Separate enhancer and nonenhancer populations were modeled bytwo-component data simulated using empirically derived estimates ofthe means and variances from multiple replication experiments (Fig.1F and below). Experimental variance was modeled either by addingnoise corresponding to our experiments—60% of the overall variance—or by neglecting it. We also used either our actual sample sizes orsimulated samples tenfold larger.

Noise at a level found in our data completely obscured anydistinction between enhancers and nonenhancers, regardless ofsample size. Even without noise, the two components could not beseparated most of the time at empirical sample sizes, unlessenhancers were on average at least as strong as the HCR1 control.Increasing the sample size tenfold resulted in reliable multimodaldetection, but curiously resulted in three components on most trials.

Thus, even with much larger sample sizes and no experimentalnoise, it appears difficult to separate enhancers and nonregulatorysequences, while silencers seem to be somewhat more easilydistinguished. These observations may help explain the limitedsuccess of bioinformatic strategies in recognizing transcriptionalcontrol regions.

Identifying regulatory elements

Clones whose values exceeded a threshold of two standarddeviations above and below the mean after normalization werechosen for a second round of screening, in which their activitieswere assayed in duplicate. Fig. 1F summarizes the screeningprocedure. Clones whose activities were significantly differentupon duplicate testing (t test, df 2, pb0.05) from the pGL3-promoter-only control were end-sequenced and subjected to afinal round of screening. Through end-sequencing, we eliminatedall fragments containing either multiple inserts or E. coli DNA, asjudged using BLAST (NCBI) [15]. For the remaining clones, the finalscreen consisted of 10 replicate assays. Clones were eliminated thatwere not significantly different from the pGL3-promoter control asjudged using t-tests (df 18, pb0.05) with a Bonferroni correction formultiple hypothesis testing.

Both conserved and nonconserved sequences constituteregulatory elements

The genomic locations of the identified regulatory sequences thatpassed the final screen are shown in Fig. 2A. Of the fragments active inliver cells, fivewere intronic, one was exonic, and four were intergenic

Page 3: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

Fig.1.Distributions and workflow. (A)Quantile normalization of relative luciferase activity for all fragments in liver C3A cells comparedwithin and between plates. Relative luciferaseactivity is the log2 ratio of firefly luciferase to Renilla luciferase. Batch number indicates corresponding 96-well plate. (B) Quantile normalization in astroglial SVGP12 cells. (C)Distribution of raw relative luciferase activities for all fragments tested in liver C3A cells. The distribution appears unimodal and nearly normal. (D) Q–Q normal plot of all fragmentsin C3A cells. The majority of points lie within a normal distribution (line). A negative tail is present. (E) Q–Q normal plot of mean activity of 20 randomly chosen fragments. A normaldistribution with a negative tail is replicated. (F) Workflow for identifying regulatory elements.

294 H.P. Chen et al. / Genomics 92 (2008) 292–300

(Fig. 2A). Six of the liver elements were enhancers and four weresilencers (Fig. 2B). For the astroglial cells, three fragments wereintergenic and one was intronic (Fig. 2A). Of the glial elements, threewere enhancers and one was a silencer (Fig. 2C).

Two liver [11,12] and two astroglial [13] enhancers had previouslybeen identified in the BAC. Of these four elements, we recovered two.The HCR2 liver enhancer associated with the APOE/C1/C4/C2 genecluster was recovered in clone H6. The ME2 astroglial enhancer

Page 4: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

Fig. 2. Putative regulatory elements. (A) Enhancers and silencers. Expression values for each gene in brain and liver deduced from SymAtlas [38]. (B)Mean activities of 10 replicates ofselected fragments in liver cells. Fold change relative to promoter-only construct shown. Error bars, standard error of the mean. (⁎, pb0.0001 compared to promoter-only construct,all figures.) (C) Selected fragments in astroglial cells. (D) Conserved and repetitive sequence within regulatory elements.

295H.P. Chen et al. / Genomics 92 (2008) 292–300

Page 5: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

Fig. 3. Regulatory effects of evolutionarily conserved regions and Alu repeat elements. (A) Fragment H2 has two regions of conservation among vertebrates. The 45-bp region isresponsible for enhancer activity. (B) A portion of H1 contains an Alu repeat element. Only full-length H1 in the forward orientation shows silencer activity.

296 H.P. Chen et al. / Genomics 92 (2008) 292–300

associated with the APOE gene was recovered in clone G4. Thisrediscovery rate suggests that the search for regulatory sequences inthe BAC was reasonably comprehensive, consistent with the 2.6-foldclone coverage. Each of the enhancers and silencers we identifiedcould be closely associated with one of the genes in the BAC. Forexample, two liver enhancers and two liver silencers were foundwithin the CLPTM1 gene.

RepeatMasker [16] and phastCons [17] were employed to findrepetitive sequences and conserved regions, respectively, in thediscovered regulatory sequences. The locations of these elementsare shown in Fig. 2D. Of the 14 regulatory elements we identified, 6harbored repetitive sequences for greater than 50% of their lengths.These elements acted as both enhancers and silencers in liver andastroglial cells. There are precedents for repetitive sequences playing arole in mammalian gene regulation [18]. For example, Alu elementshave been shown to act as positive and negative regulatory elements

in the γ chain gene of the high-affinity IgE receptor [19]. In anotherexample, a silencer of the Wilms' tumor 1 gene has been shown to bedependent on an Alu element for activity [20].

Of the single-copy noncoding DNA present in all 14 regulatoryelements, only 10.9% distributed across three blocks had significantconservation among vertebrates, as judged by phastCons [16]. Two ofthe conserved blocks (70bp and 45bp) were found in the liverenhancer H2 and the third (63bp) was found in the liver enhancer H6.The low proportion of conserved elements suggests that bioinformaticapproaches relying on sequence conservation have limited power todetect control elements, consistent with our observation that thereare relatively small effect differences between regulatory andnonregulatory sequences.

We performed additional experiments to narrow down theparticular sequences responsible for enhancer or silencer activity.Fragment H2 (396bp), a liver enhancer, contained two regions that are

Page 6: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

Fig. 4. Exon 4 coding region of APOE shows differential regulatory activity between E3 and E4 alleles. (A) Region of APOE exon4 in fragment H3. APOE E3 and E4 differ by a onenucleotide missense mutation. (B) APOE4 allele in the H3 fragment shows orientation-independent enhancer activity in liver C3A cells, (C) astroglial SVGP12 cells, and (D) astroglialDBTRG cells. The E3 allele shows no enhancer activity in any cell line.

297H.P. Chen et al. / Genomics 92 (2008) 292–300

highly conserved among vertebrates, as identified by significantphastCons conservation LOD scores surpassing a threshold of 20 (Fig.3A, left). The resected 70-bp conserved region showed relativeluciferase activity comparable to the promoter-only control (Fig. 3A,right). The resected 45-bp conserved region and the full-lengthfragment showed similar luciferase activities that were both sig-nificantly greater than promoter control (pb0.0001).

Fragment H1 (157bp), a liver silencer, contained a 45-bp single-copy region and a 112-bp piece of a repetitive Alu element (Fig. 3B,left). This silencer fragment is in the first intron of the PVRL2 gene andwas cloned in the same orientation relative to the SV40 promoter ofpGL3, though upstream rather than downstream. Neither the 112-bpregion nor the 45-bp region of H1 by itself recapitulated the silencereffect of this element. Only the full H1 fragment in the forwardorientation and not the reverse significantly suppressed luciferaseactivity compared to the promoter-alone control (Fig. 3B, right),suggesting that the silencer effect of H1 is an orientation-dependentproperty of the entire fragment.

An E3/E4 allele-dependent coding region enhancer in APOE

Fragment H3 (108bp) was contained entirely within the exon 4coding region of APOE and exhibited enhancer activity in liver cells

(Figs. 2A and 2D). The H3 fragment represents the E4 allele of APOE,the allele on the parental BAC (Fig. 4A). APOE is ubiquitouslytranscribed, and the E4 allele is associated with a higher level ofAPOE transcripts than the E3 allele in astroglial cells [24]. The E4 alleleis also associated with an increased risk of Alzheimer's disease [21, 22]and elevated levels of cholesterol [23].

We confirmed that the 108-bp H3 fragment containing the E4allele was an active enhancer in C3A liver and SVGP12 astroglial cells(Figs. 4B and 4C). We end sequenced the top 34 clones that just failedto make the cutoff in the initial screen of clones using the SVGP12astroglial cell line and found that the H3 clone had positivetranscriptional effects, being 1.22 standard deviations above themean. We further confirmed the enhancer activity of H3 in astroglialcells by transfecting the clone into the additional human cell lineDBTRG (Fig. 4D).

The E4 allele differs from the more common E3 allele by a singlenucleotide missense mutation that changes a cysteine to an arginine(Fig. 4A). We investigated whether the enhancer activity associatedwith exon 4 of APOEmight explain the differing transcript levels of theE3 and E4 alleles. In contrast to the E4 allele, the 108-bp region of theE3 allele corresponding to fragment H3 showed no enhancer activityin any of the cell lines investigated (Figs. 4B–4D). Furthermore, the108-bp E4 allele coding sequence fragment showed orientation-

Page 7: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

298 H.P. Chen et al. / Genomics 92 (2008) 292–300

independent enhancer activity in the liver cell line and both glial celllines (Figs. 4B–4D). This observation indicates that the APOE E4 allelecoding region sequence behaves in an orientation-independentmanner typical of a genuine enhancer.

Orientation-independent enhancer activity

To further investigate orientation independence, liver enhancersH6 (125bp) and H8 (207bp) were assayed for regulatory activity wheninserted in either the forward or reverse orientations. Both exhibitedorientation-independent enhancer activity (Fig. 5A).

Copy number does not explain regulatory effects

To exclude the possibility that differences in transcript levels weredue to varying plasmid replication levels, we evaluated the copynumber of the luciferase gene compared to the endogenous RNase P

Fig. 5. Orientation independence and copy number. (A) Fragments H6 and H8 showorientation-independent activity in liver C3A cells. (B) Relative clone copy number intransfected liver C3A cells (RNase P/luciferase).

gene (Fig. 5B). There was no relation between transfection efficiencyand regulatory activity (r=-0.387, df=11, p=0.192).

Discussion

Our relatively unbiased approach to the discovery of generegulatory elements efficiently uncovered novel elements in inter-genic, conserved, nonconserved, and coding regions. We used 4-basecutters to strike a balance between usefully narrowing transcriptionregulatory sequences and having too many fragments to assay.However, 4-base cutters have an average fragment size of only256 bp and may not fully encompass some functional elements,abolishing or decreasing their activity. This may explain, for example,the nonrecovery of HCR1 that harbors an AluI site. Using 5 base cuttersmight recover larger functional elements.

Our analysis suggested that regulatory elements may display acontinuum of transcriptional effects. Gaussian mixture modeling alsosuggested that enhancers were separated from nonregulatorysequences by at most only a small effect size, while silencers hadsomewhat stronger effects. A clearer view of this proposition wouldemerge with experimental methods to evaluate regulatory sequenceswith less noise and higher throughput. One potential source ofvariation in our study was due to transfection in the 96-well plateformat. Thiswasmitigated by the dual-channel design of the luciferaseassay. In addition, the transfection efficiency was uniform, as judgedusing GFP, and the regulatory properties of the identified elementswere not due to variations in copy number based on real-time PCR.

The weak separation of regulatory and nonregulatory sequencesmay help explain the limited success of bioinformatics in identifyingsuch control elements. It may also have evolutionary implications,shedding light on the observation that both large [25] and smallregulatory effects [26] can occur as a result of minor sequencechanges. However, our work has been done entirely in tissue culturecells and the results can be quite different, even opposite, in vivo [14].In addition, elements that are functional in the whole organism butnot in tissue culture will be missed in our screen.

Because of the apparent continuum of transcriptional effect size,we employed a stringent protocol incorporating thresholding andadditional rounds of multiple replication to identify clones withconsistently strong regulatory activity. Putative regulatory elementsmay have failed at any of these steps and are another reason that thelist of elements may be incomplete.

We identified nine enhancers that passed the screen, six in livercells and three in astroglial cells, and five silencers, four in liver cellsand one in astroglial cells. Each of the seven known genes within theBAC seemed to be associated with at least one regulatory element,based on position.

Consistent with a continuum of transcriptional effects, we foundonly 2 out of the 14 putative regulatory elements with conservednoncoding sequences. These results suggest that current approachesof identifying enhancers based on conserved regions may be tooconstrained [8, 27–30]. Recent comprehensive biochemical studies of1% of the human genome in the ENCODE project have shown thatimportant functions of DNA such as transcription factor binding canreside in both conserved and nonconserved regions [9]. It is possiblethat there is an evolutionary advantage to having much of the genomeconsisting of sequences poised on the brink of being regulatory ornonregulatory. A small sequence change in the qstockpileq ofnonregulatory elements could then dramatically change the expres-sion or properties of a relevant target gene.

One enhancer of 396 bp had two conserved elements of 70 and45 bp. The enhancer properties of this fragment were entirely due tothe 45-bp region. We also analyzed a silencer containing part of arepetitive Alu element. Only the entire fragment in a forwardorientation showed silencer activity. Our result echoes the previousdiscovery of an orientation-dependent silencer consisting of a partial

Page 8: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

299H.P. Chen et al. / Genomics 92 (2008) 292–300

Alu element in the CD8α gene [31]. A cruciform structure wasimplicated in this silencer effect. The transcriptional effects ofrepetitive sequence observed in our and other studies contribute tothe growing body of evidence that “junk” DNA plays an important rolein gene regulation [18].

Interestingly, we discovered a novel enhancer element entirelywithin the exon 4 coding region of the APOE gene. To our knowledge,this is the first report of an enhancer residing solely within a codingsequence. The only other description of an enhancer element in anexonic sequence was in the 3' untranslated region of the GADD45Agene [32].

The APOE enhancer encompassed the single nucleotide poly-morphism (SNP) responsible for distinguishing between the E4 and E3alleles of APOE. The polymorphism causes an amino acid change fromarginine to cysteine in the APOE4 allele compared to APOE3 and is theonly difference between the alleles in the 108 bp of the identifiedenhancer fragment. Only the E4 allele, and not E3, showed enhanceractivity.

The APOE4 allele is a known risk factor for late-onset Alzheimer'sdisease [21, 22]. The allele is associated with greater deposition ofamyloid-β peptides in the brain, a hallmark of Alzheimer's disease[33], and also with hypercholesterolemia [24]. Furthermore, the E4allele is more highly expressed than E3. The increased expression of E4may reflect an intrinsic effect of the allele [24]. Our results corroboratethis supposition. Thus, the adverse effects of the E4 allele may berelated not only to alterations in protein chemistry but also to changesin transcriptional activity.

The frequency with which enhancers reside within codingsequences is unknown. However, if prevalent, evolutionary con-straints on coding sequence may extend beyond the genetic code toalso encompass transcriptional regulation. A recent precedent wasestablished by the demonstration that DNA encodes for histonepositioning as well as protein sequence [34]. Bioinformaticapproaches to finding transcriptional regulatory elements in codingregions will clearly be challenging, since the genetic code willdominate any statistical signal. Thus, deeper experimental searchesfor such regulatory sequences will be necessary. Our observation thata SNP causing an amino acid change has transcriptional consequencessuggests that the many identified coding region SNPs (cSNPs) [35] inthe human genome may have significance for their effects on not onlyprotein function but also transcription.

Unbiased experimental searchmethods such as described herewillbe useful in identifying regulatory elements in other genomic regions.However, additional analyseswill be necessary to identify theminimalsequences necessary and sufficient for regulatory function. Recombi-neering in BACs will be a relatively high-throughput approach forevaluating potential control sequences in a more natural context [36].Our findings underscore the notion that sequence conservation is nota necessary property of a regulatory sequence. As much of the genomeremains unannotated, it is important to approach the discovery oftranscriptional regulators without a priori assumptions of sequence,conservation, or location.

Materials and methods

Library construction

A BAC (RP11-1147O10, Children's Hospital Oakland ResearchInstitute) was selected from the RPCI-11 human male BAC library[37]. The BAC is located on chromosome 19 and extends 153.407 kbfrom 50031316 to 50184722 (NCBI Build 36.1). The BAC wasindependently digested with AluI or Sau3AI (New England Biolabs)and subcloned into the pGL3-promoter vector (Promega) digestedwith SmaI or BglII, respectively. Following ligation, TOP 10 chemicallycompetent bacteria were transformed (Invitrogen), clones isolated,and plasmid DNA purified (Qiagen) for screening.

Control clones

The pGL3-promoter and pGL3-basic vector were from Promega. Asa positive control in C3A cells, we used the previously identifiedhuman APOE liver-specific enhancer HCR1 [11] inserted into the pGL3-promoter vector. As a positive control in SVGP12 cells, we used thepreviously identified human APOE brain-specific enhancer ME2 [13]also inserted into the pGL3-promoter vector.

Transfection and reporter gene activity assays

We co-transfected 100 ng of experimental firefly luciferaseplasmid with 10 ng of Renilla luciferase plasmid (phRL-TK, Promega)into C3A, SVGP12, and DBTRG-05MG human cells (ATCC) usingEffectene transfection reagent (Qiagen) in 96-well plates. The Renillaluciferase plasmid acted as an internal control for transfectionefficiency. Cells were at 80% confluence at the time of transfection.We grew C3A and SVGP12 cells in Eagle's Minimum Essential medium(ATCC 30-2003) and DBTRG-05MG cells in RPMI-1640 medium (ATCC30-2001). Cells were lysed after 24 h. We assayed luciferase activityusing the Dual-Luciferase Assay Kit (Promega).

Screens and sequencing

Relative luciferase activity was calculated as the log2 ratio of fireflyto Renilla luciferase signal. These raw data were then normalizedusing a quantile normalization method (Illumina BeadArray Technol-ogy). Clones whose activities were two standard deviations away fromthe mean after normalization were considered outliers and subjectedto further screening. Sequencing was performed at the UCLAGenotyping and Sequencing Core. PhastCons, RepeatMasker, and theUCSC Genome Browser (http://genome.ucsc.edu) were used to searchfor conserved regions and repetitive families.

Linear mixed effects modeling

We randomly chose 19 clones from our library and transfectedeach clone into C3A cells. The luciferase activity of each clone wasassayed 10 times to estimate both within-clone and between-cloneeffects. The resulting data was then fit by a linear mixed effect modely=β + bi +ɛ, where y is the relative luciferase activity, β is the fixedbaseline effect, bi is the random between-clone effect, and ɛ is within-clone noise. As a check for the fit of the model, we performed alikelihood ratio test comparing the likelihood of the model including abetween-clone effect to the likelihood of a model with no clone effect(y=β+ɛ). To estimate the proportion of variance accounted for bybetween-clone effects, we calculated the ratio of variance of the cloneeffects (σbi

2 =0.334) to the total variance (σbi2 +σɛ

2=0.823). To check thedistribution of the randomly chosen 19 clones, we fit each clone's setof activities to a linear regression model and found that the interceptsfor each regression followed a distribution similar to the distributionof all clones tested in C3A cells.

Gaussian mixture modeling

The distribution of log2 raw relative activity measures for allfragments in liver C3A cells was subjected to bootstrap resampling10,000 times. Each sample was modeled as a mixture of one to nineGaussian components, and the optimal number of component distribu-tions characterizing the full distribution evaluated. In 9,570 samples, thebest model contained two components (μ1=-2.809±2.124, μ2=3.947±0.044). In the other 430 samples, the best model contained threecomponents (μ1=-3.937±1.751, μ2=3.828±0.231, μ3=4.132±0.414).

To estimate our power to identify distinct enhancer and none-nhancer distributions, we performed mixture modeling on simulateddata in which samples sizes and noise were varied. Two normal

Page 9: Screening reveals conserved and nonconserved ... · HCR2 [12], associated with the APOE/C1/C4/C2 gene cluster were identified, as well as two astrocyte enhancers, ME1 and ME2 [13],

300 H.P. Chen et al. / Genomics 92 (2008) 292–300

distributions were generated for each simulation, the first with a fixedmean of 3.82, estimated by the mean regulatory activity of the pGL3-promoter control, and the second with one of seven means between4.74 and 6.64, estimated by the mean regulatory activities of the sixdiscovered liver enhancers and the HCR1 enhancer control. Thevariances attributable to between-fragment effects were assumedequal and estimated by the variance of the means of the seven liverenhancers (f2-σ2=0.520).

For each pair of distributions generated, 1,000 samples were taken.Each bootstrap sample consisted of 1,731 values, 1,724 from thenonregulatory distribution and 7 from the enhancer distribution,reflecting experimental sample sizes. To explore improved sample size,the bootstrap sample was also increased to 10 times the experimentalnumbers. In addition, experimental noise composing 60% of the overallvariancewas either added or omitted. The optimal number of Gaussianmixture componentswas then calculated for each sample distribution.

For noiseless data with empirical sample sizes, single-componentmodels were best (p=0.831 to 1) with one exception, corresponding tothe largest mean difference tested, 2.82, for which a single-componentmodel was optimal for 49.2% of the samples. Noiseless distributions attenfold larger sample sizes were modeled most often with a singlecomponent up to a mean difference of 2.03 (p=0.81 to 1). Distribu-tions with larger mean differences were generally modeled by threecomponents (p=0.46 to 0.97). Adding noise almost always resulted inmodels with a single component (p=0.99 to 1), regardless of samplesizes.

Quantitative real-time PCR for copy number analysis

For each clone, 400 ng DNA was transfected into C3A cells. DNAwas isolated using DNeasy tissue kits (Qiagen) after 24 h. Each sampleset included human liver cell genomic DNA and individual pGL3-insertDNA as controls. Twenty ng of each sample of DNA was employed forthe real-time PCR assay. We also measured the RNase P gene as anendogenous control in each sample to normalize for any variation inabsolute quantities.

Acknowledgments

We thank Xiao Chen of the UCLA Academic Technology ServicesStatistical Consulting group for her help.

References

[1] L.A. Pennacchio, E.M. Rubin, Genomic strategies to identify mammalian regulatorysequences, Nat. Rev. Genet. 2 (2001) 100–109.

[2] G.A. Maston, S.K. Evans, M.R. Green, Transcriptional Regulatory Elements in theHuman Genome, Annu. Rev. Genom. Hum. Genet. 7 (2006) 29–59.

[3] E.M. Blackwood, J.T. Kadonaga, Going the distance: a current view of enhanceraction, Science 281 (1998) 60–63.

[4] B.P. Berman, et al., Exploiting transcription factor binding site clustering toidentify cis-regulatory modules involved in pattern formation in the Drosophilagenome, Proc. Natl. Acad. Sci. USA. 99 (2002) 757–762.

[5] S. Ogbourne, T.M. Antalis, Transcriptional control and the role of silencers intranscriptional regulation in eukaryotes, Biochem. J. 331 (1998) 1–14.

[6] A. Sandelin, W. Alkema, P. Engstrom, W.W. Wasserman, B. Lenhard, JASPAR: anopen-access database for eukaryotic transcription factor binding profiles, Nucleic.Acids. Res. 32 (2004) D91–D94.

[7] A.E. Kel, et al., MATCH: A tool for searching transcription factor binding sites inDNA sequences, Nucleic. Acids. Res. 31 (2003) 3576–3579.

[8] L.A. Pennacchio, et al., In vivo enhancer analysis of human conserved non-codingsequences, Nature 444 (2006) 499–502.

[9] E. Birney, et al., Identification and analysis of functional elements in 1% of thehuman genome by the ENCODE pilot project, Nature 447 (2007) 799–816.

[10] S.J. Cooper, N.D. Trinklein, E.D. Anton, L. Nguyen, R.M. Myers, Comprehensiveanalysis of transcriptional promoter structure and function in 1% of the humangenome, Genome. Res. 16 (2006) 1–10.

[11] N.S. Shachter, Y. Zhu, A. Walsh, J.L. Breslow, J.D. Smith, Localization of a liver-specific enhancer in the apolipoprotein E/C-I/C-II gene locus, J. Lipid. Res. 34(1993) 1699–1707.

[12] C.M. Allan, D. Walker, J.M. Taylor, Evolutionary duplication of a hepatic controlregion in the human apolipoprotein E gene locus. Identification of a second regionthat confers high level and liver-specific expression of the human apolipoprotein Egene in transgenic mice, J. Biol. Chem. 270 (1995) 26278–26281.

[13] S. Grehan, E. Tse, J.M. Taylor, Two distal downstream enhancers direct expressionof the human apolipoprotein E gene to astrocytes in the brain, J. Neurosci. 21(2001) 812–822.

[14] P. Zheng, L.A. Pennacchio, W. Le Goff, E.M. Rubin, J.D. Smith, Identification of anovel enhancer of brain expression near the apoE gene cluster by comparativegenomics, Biochim. Biophys. Acta 1676 (2004) 41–50.

[15] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignmentsearch tool, J. Mol. Biol. 215 (1990) 403–410.

[16] A.F.A. Smit, R. Hubley, P. Green, RepeatMasker Open-3.1.8, 1996.[17] A. Siepel, et al., Evolutionarily conserved elements in vertebrate, insect, worm, and

yeast genomes, Genome. Res. 15 (2005) 1034–1050.[18] N.V. Tomilin, Control of genes by mammalian retroposons, Int. Rev. Cytol. 186

(1999) 1–48.[19] A.T. Brini, G.M. Lee, J.P. Kinet, Involvement of Alu sequences in the cell-specific

regulation of transcription of the gamma chain of Fc and T cell receptors, J. Biol.Chem. 268 (1993) 1355–1361.

[20] S.M. Hewitt, G.C. Fraizer, G.F. Saunders, Transcriptional silencer of the Wilms'tumor gene WT1 contains an Alu repeat, J. Biol. Chem. 270 (1995) 17908–17912.

[21] A.M. Saunders, et al., Association of apolipoprotein E allele epsilon 4 with late-onset familial and sporadic Alzheimer's disease, Neurology 43 (1993) 1467–1472.

[22] J. Poirier, et al., Apolipoprotein E polymorphism and Alzheimer's disease, Lancet342 (1993) 697–699.

[23] J.M. Hagberg, K.R. Wilund, R.E. Ferrell, APO E gene and gene-environment effectson plasma lipoprotein-lipid levels, Physiol Genomics 4 (2000) 101–108.

[24] N.J. Bray, et al., Allelic expression of APOE in human brain: effects of epsilon statusand promoter haplotypes, Hum. Mol. Genet. 13 (2004) 2885–2892.

[25] H.A. Itani, X. Liu, J.H. Pratt, C.D. Sigmund, Functional characterization ofpolymorphisms in the kidney enhancer of the human renin gene, Endocrinology148 (2007) 1424–1430.

[26] M.Z. Ludwig, C. Bergman, N.H. Patel, M. Kreitman, Evidence for stabilizingselection in a eukaryotic enhancer element, Nature 403 (2000) 564–567.

[27] G. Bejerano, et al., Ultraconserved elements in the human genome, Science 304(2004) 1321–1325.

[28] J.P. Balhoff, G.A. Wray, Evolutionary analysis of the well characterized endo16promoter reveals substantial variationwithin functional sites, Proc. Natl. Acad. Sci.USA. 102 (2005) 8591–8596.

[29] M.Z. Ludwig, et al., Functional evolution of a cis-regulatory module, PLoS. Biol. 3(2005) e93.

[30] S. Prabhakar, et al., Close sequence comparisons are sufficient to identify humancis-regulatory elements, Genome Res. 16 (2006) 855–863.

[31] J.H. Hanke, J.E. Hambor, P. Kavathas, Repetitive Alu elements form a cruciformstructure that regulates the function of the human CD8 alpha T cell-specificenhancer, J. Mol. Biol. 246 (1995) 63–73.

[32] F. Jiang, P. Li, A.J. Fornace Jr., S.V. Nicosia, W. Bai, G2/M arrest by 1,25-dihydroxyvitamin D3 in ovarian cancer cells mediated through the induction ofGADD45 via an exonic enhancer, J. Biol. Chem. 278 (2003) 48030–48040.

[33] K.R. Bales, et al., Apolipoprotein E is essential for amyloid deposition in the APP(V717F) transgenic mouse model of Alzheimer's disease, Proc. Natl. Acad. Sci. USA.96 (1999) 15233–15238.

[34] E. Segal, et al., A genomic code for nucleosome positioning, Nature 442 (2006)772–778.

[35] M. Cargill, et al., Characterization of single-nucleotide polymorphisms in codingregions of human genes, Nat. Genet. 22 (1999) 231–238.

[36] I. Poser, et al., BAC TransgeneOmics: a high-throughput method for exploration ofprotein function in mammals, Nat. Methods. 5 (2008) 409–415.

[37] K. Osoegawa, et al., A bacterial artificial chromosome library for sequencing thecomplete human genome, Genome Res. 11 (2001) 483–496.

[38] A.I. Su, et al., Large-scale analysis of the human and mouse transcriptomes, Proc.Natl. Acad. Sci. USA. 99 (2002) 4465–4470.