16
ISSN 00268933, Molecular Biology, 2013, Vol. 47, No. 6, pp. 836–851. © Pleiades Publishing, Inc., 2013. Published in Russian in Molekulyarnaya Biologiya, 2013, Vol. 47, No. 6, pp. 959–975. 836 INTRODUCTION The analysis of gene expression has become increasingly important in biological research. Many different research methods are now used to study gene expression, such as northern blotting, serial analysis of gene expression (SAGE), semiquantitative PCR (semiPCR), gene microarrays, quantitative realtime PCR (qRTPCR), RNase protection analysis (RPA); and RNAseq [1–3]. qRTPCR is becoming a routine tool in molecular biology to study gene expression, and is a PCRbased technique for analysis of mRNA expression in various biological samples. Compared with other gene expression analysis methods, it is highly sensitive and useful even for genes with low expression levels or those which show small expression changes [4]. At the same time, it is easy to perform and enables rapid quantification. However, normalization is required for accurate interpretation of results. Sev eral strategies have been proposed for normalizing qRTPCR data, including choosing samples of a sim ilar size, quantifying RNA relative to genomic DNA, and using an internal housekeeping or reference gene. Normalizing to a reference gene is a simple and popu lar method for internally controlling for error in qRT PCR [5]. Ideal reference genes are expected to have a stable expression level across various experimental conditions, such as plant developmental stages, tissue types and external stimuli. The most commonly used reference genes include βactin (ACTB), 18S ribosomal RNA, translation elongation factor1 (TEF1) and glyc eraldehyde3phosphate dehydrogenase (GAPDH) [6–12]. However, the level of transcript expressed from such genes is not always stable under all experi mental conditions, which may lead to the misinterpre tation of results. Species of the genus Camellia whose seeds can pro duce oil are referred to as oiltea camellia. These spe cies are very important woody oil trees that have been cultivated for a long time and are widely distributed in southern China [13]. The oil extract from oiltea camellia seeds is named tea oil and is extensively used throughout China. Tea oil is regarded as one of the healthiest edible vegetable oils in the world, with nutri tional composition superior to olive oil. The content of unsaturated fatty acids, such as oleic and linoleic acid is above 90%. It is also very rich in other nutrients such as squalene, sterols, vitamins and tocopherol. Recently, improvement of high oil producing varieties Selection of Reference Genes for Quantitative RealTime PCR in Six OilTea Camellia Based on RNAseq 1 C. F. Zhou*, P. Lin*, X. H. Yao, K. L. Wang, J. Chang, and X. J. Han Research Institute of Subtropical Forestry of the Chinese Academy of Forestry, Fuyang, Hangzhou, China 311400; email: [email protected]; [email protected] Received February 28, 2013; in final form, May 28, 2013 Abstract—qRTPCR is becoming a routine tool in molecular biology to study gene expression. It is necessary to find stable reference genes when performing qRTPCR. The expression of genes cloned in oiltea camellia currently cannot be accurately analyzed due to a lack of suitable reference genes. We collected different tissues (including roots, stems, leaves, flowers and seeds) from six oiltea camellia species to determine stable reference genes. Five novel and ten traditional reference gene sequences were selected from the RNAseq database of Camellia oleifera Abel seeds and specific PCR Primers were designed for each. Cycle threshold (Ct) data were obtained from each reaction for all samples. Three different software tools, geNorm, NormFinder and Best Keeper were applied to calculate the expression stability of the candidate reference genes according to the Ct values. The results were similar between the three software packages, and indicated that the traditional genes TUBα3, ACT7α and the novel gene CESA were relatively stable in all species and tissues. However, no genes were sufficiently stable across all species and tissues, thus the optimal number of reference genes required for accurate normalization varied from 2 to 6. Finally, the relative expression of squalene synthase (SQS) and squalene epoxidase (SQE) genes related to important ingredients squalene and tea saponin in oiltea camellia seeds were compared by using stable to less stable reference genes. The comparison results validated the selec tion of reference genes in the current study. In summary, for the different tissues of six oiltea camellia species different optimal numbers of suitable reference genes were found. DOI: 10.1134/S0026893313060198 Keywords: reference genes, realtime PCR, oiltea camellia UDC 577.214.6 1 The article is published in the original. *Zhou C.F. and Lin P. made an equal contribution to the study, and should be regarded as joint first authors. GENOMICS. TRANSCRIPTOMICS

Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

Embed Size (px)

DESCRIPTION

Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative Real-time P

Citation preview

Page 1: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

ISSN 0026�8933, Molecular Biology, 2013, Vol. 47, No. 6, pp. 836–851. © Pleiades Publishing, Inc., 2013.Published in Russian in Molekulyarnaya Biologiya, 2013, Vol. 47, No. 6, pp. 959–975.

836

INTRODUCTION

The analysis of gene expression has becomeincreasingly important in biological research. Manydifferent research methods are now used to study geneexpression, such as northern blotting, serial analysis ofgene expression (SAGE), semi�quantitative PCR(semi�PCR), gene microarrays, quantitative real�timePCR (qRT�PCR), RNase protection analysis (RPA);and RNA�seq [1–3]. qRT�PCR is becoming a routinetool in molecular biology to study gene expression,and is a PCR�based technique for analysis of mRNAexpression in various biological samples. Comparedwith other gene expression analysis methods, it ishighly sensitive and useful even for genes with lowexpression levels or those which show small expressionchanges [4]. At the same time, it is easy to perform andenables rapid quantification. However, normalizationis required for accurate interpretation of results. Sev�eral strategies have been proposed for normalizingqRT�PCR data, including choosing samples of a sim�ilar size, quantifying RNA relative to genomic DNA,and using an internal housekeeping or reference gene.

Normalizing to a reference gene is a simple and popu�lar method for internally controlling for error in qRT�PCR [5]. Ideal reference genes are expected to have astable expression level across various experimentalconditions, such as plant developmental stages, tissuetypes and external stimuli. The most commonly usedreference genes include β�actin (ACTB), 18S ribosomalRNA, translation elongation factor1 (TEF1) and glyc�eraldehyde�3�phosphate dehydrogenase (GAPDH)[6–12]. However, the level of transcript expressedfrom such genes is not always stable under all experi�mental conditions, which may lead to the misinterpre�tation of results.

Species of the genus Camellia whose seeds can pro�duce oil are referred to as oil�tea camellia. These spe�cies are very important woody oil trees that have beencultivated for a long time and are widely distributed insouthern China [13]. The oil extract from oil�teacamellia seeds is named tea oil and is extensively usedthroughout China. Tea oil is regarded as one of thehealthiest edible vegetable oils in the world, with nutri�tional composition superior to olive oil. The content ofunsaturated fatty acids, such as oleic and linoleic acidis above 90%. It is also very rich in other nutrients suchas squalene, sterols, vitamins and tocopherol.Recently, improvement of high oil producing varieties

Selection of Reference Genes for Quantitative Real�Time PCRin Six Oil�Tea Camellia Based on RNA�seq1

C. F. Zhou*, P. Lin*, X. H. Yao, K. L. Wang, J. Chang, and X. J. HanResearch Institute of Subtropical Forestry of the Chinese Academy of Forestry, Fuyang, Hangzhou, China 311400;

e�mail: [email protected]; [email protected] February 28, 2013; in final form, May 28, 2013

Abstract—qRT�PCR is becoming a routine tool in molecular biology to study gene expression. It is necessaryto find stable reference genes when performing qRT�PCR. The expression of genes cloned in oil�tea camelliacurrently cannot be accurately analyzed due to a lack of suitable reference genes. We collected different tissues(including roots, stems, leaves, flowers and seeds) from six oil�tea camellia species to determine stable referencegenes. Five novel and ten traditional reference gene sequences were selected from the RNA�seq database ofCamellia oleifera Abel seeds and specific PCR Primers were designed for each. Cycle threshold (Ct) data wereobtained from each reaction for all samples. Three different software tools, geNorm, NormFinder and Best�Keeper were applied to calculate the expression stability of the candidate reference genes according to the Ctvalues. The results were similar between the three software packages, and indicated that the traditional genesTUBα�3, ACT7α and the novel gene CESA were relatively stable in all species and tissues. However, no geneswere sufficiently stable across all species and tissues, thus the optimal number of reference genes required foraccurate normalization varied from 2 to 6. Finally, the relative expression of squalene synthase (SQS) andsqualene epoxidase (SQE) genes related to important ingredients squalene and tea saponin in oil�tea camelliaseeds were compared by using stable to less stable reference genes. The comparison results validated the selec�tion of reference genes in the current study. In summary, for the different tissues of six oil�tea camellia speciesdifferent optimal numbers of suitable reference genes were found.

DOI: 10.1134/S0026893313060198

Keywords: reference genes, real�time PCR, oil�tea camellia

UDC 577.214.6

1 The article is published in the original.* Zhou C.F. and Lin P. made an equal contribution to the study,

and should be regarded as joint first authors.

GENOMICS. TRANSCRIPTOMICS

Page 2: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 837

has become one of the most important areas of camel�lia scientific research. Selection and hybridization arethe most popular and age�old methods for breeding,but, the progress require more time and resources.Therefore, improving the production and quality oftea oil using molecular approaches should becomeespecially necessary in the future. So far only the evo�lutionary relationships of the Camellia species havebeen studied using molecular markers [14], also anEST library has been constructed for camellia seed[15] and transcriptome sequencing of different seeddevelopmental stages was finished [16]. Many func�tional genes have been cloned, such as the Dehydrin�like Protein [17], metallothionein gene [18], FAD2gene [19], SAD gene [20], calmodulin genes [21], andan aldo�keto reductase [22]. However, analyses ofgene expression in different species, tissues or growthstages could not be performed because no suitable ref�erence genes for genus Camellia were found. There�fore, the identification of reliable reference genes inCamellia has become a crucial factor to allow accuratefunctional gene expression analysis. In this study, weaimed at identifying potential reference genes suitablefor transcript normalization in different tissues of sixcamellia species. These reference genes will allowmore accurate and reliable qRT�PCR normalizationfor gene expression studies in the genus Camellia.

EXPERIMENTAL

Plant materials and biological samples. Six eco�nomically important and widely cultivated oil�teacamellia species were selected from the Camellia Ger�mplasm Collection Park at the Subtropical ForestryResearch Institute of the Chinese Academy of For�estry. These six species were Zhejiang (C. chekiango�leosa Hu), Guangning (C. semiserrata C.W. Chi),Xiaoguo (C. meiocarpa Hu), Wantian (C. polyodontaF.C. How), Youxian (C. yuhsienensis Hu) and Putong(C. oleifera C. Abel). Five different tissues, includingroots, stems, leaves, flowers and seeds, were collectedfrom six adult trees of each species. Ten samples ofeach tissue were picked from those 36 trees. Vegetativetissue samples, such as roots, leaves and stems weretaken from young tissues in March. Flowers were har�vested at full bloom; the flowers of Putong andXiaoguo were harvested in November, and the flowersof the other species were harvested in March the fol�lowing year. Fruits at the fast growth stage were col�lected in July and peeled to extract the seeds immedi�ately. All samples were harvested at about 10:00 in themorning and cut into small pieces. For each speciestissue samples from six trees were mixed together.Plant materials were frozen in liquid nitrogen immedi�ately after being harvested and stored in a freezer at –80°C until the total RNA was isolated.

RNA extraction and cDNA synthesis. The mixedfive tissue samples from each species were taken fromthe –80°C freezer and ground into fine powder in a

mortar with liquid nitrogen. About 100 mg of the pow�der was used for RNA extraction. Total RNA from allsamples was prepared using the RN38 EASYspin plusPlant RNA kit (Aidlab Biotech, Beijing, China)according to the manufacturer’s instructions. ANanodrop 2000 microvolume spectrophotometer wasused to detect the RNA concentration and quality(Table 1). Only RNA samples with a 260/280 wave�length ratio of 1.8–2.1 and a 260/230 wavelength ratiobetween 0.1 and 1 were used for cDNA synthesis, sev�eral samples with the 260/230 values greater than1 were diluted. cDNA synthesis was performed with3–8 μL total RNA (the final content of RNA in thereaction mixture was adjusted to about 1 μg for allsamples) according to the protocols of the Super�Script™ First�Strand Synthesis System (Invitrogen,Carlsbad, CA. USA) in a total volume of 20 μL.Finally, the cDNA was diluted 1 : 15 with nuclease�free water for qRT�PCR [23].

Selection of candidate reference genes based onRNA�Seq and primer design. RNA�Seq of Putongoil�tea camellia seeds from July to October weresequenced by using solexa technology. Total number ofreads, Total Nucleotides, Q20 percentage, gap per�centage, GC percentage, number of contig, Scaffoldand Unigenes in each month are listed in Table 2.There were 8310777 reads in average of RNA�Seqfrom July to October. Only 0.04% of the gaps were notsequenced. Transcriptome assembly was carried out byshort reads assembly software SOAP denovo [24],detailed assembling steps are presented in Fig. 1. Theassembling results showed 43461, 47932, 30022 and39251 unigenes respectively from July to October.After the repeated unigenes were removed, a total of80310 unigenes were detected according to this RNA�Seq of Putong oil�tea camellia seeds. 21789 unigeneswith protein function annotations were generated.The unigenes with function annotations can be classi�fied into twenty�four functional�categories andaccount for 27.13% of all unigenes. All unigenes werequeried against the KEGG pathway database, and42638 unigenes were given the pathway annotationsrelated to 265 pathways, including metabolism,genetic information processing, cell metabolism path�ways and so on.

RPKM values (Reads per kb per Million reads)[25] of unigenes were used to analyze genes expressionin RNA�Seq databases. The least fluctuation inRPKM values among different growth periods or tis�sues may represent the most stably expressed genes.Coefficient variations of the RPKM values from Julyto October were used to determine gene expressionfluctuations. The coefficient variation was calculatedas the Mean divided by the Standard deviation. Allunigenes of the two dozen traditional reference genesand several reported novel reference genes were takenfrom the RNA�Seq database. Fifteen unigenes withthe least RPKM values variations among the four seeddevelopment stages were selected (Table 3), and

Page 3: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

838

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

included nine traditional references and two reportednovel references. The nine traditional genes wereActin 7α (АСТ7α), Cyclophilin B (CYPB), Tublin α3(TUBα�3), TATA box banding protein component ofTFIID and TFIIIB (TBP), Cyclin�D4�2 (CYCD4�2),Translation elongation factor 1α (TEF1α), phospho�glycerate kinase (PGK), Glyceraldehyde�3�phosphatedehydrogenase (GAPDH) and RNA polymerase II asses�sory factor (RNAPII). The two reported novel refer�ence genes were clathrin coat associated protein AP�2complex subunit (CAC), Coatomer protein subunit ε1(COPε1). They were also chosen to be novel stablereference genes in Arabidopsis, tomato and Cupres�saceae [26, 27]. Besides all these reported referencegenes, four unigenes with a minimal Coefficient vari�

ation among all unigenes were chosen to be candi�date references in this paper. They were cellulosesynthetase A (CESA), Glyoxylate reductase (GARY),nuclear protein localization protein 4 (NPL4), vacuolar�sorting receptor 3 (VSR3). Primers were designed withPrimer3 (v0.4.0; http://frodo.wi.mit.edu/primer3/)according to the conserved sequences with the follow�ing parameters: optimal length 20–22 nucleotides,melting temperature 60–65°C, and product size rangefrom 160 to 220 base pairs. We then used OligoAnaly�ser to make sure no hairpins or dimmers could beformed.

qRT�PCR conditions. qRT�PCR reactions wereperformed in fast optical 0.1 mL, 96�well plates using anABI 7300 Real Time PCR System (Applied Biosystems,

Table 1. RNA concentration and quality detected by Nanodrop 2000

Species Tissues Nucleic acid conc., ng/µL A260 A280 260/280 260/230

Zhejiang

Roots 208.60 5.21 2.91 1.79 0.73

Stems 308.30 7.70 3.56 2.16 1.46

Leaves 266.50 3.40 1.60 2.13 0.43

Flowers 326.90 8.10 3.95 2.12 0.62

Seeds 496.90 12.42 5.87 2.12 1.14

Guangning

Roots 107.70 2.64 1.34 1.45 0.47

Stems 225.60 5.64 2.83 2.00 1.26

Leaves 647.60 11.36 5.78 1.97 1.24

Flowers 267.30 9.60 4.53 2.12 0.75

Seeds 319.90 8.00 3.82 2.09 1.12

Wantian

Roots 170.60 4.25 2.20 1.93 1.09

Stems 122.80 3.07 1.51 2.03 0.74

Leaves 289.90 4.30 2.06 1.11 0.41

Flowers 384.10 9.60 4.53 2.12 1.06

Seeds 711.20 17.78 8.25 2.15 1.59

Youxian

Roots 136.80 3.42 1.77 1.94 1.08

Stems 242.50 6.06 2.86 2.12 1.4

Leaves 334.90 8.06 3.91 2.06 1.31

Flowers 174.20 4.17 2.16 1.93 0.59

Seeds 590.40 14.77 7.49 1.97 1.21

Putong

Roots 448.10 9.76 4.99 1.96 0.67

Stems 446.80 12.17 6.28 1.78 0.82

Leaves 254.30 5.73 2.84 2.02 1.02

Flowers 158.60 3.97 1.99 1.99 0.31

Seeds 493.00 12.33 5.82 2.12 0.81

Xiaoguo

Roots 378.90 8.69 4.18 2.08 1.01

Stems 410.20 11.63 5.77 2.02 1.21

Leaves 401.50 10.24 5.38 1.90 0.73

Flowers 132.60 3.57 1.84 1.94 0.68

Seeds 276.40 5.89 3.07 1.92 0.92

Page 4: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 839

Foster City CA, USA). The qRT�PCR reaction vol�umes were 20 μL, each containing 10 μL 2× SYBR®Premix Ex TaqTM (Takara, Tokyo, Japan), 0.4 μL50× RO× reference dye, 2 μL of 15�fold diluted syn�thesized cDNA, 0.4 μL 10 μM forward primer, 0.4 μL10 μM reverse primer, and 6.8 μL sterile distilledwater. PCR was then performed following the opera�tion manual (30 s at 95°C; 40 cycles of 95°C for 5 s and60°C for 34 s). After thermal cycling, melting curveanalysis (60–95°C, fluorescence read once every0.3°C) was performed to verify the specificity of theamplicons. A negative PCR control lacking the cDNAtemplate was run for each primer pair. The thresholdcycle (Ct) values were calculated from the mean of fourtechnical replicates for each sample (every candidatereference gene primer pair).

Analysis of gene stability. qRT�PCR was performedfor each primer pair in a series of 10�fold dilutions (10–1,10–2, 10–3, 10–4, 10–5) of the mixed cDNA template toobtain Сt values. The corresponding PCR amplifica�tion efficiencies (Е) were calculated according to theequation E = (10–1/slope – 1) × 100 [28], the correla�tion coefficients (R) and slope values were calculatedfrom the standard curve of Ct values for each geneusing the Excel software.

To select suitable reference genes, the expressionstability of every candidate reference gene was ana�lyzed with three different Microsoft Excel�based soft�ware tools: geNorm, NormFinder, and BestKeeper.The raw Ct values were transformed into the requireddata input format for geNorm and NormFinder. Themaximum expression level (the lowest Ct value) ofeach gene was set to a value of 1. Relative expression

Table 2. Description of Camellia oleifera C. Abel RNA�seq analysis

Sample Total reads

Total nucleotides, nt

Q20 percentage,

%

Gap percentage,

%

GC percentage,

%Contig Scaffold Unigenes

July 8320068 1248010200 97.02 0.03 47.55 43216 43410 43461

August 8397376 1259606400 97.11 0.02 46.91 47568 47798 47932

September 8385809 1257871350 94.52 0.06 49.03 33827 35543 35589

October 8139855 1220978250 91.20 0.04 48.76 28817 29954 30022

Average 8310777 1246616550 94.96 0.04 48.06 38357 39176 39251

Reads (sample 1)

AssembleContig

Map reads to contigs

Assemble contigs to scaffolds

Scaffold

Gap fillingUnigen

Long sequence clustering

Reads (sample 2)

Contig 1 Contig 2 The same pipeline as sample 1

Unigen

Unigen

NN

NN

NN NN

NNNN

NN

Fig. 1. Detailed assembling steps of Putong oil�camellia seeds RNA�seq from July to October.

Page 5: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

840

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

Tabl

e 3.

Exp

ress

ion

an

d co

effi

cien

t va

riat

ion

of e

ach

gen

e fr

om J

uly

to O

ctob

er b

ased

on

RN

A�S

eq d

atab

ase

Uni

gene

ID

Gen

e n

ame

July

Aug

ust

Sep

tem

ber

Oct

ober

Ave

rage

of

RP

KM

Sta

nda

rd

devi

atio

n

of R

PK

MC

VR

anks

of

CV

aver

age

of R

PK

M

stan

dard

de

viat

ion

of

RP

KM

aver

age

of R

PK

M

stan

dard

de

viat

ion

of

RP

KM

aver

age

of R

PK

M

stan

dard

de

viat

ion

of

RP

KM

aver

age

of R

PK

M

stan

dard

de

viat

ion

of R

PK

M

Uni

gene

171

58_A

llA

CT

7α25

821

.17

269

22.7

122

323

.17

174

17.3

784

.42

2.63

0.03

1 8

Uni

gene

207

99_A

llC

YP

B11

913

.44

107

12.4

310

615

.16

111

15.2

656

.29

1.37

0.02

4 5

Uni

gene

101

69_A

llT

UBα

�327

311

6.47

283

124.

1718

398

.83

172

89.2

742

8.73

15.9

70.

037

9

Uni

gene

138

17_A

llT

BP

369

28.7

653

442

.81

305

30.1

032

831

.10

132.

786.

480.

049

12

Uni

gene

575

74_A

llC

YC

D4�

259

4.70

826.

7245

4.54

434.

1720

.14

1.15

0.05

7 14

Uni

gene

361

61_A

llT

EF

1�α

978

244.

5697

625

1.00

774

245.

0299

030

1.17

1041

.74

27.3

10.

026

6

Uni

gene

968

6_A

llP

GK

449

256.

2239

723

2.99

246

177.

7140

327

9.77

946.

6943

.70

0.04

6 10

Uni

gene

112

31_A

llG

AP

DH

327

84.1

321

456

.63

285

92.8

327

485

.76

319.

3515

.93

0.05

0 13

Uni

gene

610

3_A

llR

NA

PII

784

64.6

259

850

.69

523

54.5

739

940

.01

209.

9010

.17

0.04

8 11

Uni

gene

569

5_A

llC

ES

A16

0410

0.84

2100

135.

7899

379

.03

1247

95.3

741

1.02

23.8

90.

058

15

Uni

gene

242

63_A

llG

AR

Y17

1.52

161.

4713

1.47

141.

536.

000.

030.

005

2

Uni

gene

145

32_A

llN

PL

475

065

.71

734

66.1

461

267

.88

627

66.8

326

6.57

0.95

0.00

4 1

Uni

gene

533

0_A

llV

SR

331

712

0.94

325

127.

5225

012

0.74

265

123.

0049

2.20

3.15

0.00

6 3

Uni

gene

122

78_A

llC

AC

554.

9942

3.92

354.

0242

4.64

17.5

70.

510.

029

7

Uni

gene

550

1_A

llC

OPε1

1148

154.

2293

712

9.45

843

143.

3677

712

6.98

554.

0212

.72

0.02

3 4

Uni

gene

635

59_A

llS

QS

373

83.4

362

914

4.69

192

54.3

710

929

.66

312.

1449

.57

0.15

9 16

Uni

gene

477

3_A

llS

QE

405

208.

8015

3281

2.31

329

464.

0571

120

6.35

1691

.52

286.

390.

169

17

Page 6: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 841

levels were then calculated from Ct values using theformula: 2–ΔgCt, in which ΔgCt is the lowest Ct value sub�tracted from the corresponding Ct value of each sam�ple for every gene. According to the analysis bygeNorm and NormFinder, ten most suitable geneswere chosen to be further analyzed by BestKeeperbased on untransformed Сt values and amplificationefficiencies.

Reference gene validation. Squalene synthase(SQS); [29, 30] and squalene epoxidase (SQE); [31]are enzymes that catalyze the synthesis of squalene,sterols and terpenoids (which are extremely importantingredients in oil�tea camellia seeds and play a crucialrole in biological growth and metabolism). It wasfound that the expression of SQS and SQE changedconsiderably during different seed developmentalstages according to the RNA�Seq database (Table. 3).Therefore, the expression values of SQS and SQE werecompared as target genes in different species and tis�sues of interest to test the validity of the referencegenes analyzed by the three software tools in the cur�rent study.

RESULTS

Characters of Candidate Reference Genes

Fifteen unigenes were chosen to be candidate ref�erence genes because of their low fluctuations of theirRPKM values during different seed development. Theraw reads and RPKM values of each month, RPKMvalues average, standard deviation and coefficientvariation are listed in Table 3. Nine of them were tra�ditional reference genes; others were not commonlyused. It was shown that different candidate referencegenes showed different RPKM values. The sequencesof the unigenes were then aligned using BLASTn inNCBI, this alignment revealed that the sequenceswere highly similar to each particular gene, and theconserved regions were used for primer design. Thegene names, descriptions, primer sequences, ampli�con lengths, amplification efficiencies, Tm values andcorrelation coefficients are listed in Table 4. The Tmvalues varied from 79.9°C (GARY) to 85.9°C (CYPВ),and the amplicon lengths were about 200 bp, Theamplification efficiencies were between 89.4%(COPε1) and 101.8% (PGK), and the correlation coef�ficients were all larger than 0.99. Agarose gel electro�phoresis (2%) showed unique amplicons of expectedlength without primer dimers. The melting curve anal�ysis showed a single peak for each gene, and no non�specific products were detected (Fig. 2).

Comparison of Candidate Reference Gene Expression Stability

Real�time qRT�PCR assays of the fifteen candi�date reference genes were performed using the newlydesigned primer pairs, and cycle threshold (Ct) datawas obtained for all samples. The Ct values revealed

differences in transcript levels between the variouscandidate genes (Fig. 3). The Ct values of the fifteengenes ranged from 19 to 23 cycles. Two genes (PGKand TEF1�α) showed the lowest Ct values (mean Ct =18.3 and 17.8), indicating that they had the mostabundant transcript levels. The gene GARY had thelowest transcript abundance (mean Ct = 22.9). Threesoftware tools, geNorm, NormFinder and Best�Keeper, were used to calculate the expression stabilityof the candidate reference genes.

GeNorm Analysis

GeNorm is a Visual Basic application tool forMicrosoft Excel that operates on the assumption thatthe expression ratio of two ideal reference genes isconstant throughout the different groups of templates.Gene expression stability values (М) for all genes arecalculated and genes with an М value below thethreshold of 1.5 are considered stable [32]. In thisanalysis, most genes in different species and tissueshad М values less than 1.5, which meant that all geneswere stable according to geNorm. When the results ofall 30 samples were combined, CESA and TUBα�3 hadthe highest expression stability (the lowest М value),GAPDH was the least stable, and the other twelve genesvaried between the two extremes (Fig. 4). The analysisresults changed when the samples were classified intodifferent species and tissues (Fig. 4); CAC and PGK hadthe highest expression stability in Zhejiang, TUBα�3 andRNAPII were the most stable in Guangning, CESA andTUBα�3 were the most stable in both Xiaoguo andPutong, CESA and ACT7α had the highest expressionstability in Wantian, and CAC and COPε1 were themost stable in Youxian. GAPDH was the least stablegene in Zhejiang, Guangning and Wantian, while theleast stable genes in Xiaoguo, Youxian and Putongwere GARY, PGK, and COPε1, respectively. Further�more, the reference gene expression stability also var�ied in the five different tissue samples. PGK andGAPDH had the lowest М value (the highest expressionstability) in both roots and stems, CAC and COPε1 hadthe highest stability in leaves, CESA and TUBα�3 werethe most stable in flowers, and PGK and TBP were themost stable in seeds. GAPDH was the least stable genein both leaves and flowers, CYPВ, TEF1�α and GARYwere the least stable genes in roots, stems and seeds,respectively.

Evaluation of the optimal number of referencegenes required accurate normalization. The pairwisevariation (Vn/Vn + 1) between consecutively rankednormalization factors was calculated using geNorm.NFn and NFn + 1 were used to determine the number ofgenes required for reliable normalization. It has beensuggested that if the pairwise variation is below thethreshold value of 0.15, then there is no need for anadditional internal control gene [32]. In the currentresearch, six genes were needed when all 30 sampleswere analyzed together. However, different species

Page 7: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

842

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

1.41.21.00.80.60.40.2

0–0.2

60 65 70 75 80 85 90

1.41.21.00.80.60.40.2

0–0.2

60 65 70 75 80 85 90

0.32

–0.0460 65 70 75 80 85 90

0.280.240.200.160.120.080.04

0

0.6

–0.160 65 70 75 80 85 90

0

0.50.40.30.20.1

95

ACT7α CYPB TUBα�3 TBP

1.0

–0.260 65 70 75 80 85 90

CYCD4�20.80.60.40.2

0

1.0

–0.260 65 70 75 80 85 90

TEF1�α0.80.60.40.2

0

0.7

–0.160 65 70 75 80 85 90

PGK0.60.50.40.30.20.1

0

1.2

–0.260 65 70 75 80 85 90

GAPDH

0

95

1.00.80.60.40.2

1.2

–0.260 65 70 75 80 85 90

RNAP II

0.80.60.40.2

0

1.00.7

–0.160 65 70 75 80 85 90

CESA0.60.50.40.30.20.1

0

0.8

–0.160 65 70 75 80 85 90

COPε1

0.60.50.40.30.20.1

0–0.1

60 65 70 75 80 85 90

NPL40.50.40.30.20.1

0

0.7

1.0

–0.260 65 70 75 80 85 90

VSR30.80.60.40.2

0

1.4

–0.260 65 70 75 80 85 90

GARY

0.80.60.40.2

0

1.01.2

1.8

–0.260 65 70 75 80 85 90

CAC

0.80.60.40.2

0

1.01.2

1.61.4

1.2

–0.260 65 70 75 80 85 90

SQS

0.80.60.40.2

0

1.0

95

95T, °C T, °C T, °C

1.0

–0.260 65 70 75 80 85 90

SQE0.80.60.40.2

0

95T, °C

Der

ivat

ive

ACT7α CYPb TUBα�3 TBP CYCD4�2TEF1�αPGK GAPDHRNAP II CESA COPε1 NPL4 VSR3 GARY CAC SQS SQEM

500 bp

200 bp

Der

ivat

ive

Der

ivat

ive

Der

ivat

ive

Der

ivat

ive

Fig. 2. Specificity and melting curves of candidate genes. 2% agarose gel electrophoresis indicated unique amplicons of expectedlength and no primer dimmers. The melting curve analysis showed a single peak in each gene, and no non�specific products weredetected.

Page 8: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 843

30

28

26

24

22

20

18

16

14

121 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Candidate reference genes

Ct

Fig. 3. Ct values of each reference genes. Expression data displayed as Ct values for each reference gene in all samples from 1 to15 respectively. 1, ACT7α; 2, CYPB; 3, TUBα�3; 4, TBP; 5, CESA; 6, CYCD4�2; 7, TEF1�α; 8, PGK; 9, GAPDH; 10, RNAPII;11, GARY; 12, NPL4; 13, VSR3; 14, CAC; and 15, COPε1. A line across the box is depicted as the median. The box indicates the25th and 75th percentiles, whisker caps represent the maximum and minimum values, dots represent outliers.

needed different numbers of genes; Zhejiang, Youxian,Xiaoguo and Putong needed two genes, Wantianneeded three genes, and Guangning needed fourgenes. The optimal number of reference genes alsovaried in the different tissues. Two genes were neededin leaves and flowers, three in stems and seeds, and fivein roots (Fig. 5).

NormFinder Analysis

NormFinder is another Excel application that usesa model–based approach to identify the most stablereference genes by combining samples into groups.More stably expressed genes should show lower aver�age expression stability values (М values). The stabilityvalue of each gene was calculated by NormFinder(Table 5), and the results indicated that TUBα�3 andCESA were the most appropriate for use as referencegenes over the 30 samples. The least stable genes wereGAPDH and GARY. The most stable and least stablereference genes were different between each speciesand tissues. PGK was most stable in Zhejiang, GARY inGuangning, TEF1�α in Wantian, RNAPII in Youxian,TUBα�3 in Putong, and CESA in Xiaoguo. The leaststable gene in Guangning, Zhejiang and Wantian wasGAPDH, while in Youxian, Putong and Xiaoguo werePGK, TBP and GARY respectively. For the tissues,TUBα�3 was the most stable in roots, PGK in stems,COPε1 in leaves, CAC in flowers, and RNAPII in seeds.The least stable gene in leaves and flowers was GAPDH,

while in roots, stems and seeds it was COPε1, TBP andGARY, respectively.

BestKeeper Analysis

BestKeeper, also an Excel�based tool, estimates theinter�gene relationships between possible referencegene pairs by performing numerous pairwise correla�tion analyses using the raw Сt values of each gene.Most importantly, all genes may be included in thecalculations of the BestKeeper index, which is thegeometric mean of the Ct values of all candidate ref�erence genes and can be used to rank the best refer�ence genes because the stable reference genes show astrong correlation with the BestKeeper index. Best�Keeper also calculates the coefficient of variance(CV) and the standard deviation (SD) of the Ct valuesusing the whole data set, which includes all Ct values.Reference genes are identified as the most stablegenes, i.e., those that exhibit the lowest coefficient ofvariance and standard deviation (CV ± SD). Geneswith SD values greater than 1 are considered to beunacceptable [33, 34].

According to the analysis of geNorm and Norm�Finder, the five least stable genes including TEF1�α,VSR3, CYCD4�2, GARY and GAPDH were abandoned.The left ten candidate genes were ranked according tothe CV ± SD of each species and tissue (Table 6). Theanalysis revealed that all ten genes were acceptable in theXiaoguo and Putong species and the seeds tissues. Severalgenes (NPL4, PGK, COPε1, CAC, RNAPII) were not

Page 9: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

844

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

COPε1RNAP II TBP PGK CYPB ACT7α GARY NPL4 TEF1�α VSR3 CAC CYCD GAPDHTUBα�3CESA4�2

1.2

1.0

0.8

0.6

0.4

0.2

(f)

COPε1RNAP II TBP PGK CYPBACT7αGARY NPL4TEF1�α VSR3 CACCYCD GAPDH TUBα�3CESA4�2

0.9

0.7

0.5

0.3

0.1

(e)

COPε1RNAP II TBPPGK CYPBACT7αGARY NPL4TEF1�α VSR3 CACCYCD GAPDH TUBα�3CESA

4�2

1.6

1.2

0.8

0.4

0

(d)

COPε1 RNAP IITBPPGK CYPB ACT7αGARY NPL4 TEF1�αVSR3CAC CYCDGAPDH TUBα�3CESA4�2

1.6

1.1

0.6

0.1

(c)

COPε1RNAP II

TBP PGKCYPB ACT7αGARYNPL4TEF1�αVSR3CAC CYCDGAPDH TUBα�3CESA4�2

1.8

1.3

0.8

0.3

(b)

COPε1RNAP II TBP PGKCYPB ACT7αGARY NPL4 TEF1�αVSR3CAC

CYCDGAPDH TUBα�3 CESA4�2

1.6

1.1

0.6

0.1

(a)

Ave

rage

exp

ress

ion

sta

bili

ty,

М

Least stable genes Most stable genes

Fig. 4. Average expression stability values (M) calculated by geNorm of each species (Lower average expression stability (M value)indicates more stable expression). (a) Zhejiang oil�tea camellia; (b) Guangning oil�tea camellia; (c) Wantian oil�tea camellia;(d) Youxian oil�tea camellia; (e) Xiaoguo oil�tea camellia; (f) Putong oil�tea camellia; (g) roots; (h) stems; (i) leaves; (j) flowers;(k) seeds; (l) all samples.

acceptable in some species and tissues as their SD val�ues were greater than 1. ACT7α, TUBα�3 and CESAwere among the three most stable genes when all sam�ples were combined. These three genes were also stableamong different species and tissues, and only varied in

their rank positions. NPL4, PGK, COPε1 and CACexhibited the high SD in most species and tissues, indi�cating that these were the least stable reference genes,when all tissues and species under analysis were com�bined, these four genes were not acceptable to be ref�

Page 10: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 845

Fig. 4. Contd.

erence genes. The results of BestKeeper analysisshowed only a small difference from the resultsobtained by geNorm and Normfinder.

Reference Gene Validation

The expression values of SQS and SQE were com�pared among the 5 different tissues of Putong camelliaand among seeds of the 6 different oil camellia species

using the most stable and least stable candidate refer�ence genes as internal controls. CESA, TUBα�3 andPGK were the most stable reference genes, and GARYwas the least stable one for seeds of different oil�teacamellia species, TUBα�3 and CES were the most sta�ble, and COPε1 was the least stable for different tissuesof Putong oil�tea camellia. The results showed thatrelative expression trends were mainly the samebetween the stable reference genes, but there were

COPε1 RNAP II TBPPGK CYPB ACT7αGARY NPL4TEF1�αVSR3 CACCYCDGAPDH TUBα�3CESA4�2

1.5

1.3

1.1

0.9

0.7

(l)

Least stable genes Most stable genes

COPε1 RNAP II TBP PGKCYPBACT7αGARY NPL4TEF1�α VSR3 CACCYCDGAPDH TUBα�3CESA4�2

1.6

1.1

0.6

0.1

(j)COPε1

RNAP IITBP PGKCYPB ACT7αGARY NPL4TEF1�α VSR3 CACCYCDGAPDH TUBα�3CESA4�2

1.0

0.6

0.4

0.2

(i)

0.8

COPε1RNAP IITBP PGKCYPBACT7α GARYNPL4TEF1�αVSR3CACCYCDGAPDH

TUBα�3CESA4�2

1.2

0.8

0.6

0.4

(h)

1.0

COPε1 RNAP II TBP PGKCYPBACT7αGARY NPL4TEF1�α VSR3CACCYCDGAPDH

TUBα�3CESA4�2

1.6

0.8

0.4

(g)

1.2

COPε1 RNAP IITBPPGKCYPBACT7αGARY NPL4 TEF1�αVSR3 CAC CYCDGAPDH TUBα�3CESA

4�2

1.7

1.2

0.7

0.2

(k)

Ave

rage

exp

ress

ion

sta

bili

ty,

М

Page 11: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

846

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

Fig. 5. Optimal number of reference genes in each species and tissues. Pairwise variation (Vn/Vn+1) calculated by geNorm, andthe NFn and NFn+1 were used to determine the number of genes required for reliable normalization. It has been suggested that ifthe pairwise variation is below the threshold value of 0.15, then there is no need for an additional internal control gene. The resultsshow that different numbers of reference genes are needed in different species and tissues.

considerable differences between the results obtainedwith stable reference genes versus unstable ones. Forexample, the expression levels of SQE in different tis�sues of Putong camellia were almost the same withTUBα�3 and CES, and both of the relative expressionlevels were highest in seeds, followed by flowers, roots,stems and leaves. However, the results changed whenCOPε1 was used as the internal reference, the relativeexpression in flowers was lower than in roots. Notablediscrepancy of the results of SQS was also shownbetween stable references genes and unstable ones(Fig. 6). Relative expression of SQS and SQE in seedsof six species showed greater difference between stablereference genes and unstable one. For example, therelative expression of SQS showed little differenceamong the six species by stable reference, but showedgreat difference by the least stable reference gene:about 150 in Xiaoguo oil�tea camellia, but greater than7000 in Putong oil�tea camellia. Thus, the applicationof different reference genes resulted in different rela�tive expression levels. Unsuitable references couldcause great deviation from the actual target geneexpression levels. Thus, it is important to validate thereference genes before experimental application.

DISCUSSION

Recently, the quantification of RNA transcripts hasbecome increasingly rapid and precise because ofadvances in gene quantification strategies. To removesampling differences and identify real gene�specificvariation, especially when studying samples with subtlegene expression differences, normalization becomesnecessary for accurate gene expression quantificationby qRT�PCR. Normalizing to a reference gene is asimple and popular method for this. However, the

expression of many traditional housekeeping geneschanges considerably under different conditions or indifferent tissues, which biases the analysis of targetgenes [35]. For example, 18SrRNA, ACTB andRNAPII were shown to be the most stable genes amongsix leaf samples of different citrus genotypes, but whenfurther analyzed in five other tissues the results indi�cated that they were not completely stable [36]. GAPDHand ACT7α were shown to have unacceptable variabilityin peach [37], and two of the most commonly used ref�erence genes, TUBα�3 and ACT7α, are unsuitable fordifferent tissues of Chinese cabbage [11].

Oil�tea camellias are a very important woody oilcrop of the genus Camellia. So far, studies on geneexpression in Camellia have been carried out usingsingle reference genes. For example, the expression ofthe B function CjDEF�1 gene in Camellia japonicaHongshibaxueshi [38] and the chalcone isomerasegene in Camellia nitidissima [39] were analyzed acrossdifferent parts of the flowers. All used 18S rRNA as thereference gene, but none validated their results withany preliminary expression stability analysis. Further�more, the expression analysis of two calmodulin genesof Camellia oleifera did not use any reference genes[21]. To our knowledge, this is the first time suitablereference genes have been assessed for qRT�PCR insix oil�tea camellia species and their different tissues.Therefore, gene expression normalization by qRT�PCR in camellia can now be put into practice based onthis selection of reference genes. Pairwise analysisusing geNorm reveals that the optimal number of ref�erence genes varies from two to six depending on theparticular sample set. Our results suggest that there isno particular gene that is expressed across all speciesand tissues in a consistently stable way, and also showthat the more complex the samples are, the more ref�

0.25

0.20

0.15

0.10

0.05

0

V2/3 V3/4 V4/5 V5/6 V6/7 V7/8 V8/9 V9/10 V10/11 V11/12 V12/13 V13/14 V14/15

Zhejiang CuangningWantian Youxian Xiaoguo Putong Roots Stems Leaves Flowers Seeds All

Pai

rwis

e va

riat

ion

Page 12: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 847

9000850080007500

500

450

400

350

300

250

200

150

100

50

0

5507000

ZhejiangCuangning

WantianYouxian Xiaoguo

Putong

(a)

Rel

ativ

e ex

pres

sion

160014001200

400

350

300

250

200

150

100

50

0

CESAPGKTUBα�3CESA + PGK + TUBα�3GRAY

1000

ZhejiangCuangning

WantianYouxian Xiaoguo

Putong

(b)

200

150

100

50

0

CESATUBα�3CESA + TUBα�3COPε1

(c)

RootsStems

LeavesFlowers

Seeds

250

150

100

50

0

CESATUBα�3CESA + TUBα�3COPε1

(d)

RootsStems

LeavesFlowers

Seeds

200

Species Species

CESAPGKTUBα�3CESA + PGK + TUBα�3GRAY

Fig. 6. Relative expression levels of SQE and SQS in different tissues/organs of Putong oil�tea Camellia and seeds of different spe�cies. Expression levels of both SQS (a and c) and SQE (b and d) were normalized to stable and unstable reference genes respec�tively. Error bars show the mean standard error calculated from two biological replicates.

erence genes are needed. Thus, it is necessary to selectreference genes according to the samples beingresearched.

18S rRNA has been used as a reference gene inmany gene expression studies of the genus Camellia,but it has been reported that 18S rRNA is an unsuit�able reference gene since its synthesis is executed byRNA polymerase I, whereas mRNA transcription iscarried out by RNA polymerase II. Moreover, thereverse transcription reaction is an oligo(dT) reaction,but rRNA contains no poly(A) tail, so oligo(dT)primed cDNA cannot be synthesized for rRNA [40].It was also reported that target gene expression was

down�regulated when 18S rRNA was used as a refer�ence gene [41]. Based on these results, 18S rRNA wasnot chosen for the current study.

The expression results of fifteen candidate geneswere analyzed for each sample with three differentsoftware tools: geNorm, NormFinder and Bestkeeper.These three software tools use different algorithms andcan give different results. However, the results showedonly a small difference among them, and all of theresults indicated that TUBα�3 and CESA were themost stable reference genes, though they were not themost stable among all of the fifteen candidate genesbased on comparison of the RPKM values in the

Page 13: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

848

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

Tabl

e 4.

Des

crip

tion

of c

andi

date

gen

es fr

om C

amel

lia fo

r qR

T�P

CR

Gen

e S

ymbo

lG

ene

nam

eP

rim

er s

eque

nce

(5'

→ 3

')

(for

wor

d/re

vers

e)T

m,

°CA

mpl

icon

len

gth

, bp

Am

plif

icat

ion

ef

fici

ency

, %

R2

AC

T7α

Act

in7

alp

ha

CA

AC

TT

CG

CT

GG

TG

TC

TT

CA

AC

CC

TC

TA

CG

CA

GA

AG

CA

AA

82.4

212

101.

10.

9969

CY

PB

Cyc

lop

hil

in B

AC

AG

GG

AG

CT

CA

CC

AC

AT

TC

TC

TT

AG

CA

TG

GC

AA

AT

GC

AG

85.9

210

93.9

0.99

89

TU

�3T

ubl

in a

lph

a�3

CC

AT

GC

CT

TG

GA

TC

AC

AT

TT

TG

GG

GC

CA

TT

AA

TG

TA

GA

CG

82.6

319

594

0.99

49

TB

PT

AT

A b

ox

bin

din

g p

rote

in c

omp

on

ent

of

TF

IID

an

d T

FII

IB

GA

AA

AG

GC

AC

CA

TG

GG

AA

TA

G

GA

AG

AT

GG

TT

TG

CA

CT

GG

T81

.819

395

.60.

9955

CY

CD

4�2

Cyc

lin

�D4�

2G

GA

TT

GA

GG

AA

TG

GG

GA

TT

T

AT

AA

AC

AG

GC

CA

CA

GC

CA

AC

82.2

519

690

.70.

9993

TE

F1�α

Tra

nsl

atio

n e

lon

gati

on

fac

tor

1�al

ph

aT

CC

AG

GA

GC

AT

CA

AT

GA

CA

G

AC

CA

CC

AC

TG

GT

CA

CC

TC

AT

83.7

521

999

.40.

9987

PG

KP

ho

sph

ogl

ycer

ate

kin

ase

CC

CA

AG

GG

TA

CT

CA

GT

TG

GA

C

CA

TC

CA

AC

CA

TC

AG

GG

AT

A83

.82

192

101.

80.

9995

GA

PD

HG

lyce

rald

ehyd

e�3�

ph

osp

hat

e d

ehyd

roge

nas

eT

CA

AT

CA

CC

CG

AT

TG

CT

GT

A

CT

GC

TA

TC

AA

GG

AG

GC

TT

CG

82.2

519

996

.90.

9993

RN

AP

II

RN

A p

oly

mer

ase

II a

sses

sory

fac

tor

AA

TG

CT

CG

CT

CT

CA

CA

AC

CT

CG

AA

AT

CG

TT

GT

CG

TC

AT

TG

85.5

519

693

0.99

58

CE

SA

Cel

lulo

se s

ynth

etas

e A

AA

GG

AC

CG

CT

GA

TA

CT

CG

AA

A

CA

CC

AT

GG

CC

TG

GA

AA

TA

A83

.919

599

.10.

9939

GA

RY

Gly

oxy

late

red

uct

ase

TG

CG

GT

TC

TT

GT

GG

AT

GA

TA

GC

AC

TC

AT

GC

TT

TC

CT

GA

CA

79.9

220

100.

90.

9927

NP

L4

Nu

clea

r p

rote

in lo

cali

zati

on

pro

tein

4G

GC

CA

TG

GA

CT

CA

AT

TA

GG

AA

TC

AT

CT

GG

AC

CG

AA

CA

AG

G83

.817

698

.90.

9935

VS

R3

Vac

uo

lar�

sort

ing

rece

pto

r 3

GC

AC

AA

AT

GG

CC

TT

CA

AA

AC

GG

TG

AC

CC

AA

AT

GC

TG

AT

TC

82.3

167

98.3

0.99

04

CA

CC

lath

rin

co

at a

sso

ciat

ed p

rote

in A

P�2

co

mle

x su

bun

itG

GC

AT

TC

CA

GA

AA

GA

AA

GC

AA

GG

AA

GG

AG

TA

CG

CT

CA

CC

A82

235

92.3

0.99

20

CO

Pε1

Co

ato

mer

pro

tein

su

bun

it e

psi

lon

�1G

CC

TT

TC

CA

TT

CA

GG

AT

CA

AA

TG

CG

GA

AA

AA

CA

GT

TG

AG

G81

.717

889

.40.

9900

SQ

SS

qual

ene

syn

thet

ase

TT

TC

GC

CC

TC

GT

AA

TT

CA

AC

CA

TG

AA

AA

AT

GC

CA

GT

CA

CG

81.4

318

010

5.3

0.99

66

SQ

ES

qual

ene

epo

xid

ase

AA

AG

AG

CA

GA

CC

AC

CA

CC

AC

TC

GG

GC

TC

TG

TC

AA

AT

CT

CT

80.1

520

810

0.8

0.99

41

Page 14: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 849

Tab

le 5

.R

anki

ng

of c

andi

date

ref

eren

ce g

enes

in o

rder

of t

hei

r ex

pres

sion

sta

bili

ty a

s ca

lcul

ated

by

Nor

mF

inde

r

Ran

kZ

hej

ian

gG

uan

gnin

gW

anti

anYo

uxia

nP

uton

gX

iaog

uoR

oots

Ste

ms

Lea

ves

Flo

wer

s S

eeds

All

1P

GK

0.06

1G

AR

Y0.

302

TE

F1�α

0.33

1R

NA

PII

0.10

9T

UBα

�30.

144

CE

SA0.

052

TU

�30.

263

PG

K0.

163

CO

Pε1

0.13

0C

AC

0.02

6R

NA

PII

0.26

1T

UBα

�30.

358

2C

OPε1

0.18

8R

NA

PII

0.37

1T

UBα

�30.

390

CY

PB

0.35

6C

AC

0.19

4T

UBα

�30.

080

CY

PB

0.36

6C

ES

A0.

225

PG

K0.

199

TU

�30.

062

TU

�30.

270

CE

SA

0.49

2

3C

AC

0.21

3A

CT

7α0.

385

CE

SA0.

411

CA

C0.

393

VS

R3

0.20

1C

OPε1

0.15

9C

ESA

0.52

3T

UBα

�30.

345

RN

AP

II0.

227

CE

SA0.

062

CE

SA0.

459

NP

L4

0.52

4

4T

EF

1�α

0.34

6P

GK

0.41

6R

NA

PII

0.43

0C

OPε1

0.42

2G

AP

DH

0.20

5C

AC

0.15

9P

GK

0.52

4G

AP

DH

0.35

4T

UBα

�30.

242

PG

K0.

163

PG

K0.

483

CA

C0.

547

5C

ESA

0.36

6N

PL

4 0.

422

AC

T7α

0.43

8N

PL

40.

430

CY

CD

4�2

0.21

6G

AP

DH

0.33

2A

CT

7α0.

524

GA

RY

0.44

3N

PL

40.

253

TB

P0.

175

TB

P0.

562

AC

T7α

0.55

2

6T

UBα

�30.

447

TU

�30.

564

CY

PB

0.46

5T

UBα

�30.

469

CE

SA

0.21

9V

SR

30.

340

TB

P0.

530

NP

L4

0.52

7C

AC

0.27

4N

PL

40.

264

TE

F1�α

0.64

5P

GK

0.55

7

7C

YC

D4�

20.

452

CO

Pε1

0.62

0P

GK

0.49

8T

BP

0.48

3N

PL

40.

240

CY

PB

0.37

4N

PL

40.

606

CY

PB

0.52

9A

CT

7α0.

303

AC

T7α

0.39

3C

YP

B0.

717

RN

AP

II0.

587

8N

PL

40.

477

CE

SA

0.63

9V

SR

30.

674

GA

PD

H0.

512

TE

F1�α

0.24

5A

CT

7α0.

386

GA

PD

H0.

711

CO

Pε1

0.58

9T

EF

1�α

0.40

2R

NA

PII

0.40

5C

OPε1

0.71

8T

BP

0.62

7

9A

CT

7α0.

580

TE

F1�α

0.68

5T

BP

0.72

7V

SR

30.

586

AC

T7α

0.24

6N

PL

40.

432

VS

R3

0.76

7T

EF

1�α

0.62

8T

BP

0.41

6C

OPε1

0.41

1C

AC

0.72

7C

YP

B0.

631

10T

BP

0.60

9C

YP

B0.

686

NP

L4

0.73

3A

CT

7α0.

669

CO

Pε1

0.29

0P

GK

0.44

2C

AC

0.77

5C

YC

D4�

20.

711

VS

R3

0.51

7V

SR

30.

607

GA

PD

H0.

767

CO

Pε1

0.64

1

11C

YP

B0.

676

CY

CD

4�2

0.80

9C

AC

0.77

1C

ES

A0.

726

RN

AP

II0.

292

TB

P0.

510

RN

AP

II0.

804

VS

R3

0.72

6C

ES

A0.

519

CY

PB

0.63

2C

YC

D4�

20.

804

TE

F1�α

0.70

8

12R

NA

PII

0.72

8T

BP

0.86

0C

YC

D4�

20.

975

CY

CD

4�2

0.83

4G

AR

Y0.

339

CY

CD

4�2

0.51

7T

EF

1�α

1.01

6C

AC

0.73

7C

YP

B0.

557

TE

F1�α

0.67

0A

CT

7α0.

890

VS

R3

0.74

9

13V

SR

30.

773

VS

R3

0.86

1C

OPε1

1.16

6G

AR

Y0.

849

PG

K0.

340

RN

AP

II0.

622

GA

RY

1.02

9A

CT

7α0.

752

CY

CD

4�2

0.58

8C

YC

D4�

20.

724

NP

L4

0.93

3C

YC

D4�

20.

850

14G

AR

Y1.

005

CA

C0.

996

GA

RY

1.17

4T

EF

1�α

0.90

5C

YP

B0.

383

TE

F1�α

0.62

5C

YC

D4�

21.

129

RN

AP

II0.

778

GA

RY

0.67

6G

AR

Y0.

760

VS

R3

1.24

9G

AR

Y0.

917

15G

AP

DH

1.60

6G

AP

DH

1.17

4G

AP

DH

1.38

3P

GK

1.06

9T

BP

0.64

3G

AR

Y0.

756

CO

Pε1

1.24

4T

BP

0.85

2G

AP

DH

1.09

2G

AP

DH

2.14

2G

AR

Y1.

293

GA

PD

H1.

231

Page 15: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

850

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

ZHOU et al.

RNA�seq database of Camellia oleifera seeds from Julyto October. CV values of TUBα�3 and CESA wereranked as the ninth and the fifteenth respectively(Table 3). So it seems that the most stable referencegenes are not the genes with the lowest CV values ofRPKM. This may be because RNA�seq is focused ongene expression during the development of the seed,but not on different species and tissues, furthermore,all of the candidate genes were relatively stable uni�genes, and there was only a little CV differencebetween them. So RNA�seq is still valid and it may besupposed that more suitable reference genes will befound using RNA�seq data in future research.

Finally, reference gene validation was performed.The results showed many relative expression discrep�ancies between the stable and unstable referencegenes, indicating that the inappropriate use of refer�ence genes without validation may produce bias fromthe real expression.

Fifteen genes were selected as candidate referencegenes from the RNA�seq database, all of which werestable among the different seed developmental stages.Ten genes were expressed at moderate levels, threegenes expressed at low levels (GARY, CAC and CYCD4�2),and the other two genes (TEF1�α and PGK) wereexpressed at a much higher level. Thus, their Ct valueswere lower than those of the others. Based on the ref�erence gene validation analysis, we speculated that ifthe reference genes were highly expressed, it wouldlower the relative expression and lead to transcriptabundance discrepancies of the target gene. The twohighest�abundance genes, TEF1�α and PGK, were theleast stable genes in most species and tissues accordingto the three different software tools. Although PGK wasfound to be a stable gene in seeds, the relative expres�sion of SQE in the seeds of Guangning, Wantian andYouxian was almost zero when PGK was used as thereference gene, so differences in target gene expres�sion were hard to distinguish. This result verified ourassumption that a highly expressed reference genecould lead to discrepancies in target gene expression.

CONCLUSION

In summary, fifteen reference genes were analyzedin different tissues of six oil�tea camellia species withgeNorm, NormFinder and BestKeeper. The resultsindicated that TUBα�3 and CESA were the most stablyexpressed when all samples were analyzed together.Suitable reference genes were also evaluated in eachtissue and species specifically. Although no gene wasstable across all of the different species and tissues,TUBα�3, CESA and ACT7α were comparatively stablein many species and tissues. The optimal number ofreference genes required for accurate normalizationvaried from 2–6.

ACKNOWLEDGMENTS

This work was supported by the Creation of HighYield and Quality New Oil�tea camellia Germplasm(no. 2009BADB1B01) and the Research and Devel�opment of Crucial Technology for Oil�tea camelliaIndustrial Upgrading (no. 2009BADB1B00).

REFERENCES

1. Chang Z., Ling C., Yamashita M., et al. 2010. Microar�ray�driven validation of reference genes for quantitativereal�time polymerase chain reaction in a rat vocal foldmodel of mucosal injury. Anal. Biochem. 406 (2), 214–221.

2. Umenishi F., Verkman A.S., Gropper M.A. 1996.Quantitative analysis of aquaporin mRNA expressionin rat tissues by RNase protection assay. DNA CellBiol.15, 475–480 .

3. Zhang L., Zhou W., Velculescu V.E., et al. 1997. Geneexpression profiles in normal and cancer cells. Science.276 (5316), 1268–1272.

4. Demidenko N.V., Logacheva M.D., Penin A.A. 2011.Selection and validation of reference genes for quanti�tative real�time PCR in Buckwheat (Fagopyrum escu�lentum) based on transcriptome sequence data. PloSONE. 6 (5), e19434.

5. Huggett J., Dheda K., Bustin S., et al. 2005. Real�timeRT�PCR normalization: Strategies and considerations.Genes Immun. 6 (4), 279–284.

6. Paolacci A.R., Tanzarella O.A., Porceddu E., Ciaffi M.2009. Identification and validation of reference genesfor quantitative RT–PCR normalization in wheat.BMC Mol. Biol. 10 (1), 11.

7. Maccoux L.J., Clements D.N., Salway F., Day P.J.2007. Identification of new reference genes for the nor�malization of canine osteoarthritic joint tissue tran�scripts from microarray data. BMC Mol. Biol. 8 (1), 62.

8. Migocka M., Papierniak A. 2011. Identification of suit�able reference genes for studying gene expression incucumber plants subjected to abiotic stress and growthregulators. Mol. Breeding. 28 (3), 343–357.

9. Dheda K., Huggett J.F., Chang J.S., et al. 2005. Theimplications of using an inappropriate reference genefor real�time reverse transcription PCR data normal�ization. Anal. Biochem. 344 (1), 141–143.

10. Andersen C.L., Jensen J.L., Ørntoft T.F. 2004. Nor�malization of real�time quantitative reverse transcrip�tion�PCR data, a model�based variance estimationapproach to identify genes suited for normalization,applied to bladder and colon cancer data sets. CancerRes. 64 (15), 5245.

11. Qi J., Yu S., Zhang F., Shen X., Zhao X., Yu Y., Zhang D.2010. Reference gene selection for real�time quantita�tive polymerase chain reaction of mRNA transcript levelsin Chinese cabbage (Brassica rapa L. ss. pekinensis).Plant Mol. Biol. Rep. 28 (4), 597–604.

12. Hruz T., Wyss M., Docquier M., et al. 2011. RefGenes,identification of reliable and condition specific refer�ence genes for RT–qPCR data normalization. BMCGenomics. 12 (1), 156.

Page 16: Molecular Biology Volume 47 Issue 6 2013 [Doi 10.1134%2FS0026893313060198] Zhou, C. F.; Lin, P.; Yao, X. H.; Wang, K. L.; Chang, J.; Han, X -- Selection of Reference Genes for Quantitative

MOLECULAR BIOLOGY Vol. 47 No. 6 2013

SELECTION OF REFERENCE GENES FOR QUANTITATIVE REAL�TIME PCR 851

13. Zhuang R.L., Yao X.H. 2008. Oil�Tea Camellia ofChina, 2nd ed. Beijing: Chinese Forestry Publ., pp. 1–10(in Chinese).

14. Vijayan K., Zhang W.J., Tsou C.H. 2009. Moleculartaxonomy of Camellia (Theaceae) inferred from nrITSsequences. Am. J. Bot. 96 (7), 1348–1360.

15. Tan X.F., Hu F.M., Xie L.S., et al. 2006. Constructionof EST library and analysis of main expressed genes ofCamellia oleifera seeds. Sci. Silvae Sinicae (China). 42 (1),43–48.

16. Lin P., Cao Y.Q., Yao X.H., et al. 2011.Transcriptomeanalysis of Camellia oleifera Abel seed in four develop�ment stages. Mol. Plant Breeding (China). 19 (4), 498–505.

17. Hu X., Tan X., Tian X., et al. 2008. cDNA cloning,sequence analysis and physiological role speculation ofa dehydrin�like protein from Camellia oleifera. Acta Bot.Boreali�Occidentalia Sinica (China). 28 (8), 1541–1548.

18. Jiang Y., Tan X.F., Zhang D.Q., et al. 2009. Cloningand sequence analysis of a metallothionein gene fromCamellia oleifera. Acta Agric. Univ. Jiangxiensis (China).31 (4), 699–705.

19. Luo Q., Xie L.S., Tan X.F., et al. 2008. Cloning of full�length cDNA of FAD2 gene from Camellia oleifera. Sci.Silvae Sinicae (China). 44 (3), 70–75.

20. Zhang D., Tan X., Chen H., et al. 2008. Full�lengthcDNA cloning and bioinformatic analysis of Camelliaoleifera SAD. Sci. Silvae Sinicae. 44 (2), 155–159.

21. Wang B., Tan X.F., Chen Y., et al. 2012. Molecularcloning and expression analysis of two calmodulingenes encoding an identical protein from Camellia ole�rfera. Pak. J. Bot. 44 (3), 961–968.

22. Shao G., Tan X.F., Chen H., et al. 2012. Isolation andcharacterization of an aldo�keto reductase cDNA fromCamellia oleifera seed. Adv. Sci. Lett. (China). 10 (1),153–157.

23. Han X.J., Lu M., Chen Y., et al. 2012. Selection of reli�able reference genes for gene expression studies usingreal�time PCR in tung tree during seed development.PloS ONE. 7 (8), e43084.

24. Li R., Zhu H., Ruan J., et al. 2010. De novo assemblyof human genomes with massively parallel short readsequencing. Genome Res. 20, 265–272.

25. Mortazavi A., Williams B.A., McCue K., Schaeffer L.,Wold B. 2008. Mapping and quantifying mammaliantranscriptomes by RNA�Seq. Nature Methods. 5 (7),621–628.

26. Czechowski T., Stitt M., Altmann T., et al. 2005.Genome�wide identification and testing of superiorreferencegenes for transcript normalization in Arabi�dopsis. Plant Physiol. 139, 5–17.

27. Expósito�Rodríguez M., Borges A.A., Borges�Pérez A.,et al. 2008. Selection of internal control genes for quan�

titative real�time RT�PCR studies during tomato devel�opment process. BMC Plant Biol. 8, 131.

28. Chang E., Shi S., Liu J., et al. 2012. Selection of refer�ence genes for quantitative gene expression studies inPlatycladus orientalis (Cupressaceae) using real�timePCR. PloS ONE. 7 (3), e33278.

29. Do R., Kiss R.S., Gaudet D., et al. 2009. Squalene syn�thase, a critical enzyme in the cholesterol biosynthesispathway. Clin. Genet.75 (1), 19–29.

30. Beytia E., Qureshi A.A., Porter J.W. 1973. Squalenesynthetase III. Mechanism of the reacion. J. Biol.Chem. 248 (5), 1856–1867.

31. M’Baya B., Fegueur M., Servouse M., et al. 1989.Regulation of squalene synthetase and squalene epoxi�dase activities in Saccharomyces cerevisiae. Lipids.24 (12), 1020–1023.

32. Vandesompele J., De Preter K., Pattyn F., et al. 2002.Accurate normalization of real�time quantitative RT�PCR data by geometric averaging of multiple internalcontrol genes. Genome Biol. 3 (7), res. 0034.

33. Pfaffl M.W., Tichopad A., Prgomet C., et al. 2004.Determination of stable housekeeping genes,differen�tially regulated target genes and sample integrity: Best�Keeper–Excel�based tool using pair�wise correlations.Biotechnol. Lett. 26 (6), 509–515.

34. Livak K.J., Schmittgen T.D. 2001. Analysis of relativegene expression data using real�time quantitative PCRand the 2–ΔCt method. Methods. 25 (4), 402–408.

35. Benn C.L., Fox H., Bates G. 2008. Optimisation ofregion�specific reference gene selection and relativegene expression analysis methods for pre�clinical trialsof Huntington’s disease. Mol. Neurodegener. 3 (1), 17.

36. Yan J., Yuan F., Long G., et al. 2011. Selection of refer�ence genes for quantitative real�time RT�PCR analysisin citrus. Mol. Biol. Rep. 39 (2), 1831–1838.

37. Tong Z., Gao Z., Wang F., et al. 2009. Selection of reli�able reference genes for gene expression studies inpeach using real�time PCR. BMC Mol. Biol. 10 (1), 71.

38. Zhu G.P., Li J.Y., Fan Z.Q., et al. 2011. Isolation of Bfunction CjDEF�1 gene involved in floral developmentin Camellia japonica Hongshibaxueshi and its expres�sion analysis. J. Agric. Biotechnol. 19 (3), 442–448.

39. Zhou X.W., Li J.Y., Fan Z.Q. 2012. Cloning and expres�sion analysis of chalcone isomerase gene cDNA fromCamellia nitidissima. Forest Res. (China). 25 (1), 93–99.

40. Radonic� A., Thulke S., Mackay I.M., Landt O., Siegert W.,Nitsche A. 2004. Guideline to reference gene selectionfor quantitative real�time PCR. Biochem. Biophys. Res.Commun. 313 (4), 856–862.

41. Raaijmakers M.H., van Emst L., de Witte T., Mensink E.,Raymakers R.A. 2002. Quantitative assessment of geneexpression in highly purified hematopoietic cells usingreal�time reverse transcriptase polymerase chain reac�tion. Exp. Hematol. 30 (5), 481–487.