18
Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem Jill L. Wegrzyn 1 , Andrew J. Eckert 2,3 , Minyoung Choi 2 , Jennifer M. Lee 2 , Brian J. Stanton 4 , Robert Sykes 5 , Mark F. Davis 5 , Chung-Jui Tsai 6 and David B. Neale 1,3,7,8 1 Department of Plant Sciences, University of California at Davis, Davis, CA 95616, USA; 2 Section of Evolution and Ecology, University of California at Davis, Davis, CA 95616, USA; 3 Center for Population Biology, University of California at Davis, Davis, CA 95616, USA; 4 Genetic Resources Conservation Program, Greenwood Resources, Portland, OR, 97201 USA; 5 National Renewable Energy Laboratory, Golden, CO, 80401 USA; 6 School of Forestry and Natural Resources, and Department of Genetics, University of Georgia, Athens, GA, 30602 USA; 7 Bioenergy Research Center (BERC), University of California at Davis, Davis, CA 95616, USA; 8 Institute of Forest Genetics, USDA Forest Service, Davis, CA 95616, USA Author for correspondence: David B. Neale Tel: +1 530 754 8431 Email: [email protected] Received: 2 March 2010 Accepted: 11 June 2010 New Phytologist (2010) 188: 515–532 doi: 10.1111/j.1469-8137.2010.03415.x Key words: association genetics, biofuels, black cottonwood (Populus trichocarpa), genotyping, lignin biosynthesis, linkage disequilibrium, resequencing, single nucleotide polymorphism (SNP). Summary An association genetics approach was used to examine individual genes and alleles at the loci responsible for complex traits controlling lignocellulosic bio- synthesis in black cottonwood (Populus trichocarpa). Recent interest in poplars as a source of renewable energy, combined with the vast genomic resources available, has enabled further examination of their genetic diversity. Forty candidate genes were resequenced in a panel of 15 unrelated individuals to identify single nucleotide polymorphisms (SNPs). Eight hundred and seventy-six SNPs were successfully genotyped in a clonally replicated population (448 clones). The association population (average of 2.4 ramets per clone) was phenotyped using pyrolysis molecular beam mass spectrometry. Both single-marker and haplo- type-based association tests were implemented to identify associations for composite traits representing lignin content, syringyl : guaiacyl ratio and C6 sugars. Twenty-seven highly significant, unique, single-marker associations (false discovery rate Q < 0.10) were identified across 40 candidate genes in three com- posite traits. Twenty-three significant haplotypes within 11 genes were discovered in two composite traits. Given the rapid decay of within-gene linkage disequilibrium and the high cover- age of amplicons across each gene, it is likely that the numerous polymorphisms identified are in close proximity to the causative SNPs and the haplotype associa- tions reflect information present in the associations between markers. Introduction Forest trees are a potential source of net-zero carbon emis- sion lignocellulosic biofuels. The production of biofuels involves the collection of biomass, deconstruction of cell wall polymers into component sugars (pretreatment and saccharification) and conversion of these sugars to ethanol (fermentation) (Rubin, 2008). Woody bioenergy crops from which biomass is derived have not been domesticated for this purpose and the current methods for lignocellulosic saccharification and fermentation are inefficient. The recent need to develop viable fuel alternatives is now taking advan- tage of genomics resources and technologies to discover the potential gain that can be achieved through breeding. Traits of interest in trees with applications in bioenergy include growth rate, branching habit, stem thickness and cell wall chemistry (Bradshaw et al., 2000). Rapid growth, moderate genome size, woody tissues and economic importance make black cottonwood (Populus trichocarpa) an ideal model organism to examine biofuels-related traits (Bradshaw et al., New Phytologist Research No claim to original US government works Journal compilation ȑ New Phytologist Trust (2010) New Phytologist (2010) 188: 515–532 515 www.newphytologist.com

Association genetics of traits controlling lignin and cellulose biosynthesis in black cottonwood (Populus trichocarpa, Salicaceae) secondary xylem

Embed Size (px)

Citation preview

Association genetics of traits controlling lignin andcellulose biosynthesis in black cottonwood (Populustrichocarpa, Salicaceae) secondary xylem

Jill L. Wegrzyn1, Andrew J. Eckert2,3, Minyoung Choi2, Jennifer M. Lee2, Brian J. Stanton4, Robert Sykes5,

Mark F. Davis5, Chung-Jui Tsai6 and David B. Neale1,3,7,8

1Department of Plant Sciences, University of California at Davis, Davis, CA 95616, USA; 2Section of Evolution and Ecology, University of California at

Davis, Davis, CA 95616, USA; 3Center for Population Biology, University of California at Davis, Davis, CA 95616, USA; 4Genetic Resources

Conservation Program, Greenwood Resources, Portland, OR, 97201 USA; 5National Renewable Energy Laboratory, Golden, CO, 80401 USA; 6School of

Forestry and Natural Resources, and Department of Genetics, University of Georgia, Athens, GA, 30602 USA; 7Bioenergy Research Center (BERC),

University of California at Davis, Davis, CA 95616, USA; 8Institute of Forest Genetics, USDA Forest Service, Davis, CA 95616, USA

Author for correspondence:David B. Neale

Tel: +1 530 754 8431

Email: [email protected]

Received: 2 March 2010

Accepted: 11 June 2010

New Phytologist (2010) 188: 515–532doi: 10.1111/j.1469-8137.2010.03415.x

Key words: association genetics, biofuels,black cottonwood (Populus trichocarpa),genotyping, lignin biosynthesis, linkagedisequilibrium, resequencing, singlenucleotide polymorphism (SNP).

Summary

• An association genetics approach was used to examine individual genes and

alleles at the loci responsible for complex traits controlling lignocellulosic bio-

synthesis in black cottonwood (Populus trichocarpa). Recent interest in poplars as

a source of renewable energy, combined with the vast genomic resources available,

has enabled further examination of their genetic diversity.

• Forty candidate genes were resequenced in a panel of 15 unrelated individuals

to identify single nucleotide polymorphisms (SNPs). Eight hundred and seventy-six

SNPs were successfully genotyped in a clonally replicated population (448 clones).

The association population (average of 2.4 ramets per clone) was phenotyped

using pyrolysis molecular beam mass spectrometry. Both single-marker and haplo-

type-based association tests were implemented to identify associations for

composite traits representing lignin content, syringyl : guaiacyl ratio and C6 sugars.

• Twenty-seven highly significant, unique, single-marker associations (false

discovery rate Q < 0.10) were identified across 40 candidate genes in three com-

posite traits. Twenty-three significant haplotypes within 11 genes were discovered

in two composite traits.

• Given the rapid decay of within-gene linkage disequilibrium and the high cover-

age of amplicons across each gene, it is likely that the numerous polymorphisms

identified are in close proximity to the causative SNPs and the haplotype associa-

tions reflect information present in the associations between markers.

Introduction

Forest trees are a potential source of net-zero carbon emis-sion lignocellulosic biofuels. The production of biofuelsinvolves the collection of biomass, deconstruction of cellwall polymers into component sugars (pretreatment andsaccharification) and conversion of these sugars to ethanol(fermentation) (Rubin, 2008). Woody bioenergy cropsfrom which biomass is derived have not been domesticatedfor this purpose and the current methods for lignocellulosic

saccharification and fermentation are inefficient. The recentneed to develop viable fuel alternatives is now taking advan-tage of genomics resources and technologies to discover thepotential gain that can be achieved through breeding. Traitsof interest in trees with applications in bioenergy includegrowth rate, branching habit, stem thickness and cell wallchemistry (Bradshaw et al., 2000). Rapid growth, moderategenome size, woody tissues and economic importance makeblack cottonwood (Populus trichocarpa) an ideal modelorganism to examine biofuels-related traits (Bradshaw et al.,

NewPhytologist Research

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532 515www.newphytologist.com

2000). Black cottonwood possesses tremendous genetic andphenotypic diversity, is obligately outcrossing, is able tohybridize with many other species and is easily clonallypropagated (Davis, 2008). To further complement theadvantages, black cottonwood is the first tree and bioenergyfeedstock to have its genome sequenced and annotated.Derived from a single wild individual (Nisqually-1), thegenome sequence represents an estimated 45 500 genesacross 19 chromosomes (Tuskan et al., 2006). In additionto the genome, resources such as controlled cross-populations,cross-species’ molecular markers, expressed sequence tag(EST) collections and full-length cDNAs are available tothe research community (Strauss & Martin, 2004; Ralphet al., 2006a,b; Tuskan et al., 2006).

Improvement of biofuels feedstocks focuses on increasingboth the relative carbon partitioning in woody tissues aboveground and the accessibility of cellulose for enzymatic diges-tion (Ragauskas et al., 2006). As with other woody species,the major components of black cottonwood secondary cellwalls are cellulose, hemicellulose and lignin (Harris et al.2008). Lignin inhibits saccharification in processes aimed atproducing simple sugars for fermentation to ethanol. Manystudies have focused on the molecular biology of wood andsecondary wall formation (Sterky et al., 1998, 2004;Plomion et al., 2001; Schrader et al., 2004). The pathwaysand genes involved in lignin and cellulose biosynthesis, andmicrofibril deposition, are increasingly becoming wellunderstood through biochemical analysis and expressionstudies (Whetten et al., 1998; Plomion et al., 2001; Liet al., 2003a,b; Peter & Neale, 2004; Schrader et al., 2004;Boerjan, 2005; Oakley et al., 2007). The specific roles ofgenes in these pathways have been verified through forwardand reverse genetic mutation studies (Dixon & Reddy,2003; Ralph et al. 2006a,b; Davis, 2008). A relatively unex-plored area of research is the identification of the naturalallelic variation controlling phenotype variation and theexploitation of this variation in breeding.

A major goal of population and quantitative genetics isthe identification of the polymorphisms responsible forphenotypic variation (Feder & Mitchell-Olds, 2003;Stinchcombe & Hoekstra, 2008). Many traits of interest inforest trees, such as wood quality, are complex in natureand occur later in development (Groover, 2007). Recentadvances in high-throughput marker technologies, combinedwith the wealth of genomic resources available to speciessuch as black cottonwood, have enabled a closer examina-tion of the number and effect sizes of genes responsible fortraits of interest through complex trait dissection using asso-ciation mapping. Tree species are ideal for association map-ping as they are predominantly outcrossing and have large,relatively unstructured, populations, resulting in high levelsof nucleotide diversity and low linkage disequilibrium(Neale & Savolainen, 2004; Gonzalez-Martinez et al.,2006). Significant associations between single nucleotide

polymorphisms (SNPs) within candidate genes have beenestablished in forest trees. Associations with wood qualitytraits in eucalyptus (Thumma et al., 2005), wood qualityand drought tolerance traits in loblolly pine (Gonzalez-Martinez et al., 2007, 2008), bud phenology traits inEuropean poplar (Ingvarsson et al., 2008) and cold hardiness-related traits in coastal Douglas fir (Eckert et al., 2009a)have been identified. In general, individual SNPs explaina small proportion of the phenotypic variance (0.5–5.0%),which is consistent with the complex nature of these traits.

In this study, statistical models were applied to performassociation tests and to account for population structure in579 SNPs from 40 candidate genes involved in lignocellu-losic cell wall synthesis in black cottonwood. Single-markerand haplotype-based tests were performed to identify associ-ations with natural variation in composite traits evaluatinglignin and cellulose content.

Materials and Methods

Association population and phenotypic data

Association population GreenWood Resources (Portland,OR, USA) assembled a collection of 1189 black cottonwood(Populus trichocarpa Torr. & A. Gray) clones from 101 prov-enances from 12 river drainages located west of the CascadeMountains between 480�56¢N (Nooksack River, WhatcomCounty, WA, USA) and 430�47¢N (Middle Fork,Willamette River, Lane County, OR, USA) latitudes duringthe period 1990 to 1999 (Fig. 1). The collection was estab-lished in clone banks where it was annually coppiced toremove C effects from planting stock used in the establish-ment of clonally replicated field trials in 1994, 1996, 1999and 2003. All four trials were planted at an alluvial site onthe lower Columbia River floodplain at Westport, OR, USA(460�08¢N). The soil is deep and moderately well drained,with a loam–silt loam surface overlaying a sandy loam to finesand horizon. Annual precipitation averages 2034 mm andthe average maximum temperature during the April–September growing season is 20�C.

Sample preparation and wood chemistry phenotyping Woodsamples were collected from a subset of 448 clones repre-senting all of the original provenances. Two Haglof 5 mmincrement cores were taken from the bark to the pith of upto three ramets per clone growing in the four Westportclone trials (Fig. 1b, Supporting Information Table S1).Cores were extracted at a diameter at breast height of1.37 m and placed in a )8�C freezer until sectioning.Sample preparation consisted of removing the two outer-most complete growth rings of each core because of thedifferent ages of the trees.

Ground wood samples (c. 4 mg) were prepared in stainlesssteel sample cups, and pyrolyzed using a Frontier Pyrolyzer

516 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

PY2020iD (Frontier Laboratories, Ltd. Fukushima, Japan).Pyrolysis was performed at 500�C using helium carrier gasflowing at 2.0 l min)1 (at STP). The transfer line connect-ing the pyrolysis unit to the molecular beam mass spectro-meter was heated to c. 400�C. The pyrolysis vapors wereexpanded through a ruby sampling orifice that was mateddirectly to the faceplate of the molecular beam mass spectro-meter. The total pyrolysis time was 30 s, although the pyro-lysis reaction was completed in < 12 s. A custom-builtmolecular beam mass spectrometer using an Extrel� ModelTQMS C50 mass spectrometer was used for pyrolysis vaporanalysis. Mass spectral data from mass to charge ratio (m ⁄ z)30–450 were acquired on a Merlin data acquisition systemusing 22.5 eV electron impact ionization. Using this system,both light gases and heavy tars are sampled simultaneouslyand in real time. The mass spectrum of the pyrolysis vaporprovides a rapid, semi-quantitative depiction of the molecularfragments. Data analysis was performed using Unscramblerv. 9.7 (CAMO A ⁄ S, Trondheim, Norway).

Resequencing, SNP discovery and genotyping

Candidate gene selection Forty candidate genes associatedwith lignocellulosic cell wall development, and well anno-tated in the JGI Poplar Genome Assembly v. 1.1, wereselected for resequencing (Table 1). These included 22genes from 11 gene families involved in lignin biosynthesisand polymerization, six genes from four families involved inone-carbon metabolism associated with lignin biosynthesis,and 12 genes from five families involved in cellulose

biosynthesis and microfibril deposition. The correspondinggene models were obtained from the poplar genome andmanually curated (Table 1).

DNA isolation, primer design and resequencing Leaftissue from the diversity panel of 15 unrelated poplar clones(one ramet per clone), selected to represent the latitudinalrange of the entire clone collection, was sampled as leafpunches, dried with silica gel and shipped at room tempera-ture to DNA Landmarks (Saint-Jean-sur-Richelieu, QC,Canada) for DNA extraction, utilizing their proprietarymicroscale protocol. All DNA extractions were standardizedto 2.5 ng ll)1 for resequencing. The same protocol wasused to extract DNA for the 448 clones, with all extractionsstandardized to 50 ng ll)1 prior to genotyping.

Primers were designed at Ampure Agencourt BioscienceCorporation (Beverly, MA, USA), utilizing custom softwareagainst the Poplar Genome Assembly v. 1.1. Genomicsequences covering the entire protein-coding regions,including introns and 1000 bp upstream and 300 bpdownstream noncoding sequences, were retrieved for pri-mer design. The program was set to design primers every700 bp, which yielded 517 primer pairs across the 40 genes.Of these, Agencourt utilized in-house software to select 200nonoverlapping primer pairs based on a quality metricrepresenting the redundancy in the genome and how likelythe amplicon is to be a homopolymer locus. The best-scoringpairs were tagged with M13F (GTAAAACGACGGCCAGT)and M13R (CAGGAAACAGCTATGACC) primers forhigh-throughput sequencing.

(a) (c)

(b)

Fig. 1 Descriptive information on the distribution, sampling localities and population structure across the range of black cottonwood. (a)Range map for black cottonwood. (b) Sample locations across Oregon and Washington. Each point denotes a single tree (n = 448). (c)Population structure estimates across all the sampled range of black cottonwood. Colors designate the five significant genetic clusters detectedusing principal components analysis (PCA). Multiple colors denote points with multiple clones assigned to different genetic clusters.

NewPhytologist Research 517

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

Genomic DNA was amplified in a 384-well formatpolymerase chain reaction (PCR) set-up. Each PCR con-tained 10 ng DNA, 1 · HotStar buffer, 0.8 mM deoxy-nucleoside triphosphates (dNTPs), 1 mM MgCl2, 0.2 UHotStar enzyme (Qiagen, Valencia, CA, USA) and0.2 lM forward and reverse primers in a 10 ll reaction.PCR cycling parameters were: one cycle of 95�C for15 min, 35 cycles of 9�C for 20 s, 60�C for 30 s and72�C for 1 min, followed by one cycle of 72�C for 3 min.The resulting PCR products were purified using solid-

phase reversible immobilization chemistry followed bydye-terminator fluorescent sequencing with universal M13primers. PCR for sequencing was initiated at 95�C for15 min followed by 40 cycles for 10 s, 50 cycles for 5 sand, finally, 60 cycles for 2 min 30 s. Dye-terminatorremoval was performed using solid-phase reversibleimmobilization. Bidirectional Sanger sequencing of PCRfragments was carried out via capillary electrophoresisusing ABI Prism 3730xl DNA analyzers (AppliedBiosystems, Foster City, CA, USA).

Table 1 Details of candidate genes selected for resequencing

Gene Amplicons SNPs targeted SNPs converted Gene family JGI gene model

4CL11 10 42 28 4-Coumarate:CoA ligase (4CL) estExt_fgenesh4_pg.C_12100044CL31 3 9 5 grail3.01000027024CL5 3 19 15 fgenesh4_pg.C_LG_III001773C3H3 3 10 10 Coumarate 3-hydroxylase (C3H) fgenesh4_pg.C_LG_VI000268C4H12 2 9 7 Cinnamate 4-hydroxylase (C4H) grail3.0094002901C4H22 3 15 10 estExt_fgenesh4_pg.C_LG_XIII0519CAD1,2 7 56 30 Cinnamyl alcohol dehydrogenase (CAD) estExt_Genewise1_v1.C_LG_IX2359CCR1,2 5 21 17 Cinnamoyl-CoA reductase (CCR) estExt_fgenesh4_kg.C_LG_III0056CesA1A1,2 10 43 27 Cellulose synthase (CesA) gw1.XI.3218.1CesA1B1,2 10 65 39 eugene3.00040363CesA2A2 6 25 19 gw1.XVIII.3152.1CesA2B1,2 6 32 18 estExt_Genewise1_v1.C_LG_VI2188CesA3A1 7 38 29 eugene3.00002636HCT11,2 5 33 18 Hydroxcinnamoyl-CoA quinate ⁄ shikimate

hydroxycinnamoyltransferase (HCT)fgenesh4_pg.C_LG_III001559

HCT62 5 27 18 eugene3.02080010KOR12 4 12 9 Cellulase (KOR) estExt_fgenesh4_pg.C_LG_I0683LAC1A2 5 15 13 Laccase (LAC) estExt_fgenesh4_pg.C_LG_XVI1027LAC2 2 7 5 estExt_fgenesh4_pg.C_LG_VIII0541LAC90A 5 29 16 estExt_fgenesh4_pm.C_LG_VIII0291PAL22 5 17 11 Phenylalanine ammonia-lyase (PAL) estExt_fgenesh4_pg.C_LG_VIII0293PAL4 4 24 11 gw1.X.2713.1PAL5 4 24 11 estExt_fgenesh4_pg.C_LG_X2023SAM11,2 3 20 12 S-Adenosylmethionine synthetase (SAMS) eugene3.00080928SHMT1 3 17 9 Serine hydroxymethyltransferase (SHMT) eugene3.00012227SHMT3 6 26 14 grail3.0003095602SHMT6 3 14 10 estExt_fgenesh4_pm.C_880008SUSY11,2 6 24 16 Sucrose synthase (SUSY) estExt_fgenesh4_pm.C_LG_XVIII0009TUA1 3 12 9 a-Tubulin (TUA) gw1.II.3483.1TUA51 3 19 11 eugene3.00090803TUB15 4 27 18 b-Tubulin (TUB) estExt_Genewise1_v1.C_LG_I1970TUB16 3 13 7 estExt_fgenesh4_pm.C_LG_IX0457TUB9 3 18 12 eugene3.00010909CoAOMT12 3 14 7 Caffeoyl CoA O-methyltransferase (CCoAOMT) grail3.0001059501CoAOMT2 3 24 16 estExt_fgenesh4_pm.C_LG_I1023COMT1 4 26 14 Caffeate O-methyltransferase (COMT) estExt_fgenesh4_pm.C_LG_XV0035COMT22 4 17 13 estExt_fgenesh4_pm.C_LG_XII0129F5H1 4 14 9 Ferulate 5-hydroxylase (F5H) estExt_fgenesh4_pm.C_570058F5H2 3 9 5 eugene3.00071182gdcH1 6 36 22 Glycine decarboxylase complex, H (gdcH) estExt_fgenesh4_pg.C_LG_XII1299gdcT2 3 10 8 Glycine decarboxylase complex, T (gdcT) eugene3.02520018

Single nucleotide polymorphisms (SNPs) targeted, SNPs identified and sent for genotyping on the Illumina GoldenGate assay; SNPs converted,SNPs successfully genotyped on the Illumina GoldenGate assay.1Genes with significant haplotype-based associations.2Genes with significant single-marker associations.

518 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

SNP discovery and selection Sanger resequencingproduced a total of 202 amplicons (600–700 bp in length)representing 40 genes (3–12 amplicons per gene). Thepackage, PineSAP (Pine Sequence Alignment and SNPIdentification) (Wegrzyn et al., 2009), applied a combina-tion of ProbConsRNA (Do et al., 2005), Polyphred(Nickerson et al., 1997), Polybayes (Marth et al., 1999)and machine learning techniques to align sequences from195 of the 202 amplicons and to computationally identify1485 polymorphisms (an average of seven SNPs per ampli-con). SNP detection of the resulting calls was based oninformation gathered on quality scores, coverage and align-ment metrics computed during the sequence alignments.The identified polymorphisms and their flanking sequenceswere formatted for the GoldenGate assay (Illumina, SanDiego, CA, USA) and submitted to their in-house softwarepackage responsible for assigning design scores. Anadditional 1233 SNPs from 232 genes were identified forpopulation structure inference through eSNP methods,utilizing ESTs from male and female catkin tissue alignedto the reference genome (Unneberg et al., 2005). To con-struct the 1536 assay, we selected 948 high scoring SNPsfrom the 40 lignin ⁄ cellulose genes and 588 high scoringeSNPs from the 232 catkin ESTs.

SNP genotyping Genotyping was carried out using theIllumina GoldenGate SNP genotyping platform (Landegrenet al., 1998; Oliphant et al., 2002; Fan et al., 2003; Eckertet al., 2009b) at the DNA Technologies Core Facility(University of California at Davis, Davis, CA, USA). Theassay involves the generation of templates with specific tar-get and address sequences using allele-specific extension,followed by ligation and amplification with universal prim-ers. Fluorescent products are hybridized to coded beads onan array matrix and the signal intensities are subsequentlydetermined using the BeadArray Reader (Illumina). Signalintensities are quantified and matched to specific allelesusing BeadStudio v. 3.1.14 (Illumina). Manual adjustmentsto genotypic clusters were made when necessary. For theinclusion of SNPs into the final dataset, we used thresholdsof 0.20 and 0.60 for the GenCall50 (GC50) and call rate(CR) indices, respectively (Table S2). These are establishedquality metrics that have been used to evaluate Illuminagenotyping data (Pavy et al., 2008; Eckert et al., 2009b).The scores reflect the quality genotypic clusters (GC50) andthe fraction of samples having a genotype defined for aparticular SNP.

Tests for association

Genetic diversity, population structure and linkagedisequilibrium For each SNP, we estimated the expectedand observed heterozygosity, Wright’s inbreeding coeffi-cient (FIS) and hierarchical fixation indices using the

Genetics and HIERFSTAT packages available in R(Goudet, 2005; Warnes and Leisch, 2006; R DevelopmentCore Team, 2007). We excluded those SNPs with|FIS| > 0.25 from further analyses. The significance ofmultilocus fixation indices was tested via bootstrappingacross loci (n = 10 000 replicates) to obtain 99% confi-dence intervals (99% CI). Patterns of population structurewere further examined using principal components analysis.Population structure coefficients were estimated usingEigenstrat v. 2.0 (Price et al., 2006). For association analy-ses, a Q matrix defined by significant principal components,as assessed using the Tracy–Widom distribution (Pattersonet al. 2006), was utilized. Cluster membership was deter-mined via hierarchical cluster analysis using Ward’s linkageand Euclidean distances on the significant principal compo-nents. The number of clusters was identified as k + 1, wherek is the number of significant principal components. Weidentified FST outliers using the bivariate distribution ofexpected heterozygosity and FST among inferred clustersobserved for the 297 eSNPs to define the genome-wideexpectation of background levels of genetic structure.Lignin SNPs falling outside this distribution were identifiedas FST outliers.

Linkage disequilibrium (LD) was measured as thesquared correlation of allele frequencies r2 (Hill &Robertson, 1968), which is affected by both recombinationand differences in allele frequencies between sites. The r2

value between pairs of informative SNP sites in candidategenes was calculated using the Genetics package in R(Warnes and Leisch 2006; R Development Core Team,2007). Patterns of LD were investigated among SNPs from39 of the 40 candidate genes. CesA1A was not included inthis analysis because of physical annotation differences inthe reference genome. To assess the extent of LD in thesequenced genomic regions, the decay of LD with physicaldistance (base pairs) between SNP sites within each candi-date locus and over all candidate genes was evaluated bynonlinear regression analysis of r2 values (Remington et al.,2001). The expectation of r2 for low mutation rates andtaking into account sample size is given by:

E ðr2Þ= 10þC

ð2þC Þð11þC Þ

� �1þð3þC Þð12þ12C þC 2Þ

nð2þC Þð11þC Þ

� �

where C is the population recombination parameter(P = 4Ner) and n is the sample size; C was replaced byC · distance in base pairs when fitting the formula to ourdata using the nonlinear regression (nls) function in R (RCore Development Team, 2007).

Statistical models

Single-marker models were utilized for all SNP–trait com-binations. A general linear model was fitted to each trait–

NewPhytologist Research 519

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

SNP combination (Yu et al., 2006), with SNP markers asfixed effects and elements of the Q matrix as covariates. Pvalues were generated for each test using 10 000 permuta-tions of genotypes with respect to phenotypic trait values.All analyses were conducted using TASSEL v. 2.0.1(Bradbury et al., 2007). Corrections for multiple testingwere performed using the positive false discovery rate(FDR) method (Storey, 2002; Storey & Tibshirani, 2003).All the necessary data to perform these analyses are availablein Tables S4 and S5. Modes of gene action were quantifiedusing the ratio of dominance (d) to additive (a) effects esti-mated from least-square means for each genotypic class.Partial or complete dominance was defined as values in therange 0.50 < |d ⁄ a| < 1.25, whereas additive effects weredefined as values in the range )0.50 £ d ⁄ a £ 0.50.Values of |d ⁄ a| > 1.25 were equated with over- or under-dominance.

Haplotypes were inferred and their frequencies wereestimated using the modified expectation maximizationmethod of haplotype inference included in the haplo.stats(v. 2.0.1) program available in R (Schaid et al., 2002; RCore Development Team, 2007). Input consisted of geno-type matrices with principal components analysis valuesand phenotypic values organized by tree sample (Table S6).Singleton alleles were ignored when constructing the haplo-types, and haplotypes with a frequency < 5 were also dis-carded. Output in the form of global score statistics andhaplotype-specific scores was derived from generalizedlinear models (haplo.score). Corrections for multiple testingwere performed using the positive FDR method (Storey,2002; Storey & Tibshirani, 2003).

Results

Phenotype

Wood samples were analyzed using pyrolysis molecularbeam mass spectrometry. The intensities of the majorpeaks assigned to lignin were summed in order to estimatethe lignin content, syringyl : guaiacyl (S : G) ratios, C5sugars and C6 sugars across the range of samples(Table 2). Lignin content was calculated with peaks atmass to charge ratio (m ⁄ z) of 124, 137, 138, 150, 152,164 and 178; these were summed and then averaged forthe different samples. S : G ratios were determined bysumming the S peaks and then dividing by the sum of Gpeaks. C5 and C6 sugars were calculated as the sum oftheir respective peaks. Visualization of each phenotypedemonstrated a strongly bimodal distribution for the C5trait as opposed to the distributions for the other threecomposite traits, which were approximately normal. As aresult, C5 was not included in subsequent analyses. S : Gratios ranged from 1.2 to 2.4, and lignin content rangedfrom 15.8 to 27.5%.

Genotyping results

The 1536 SNPs chosen for genotyping using the IlluminaGoldenGate platform represent 948 from 40 candidategenes (20 gene families and 202 amplicons), with 7–65SNPs per gene, and 588 from the 232 catkins ESTs(Table 1). Of the 1536 SNPs, 874 (57%) yielded data con-sistent with our quality thresholds (579 candidate geneSNPs and 297 eSNPs). A conversion rate of 61% (579SNPs) was observed among the 948 SNPs from the rese-quenced 40 lignin ⁄ cellulose candidate genes, as opposed to51% for the eSNPs. The median GC50 score across allusable SNPs was 0.71 and the median CR score was 0.72.Quality scores across the genotyped loci are summarized inTable S2. The distribution of the quality metrics for geno-typed SNPs, grouped by dataset, is shown in Fig. S1. Themajority of the 579 successfully genotyped SNPs weresilent, with nonsynonymous SNPs accounting for 19% ofthe total.

Population structure

Principal components analysis on the 488 clones using 297eSNPs revealed four significant principal components,explaining 10% of the overall variance. From these fourprincipal components, five clusters were formed using hier-archical clustering with Ward’s linkage method. All fiveclusters illustrated a latitudinal trend, with the Columbia

Table 2 Major peak assignments from pyrolysis molecular beammass spectrometry

m ⁄ z Assignment(S) or (G)precursor

57, 73, 85, 96, 114 C5 sugars57, 60, 73, 98, 126, 144 C6 sugars124 Guaiacol G137 Ethylguaiacol, homovanillin,

coniferyl alcoholG

138 Methylguaiacol G150 Vinylguaiacol,

coumaryl alcoholG

152 4-Ethylguaiacol, vanillin G154 Syringol S164 Allyl-*propenyl guaiacol G167 Ethylsyringol,

syringylacetone,propiosyringone

S

168 4-Methyl-2,6-dimethoxyphenol

S

178 Coniferyl aldehyde G180 Coniferyl alcohol,

vinylsyringol, a-D-glucoseG, S

182 Syringaldehyde S194 4-Propenylsyringol S208 Sinapyl aldehyde S210 Sinapyl alcohol S

m ⁄ z, mass to charge ratio; S, syringyl peaks; G, guaiacyl peaks.

520 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

River delineating a major geographical north–south separa-tion (Fig. 1c). These five clusters also illustrated significantgenetic structuring as estimated using FST, as well as signifi-cant differences among means for the three composite traits.The average FST was low for both sets, but greater for thelignocellulosic SNPs (FST = 0.034; 99% CI, 0.028–0.042)as opposed to the eSNPs (FST = 0.013; 99% CI, 0.011–0.016) SNPs. A comparison of the distribution of FST foreach set revealed that seven genes had values of FST greaterthan any observed for the eSNPs (Fig. S4). These outlierswere concentrated within the CesA3A, CAD, SUSY1, 4CL1,CesA2B, TUB15 and CesA1B genes (Fig. S4).Polymorphisms within these genes had values of FST

approximately five to 10-fold greater than the multiple

locus average. Cluster 1, which was distributed primarilysouth of the Columbia River, also had significantly differentmeans for lignin, S : G and C6 (ANOVA: P < 2.0 · 10)6;Tukey multiple comparison tests: P < 0.01). Additionalsummaries of genetic diversity across all SNPs and clustersare given in Table S3 and Figs S1–S5.

Linkage disequilibrium

All r2 values were pooled to assess the overall behavior ofLD for the candidate genes and to estimate the genome-wide degree of LD in black cottonwood. Fig. 2(b) showsthe extent of LD across the sequenced regions. The fittedcurve indicates that LD is generally low in black

(a)

(b) (c)

Fig. 2 (a) Decay of linkage disequilibrium (LD) with distance in base pairs between sites in two candidate genes: SUSY1 and C4H1. Squaredcoefficients of allele frequency (r2) are plotted against distance in base pairs. The fitted curve represents the trend of decay of LD. (b) Decay ofLD with distance in base pairs between sites pooled across 39 genes. (c) Decay of LD across all candidate genes for the first 400 base pairsfrom that presented in (b).

NewPhytologist Research 521

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

cottonwood, rapidly decaying by over 50% (from 0.50 to0.20) within a distance of c. 200 bp (Fig. 2b,c). Withincandidate genes, the average distance associated with LDdecline to r2 = 0.1 varies from c. 200 to c. 600 bp (Fig. 2a).

Overall summary of single SNP and haplotype-basedassociations

A total of 1734 (579 SNPs · 3 traits) single-marker associa-tion tests were performed. Of these, 65 were significant atthe threshold of P < 0.05. Multiple test corrections usingthe FDR method reduced this number to 37 at a signifi-cance threshold of Q < 0.10. A total of 13 lignin content,one S : G and 23 C6 sugar content associations were

identified (Table 3). The 37 associations represent 27unique SNPs from 40 candidate genes. Many of the 37SNPs that exhibited significant associations with at leastone trait were consistent with codominance (Table 4). Fourof the 34 markers for which dominance and additive effectscould be calculated were consistent with overdominance(|d ⁄ a| > 1.25). The remaining 30 markers were split betweenmodes of gene action that were codominant (|d ⁄ a| < 0.50,25) or partially to fully dominant (0.50 < |d ⁄ a| < 1.25, 5).

Among haplotype-based associations, 181 ampliconswere analyzed (after the removal of singletons) and 17amplicons from 13 unique genes were significant, with aglobal significance threshold of P < 0.05 (Table 5).Multiple test corrections using the FDR method reduced

Table 3 List of significant marker–trait pairs after a correction for multiple testing [false discovery rate FDR (Q) £ 0.10]

Trait Gene symbol SNP F P N R2 Q

LigninC4H1_04-219 [A:C]ns 4.6766 0.0013 433 0.0187 0.0395C4H2_09-169 [A:C]nc 9.8329 0.0001 433 0.0384 0.0178CCR_08-554 [A:G]nc 5.9541 0.0014 435 0.0119 0.0395CesA1A_20-226 [A:G]nc 4.0516 0.0015 432 0.0163 0.0402CesA1B_02-87 [A:G]nc 4.0226 0.0024 427 0.0163 0.0482CesA1B_04-127 [A:C]nc 5.7095 0.001 432 0.0227 0.0288CesA1B_08-261 [A:G]nc 5.4417 0.001 434 0.0216 0.0288CesA2A_08-38 [A:G]ns 8.6111 0.0011 431 0.0172 0.0288HCT1_03-246 [A:G]nc 3.9879 0.0027 434 0.0159 0.0482HCT6_13-225 [A:G]s 5.4364 0.0001 433 0.0217 0.0178SUSY1_02-108 [A:T]nc 6.7751 0.0001 433 0.0268 0.0178SUSY1_10-258 [A:C]nc 6.7036 0.0001 433 0.0265 0.0178SUSY1_14-94 [A:G]nc 3.7898 0.0027 434 0.0152 0.0482

S : GCAD_04-185 [A:T]nc 6.5211 0.001 325 0.0322 0.0288

C6C4H2_09-169 [A:C]nc 9.9962 0.003 433 0.0373 0.0444C4H2_12-151 [A:G]s 3.535 0.0027 429 0.0137 0.0487CAD_04-185 [A:T]nc 3.4754 0.0033 325 0.0175 0.0487CesA1A_02-481 [A:C]nc 5.6615 0.008 432 0.0216 0.0766CesA1A_12-40 [A:G]s 3.593 0.0033 433 0.0138 0.0487CesA1A_20-226 [A:G]nc 5.7281 0.003 432 0.0218 0.0444CesA1B_04-127 [A:C]nc 3.4124 0.0037 432 0.0131 0.0487CesA1B_08-261 [A:G]nc 3.4197 0.0035 434 0.0131 0.0487CesA2A_08-38 [A:G]ns 8.1264 0.008 431 0.0157 0.0766CesA2B_01-162 [A:C]nc 3.5186 0.0026 431 0.0135 0.0487CoAOMT1_08-313 [A:G]nc 7.2144 0.002 431 0.0272 0.0344COMT2_10-423 [A:C]nc 3.3459 0.0046 397 0.014 0.0487HCT6_13-225 [A:G]s 3.1681 0.0044 433 0.0122 0.0487LAC1a_03-98 [A:G]nc 4.6733 0.0013 424 0.0184 0.0487LAC1a_11-493 [A:G]nc 2.8918 0.0049 433 0.0111 0.0487PAL2_04-212 [A:G]nc 3.5212 0.0021 432 0.0135 0.0487SAM1_09-195 [A:T]nc 4.1603 0.0015 422 0.0162 0.0487SUSY1_02-108 [A:T]nc 5.4253 0.004 433 0.0207 0.0366SUSY1_02-396 [A:G]nc 3.0401 0.0042 430 0.0118 0.0487SUSY1_02-503 [A:G]nc 3.1539 0.0031 432 0.0122 0.0487SUSY1_10-258 [A:C]nc 4.3819 0.0014 433 0.0168 0.0487SUSY1_14-128 [A:T]nc 3.2779 0.0035 434 0.0126 0.0487SUSY1_14-94 [A:G]nc 3.2779 0.0045 434 0.0126 0.0487

ns, nonsynonymous polymorphism; s, synonymous polymorphism; nc, noncoding polymorphism.

522 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

this number to 14 amplicons (13 unique genes and 71 haplo-types) at a global significance threshold of Q < 0.10.

Lignin associations

Lignin composition was represented by averaging values ofguaiacyl precursor peaks. A total of 13 significant single-marker associations were found for nine candidate genesassociated with lignin content (Table 3). Three of thesignificant marker–trait associations were located in thecoding region and 10 in the noncoding region. Two of the

significant associations were nonsynonymous (C4H1,CESA2A) and one was synonymous (HCT6). Individually,each of the 13 markers explained a small proportion of thephenotypic variance, with effects ranging from 1.2% to 3.8%.

Eleven significant haplotype associations from 10 uniquegenes were identified for lignin content (Table 5). Eightamplicons, representing seven unique genes, had at leastone significant haplotype after multiple test corrections(Table 5). Three of the amplicons did not have significantindividual haplotypes and included regions of three candi-date genes (CCR, CesA2B and TUA5). From the eight

Table 4 List of marker effects for significant marker–trait pairs

Trait SNP 2a1 d2 d ⁄ a 2a ⁄ sp3 Frequency4 a5

LigninC4H1_04-219 1.1128 )0.2097 )0.3769 0.8912 0.17 (C) 0.7467C4H2_09-169 5.4356 )2.8933 )1.0646 4.3533 0.01 (A) 5.2266CesA1A_20-226 0.8812 )0.2481 )0.5631 0.7057 0.32 (A) 0.3124CesA1B_02-87 0.7162 0.1487 0.4154 0.5736 0.30 (G) )0.7869CesA1B_04-127 0.8864 0.1855 0.4185 0.7099 0.28 (A) )0.6741CesA1B_08-261 0.8609 0.1475 0.3427 0.6895 0.28 (A) )0.5574HCT1_03-246 1.7016 )0.0340 )0.0400 1.3628 0.07 (A) 1.4719HCT6_13-225 1.1007 )0.8865 )1.6108 0.8815 0.12 (G) 0.8653SUSY1_02-108 0.6518 0.5331 1.6356 0.5220 0.13 (T) 0.3869SUSY1_10-258 0.4200 0.7762 3.6963 0.3363 0.07 (A) 0.2358SUSY1_14-94 1.8197 )0.1772 )0.1947 1.4574 0.06 (A) 1.5948

S : GCAD_04-185 0.1655 0.0268 0.3236 0.7762 0.33 (A) )0.4801

C6C4H2_09-169 9.3773 4.0891 0.8721 4.4291 0.01 (A) )9.3344C4H2_12-151 0.9478 )0.2340 )0.4938 0.4477 0.45 (G) )0.7508CAD_04-185 1.3827 0.3715 0.5373 0.6531 0.33 (A) )8.8239CesA1A_02-481 4.4213 2.1152 0.9568 2.0883 0.08 (C) )4.2124CesA1A_12-40 2.0992 0.2777 0.2645 0.9915 0.13 (A) )1.8498CesA1A_20-226 1.6669 0.5787 0.6943 0.7873 0.32 (A) )1.1166CesA1B_04-127 1.5740 )0.3659 )0.4649 0.7435 0.28 (A) 0.7040CesA1B_08-261 1.5529 )0.2997 )0.3860 0.7335 0.28 (A) 0.8447CesA2B_01-162 1.2053 0.2626 0.4358 0.5693 0.30 (A) )0.9912CoAOMT1_08-313 0.9396 )0.4899 )1.0429 0.4438 0.47 (A) )0.0247COMT2_10-423 1.3815 )0.1172 )0.1697 0.6525 0.44 (C) )2.4781HCT6_13-225 1.8043 1.2473 1.3826 0.8522 0.12 (G) )1.7190LAC1a_03-98 2.5171 0.2932 0.2329 1.1889 0.14 (G) )2.7865LAC1a_11-493 1.3099 0.0851 0.1299 0.6187 0.26 (A) 0.6407PAL2_04-212 0.9397 )1.5024 )3.1978 0.4438 0.05 (G) )1.0696SAM1_09-195 0.8770 0.7918 1.8058 0.4142 0.32 (A) )1.4465SUSY1_02-108 1.4921 )0.6568 )0.8804 0.7047 0.13 (T) )1.3030SUSY1_02-396 1.0363 )0.1138 )0.2196 0.4895 0.23 (G) )1.0285SUSY1_02-503 0.9770 )0.1478 )0.3024 0.4615 0.23 (G) )0.8377SUSY1_10-258 1.5603 )0.8029 )1.0292 0.7370 0.07 (A) )1.4921SUSY1_14-128 3.6518 0.5908 0.3236 1.7248 0.06 (A) )3.3928SUSY1_14-94 3.6518 0.5908 0.3236 1.7248 0.06 (A) )3.3928

1Calculated as the difference between the phenotypic means observed within each homozygous class (2a = |GBB)Gbb|, where Gij is the traitmean in the ijth genotypic class).2Calculated as the difference between the phenotypic mean observed within the heterozygous class and the average phenotypic mean acrossboth homozygous classes [d = GBb)0.5(GBB + Gbb), where Gij is the trait mean in the ijth genotypic class].3sp, standard deviation for the phenotypic trait under consideration.4Allele frequency of either the derived or minor allele. Single nucleotide polymorphism (SNP) alleles corresponding to the frequency listed aregiven in parentheses.5The additive effect was calculated as a = pB(GBB) + pb(GBb))G, where G is the overall trait mean, Gij is the trait mean in the ijth genotypicclass and pi is the frequency of the ith marker allele. These values were always calculated with respect to the minor allele.

NewPhytologist Research 523

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

amplicons with at least one significant haplotype, just one(SUSY1) was supported with a single-marker association inthe same trait (SUSY1_02-108). The remaining five candi-date genes (4CL1, 4CL3, CesA1B, CesA3A and HCT1) hadat least one supporting single-marker association, with a Pvalue of < 0.05 before multiple test corrections.

S : G ratio associations

The S : G ratio phenotype is a result of the seven S peaks tothe six G peaks. Analysis of the S : G trait resulted in onesignificant marker–trait association (Table 3). This markeris noncoding and explained a small proportion of the phe-notypic variance (3.2%). Haplotype-based tests did notreveal significant associations.

C6 sugar associations

C6 sugars were represented by summing the values of sixpeaks. A total of 23 significant associations was found in 13candidate genes associated with C6 sugars (Table 3). Fourof the significant marker–trait associations were located in

coding regions. Three of these SNPs were synonymous forthree different candidate genes (CESA1A, C4H2, HCT6)and one significant association was nonsynonymous(CESA2A). Four marker–trait associations in two candidategenes were highly significant and unique only to the C6phenotype (SUSY1, CESA1B). All 23 markers explained asmall proportion of the phenotypic variance, with individ-ual effects ranging from 1.1% to 3.7%.

Three amplicons representing three unique candidategenes (4CL1, CesA1A and SAM1) were significant in termsof haplotype-based associations with C6 (Table 5). All threeamplicons were highly significant (Q < 0.05) with respectto C6 sugars and contained at least one significant individ-ual haplotype after multiple test corrections (Q < 0.10).One candidate gene (CesA1A) contained a significantsingle-marker association in the same amplicon and associ-ated with the same trait (CesA1A_12-40).

Discussion

Strategies for the domestication of forest trees using eitherconventional or novel molecular breeding approaches are

Table 5 List of haplotypes with significant associations to phenotype after a correction for multiple testing [false discovery rate (FDR)Q £ 0.10]

Amplicon Trait P Q HaplotypesSignificanthaplotypes

Haplotypefrequency Single-marker associations

4CL1_11 Lignin 0.0042 0.0539 3 TGC 0.31 4CL1_11-108 (0.2278)1

AGC 0.944CL3_14 Lignin 0.0021 0.0519 6 CGT 0.02 4CL3_13-464 (0.2041)1

GGT 0.09GGA 0.22

CAD_04 Lignin 0.0065 0.0578 9 CAAAAT 0.03 CAD_04-185 (S ⁄ G, C6)2

CATAAT 0.01GATAAT 0.02

CCR_12 Lignin 0.0060 0.0578 4 0 CCR_12-366 (0.2168)1

CesA1B_10 Lignin 0.0038 0.0539 6 AGA 0.15 CesA1B_10-41 (0.3726)1

CesA2B_16 Lignin 0.0055 0.0576 2 0 CesA2B_16-423 (0.2967)1

CesA3A_09 Lignin 0.0018 0.0519 5 TAAAAA 0.01 CesA3A_09-93 (0.2068)1

CesA3A_13 Lignin 0.0022 0.0519 7 CGGAA 0.15 CesA3A_13-535 (0.1777)1

CAAAT 0.04CGGCT 0.02CAACT 0.57

HCT1_12 Lignin 0.0016 0.0519 3 AA 0.73 HCT1_12-156 (0.1828)1

GA 0.08SUSY1_02 Lignin 0.0053 0.0576 3 AAAA 0.77 SUSY1_02-108 (lignin, C6)2

SUSY1_02-396, SUSY1_02-503 (C6)2TGGG 0.13TUA5_09 Lignin 0.0027 0.0521 7 0 TUA5_09-73 (0.1899)1

4CL1_01 C6 0.0000 0.0018 5 AGA 0.12 4CL1_01-468 (0.1668)1

AAA 0.02CesA1A_12 C6 0.0005 0.0231 6 AGA 0.09 CesA1A_12-40 (C6)2

AAA 0.04GAG 0.20

SAM1_07 C6 0.0008 0.0239 5 AGAA 0.01 SAM1_07-480 (0.2874)1

GGAA 0.30

1Single-marker associations with the lowest Q value relating to the significant haplotype–trait association.2Significant single-marker associations (FDR Q £ 0.10) listed with the associated traits.

524 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

centered around the exploitation of existing genetic diver-sity. Over the past few decades, genetic maps have beenmade for many forest tree species and quantitative trait locihave been mapped for a range of traits (Brown et al., 2003).The lack of resolution in mapping candidate genes andquantitative trait loci alleles can be overcome by associa-tion genetics, using natural populations in which the longevolutionary history has decreased the extent of LD inpopulations (Neale & Savolainen, 2004). An importantprerequisite for association mapping is the availability oflarge allelic variation in the population. LD describes a keyaspect of genetic variation in natural populations of plants.This study is the first examination of genome-wide LD inblack cottonwood, and enables comparison with otherpoplars. We examined LD across 39 of the candidate genes(Fig. 2b,c), and observed a rapid decay of LD within just afew hundred base pairs, indicating the potential of associa-tion genetics to identify the genes responsible for variationin the trait. Previous studies in both P. tremula (five genes)and P. nigra (nine genes) showed a similar rapid decay ofLD (Ingvarsson, 2005; Chu et al., 2009). LD was demon-strated to decay over significantly longer distances in arecent study across over 300 randomly selected gene frag-ments in the closely related P. balsimifera (Olsen et al., 2010).

This study examined both single-marker associations andhaplotype-based tests to account for information present inthe associations between markers, as well as directly betweenan SNP and the trait. Given the structure of our data, a nat-ural way to apply the knowledge of LD within and betweengenes is to perform haplotype-based association tests. Thepower of a single-marker association test is often limitedbecause LD information contained in flanking markers isignored. Intuitively, haplotypes (which are essentially acollection of ordered markers) may be more powerful thanindividual, nonordered markers. This study demonstratesthat the use of haplotypes can increase significantly theability to map traits of interest.

Candidate genes known to be involved in lignocellulosiccell wall development were examined for genetic associa-tions. There are two major steps of lignin biosynthesis inplants: monolignol biosynthesis and the subsequent poly-merization of lignin monomers to form polymers. Thisbiochemical pathway is highly conserved throughout vascularplants, and many of the enzymes have been identified andcharacterized (Boerjan et al., 2003; Xu et al., 2009). Thecellulose biosynthesis pathway involves the synthesis andassembly of b-1,4 glucan chains at the rosette terminal com-plex, and their orderly deposition to form cell wall microfi-brils. Although several candidate genes have been identified,the precise molecular mechanism of cellulose biosynthesisand microfibril deposition in plants is still not clearlyunderstood. Genetic improvement of lignin and cellulosebiosynthesis in trees continues to be a major research prior-ity. Similar to other commercial applications for black

cottonwood, modified lignin structure (chemical reactivity)and increased cellulose content are desirable traits. Mechanismsthat can increase C6 sugar content and decrease C5 sugarcontent of hemicelluloses are favorable for fermentation.

In the monolignol biosynthetic pathway, the first stepconsists of a deamination of phenylalanine by phenylalanineammonia-lyase (PAL) that produces cinnamic acid. PAL isencoded by a small multigene family (Appert et al., 1994;Osakabe et al., 1995; Cochrane et al., 2004), and fiveisoforms have been annotated in the poplar genome (Tsaiet al., 2006). In this study, markers in PAL2, PAL4 andPAL5 were genotyped. A single-marker noncoding associa-tion was identified with PAL2 that explained 1.4% of thephenotypic variation in C6 sugars (Table 3). In aspen(P. tremuloides) stem, PAL2 transcripts have been localizedto developing xylem cells, consistent with its involvement inlignin biosynthesis (Kao et al., 2002).

C4H catalyzes the first oxidative reaction in phenylpropa-noid metabolism, namely the conversion of cinnamic acid top-coumaric acid (Sewalt et al., 1997). Three C4H genes havebeen characterized in black cottonwood (Lu et al., 2006).C4H1 is proposed to be associated with G lignin deposition,whereas C4H2 is thought to be involved in S ligninbiosynthesis (Lu et al., 2006). Four unique single-markerassociations were identified in the C4H1 and C4H2 genesexamined in this study. A significant nonsynonymousassociation in exon 1 of C4H1 with lignin demonstratedmodes of gene action consistent with additive effects(Table 3; Fig. 3). The C allele at C4H1_02-219 is the minorallele and causes a histidine (H) fi proline (P) amino acidsubstitution. Heterozygotes for the marker had a percentagevalue of lignin composition that was intermediate to eitherhomozygote class (21.9% for A ⁄ A, 22.7% for A ⁄ C, 23.2%for C ⁄ C). A similar study in European maize identified a non-synonymous SNP in the first exon of C4H1 associated withforage quality traits (Andersen et al. 2008). Physiologicalstudies of these genes describe unique functions for theisoforms within the lignin biosynthetic pathway.

4-Coumarate:CoA ligase (4CL), which catalyzes theformation of CoA esters of p-coumaric acid and its deriva-tives, has a pivotal role in channeling phenylpropanoid pre-cursors into different downstream pathways, each leading toa variety of functionally distinct end products (Hardinget al., 2002). 4CL is also encoded by multigene families,with five isoforms annotated in the poplar genome (Tsaiet al., 2006). Although we were unable to identify signifi-cant single-marker associations in 4CL1, 4CL3 and 4CL5,significant associations with haplotypes in 4CL1 and4CL3 were observed for both lignin and C6 traits. Of thefive haplotypes (spanning 389 bp) in 4CL1_01, two signifi-cant associations demonstrated an effect on C6 sugarcontent (35.1% for AGA and 34.1% for AAA). In lignincomposition, two haplotypes of 4CL1_11 demonstrated adifference of > 1% in lignin composition (Table 5;

NewPhytologist Research 525

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

Fig. 4b). Three single markers in 4CL_11 at P < 0.05 werefound to be LD, and their individual genotypic effects onlignin composition were small in comparison with the span-ning haplotype block (Fig. 4b). The reduction of 4CLexpression in transgenic poplar has resulted in significantreductions of lignin, ranging from 5% to 45% (Hu et al.,1999; Li et al., 2003a,b).

Hydroxycinnamoyl-CoA transferase (HCT) is the mostrecently identified enzyme in monolignol biosynthesis andbelongs to a large family of acyltransferases (Hoffmannet al., 2003a,b). It catalyzes the conversion of p-coumaroyl-CoA and caffeoyl-CoA to their corresponding shikimate orquinate esters. Two of the six annotated HCT genes in thePopulus genome (HCT1 and HCT6) are expressed in devel-oping xylem (Tsai et al., 2006). HCT6_13-225 was a sig-nificant synonymous marker in both lignin and C6(Table 3). Two significant haplotypes in HCT1_12 wereassociated with lignin composition (Table 5). HCT has notbeen transgenically manipulated in poplar; however, RNAi-mediated silencing of HCT in conifers (Pinus radiata) that

do not produce S lignin had a strong impact on lignin con-tent (42% reduction), monolignol composition and inter-unit linkage distribution (Wagner et al., 2007). A similarstudy of HCT in Arabidopsis showed a reduction in lignincontent and an increased G lignin deposition (Hoffmannet al. 2004).

p-Coumaroyl-CoA shikimate proceeds through a series oftransformations into caffeoyl-CoA shikimate, caffeoyl-CoA,feruloyl-CoA and coniferaldehyde by the action of theenzymes p-coumaroyl-CoA 3¢-hydrolase (C3¢H), HCT,caffeoyl-CoA O-methyltransferase (CCoAOMT) and cinna-moyl CoA reductase (CCR), respectively. CCoAOMT, cata-lyzing the methylation of caffeoyl-CoA to feruloyl-CoA, iscritical in maintaining lignin structural integrity(Meyermans et al., 2000; Zhong et al., 2000). In the twoindependent studies referenced, antisense downregulationof CCoAOMT1 in transgenic hybrid poplar (P. tremula ·P. alba) resulted in reduced lignin content as well as alteredS : G ratio. In this study, markers from CCoAOMT1 andCCoAMOT2 were genotyped. CCoAOMT1 had one

(a)

(b)

Fig. 3 Marker effects on the significant nonsynonymous single nucleotide polymorphisms (SNPs) found in C4H1 and CesA2A. (a) TheC4H1_04-219 nonsynonymous marker in the first exon of the C4H1 gene illustrates patterns of gene action consistent with additive effects.The C allele at C4H1_04-219 causes a histidine (H) to proline (P) amino acid substitution. (b) The CesA2A_08-38 nonsynonymous marker islocated in the sixth exon of the CesA2A gene. This SNP is significant for both lignin content and C6 traits. For lignin content, the homozygotedecreases the percentage content, whereas, in C6, the sugar content is elevated. The G allele at CESA2A is the derived state and is responsiblefor an isoleucine (I) to valine (V) amino acid substitution. In both gene models, solid boxes denote untranslated region, solid lines are intronsand open boxes indicate exons.

526 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

significant noncoding SNP associated with C6 sugar con-tent (Table 3).

Cinnamoyl-CoA reductase (CCR) catalyzes the conver-sion of hydroxycinnamoyl-CoA esters (p-coumaroyl-CoA,feruloyl-CoA, sinapoyl-CoA) into their corresponding cinn-amyl aldehydes (Pichon et al., 1998). Downregulation ofCCR in transgenic poplar (P. tremula · P. alba) is associ-ated with up to 50% reduced lignin content (Leple et al.,2007). In this study, a single noncoding two-state marker inCCR was found to be strongly associated with lignin com-position (Table 3). A different amplicon in the same gene(CCR_12) was globally significant in terms of haplotypeassociations, but did not report any significant individualhaplotypes (Table 5). Haplotype associations have beenidentified previously in eucalyptus with CCR in relation towood property traits (Thumma et al., 2005).

Coniferaldehyde (CAD) can be converted to coniferylalcohol by the action of CAD or to 5-hydroxy-coniferalde-hyde and sinapyl aldehyde by the action of ferulate 5-hydro-lase (F5H) and caffeic ⁄ 5-hydroxyferulic acid O-methyltransferase (COMT). CAD catalyzes the reduction of

p-hydroxycinnamaldehydes into their corresponding alco-hols and is the last enzyme in monolignol biosynthesis. Inthis study, CAD_04-185, a noncoding marker, illustratedpatterns of gene action consistent with additive effects inrelation to S : G and C6 sugars. This was the only single-marker association identified with the S : G ratio. Three ofthe nine individual haplotypes (spanning 407 bp) in thesame amplicon of CAD were significant for lignin composi-tion. Differences in genotypic effects on lignin content wereminimal (22.2% for CAAAAT, 22.8% for CATAAT and22.5% for GATAAT). The CAD gene family has been stud-ied extensively in Arabidopsis, rice and poplar (Barakat et al.2009). The downregulation of CAD in transgenic poplardid not affect the overall lignin content and composition,but led to an increased incorporation of the hydroxycinn-amaldehydes into lignin (Baucher et al., 1996; Pilate et al.,2002). Field trials of CAD-deficient transgenic poplarshowed improved Kraft pulping performance (Pilate et al.,2002).

COMT was originally thought to be a bifunctionalenzyme that sequentially methylated caffeic and 5-

(a) (b)

Fig. 4 Haplotype and single-marker associations are illustrated for SUSY1 and 4CL1. (a) The genotypic effects of the three proposedhaplotypes (two significant) of SUSY1 are shown. The haplotypes yield significantly different median phenotypic values for the lignin contenttrait. The marker effects of four significant single-marker associations are also shown. SUSY1_02-108 is significant with respect to lignin. Theremaining markers are significant with respect to the related trait, C6 sugars. All four markers are within linkage disequilibrium (LD) with oneanother. (b) The genotypic effects of the three haplotypes (two significant) of 4CL1 are shown. The significant haplotypes yield differentmedian phenotypic values for the lignin content trait. No significant single-marker associations were identified after multiple testing; however,the box plots for single markers with P < 0.05 are shown. Two of the three markers are in LD with one another.

NewPhytologist Research 527

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

hydroxyferulic acids. More recently, it has been shown toact downstream in monolignol biosynthesis by methylatingthe aldehyde and alcohol backbones (Osakabe et al., 1999;Parvathi et al., 2001). In this study, markers from COMT1and COMT2 were successfully genotyped (Table 1). A sin-gle noncoding COMT2 marker was identified as significantwith C6 sugar content (Table 3). Suppression of COMT inboth P. tremula · P. alba and P. tremuloides lines did notchange the lignin content, but resulted in a reduction in theS : G lignin ratio (as a result of a decrease in S and anincrease in G), as well as the incorporation of an abnormal5-hydroxyguaiacyl unit into the lignin (Van Doorsselaereet al., 1995; Tsai et al., 1998).

After their biosynthesis, monolignols are transportedfrom the cytoplasm to the cell wall and polymerized to a lig-nin matrix. In the cell wall, the monolignols are oxidized totheir radicals and polymerized. Laccases (Lac), peroxidases

and other phenol oxidases have long been thought to beinvolved in this polymerization (Baucher et al., 2003), butconclusive evidence for their role is still lacking. In ourstudy, we examined Lac1a, Lac2 and Lac90a. Lac1a wasfound to have two noncoding single-marker associationswith C6 sugars (Table 3). In poplars, several laccases(Ranocha et al., 1999) have been cloned and characterized.At least eight of these laccases were identified in associationwith lignin biosynthetic pathways by microarray analysis(Andersson-Gunneras et al., 2006). Subsequent studieswith antisense Lac3 in transgenic hybrid poplar showed littlevariation in lignin content; however, the soluble phenolicsand structure of the secondary wall were altered (Ranochaet al., 2002).

Variations in the quantity and quality of cellulose inplants are suspected to be a primary result of enzymaticactivities of different types of cellulose synthases (CesAs)

(a)

(c)

(d)

(b)

Fig. 5 (a–c) An example of marker effects inthe CesA1B gene on the lignin contentphenotype. Each marker explains a smallproportion of the phenotypic variance(r2 � 2–3%) and is consistent with anadditive model of gene action. Whiskers inthe box plots represent 1.5 times theinterquartile range. (d) Illustrated are the 39single nucleotide polymorphisms (SNPs)genotyped for the CesA1B gene relative tothe reference gene model, as well as three ofthe 39 that were significant (indicated withan asterisk). Solid boxes denote UTR, solidlines are introns and open boxes indicateexons in the gene model.

528 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

(Haigler & Blanton, 1996). The CesA gene family contains17 members in the sequenced poplar genome, five of whichare highly expressed during wood formation (Joshi et al.,2004; Djerbi et al., 2005a,b; Suzuki et al., 2006; Kumaret al., 2009). All five isoforms were evaluated for associationin this study (CesA1A, CesA1B, CesA2A, CesA2B andCesA3A), and all had at least one single-marker or haplotypeassociation (Table 1). In lignin and C6 sugar traits, thesame nonsynonymous marker in the sixth exon of CesA2Awas strongly associated. The G allele at CesA2A is the minorallele and causes an isoleucine (I) fi valine (V) aminoacid substitution (Table 3). The genotypic effects of thetwo-state SNP are shown in Fig. 5(b). In lignin traits, thedifferences in content were significant (22% for AA and23.6% for AG); the same is true for C6 sugar content(34.9% for AA and 32.1% for AG). Three single-markerassociations between CesA1B and lignin composition wereidentified (Table 3; Fig. 5). Two of these three noncodingSNPs were also associated with C6 sugar content.CesA1B_10 had one significant haplotype associated withlignin composition. CesA1A had two noncoding and onesynonymous association (CesA1A_12-40) for C6 sugars.One of the noncoding SNPs (CesA1A_20-226) was alsoassociated with lignin content. CesA3A had two differentamplicons with significant haplotype associations with lig-nin. Three significant haplotypes from six were highly asso-ciated in CesA1A_12 (spanning 183 bp), and theirgenotypic effects on C6 were also significant (33.6% forAGA, 34.2% for AAA, 35.3% for GAG) (Table 5).

CesA proteins in the rosette terminal complex use cyto-solic uridine diphosphate (UDP)-glucose as substrate,which is provided directly by particulate sucrose synthase(SUSY) (Haigler et al., 2001). This enzyme producesUDP-glucose and fructose from sucrose and UDP. Of thesix SUSY genes annotated in the poplar genome, only twowere highly expressed in wood-forming tissues based onmicroarray analysis (Geisler-Lee et al., 2006; Meng et al.,2007). In this study, amplicons from SUSY1 were success-fully genotyped (Table 1). Single-marker tests with SUSY1revealed six noncoding associations with C6 and two withlignin composition (Table 3). Two of the three individualhaplotypes (spanning 386 bp) identified in SUSY1_02were significant. Genotypic differences between haplotypeswere observed for lignin composition (21.8% for AAAAand 22.9% for TGGG) (Table 5). Three of the fourmarkers that compose the SUSY1_02 haplotype are instrong LD (Fig. 4a). Recently, overexpression of SUSY intransgenic poplar has led to an increase in both celluloseproduction and cellulose crystallinity (Coleman et al.,2009), confirming the previous suggestion that SUSYcould be one of the limiting steps of cellulose biosynthesis(Haigler et al., 2001).

This study represents the most comprehensive evaluationof LD and genetic association in poplars. High-throughput

genotyping technologies and the vast genomic resources inblack cottonwood allowed a large number of candidategenes to be evaluated for associations with lignocellulosiccell wall development. The genes studied are those knownto be associated with these pathways and those that havebeen extensively studied for commercial applications, suchas pulp and feedstock production, and are now being fur-ther evaluated for improvement in relation to biofuels pro-duction. Given the rapid decay of within-gene LD in blackcottonwood and the high coverage of amplicons across eachgene, it is likely that the numerous polymorphisms identi-fied are in close proximity to the causative SNPs, and thehaplotype associations accurately reflect the informationpresent in the associations between markers. This studydemonstrates that a forward genetics approach (associationgenetics) can be used to discover naturally occurring allelicvariation in genes associated with commercially importanttraits. The association approach provides estimates of thesize of effects of these alleles on a phenotype. Understandingthe size of the effects as well as the existing variation is criticalin applying the knowledge gained on a particular SNPto marker-based breeding programs with goals to increasecellulose yield and, therefore, cellulosic ethanol production.

Acknowledgements

We thank Charles Nicolet and Vanessa Rashbrook for per-forming the SNP genotyping, and John Liechty andBenjamin Figueroa for bioinformatics support. Funding forthis project was made available through the ChevronTechnology Ventures-UC Davis Biofuels Project.

References

Andersson-Gunneras S, Mellerowicz EJ, Love J, Segerman B, Ohmiya Y,

Coutinho PM, Nilsson P, Henrissat B, Moritz T, Sundberg B. 2006.

Biosynthesis of cellulose-enriched tension wood in Populus: global

analysis of transcripts and metabolites identifies biochemical and

developmental regulators in secondary wall biosynthesis. Plant Journal45: 144–165.

Andersen JR, Zein I, Wenzel G, Darnhofer B, Eder J, Ouzunova M,

Lubberstedt T. 2008. Characterization of phenylpropanoid

pathway genes within European maize (Zea mays L.) inbreds.

BMC Plant Biol 8: 2.

Appert C, Logemann E, Hahlbrock K, Schmid J, Amrhein N. 1994.

Structural and catalytic properties of the 4-phenylalanine ammonia-lyase

isoenzymes from parsley (Petroselinum crispum Nym). European Journalof Biochemistry 225: 491–499.

Barakat A, Bagniewska-Zadworna A, Choi A, Plakkat U, DiLoreto DS,

Yellanki P, Carlson JE. 2009. The cinnamyl alcohol dehydrogenase

gene family in Populus: phylogeny, organization, and expression. BMCPlant Biol 9: 26.

Baucher M, Chabbert B, Pilate G, VanDoorsselaere J, Tollier MT,

PetitConil M, Cornu D, Monties B, VanMontagu M, Inze D et al.1996. Red xylem and higher lignin extractability by down-regulating a

cinnamyl alcohol dehydrogenase in poplar. Plant Physiology 112: 1479–

1490.

NewPhytologist Research 529

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

Baucher M, Halpin C, Petit-Conil M, Boerjan W. 2003. Lignin: genetic

engineering and impact on pulping. Critical Reviews in Biochemistry andMolecular Biology 38: 305–350.

Boerjan W. 2005. Biotechnology and the domestication of forest trees.

Current Opinion in Biotechnology 16: 159–166.

Boerjan W, Ralph J, Baucher M. 2003. Lignin biosynthesis. AnnualReview of Plant Biology 54: 519–546.

Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler

ES. 2007. TASSEL: software for association mapping of complex traits

in diverse samples. Bioinformatics 23: 2633–2635.

Bradshaw HD, Ceulemans R, Davis J, Stettler R. 2000. Emerging model

systems in plant biology: poplar (Populus) as a model forest tree. Journalof Plant Growth Regulation 19: 306–313.

Brown GR, Bassoni DL, Gill GP, Fontana JR, Wheeler NC, Megraw RA,

Davis MF, Sewell MM, Tuskan GA, Neale DB. 2003. Identification of

quantitative trait loci influencing wood property traits in loblolly pine

(Pinus taeda L.). III. QTL verification and candidate gene mapping.

Genetics 164: 1537–1546.

Chu Y, Su X, Huang Q, Zhang X. 2009. Patterns of DNA sequence

variation at candidate gene loci in black poplar (Populus nigra L.)

as revealed by single nucleotide polymorphisms. Genetica 137: 141–150.

Cochrane FC, Davin LB, Lewis NG. 2004. The Arabidopsis

phenylalanine ammonia lyase gene family: kinetic characterization of the

four PAL isoforms. Phytochemistry 65: 1557–1564.

Coleman HD, Yan J, Mansfield SD. 2009. Sucrose synthase affects carbon

partitioning to increase cellulose production and altered cell wall

ultrastructure. Proceedings of the National Academy of Sciences, USA 106:

13118–13123.

Davis JM. 2008. Genetic improvement of poplar (Populus spp.) as a

bioenergy crop. In: Vermerris W, ed. Genetic improvement of bioenergycrops. New York, NY, USA: Springer New York, 397–419.

Dixon RA, Reddy MSS. 2003. Biosynthesis of monolignols. Genomic and

reverse genetic approaches. Phytochemistry Reviews 2: 289–306.

Djerbi S, Lindskog M, Arvestad L, Sterky F, Teeri TT. 2005a.

The genome sequence of black cottonwood (Populus trichocarpa)

reveals 18 conserved cellulose synthase (CesA) genes. Planta 221: 739–

746.

Do CB, Mahabhashyam MSP, Brudno M, Batzoglou S. 2005. ProbCons:

probabilistic consistency-based multiple sequence alignment. GenomeResearch 15: 330–340.

Eckert AJ, Bower AD, Wegrzyn JL, Pande B, Jermstad KD, Krutovsky

KV, St Clair JB, Neale DB. 2009a. Association genetics of coastal

Douglas fir (Pseudotsuga menziesii var. menziesii, Pinaceae). I. Cold-

hardiness related traits. Genetics 182: 1289–1302.

Eckert AJ, Wegrzyn JL, Pande B, Jermstad KD, Lee JM, Liechty JD,

Tearse BR, Krutovsky KV, Neale DB. 2009b. Multilocus patterns of

nucleotide diversity and divergence reveal positive selection at candidate

genes related to cold hardiness in coastal Douglas fir (Pseudotsugamenziesii var. menziesii). Genetics 183: 289–298.

Fan JB, Oliphant A, Shen R, Kermani BG, Garcia F, Gunderson KL,

Hansen M, Steemers F, Butler SL, Deloukas P et al. 2003. Highly

parallel SNP genotyping. Cold Spring Harbor Symposia on QuantitativeBiology 68: 69–78.

Feder ME, Mitchell-Olds T. 2003. Evolutionary and ecological functional

genomics. Nature Review Genetics 4: 651–657.

Geisler-Lee J, Geisler M, Coutinho PM, Segerman B, Nishikubo N,

Takahashi J, Aspeborg H, Djerbi S, Master E, Andersson-Gunneras S

et al. 2006. Poplar carbohydrate-active enzymes. Gene identification

and expression analyses. Plant Physiology 140: 946–962.

Gill GP, Brown GR, Neale DB. 2003. A sequence mutation in the

cinnamyl alcohol dehydrogenase gene associated with altered

lignification in loblolly pine. Plant Biotechnology Journal 1: 253–258.

Gonzalez-Martinez SC, Ersoz E, Brown GR, Wheeler NC, Neale DB.

2006. DNA sequence variation and selection of tag single-nucleotide

polymorphisms at candidate genes for drought-stress response in Pinustaeda L. Genetics 172: 1915–1926.

Gonzalez-Martinez SC, Huber D, Ersoz E, Davis JM, Neale DB. 2008.

Association genetics in Pinus taeda L. II. Carbon isotope discrimination.

Heredity 101: 19–26.

Gonzalez-Martinez SC, Wheeler NC, Ersoz E, Nelson CD, Neale DB.

2007. Association genetics in Pinus taeda L. I. Wood property traits.

Genetics 175: 399–409.

Goudet J. 2005. HIERFSTAT, a package for R to compute and test

hierarchical F-statistics. Molecular Ecology Notes 5: 184–186.

Groover AT. 2007. Will genomics guide a greener forest biotech? Trendsin Plant Science 12: 234–238.

Haigler CH, Blanton RL. 1996. New hope for old dreams: evidence that

plant cellulose synthase genes have finally been identified. Proceedings ofthe National Academy of Sciences, USA 93: 12082–12085.

Haigler CH, Ivanova-Datcheva M, Hogan PS, Salnikov VV, Hwang S,

Martin K, Delmer DP. 2001. Carbon partitioning to cellulose

synthesis. Plant Molecular Biology 47: 29–51.

Harding SA, Leshkevich J, Chiang VL, Tsai CJ. 2002. Differential

substrate inhibition couples kinetically distinct 4-coumarate:coenzyme A

ligases with spatially distinct metabolic roles in quaking aspen. PlantPhysiology 128: 428–438.

Harris AT, Riddlestone S, Bell Z, Hartwell PR. 2008. Towards zero

emission pulp and paper production: the BioRegional MiniMill. Journalof Cleaner Production 16: 1971–1979.

Hoffmann B, Chabbert B, Monties B, Speck T. 2003a. Mechanical,

chemical and X-ray analysis of wood in the two tropical lianas Bauhiniaguianensis and Condylocarpon guianense: variations during ontogeny.

Planta 217: 32–40.

Hoffmann L, Besseau S, Geoffroy P, Ritzenthaler C, Meyer D, Lapierre

C, Pollet B, Legrand M. 2004. Silencing of hydroxycinnamoyl-

coenzyme A shikimate ⁄ quinate hydroxycinnamoyltransferase affects

phenylpropanoid biosynthesis. Plant Cell 16: 1446–1465.

Hoffmann L, Maury S, Martz F, Geoffroy P, Legrand M. 2003b.

Purification, cloning, and properties of an acyltransferase controlling

shikimate and quinate ester intermediates in phenylpropanoid

metabolism. Journal of Biological Chemistry 278: 95–103.

Hu WJ, Harding SA, Lung J, Popko JL, Ralph J, Stokke DD, Tsai CJ,

Chiang VL. 1999. Repression of lignin biosynthesis promotes cellulose

accumulation and growth in transgenic trees. Nature Biotechnology 17:

808–812.

Ingvarsson PK. 2005. Nucleotide polymorphism and linkage

disequilibrium within and among natural populations of European

aspen (Populus tremula L., Salicaceae). Genetics 169: 945–953.

Ingvarsson PK, Garcia MV, Luquez V, Hall D, Jansson S. 2008.

Nucleotide polymorphism and phenotypic associations within and

around the phytochrome B2 locus in European aspen (Populus tremula,

Salicaceae). Genetics 178: 2217–2226.

Joshi CP, Bhandari S, Ranjan P, Kalluri UC, Liang X, Fujino T, Samuga

A. 2004. Genomics of cellulose biosynthesis in poplars. New Phytologist164: 53–61.

Kao YY, Harding SA, Tsai CJ. 2002. Differential expression of two

distinct phenylalanine ammonia-lyase genes in condensed tannin-

accumulating and lignifying cells of quaking aspen. Plant Physiology 130:

796–807.

Kumar M, Thammannagowda S, Bulone V, Chiang V, Han KH, Joshi

CP, Mansfield SD, Mellerowicz E, Sundberg B, Teeri T et al. 2009.

An update on the nomenclature for the cellulose synthase genes in

Populus. Trends in Plant Science 14: 248–254.

Landegren U, Nilsson M, Kwok PY. 1998. Reading bits of genetic

information: methods for single-nucleotide polymorphism analysis.

Genome Research 8: 769–776.

Leple JC, Dauwe R, Morreel K, Storme V, Lapierre C, Pollet B,

Naumann A, Kang KY, Kim H, Ruel K et al. 2007. Downregulation of

530 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

cinnamoyl-coenzyme A reductase in poplar: multiple-level phenotyping

reveals effects on cell wall polymer metabolism and structure. Plant Cell19: 3669–3691.

Li L, Zhou Y, Cheng X, Sun J, Marita JM, Ralph J, Chiang VL. 2003a.

Combinatorial modification of multiple lignin traits in trees through

multigene cotransformation. Proceedings of the National Academy ofSciences, USA 100: 4939–4944.

Li Y, Kajita S, Kawai S, Katayama Y, Morohoshi N. 2003b. Down-

regulation of an anionic peroxidase in transgenic aspen and its effect on

lignin characteristics. Journal of Plant Research 116: 175–182.

Lu SF, Zhou YH, Li LG, Chiang VL. 2006. Distinct roles of cinnamate 4-

hydroxylase genes in Populus. Plant and Cell Physiology 47: 905–914.

Marth GT, Korf I, Yandell MD, Yeh RT, Gu ZJ, Zakeri H, Stitziel NO,

Hillier L, Kwok PY, Gish WR. 1999. A general approach to single-

nucleotide polymorphism discovery. Nature Genetics 23: 452–456.

Meng M, Geisler M, Johansson H, Mellerowicz EJ, Karpinski S,

Kleczkowski LA. 2007. Differential tissue ⁄ organ-dependent expression

of two sucrose- and cold-responsive genes for UDP-glucose

pyrophosphorylase in Populus. Gene 389: 186–195.

Meyermans H, Morreel K, Lapierre C, Pollet B, De Bruyn A, Busson R,

Herdewijn P, Devreese B, Van Beeumen J, Marita JM et al. 2000.

Modifications in lignin and accumulation of phenolic glucosides in

poplar xylem upon down-regulation of caffeoyl-coenzyme A O-

methyltransferase, an enzyme involved in lignin biosynthesis. Journal ofBiological Chemistry 275: 36 899–36 909.

Neale DB, Savolainen O. 2004. Association genetics of complex traits in

conifers. Trends in Plant Science 9: 325–330.

Nickerson DA, Tobe VO, Taylor SL. 1997. PolyPhred: automating the

detection and genotyping of single nucleotide substitutions using

fluorescence-based resequencing. Nucleic Acids Research 25: 2745–2751.

Oakley RV, Wang YS, Ramakrishna W, Harding SA, Tsai CJ. 2007.

Differential expansion and expression of alpha- and beta-tubulin gene

families in Populus. Plant Physiology 145: 961–973.

Oliphant A, Barker DL, Stuelpnagel JR, Chee MS. 2002. BeadArray

technology: enabling an accurate, cost-effective approach to high-

throughput genotyping. BioTechniques 32: S56–S61.

Olsen MS, Robertson AL, Takebayashi N, Salim S, Schroeder WR, Tiffin

P. 2010. Nucleotide diversity and linkage disequilibrium in balsam

poplar (Populus balsamifera). New Phytologist 186: 2526–2536.

Osakabe K, Tsao CC, Li LG, Popko JL, Umezawa T, Carraway DT,

Smeltzer RH, Joshi CP, Chiang VL. 1999. Coniferyl aldehyde 5-

hydroxylation and methylation direct syringyl lignin biosynthesis in

angiosperms. Proceedings of the National Academy of Sciences, USA 96:

8955–8960.

Osakabe Y, Ohtsubo Y, Kawai S, Katayama Y, Morohoshi N. 1995.

Structure and tissue-specific expression of genes for phenylalanine

ammonia-lyase from a hybrid aspen, Populus kitakamiensis. Plant Science105: 217–226.

Parvathi K, Chen F, Guo DJ, Blount JW, Dixon RA. 2001. Substrate

preferences of O-methyltransferases in alfalfa suggest new pathways for

3-O-methylation of monolignols. Plant Journal 25: 193–202.

Patterson N, Price A, Reich D. 2006. Population structure and

Eigenanalysis. PLoS Genet 2(12).

Pavy N, Pelgas B, Beauseigle S, Blais S, Gagnon F, Gosselin I, Lamothe

M, Isabel N, Bousquet J. 2008. Enhancing genetic mapping of complex

genomes through the design of highly-multiplexed SNP arrays:

application to the large and unsequenced genomes of white spruce and

black spruce. BMC Genomics 9: 21.

Peter G, Neale D. 2004. Molecular basis for the evolution of xylem

lignification. Current Opinion in Plant Biology 7: 737–742.

Pichon M, Courbou I, Beckert M, Boudet AM, Grima-Pettenati J.

1998. Cloning and characterization of two maize cDNAs encoding

cinnamoyl-CoA reductase (CCR) and differential expression of the

corresponding genes. Plant Molecular Biology 38: 671–676.

Pilate G, Guiney E, Holt K, Petit-Conil M, Lapierre C, Leple JC, Pollet

B, Mila I, Webster EA, Marstorp HG et al. 2002. Field and pulping

performances of transgenic trees with altered lignification. NatureBiotechnology 20: 607–612.

Plomion C, Leprovost G, Stokes A. 2001. Wood formation in trees. PlantPhysiology 127: 1513–1523.

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA,

Reich D. 2006. Principal components analysis corrects for stratification

in genome-wide association studies. Nature Genetics 38: 904–

909.

Ragauskas AJ, Williams CK, Davison BH, Britovsek G, Cairney J,

Eckert CA, Frederick WJ Jr, Hallett JP, Leak DJ, Liotta CL et al.2006. The path forward for biofuels and biomaterials. Science 311:

484–489.

Ralph J, Akiyama T, Kim H, Lu FC, Schatz PF, Marita JM, Ralph SA,

Reddy MSS, Chen F, Dixon RA. 2006a. Effects of coumarate 3-

hydroxylase down-regulation on lignin structure. Journal of BiologicalChemistry 281: 8843–8853.

Ralph S, Oddy C, Cooper D, Yueh H, Jancsik S, Kolosova N, Philippe

RN, Aeschliman D, White R, Huber D et al. 2006b. Genomics of

hybrid poplar (Populus trichocarpax deltoides) interacting with forest tent

caterpillars (Malacosoma disstria): normalized and full-length cDNA

libraries, expressed sequence tags, and a cDNA microarray for the

study of insect-induced defences in poplar. Molecular Ecology 15: 1275–

1297.

Ranocha P, Chabannes M, Chamayou S, Danoun S, Jauneau A. 2002.

Laccase down-regulation causes alterations in phenolic metabolism and

cell wall structure in poplar. Plant Physiology 129: 145.

Ranocha P, McDougall G, Hawkins S, Sterjiades R, Borderies G, Stewart

D, Cabanes-Macheteau M, Boudet AM, Goffner D. 1999. Biochemical

characterization, molecular cloning and expression of laccases – a

divergent gene family – in poplar. European Journal of Biochemistry 259:

485–495.

Remington DL, Thornsberry JM, Matsuoka Y, Wilson LM, Whitt SR,

Doebley J, Kresovich S, Goodman MM, Buckler ESt. 2001. Structure

of linkage disequilibrium and phenotypic associations in the maize

genome. Proceedings of the National Academy of Sciences, USA 98:

11479–11484.

Rubin EM. 2008. Genomics of cellulosic biofuels. Nature 454: 841–845.

Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA.

2002. Score tests for association between traits and haplotypes when

linkage phase is ambiguous. American Journal of Human Genetics 70:

425–434.

Schrader J, Nilsson J, Mellerowicz E, Berglund A, Nilsson P, Hertzberg

M, Sandberg G. 2004. A high-resolution transcript profile across the

wood-forming meristem of poplar identifies potential regulators of

cambial stem cell identity. Plant Cell 16: 2278–2292.

Sewalt VJH, Ni WT, Jung HG, Dixon RA. 1997. Lignin impact on fiber

degradation: increased enzymatic digestibility of genetically engineered

tobacco (Nicotiana tabacum) stems reduced in lignin content. Journal ofAgricultural and Food Chemistry 45: 1977–1983.

Sterky F, Bhalerao RR, Unneberg P, Segerman B, Nilsson P, Brunner

AM, Charbonnel-Campaa L, Lindvall JJ, Tandre K, Strauss SH et al.2004. A Populus EST resource for plant functional genomics.

Proceedings of the National Academy of Sciences, USA 101: 13951–

13956.

Sterky F, Regan S, Karlsson J, Hertzberg M, Rohde A, Holmberg A,

Amini B, Bhalerao R, Larsson M, Villarroel R et al. 1998. Gene

discovery in the wood-forming tissues of poplar: analysis of 5,692

expressed sequence tags. Proceedings of the National Academy of Sciences,USA 95: 13330–13335.

Stinchcombe JR, Hoekstra HE. 2008. Combining population genomics

and quantitative genetics: finding the genes underlying ecologically

important traits. Heredity 100: 158–170.

NewPhytologist Research 531

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com

Storey JD. 2002. A direct approach to false discovery rates. Journal of theRoyal Statistical Society, Series B (Methodological) 64: 479–498.

Storey JD, Tibshirani R. 2003. Statistical significance for genomewide

studies. Proceedings of the National Academy of Sciences, USA 100: 9440–

9445.

Strauss SH, Martin FM. 2004. Poplar genomics comes of age. NewPhytologist 164: 1–4.

Suzuki S, Li LG, Sun YH, Chiang VL. 2006. The cellulose synthase gene

superfamily and biochemical functions of xylem-specific cellulose

synthase-like genes in Populus trichocarpa. Plant Physiology 142: 1233–

1245.

R Development Core Team. 2007. R: A Language and Environment forStatistical Computing. Vienna: R Foundation for Statistical Computing.

Thumma BR, Nolan MF, Evans R, Moran GF. 2005. Polymorphisms in

cinnamoyl CoA reductase (CCR) are associated with variation in

microfibril angle in Eucalyptus spp. Genetics 171: 1257–1265.

Tsai CJ, Harding SA, Tschaplinski TJ, Lindroth RL, Yuan Y. 2006.

Genome-wide analysis of the structural genes regulating defense

phenylpropanoid metabolism in Populus. New Phytologist 172:

47–62.

Tsai CJ, Popko JL, Mielke MR, Hu WJ, Podila GK, Chiang VL. 1998.

Suppression of O-methyltransferase gene by homologous sense

transgene in quaking aspen causes red–brown wood phenotypes. PlantPhysiology 117: 101–112.

Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,

Putnam N, Ralph S, Rombauts S, Salamov A et al. 2006. The genome

of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:

1596–1604.

Unneberg P, Stromberg M, Sterky F. 2005. SNP discovery using

advanced algorithms and neural networks. Bioinformatics 21: 2528–

2530.

Van Doorsselaere J, Baucher M, Chognot E, Chabbert B, Tollier MT,

PetitConil M, Leple JC, Pilate G, Cornu D, Monties B et al. 1995. A

novel lignin in poplar trees with a reduced caffeic acid 5-hydroxyferulic

acid O-methyltransferase activity. Plant Journal 8: 855–864.

Wagner A, Ralph J, Akiyama T, Flint H, Phillips L, Torr K,

Nanayakkara B, Kiri LT. 2007. Exploring lignification in conifers by

silencing hydroxycinnamoyl-CoA:shikimate

hydroxycinnamoyltransferase in Pinus radiata. Proceedings of theNational Academy of Sciences, USA 104: 11856–11861.

Warnes G, Leisch F. 2006. Genetics: population genetics. R Package.

Wegrzyn JL, Lee JM, Liechty J, Neale DB. 2009. PineSAP – sequence

alignment and SNP identification pipeline. Bioinformatics 25: 2609–

2610.

Whetten RW, MacKay JJ, Sederoff RR. 1998. Recent advances in

understanding lignin biosynthesis. Annual Review of Plant Physiology andPlant Molecular Biology 49: 585–609.

Xu Z, Zhang D, Hu J, Zhou X, Ye X, Reichel KL, Stewart NR, Syrenne

RD, Yang X, Gao P et al. 2009. Comparative genome analysis of lignin

biosynthesis gene families across the plant kingdom. BMC Bioinformatics10(Suppl. 11): S3.

Yu XQ, Mei HW, Luo LJ, Liu GL, Liu HY, Zou GH, Hu SP, Li MS,

Wu JH. 2006. Dissection of additive, epistatic effect and Q · E

interaction of quantitative trait loci influencing stigma exsertion under

water stress in rice. Yi Chuan Xue Bao 33: 542–550.

Zhong RQ, Morrison WH, Himmelsbach DS, Poole FL, Ye ZH. 2000.

Essential role of caffeoyl coenzyme A O-methyltransferase in lignin

biosynthesis in woody poplar plants. Plant Physiology 124: 563–577.

Supporting Information

Additional supporting information may be found in theonline version of this article.

Fig. S1 Distribution of quality metrics for genotyped singlenucleotide polymorphisms (SNPs) grouped by dataset.

Fig. S2 Cluster assignments illustrated across pairwise plotsof the four significant principal components (PCs) derivedusing principal components analysis (PCA).

Fig. S3 Summaries of population genetic parameters acrossall samples and samples placed into clusters.

Fig. S4 Differentiation among inferred genetic clusters forPopulus trichocarpa reveals FST outliers within the set offocal single nucleotide polymorphisms (SNPs).

Fig. S5 Cluster assignment is correlated with phenotypictraits.

Table S1 Sample localities for the 448 individuals used forassociation mapping in Populus trichocarpa

Table S2 Summaries of quality scores across genotypedsingle nucleotide polymorphism(SNP) loci

Table S3 Summaries of genotyped single nucleotide poly-morphisms (SNPs) for focal and control SNPs

Table S4 Genotype data

Table S5 Phenotype data

Table S6 Haplotype data

Please note: Wiley-Blackwell are not responsible for thecontent or functionality of any supporting informationsupplied by the authors. Any queries (other than missingmaterial) should be directed to the New Phytologist CentralOffice.

532 Research

NewPhytologist

No claim to original US government works

Journal compilation � New Phytologist Trust (2010)

New Phytologist (2010) 188: 515–532

www.newphytologist.com