13
The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration Supplementary Notes Contents Background and Structure of the ORFeome Collaboration (OC) ..................................... 2 Description of OC Clone Collection .................................................................................. 3 Sources and methods used to create OC clones .......................................................... 3 Vector and ORF clone format ........................................................................................ 3 General utility of the ORFeome and individual ORF clones………………………………3 Sequence verification of OC clones .............................................................................. 4 Quality criteria for OC clones: ....................................................................................... 4 Verification of OC plates ................................................................................................ 5 Genes Represented in the Collection ............................................................................... 5 Evaluating human gene coverage in functional categories .............................................. 5 Obtaining OC clones......................................................................................................... 6 Finding the Desired OC Clones ..................................................................................... 6 Evaluating cDNA clone quality ...................................................................................... 7 Clone availability ........................................................................................................... 9 Description of the OC clone annotation process .............................................................. 9 Databases utilized in annotation process .................................................................... 10 Tools used in annotation process ................................................................................ 10 Input ............................................................................................................................ 11 Blast ............................................................................................................................ 11 Collection of additional information and extraction of nucleotide coding sequence of Blast hit for alignment .................................................................................................. 11 Alignment with Kalign and analysis for SNPs .............................................................. 11 Supplementary References ............................................................................................ 12 Nature Methods: doi:10.1038/nmeth.3776

The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

The ORFeome Collaboration: A genome-scale human ORF-clone resource

The ORFeome Collaboration

Supplementary Notes

Contents Background and Structure of the ORFeome Collaboration (OC) ..................................... 2

Description of OC Clone Collection .................................................................................. 3

Sources and methods used to create OC clones .......................................................... 3

Vector and ORF clone format........................................................................................ 3

General utility of the ORFeome and individual ORF clones………………………………3

Sequence verification of OC clones .............................................................................. 4

Quality criteria for OC clones: ....................................................................................... 4

Verification of OC plates................................................................................................ 5

Genes Represented in the Collection ............................................................................... 5

Evaluating human gene coverage in functional categories .............................................. 5

Obtaining OC clones......................................................................................................... 6

Finding the Desired OC Clones..................................................................................... 6

Evaluating cDNA clone quality ...................................................................................... 7

Clone availability ........................................................................................................... 9

Description of the OC clone annotation process .............................................................. 9

Databases utilized in annotation process .................................................................... 10

Tools used in annotation process................................................................................ 10

Input ............................................................................................................................ 11

Blast ............................................................................................................................ 11

Collection of additional information and extraction of nucleotide coding sequence of Blast hit for alignment .................................................................................................. 11

Alignment with Kalign and analysis for SNPs.............................................................. 11

Supplementary References ............................................................................................ 12

Nature Methods: doi:10.1038/nmeth.3776

Page 2: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Background and structure of the ORFeome Collaboration (OC) The OC was initiated in 2005 with the primary aim of generating a comprehensive resource of fully sequenced human open-reading-frame (ORF) clones available to the entire research community. Over time, the OC expanded into a collaboration of twelve international bioscience laboratories and commercial groups, most of which were already pursuing large-scale cloning and characterization of human cDNAs and ORFs (Supplementary Table 1). To promote clone distribution, the OC includes commercial and academic distributors in the U.S., Europe, and Asia (Supplementary Table 1).

Supplementary Table 1: Members of the ORFeome Collaboration: ‘+’ indicates role as OC clone provider and/or OC clone distributor. Some Laboratories are no longer active in the OC nor distributing cDNA clones: aIMAGE clones are now available through OC distributors. bThe MGC was completed in 2009. cWellcome Trust Sanger Institute is no longer active in the OC.

Organization Website OCclone

providerOCclonedistributor

CenterforCancerSystemsBiology(CCSB),Dana-FarberCancerInstitute

DKFZ---German CancerResearchCenter----DivisionofMolecularGenomeAnalysis

http://ccsb.dfci.harvard.edu/web/www/ccsb/ +http://www.dkfz.de/en/mga +

DNAFORM http://www.dnaform.jp/en/ +

DNASUPlasmidRepository atArizonaStateUniversity

https://dnasu.org/DNASU/Home.do + +

GEHealthcareDharmacon http://dharmacon.gelifesciences.com/ +

GeneCopoeia http://www.genecopoeia.com/product/orfeo +me-clone/

KazusaDNAResearch Institute http://www.kazusa.or.jp/huge/ + +

IMAGEConsortium, LawrenceLivermoreNationalLaboratorya

MammalianGeneCollection(MGC)b

+http://mgc.nci.nih.gov/ +

PLASMID:DF/HCCResourceCorehttp://plasmid.med.harvard.edu/PLASMID/ + +atHarvard

TheFANTOMConsortium http://fantom.gsc.riken.jp/ +

SourceBioScience http://www.lifesciences.sourcebioscience.com +/overview/

WellcomeTrustSanger Institutechttp://www.sanger.ac.uk/ +

Nature Methods: doi:10.1038/nmeth.3776

Page 3: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Description of OC Clone Collection

Sources and methods used to create OC clones All OC clones were previously prepared, sequenced, and donated to the OC by its members1-11 (Supplementary Table 1). Cloning methods used by individual OC members have been published (http://www.orfeomecollaboration.org/Contributors). Most OC ORF clones were generated from sequenced-verified Mammalian Gene Collection (MGC) cDNA clones12-14 originally obtained from the IMAGE consortium of human cDNA libraries15 or were derived from full-length-cDNAs of the German cDNA Consortium11. Additional clones were obtained from the Kazusa DNA Research Institiute and the RIKEN FANTOM Consortium. In those cases where pre-existing cDNAs were not available, ORF sequences were isolated by directed reverse transcriptase-PCR cloning of transcript sequences using primers based on known sequences from the 5’ and 3’ ends of an ORF13, 16-18 or were prepared by DNA synthesis13. The complete protein-coding regions were transferred into GatewayTM entry vectors19, and the ORF sequences and adjacent nucleotides were sequence verified (see below)) before being added to the OC collection. The laboratory source of each clone and literature references for its preparation are provided in the GenBank record for individual clones and derived sequences (http://www.ncbi.nlm.nih.gov/nuccore).

Vector and ORF clone format OC clones are provided as ‘entry’ clones compatible with the Gateway Cloning SystemTM (Life Technologies). Entry clones contain an ORF flanked by attL recombination sites that permit directional and precise recombinational transfer of the ORF sequence from an OC entry vector to an attR-modified expression vector (Figure S1), using the phage lambda-att-based recombination system19. This recombination reaction yields an attB expression clone, where 25 bp attB sites flank the ORF. The OC chose this format due to its widespread adoption, ease of use, reliability, and large variety of attR-modified vectors available for protein expression in a range of biological systems, e.g., E.coli, yeast, and mammalian20, or using cell-free protein expression21. Standardized reagents required for using the Gateway Cloning System TM are commercially available (http://www.lifetechnologies.com/us/en/home/life- science/cloning/gateway-cloning.html). General utility of the ORFeome and individual ORF clones The great value of the OC resource has been shown in numerous studies, covering a broad range of applications. These include large-scale protein-protein interaction mapping of binary22 and co-complex associations23, production of recombinant human proteins21, fluorescent-protein tagging for human protein localization in mammalian cells and microscopy-based functional screening of proteins24, development of disease-specific protein interaction networks25, co-expression to rescue RNAi- or CRISPR/CAS9-induced reduction of endogenous transcripts26, and expression of ORFs carrying a mutation of interest to allow measurement of the mutation effect in the absence of the wild type background27.

Nature Methods: doi:10.1038/nmeth.3776

Page 4: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Figure S 1 : Utilization of OC entry clones to generate expression clones for subsequent protein expression in a range of biological systems and applications.

Sequence verification of OC clones After transfer of an ORF sequence into an entry vector, all clones were sequence verified to ensure complete protein-coding sequences corresponding to RefSeq or Ensembl consensus RNA sequences, together with intact attL recombination sequences flanking the ORF, required for the transfer of the ORF into a suitable Gateway (attR- modified) expression vector. Details of OC clone flanking sequences are provided at http://www.orfeomecollaboration.org/Resources. All OC ORF sequences have been deposited in GenBank/EMBL/DDBJ databases (www.ncbi.nlm.nih.gov/genbank/; www.ebi.ac.uk/ena; www.ddbj.nig.ac.jp/). Synonymous and nonsynonymous changes and indels were permitted within the protein-coding sequences of clones, but changes that altered the phase of reading frame or introduced premature stop codons were not permitted. Differences noted between the sequences of OC clones and the corresponding RefSeq and Ensembl transcripts are annotated in the OC database (see below).

Quality criteria for OC clones: 1. Sequencing standards for Sanger capillary sequencing: less than one error per 50,000 bp, no uncertain base calls, and a Phred score of 30 or higher at each base pair.

Nature Methods: doi:10.1038/nmeth.3776

Page 5: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

2. Sequencing standards for next-generation DNA sequencing: Plasmid DNA replicates from single clones were subjected to standard library preparation and sequenced with standard conditions by Illumina sequencing technology. PCR products were generated from entry clones using flanking universal primers and subjected to Roche 454 sequencing. Sequencing reads were filtered for quality and vector backbone sequences, and mapped against the expected ORF sequence of the parent clone. Only clones with a minimum positional coverage of 10x (91%) were considered further. Seventy-seven percent of analyzed clones fully matched the expected parent sequence. Potential sequence discrepancies were resolved by Sanger sequence analysis.

Verification of OC plates Replicas of 96-well master plates were prepared at a central site, to create duplicate plates that were sent to OC distributors. Prior to replicating master plates, a subset of wells was sequenced to verify the correct identity and orientation of the plate. Previous small-scale tests of plates, where all 96-wells were sequence verified, showed that typically 95% -98% of wells contain the correct clones. Most errors in clone positions in plates were explained by spill-over from nearby wells, during filling and handling. Most OC clone distributors will provide, on request, additional plate and cDNA clone sequence validation.

Genes represented in the collection The OC clone collection includes one or more ORF clones for 73% of 20,906 RefSeq human genes (Figure 1) and 79% of the 19,022 highly curated Consensus Coding DNA Sequence Project (CCDS) human genes (Figure 1). The OC also includes clones for two or more transcript variants for 6,304 out of the 17,154 (37%) genes represented in the collection. To minimize inclusion of sequences from pseudogenes and other non- protein-coding regions, clones potentially encoding proteins of fewer than 100 amino acids were mostly excluded from the collection. As a result, clones for some authentic single-exon genes and genes encoding short protein-coding sequences are absent. Similarly, cDNAs for transcripts of very large genes are notably underrepresented: less than ten percent of human ORFs >10kb were successfully cloned, despite the use of PCR rescue and DNA synthesis 14. Despite these limitations, 79% of shared RefSeq and Ensembl genes are represented by one or more OC ORF clones, covering all the major functional categories of human genes shown in Figure 1. See the Supplementary Source Data File for complete listing of all clones.

Evaluating human gene coverage in functional categories Functional categories were annotated in a customized way, combining gene ontology annotation, UniProt annotation, Reactome annotation, relevant publications, or other specific resources such as TransporterDB for transporters (http://www.membranetransport.org/), HPMR database for Receptors (http://receptome.stanford.edu/HPMR/), OMIM for disease genes (http://www.ncbi.nlm.nih.gov/omim). Gene ontology analysis was done by DAVID (http://david.abcc.ncifcrf.gov/) using flatted cellular component, biological process and

Nature Methods: doi:10.1038/nmeth.3776

Page 6: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

molecular function annotation. The top gene ontology terms with at least 300 gene members were selected (Supplementary Table 2) and similar terms were removed manually.

Supplementary Table 2: Top Gene Ontology Terms Enriched by OC Genes.

Gene Ontology Term

Term Human

Genome

OC

% Coverage mitochondrion Component 1040 923 89 endoplasmic reticulum Component 947 812 86 nucleoplasm Component 864 724 84 nucleolus Component 669 555 83 Golgi apparatus Component 849 690 81 vesicle Component 653 550 84 organelle lumen Component 1772 1500 85 organelle membrane Component 1063 910 86 integral to plasma membrane Component 1156 942 81 cytosol Component 1300 1088 84 envelope Component 600 519 86 protein dimerization activity Function 531 436 82 magnesium ion binding Function 442 365 83 transcription regulator activity Function 1451 1167 80 nucleotide binding Function 2149 1726 80 protein transport Process 740 638 86 ion transport Process 742 606 82 protein localization Process 858 730 85 cellular response to stress Process 546 455 83 homeostatic process Process 728 610 84 cell cycle Process 756 630 83 oxidation reduction Process 614 528 86 regulation of cell death Process 791 666 84 regulation of cell proliferation Process 767 651 85 intracellular signaling cascade

Process

1226

1001

82

regulation of transcription from RNA polymerase II promoter

Process

713

585

82 protein catabolic process Process 600 501 84

Obtaining OC clones

Finding the desired OC clones 1. OC database. The database lists all OC clones and is searchable by RefSeq and Ensembl transcript accession numbers, GI number, HUGO (HGCN) gene symbol, Gene

Nature Methods: doi:10.1038/nmeth.3776

Page 7: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

ID, RefSeq protein ID, and Ensembl protein ID, as well as keywords (like <kinase>). The database also notes whether the clone is an exact match or contains mismatches to the best-matching RefSeq consensus gene sequence and distinguishes synonymous from non-synonymous changes. The annotation further notes whether an OC clone contains an ORF that ends with or without a stop codon or is available in both versions. The database also provides end sequences of the attL1 and attL2 recombination sites flanking the ORF.

2. Other Databases Listing OC Clones. The Genome Browser (http://genome.ucsc.edu/cgi-bin/hgGateway) of the University of California Santa Cruz (UCSC) includes an ‘ORFeome Clones’ Track within the ‘Genes and Gene Prediction Tracks’. Activating this track displays available OC clones for genes of interest. This browser provides a graphical representation of the gene structure and allows for simple examination of splice variants and potential coverage in the OC collection. Similarly, the RIKEN FANTOM ZENBU browser (http://fantom.gsc.riken.jp/zenbu/gLyphs/#config) provides a graphical representation of the gene structure of OC clones. The OC clone database also links OC clones directly to their respective entries within the UCSC Genome Browser and the RIKEN FANTOM ZENBU Browser28, 29. 3. The websites of OC clone distributors (Supplementary Table 1) also provide useful details on OC clones, and have shopping carts for ordering. 4. General information on OC clones, including originating source, cloning vector, flanking sequences around the ORF, and references to methods used in its preparation, is provided in the GenBank–EMBL–DDBJ records for each clone. For example, such information on the clone with accession number EU831970 is found by searching the Nucleotide databases at NCBI/EBI/DDBJ for EU831970 (e.g., http://www.ncbi.nlm.nih.gov/nuccore/eu831970).

Evaluating clone quality The following are two quality measures to consider when choosing an ORF clone: A. How strong is the evidence that an mRNA of interest is an authentic protein-coding transcript? In other words, how likely is a GenBank or Ensembl transcript an artifact of transcription, a transcript from a pseudogene, or an incompletely spliced transcript, or an ORF with incorrectly annotated initiating ATG? Confidence in transcript authenticity depends on supporting genomic, RNA, and protein evidence and whether this evidence has been manually curated. To assist users in assessing the authenticity of a transcript, the OC developed nine confidence levels for transcripts, based on a combination of the RefSeq ranking of the transcript (described at “Curation by RefSeq Staff”: http://www.ncbi.nlm.nih.gov/books/NBK21091/), whether a transcript is included in the highly curated CCDS database (http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi), and whether the sequence is present in Ensembl, but not in RefSeq.

Nature Methods: doi:10.1038/nmeth.3776

Page 8: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

This score is based on:

(1) NCBI curation of the RefSeq annotation of the ORF (See: Curation by NCBI Staff at http://www.ncbi.nlm.nih.gov/refseq/about/) and

(2) Whether the transcript is included in the highly curated Consensus CDS

database (CCDS; http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi). Thus, each clone is ranked at one of nine confidence levels (Supplementary Table 3), based on the relative level of evidence that its corresponding transcript encodes a bona fide protein.

A second measure of confidence in an ORF sequence is its degree of sequence identity to the corresponding RefSeq or Ensembl consensus transcript and the number and type of variation, if any, in the clone. Although synonymous and non-synonymous changes are permitted in OC clones, an incomplete coding potential, i.e., lacking either N– or C– terminal regions of the protein, disqualifies the clones from entry into the OC collection.

Supplementary Table 3. Confidence levels and coverage of OC clones Confidence

Level

Details All Genes

Represented in OC

OC Coverage

1 Status "Reviewed" in RefSeq database and reference

9872

8244

84%

2 Status "Validated" in RefSeq and reference transcript

7682

5794

75%

3

Status "Reviewed" in RefSeq database

47

30

64%

4 Status 'Validated" in RefSeq database

262

66

25%

5 Status "Provisional" in RefSeq and reference

1296

880

68%

6

Status "Provisional" in RefSeq database

55

14

25%

7

Other Status (Predicted, Inferred, Model or Suppressed) in RefSeq and reference transcript in CCDS

172

71

41%

8 Other Status (Predicted, Inferred, Model or Suppressed) in RefSeq

1520

103

7%

9

Reference transcript identified in Ensembl database but no reference transcript identified in RefSeq and

4557

1952

43%

B. Are changes in a cDNA sequence (compared to the consensus sequence) acceptable for the intended application? Synonymous or non-synonymous changes or small indels

Nature Methods: doi:10.1038/nmeth.3776

Page 9: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

are permitted in OC clones as long as these don’t change the ATG start, the site of protein synthesis termination, or the translational reading frame.

Sequence differences in clones can arise through natural variation or through mutations acquired during the cloning process. Many, but not all, changes that represent natural variation are listed in dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/). Changes in each clone from the consensus sequence are annotated in the OC database entry.

Useful guides for finding and evaluating cDNA clones are provided at MGC, "A Guide to Finding Mammalian Gene Collection (MGC) Clones and Evaluating Their Sequence; and a tutorial is available at the UCSC Genome Browser http://www.openhelix.com/ucsc.

Clone availability ORFeome Collaboration clones are provided by OC clone distributors under the terms of a Good Faith Agreement (http://www.orfeomecollaboration.org/Good_Faith_Agreement) in response to requests from any and all scientists worldwide. The Good Faith Agreement places no restrictions on the use or distribution of OC clones (or their derivatives) to collaborating laboratories, but prohibits the resale of OC clones. Users are responsible for abiding by any third party proprietary technology that might apply to functional sequences or other elements within specific OC clones (for example, clones encoding proprietary proteins, and vectors with proprietary transcription promoters, terminators, or polyA-addition sites). Authorized distributors of OC clones and links to their websites are listed in Supplementary Table 1. OC clone distributors cover their costs for maintaining and distributing the collection by charging a nominal service fee for providing clones to users. Several distributors provide additional search capabilities for OC clones.

Description of the OC clone annotation process All OC sequences have been analyzed using the databases and tools shown in Figure S2, below, using a custom-made pipeline; the respective processes are described in the following paragraphs. The annotation process has been performed in regular intervals using updates of databases and annotation tools.

Nature Methods: doi:10.1038/nmeth.3776

Page 10: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Figure S 2: Workflow of the sequence annotation process. All OC clone sequences were subjected to this pipeline to obtain full annotation of genes, transcripts, and proteins, as well as of variation.

Databases utilized in annotation process ccds_prot_human Release 17 (August 7, 2014)

RefSeq Release 69 (January 2015)

Ensembl 75 (February, 2014) Tools used in annotation process Blastx (Blast 2.2.26, NCBI) Kalign (Kalign version 2.03, http://msa.cgb.ki.se/) SRS 7.1.3.1, http://www.instem.com/solutions/srs.html Ensembl API 72, http://www.ensembl.org

Nature Methods: doi:10.1038/nmeth.3776

Page 11: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Input Clone sequences in multiple fasta format (open reading frame and 5’ + 3’ att-sites), obtained from GenBank

Blast First, BlastX30 was run with the input sequences (E value < 1, list 10 hits and 10 alignments only) against three protein databases (Ccds_prot_human Release 17 from August 2014, RefSeq_prot RefSeq Release 69, January 2015, Ensembl_prot Ensembl 75 from February 2014) to identify the best protein hit(s) for the clone sequence. In case of alternative splicing, several transcripts with the same score might be found. The output of the BlastX was parsed and the quality of the Blast alignment was classified according to following rules: EXACT – overlap 100%, matches 100%, SNPs – overlap 100%, matches > 95%, ManySNPs – overlap 100%, matches > 90%, PART – overlap > 90%, matches 100%, and PARTWithSNPs – overlap >= 70%, matches > 90%. The result was a tab delimited list with clone_name, hit(s) and hit classification.

Collection of additional information and extraction of nucleotide coding sequence of Blast hit for alignment For the RefSeq_prot Blast results, the RefSeq protein id of the hit(s) was used in command-line SRS (SRS 7.1.3.1, http://www.instem.com/solutions/srs.html ) to extract (from the best RefSeq entries) the respective RefSeq gene name, the gene symbol, the EntrezGene id, the RefSeq status, the NCBI GI id, the start and the end of the coding sequence, and the mRNA sequence. Using this information a table with the status information, a table with the different names and ids, and files containing the clone sequence and the Blast hit sequence(s) in fasta format was created.

For the Ensembl_prot Blast results, a script using the EnsemblAPI 75 (http://www.ensembl.org) extracted the transcript coding sequence, the gene symbol as well as the description (if available). The result was an Ensembl info table, and files containing the clone sequence and the Blast hit sequence(s) in fasta format.

Alignment with Kalign and analysis for SNPs Next, an alignment with Kalign (Kalign version 2.03, http://msa.cgb.ki.se/, gapopen penalty=11, gapextension penalty=0.85, terminal gap penalty=0.45) for each clone sequence was done with its RefSeq and Ensembl Blast hit(s). Alignments were analyzed for positions of mismatches in the codons to obtain the numbers of synonymous, nonsynonymous and neutral (N in one position) SNPs. The result was collected in a table containing those SNP numbers together with clone name and length.

Nature Methods: doi:10.1038/nmeth.3776

Page 12: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

Supplementary References

1. Bechtel,S.etal.Thefull-ORFcloneresourceoftheGermancDNAConsortium.BMCgenomics8,399(2007).

2. Collins,J.E.etal.Agenomeannotation-drivenapproachtocloningthehumanORFeome.Genomebiology5,R84(2004).

3. Lamesch,P.etal.hORFeomev3.1:aresourceofhumanopenreadingframesrepresentingover10,000humangenes.Genomics89,307-315(2007).

4. Nagase,T.etal.ExplorationofhumanORFeome:high-throughputpreparationofORFclonesandefficientcharacterizationoftheirproteinproducts.DNAresearch:aninternationaljournalforrapidpublicationofreportsongenesandgenomes15,137-149(2008).

5. Nakajima,D.etal.Preparationofasetofexpression-readyclonesofmammalianlongcDNAsencodinglargeproteinsbytheORFtrapcloningmethod.DNAresearch:aninternationaljournalforrapidpublicationofreportsongenesandgenomes12,257-267(2005).

6. Rolfs,A.etal.Abiomedicallyenrichedcollectionof7000humanORFclones.PLoSOne3,e1528(2008).

7. Rual,J.F.etal.HumanORFeomeversion1.1:aplatformforreverseproteomics.Genomeresearch14,2128-2135(2004).

8. Seiler,C.Y.etal.DNASUplasmidandPSI:Biology-Materialsrepositories:resourcestoacceleratebiologicalresearch.Nucleicacidsresearch42,D1253-1260(2014).

9. Wellenreuther,R.,Schupp,I.,Poustka,A.,Wiemann,S.&Germanc,D.N.A.C.SMARTamplificationcombinedwithcDNAsizefractionationinordertoobtainlargefull-lengthclones.BMCgenomics5,36(2004).

10. Wiemann,S.etal.FromORFeometobiology:afunctionalgenomicspipeline.Genomeresearch14,2136-2144(2004).

11. Wiemann,S.etal.Towardacatalogofhumangenesandproteins:sequencingandanalysisof500novelcompleteproteincodinghumancDNAs.Genomeresearch11,422-435(2001).

12. Gerhard,D.S.etal.Thestatus,quality,andexpansionoftheNIHfull-lengthcDNAproject:theMammalianGeneCollection(MGC).Genomeresearch14,2121-2127(2004).

13. MGCProjectTeametal.ThecompletionoftheMammalianGeneCollection(MGC).Genomeresearch19,2324-2333(2009).

14. Strausberg,R.L.etal.Generationandinitialanalysisofmorethan15,000full-lengthhumanandmousecDNAsequences.ProcNatlAcadSciUSA99,16899-16903(2002).

15. Lennon,G.,Auffray,C.,Polymeropoulos,M.&Soares,M.B.TheI.M.A.G.E.Consortium:anintegratedmolecularanalysisofgenomesandtheirexpression.Genomics33,151-152(1996).

16. Baross,A.etal.Systematicrecoveryandanalysisoffull-ORFhumancDNAclones.Genomeresearch14,2083-2092(2004).

17. Wu,J.Q.etal.Large-scaleRT-PCRrecoveryoffull-lengthcDNAclones.Biotechniques36,690-696,698-700(2004).

18. Yang,X.etal.Apublicgenome-scalelentiviralexpressionlibraryofhumanORFs.NatMethods8,659-661(2011).

19. Hartley,J.L.,Temple,G.F.&Brasch,M.A.DNAcloningusinginvitrosite-specificrecombination.Genomeresearch10,1788-1795(2000).

20. Brasch,M.A.,Hartley,J.L.&Vidal,M.ORFeomecloningandsystemsbiology:standardizedmassproductionofthepartsfromtheparts-list.Genomeresearch14,2001-2009(2004).

21. Yu,X.&LaBaer,J.High-throughputidentificationofproteinswithAMPylationusingself-assembledhumanprotein(NAPPA)microarrays.NatProtoc10,756-767(2015).

22. Rolland,T.etal.Aproteome-scalemapofthehumaninteractomenetwork.Cell159,1212-1226(2014).

23. Huttlin,E.L.etal.TheBioPlexNetwork:ASystematicExplorationoftheHumanInteractome.Cell162,425-440(2015).

24. Stadler,C.etal.Immunofluorescenceandfluorescent-proteintaggingshowhighcorrelationforNature Methods: doi:10.1038/nmeth.3776

Page 13: The ORFeome Collaboration: A genome-scale human ORF-clone ... · The ORFeome Collaboration: A genome-scale human ORF-clone resource The ORFeome Collaboration ... entire research community

proteinlocalizationinmammaliancells.NatMethods10,315-323(2013).

25. Sahni,N.etal.Widespreadmacromolecularinteractionperturbationsinhumangeneticdisorders.Cell161,647-660(2015).

26. Simpson,J.C.etal.Genome-wideRNAiscreeningidentifieshumanproteinswitharegulatoryfunctionintheearlysecretorypathway.NatCellBiol14,764-774(2012).

27. Cooks,T.etal.Mutantp53prolongsNF-kappaBactivationandpromoteschronicinflammationandinflammation-associatedcolorectalcancer.CancerCell23,634-646(2013).

28. Consortium,F.etal.Apromoter-levelmammalianexpressionatlas.Nature507,462-470(2014).29. Severin,J.etal.Interactivevisualizationandanalysisoflarge-scalesequencingdatasetsusing

ZENBU.Naturebiotechnology32,217-219(2014).30. Altschul,S.F.etal.GappedBLASTandPSI-BLAST:anewgenerationofproteindatabasesearch

programs.Nucleicacidsresearch25,3389-3402(1997).

Nature Methods: doi:10.1038/nmeth.3776