10
The Reference Sequence database A non-redundant collection of richly annotated DNA , RNA , and protein sequences from diverse taxa The collection includes sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes Each RefSeq represents a single, naturally occurring molecule from one organism. RefSeq biological sequences (also known as RefSeqs ) are derived from GenBank records but differ in that each RefSeq is a synthesis of information, not an archived unit of primary research data Similar to a review article in the literature, a RefSeq represents the consolidation of information by a particular group at a particular time.

The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Embed Size (px)

Citation preview

Page 1: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

The Reference Sequence database

• A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxa

• The collection includes sequences from plasmids, organelles, viruses, archaea, bacteria, and eukaryotes

• Each RefSeq represents a single, naturally occurring molecule from one organism.

• RefSeq biological sequences (also known as RefSeqs) are derived from GenBank records but differ in that each RefSeq is a synthesis of information, not an archived unit of primary research data

• Similar to a review article in the literature, a RefSeq represents the consolidation of information by a particular group at a particular time.

Page 2: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Accession prefix Molecule type CommentAC_ Genomic Complete genomic molecule,

alternate assembly

NC_ Genomic Complete genomic molecule, reference assembly

NG_ Genomic Incomplete genomic region

NT_ Genomic Contig or scaffold, clone-based or WGSa

NW_ Genomic Contig or scaffold, primarily WGSa

NS_ Genomic Environmental sequence

NZ_b Genomic Unfinished WGS NM_ mRNANR_ RNAXM_c mRNA Predicted modelXR_c RNA Predicted model AP_ Protein Annotated on AC_ alternate

assemblyNP_ ProteinYP_c ProteinXP_c Protein Predicted modelZP_c Protein Predicted model, annotated on

NZ_ genomic records

a Whole Genome Shotgun sequence data.b An ordered collection of WGS for a genome.c Computed.

The RefSeq accession number format and molecule types

Page 3: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Flat File Format and Annotated Features

RefSeq records appear similar in format to the GenBank records from which they are derived.

Page 4: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Features of a RefSeq record

Page 5: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

RefSeq records may also be displayed in a graphical format

Page 6: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Code Description GENOME ANNOTATION The RefSeq record is provided via automated processing and

is not subject to individual review or revision between builds.

INFERRED The RefSeq record has been predicted by genome sequence analysis, but it is not yet supported by experimental evidence. The record may be partially supported by homology data.

PREDICTED The RefSeq record has not yet been subject to individual review, and some aspect of the RefSeq record is predicted.

PROVISIONAL The RefSeq record has not yet been subject to individual review. The initial sequence-to-gene name associations have been established by outside collaborators or NCBI staff.

REVIEWED The RefSeq record has been reviewed by NCBI staff or by a collaborator. The NCBI review process includes assessing available sequence data and the literature. Some RefSeq records may incorporate expanded sequence and annotation information.

VALIDATED The RefSeq record has undergone an initial review to provide the preferred sequence standard. The record has not yet been subject to final review, at which time additional functional information may be provided.

WGS The RefSeq record is provided to represent a collection of whole genome shotgun sequences. These records are not subject to individual review or revisions between genome updates.

RefSeq status codes

Page 7: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes
Page 8: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Using Entrez Limits to restrict a query to RefSeq

Page 9: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

http://www.ncbi.nlm.nih.gov/gene

Gene maintains information about genes from genomes of interest to the RefSeq group

Page 10: The Reference Sequence database A non-redundant collection of richly annotated DNA, RNA, and protein sequences from diverse taxaDNARNA The collection includes

Find genes by... Search textfree text human muscular dystrophypartial name and multiple species transporter[title] AND ("Drosophila melano

gaster"[orgn] OR "Mus musculus"[orgn])

chromosome and symbol (II[chr] OR 2[chr]) AND adh*[sym]

associated sequence accession number M11313[accn]

gene name (symbol) BRCA1[sym]publication (PubMed ID) 11331580[PMID]Gene Ontology (GO) terms or identifiers "cell adhesion"[GO]

10030[GO] Genes with variants of medical interest gene_snp_clin[filter]

chromosome and species Y[CHR] AND human[ORGN]Enzyme Commission (EC) numbers 1.9.3.1[EC]

Entrez Gene is accessed like any other Entrez database: