3
Araport is an initiative funded by the NSF and the BBSRC, and was established after a series of community workshops [1] to give Arabidopsis and plant scientists direct access to a new generation web- based data platform. Users can browse and analyze a wide array of data already available through Araport, and publish their own data modules for community sharing and building analysis workflows. The Araport data platform consists of three major components: ThaleMine, JBrowse, and Science Apps [2, 3]. Find out more at, https://www.araport.org . Araport has completed comprehensive updates to both structural and functional annotation of the Col-0 genome as Araport11that has been officially released by NCBI as of June 2016. Araport11 contains revisions to gene structures and isoforms derived from 113 publicly available RNA-seq datasets, as well as non-coding genes, upstream ORFs, pseudogenes, along with annotation contributions from NCBI, UniProt, and individual research groups. The Araport11 data release is available through ThaleMine, JBrowse, Science Apps, and FTP download. Find out more at, https://www.araport.org/data/araport11. n C a teg ories TAIR10 A rap ort1 1 (A ) P rotein -cod in g g en e N um ber ofloci 27,416 27,655 N um ber oftranscripts 35,386 48,359 N um ber oflociw ith > = 2 splice variants 5,804 (18% ) 10,696 (39% ) (B ) N oncoding gene Long intergenic noncoding R N A (lincR N A) 36 2,444 N aturalantisense transcript (N AT) 223 1,115 M icroR N A (m iRN A) 177 325 Sm allnucleolar R N A (snoR N A) 71 287 Sm allnuclear R N A (snR N A) 13 82 tRN A 689 689 rRN A 15 15 O ther R N A 394 221 (C ) G enom ic featu re Sm allRNA 35,846 N oveltranscribed region 508 U pstream open reading fram e 58 84 O bsolete lociw ith short coding sequence 388 ThaleMine is based on the popular model organism data warehouse GMOD InterMine. ThaleMine currently houses a wide array of Arabidopsis genomic information including RNA-seq expression (NCBI SRA), array expression (BAR), coexpression (ATTED), orthologs (Phytozome, Panther), protein interactions (IntAct, BioGrid), pathways (KEGG), publications (NCBI, UniProt), etc. Germplasm and phenotypes (ABRC/TAIR) have also been integrated into the Gene Report page. Users can browse Gene Reports, analyze gene lists for feature enrichment, run data queries (prebuilt or customized), export data tables, and save/share. gene lists and data queries. Find out more at https://apps.araport.org/thalemine . The GMOD JBrowse is a next generation fast response genome browser. The Araport JBrowse hosts a collection of over 100 data tracks of which some data sources are accessed in real-time. Data tracks include the latest Araport11 gene structure updates and over 100 RNA-seq The Arabidopsis Information Portal Araport11 Genome Annotation ThaleMine Data Warehouse JBrowse Genome Browser

2016 Summer - Araport Project Overview Leaflet

  • Upload
    araport

  • View
    202

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2016 Summer - Araport Project Overview Leaflet

Araport is an initiative funded by the NSF and the BBSRC, and was established after a series of community workshops [1] to give Arabidopsis and plant scientists direct access to a new generation web-based data platform. Users can browse and analyze a wide array of data already available through Araport, and publish their own data modules for community sharing and building analysis workflows. The Araport data platform consists of three major components: ThaleMine, JBrowse, and Science Apps [2, 3]. Find out more at, https://www.araport.org.

Araport has completed comprehensive updates to both structural and functional annotation of the Col-0 genome as Araport11that has been officially released by NCBI as of June 2016. Araport11 contains revisions to gene structures and isoforms derived from 113 publicly available RNA-seq datasets, as well as non-coding genes, upstream ORFs, pseudogenes, along with annotation contributions from NCBI, UniProt, and individual research groups. The Araport11 data release is available through ThaleMine, JBrowse, Science Apps, and FTP download. Find out more at, https://www.araport.org/data/araport11.

nCategories TAIR10 Araport11

(A) Protein-coding geneNumber of loci 27,416 27,655Number of transcripts 35,386 48,359Number of loci with >=2 splice variants 5,804 (18%) 10,696 (39%)(B) Noncoding geneLong intergenic noncoding RNA (lincRNA) 36 2,444Natural antisense transcript (NAT) 223 1,115MicroRNA (miRNA) 177 325Small nucleolar RNA (snoRNA) 71 287Small nuclear RNA (snRNA) 13 82tRNA 689 689rRNA 15 15Other RNA 394 221(C) Genomic featureSmall RNA 35,846Novel transcribed region 508Upstream open reading frame 58 84Obsolete loci with short coding sequence 388

ThaleMine is based on the popular model organism data warehouse GMOD InterMine. ThaleMine currently houses a wide array of Arabidopsis genomic information including RNA-seq expression (NCBI SRA), array expression (BAR), coexpression (ATTED), orthologs (Phytozome, Panther), protein interactions (IntAct, BioGrid), pathways (KEGG), publications (NCBI, UniProt), etc. Germplasm and phenotypes (ABRC/TAIR) have also been integrated into the Gene Report page. Users can browse Gene Reports, analyze gene lists for feature enrichment, run data queries (prebuilt or customized), export data tables, and save/share. gene lists and data queries. Find out more at https://apps.araport.org/thalemine.

The GMOD JBrowse is a next generation fast response genome browser. The Araport JBrowse hosts a collection of over 100 data tracks of which some data sources are accessed in real-time. Data tracks include the latest Araport11 gene structure updates and over 100 RNA-seq datasets used in the Araport genome reannotation effort, TDNA-seq (Ecker lab), population variants (1001 Genomes Project), epigenetic marks (EPIC-CoGe), chromatin states (Gutierrez lab), sequence conservation (VISTA) plots (Phytozome), and more. In addition to the large collection of data tracks from which to choose, users can also upload and view their own sequence alignments (e.g. BAM files) or genomic features (e.g. GFF files) for side-by-side comparison with the reference genome annotation. New mechanisms for community users to easily publish persistent data tracks for public sharing are being developed. Find out more at, https://apps.araport.org/jbrowse.

The Arabidopsis Information Portal

Araport11 Genome Annotation

ThaleMine Data Warehouse

JBrowse Genome Browser

Page 2: 2016 Summer - Araport Project Overview Leaflet

A growing collection of modules providing data or analysis capabilities will serve as building blocks for creating discovery workflows. The modules will be built and shared by the research community for interoperability and reuse. Users can review the currently available set of Science Apps, and provide comments and feedback on what modules they would like to see or contribute. More at, https://www.araport.org/apps/catalog.

Araport is an open source science resource and maintains an active GitHub repository. The community is not only invited to contribute and expand functionalities, but is also empowered to do so. Araport developers exploit cutting edge technologies such as CyVerse/iPlant, Agave, Adama, git, jQuery, Bootstrap, Docker, and Swagger. Araport hosts developer workshops, and hack-a-thons to help the community enrich and exploit this resource. Find out more at, https://www.araport.org/devzone.

[1] International Arabidopsis Informatics Consortium. (2012). Taking the next step: building an Arabidopsis information portal. The Plant Cell, 24(6), 2248-2256. PMID: 22751211

[2] Krishnakumar et al. (2014). Araport: the Arabidopsis Information Portal. Nucl. Acids Res., 43(D1), D1003-D1009. PMID: 25414324

[3] Hanlon et al. (2015). Araport: an application platform for data discovery. Concurrency Computat.: Pract. Exper., doi: 10.1002/cpe.3542.

Powered by

Funded by

Science Apps

Developer Zone

References

Last modified: June 23,

Science AppVisualization of BAR Interaction data

https://www.araport.orgEmail: [email protected]

Twitter: @araportorg

https://www.araport.org