28
CyVerse Documentation Release 0.1b.0 CyVerse Jan 18, 2021

CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse DocumentationRelease 0.1b.0

CyVerse

Jan 18, 2021

Page 2: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately
Page 3: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

Welcome to CyVerse

1 Getting Started 3

2 Platform Guides 52.1 Discovery Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 DNA Subway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 BisQue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.6 SciApps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.7 Science APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.8 VICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Tool and App Integration 9

4 Quick Start Guides 11

5 Tutorials 135.1 VICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Discovery Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.3 SciApps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.4 Atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6 Workshops 17

7 Webinars 19

8 Contributing to the Learning Center 21

9 Power Users 239.1 Letter of Support: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.2 CyVerse’s APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.3 External Collaborative Partnerships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.4 Powered by CyVerse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

Page 4: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

ii

Page 5: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

Learning Center Home

Welcome to the CyVerse Learning Center

The CyVerse Learning center is a release of our learning materials in the popular “Read the Docs” formatting. Weare transitioning our leaning materials from our wiki into this format to make them easier to search, use, and update.We will be making regular contributions to these materials, and you can suggest new materials or create and shareyour own. If you have ideas or suggestions please email [email protected]. You can also view, edit, and submitcontributions on .

• Getting Started Webinars - Watch recent webinars

• About CyVerse

CyVerse Homepage:

Funding and Citations:

CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.

Please cite CyVerse appropriately when you make use of our resources, see .

Fix or improve this documentation

• On Github:

• Send feedback: [email protected]

Learning Center Home

Welcome to the CyVerse Learning Center

Welcome to CyVerse 1

Page 6: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

The CyVerse Learning center is a release of our learning materials in the popular “Read the Docs” formatting. Weare transitioning our leaning materials from our wiki into this format to make them easier to search, use, and update.We will be making regular contributions to these materials, and you can suggest new materials or create and shareyour own. If you have ideas or suggestions please email [email protected]. You can also view, edit, and submitcontributions on .

2 Welcome to CyVerse

Page 7: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 1

Getting Started

• Getting Started Webinars - Watch recent webinars

• About CyVerse

CyVerse Homepage:

Funding and Citations:

CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.

Please cite CyVerse appropriately when you make use of our resources, see .

Fix or improve this documentation

• On Github:

• Send feedback: [email protected]

Learning Center Home

3

Page 8: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

4 Chapter 1. Getting Started

Page 9: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 2

Platform Guides

CyVerse offers an interconnected series of platforms, tools and services. These guides will help you navigate thetop-level user platforms.

2.1 Discovery Environment

Use hundreds of bioinformatics apps and manage data in the CyVerse Data Store from a simple web interface

2.2 Atmosphere

Cloud computing with CyVerse

2.3 Data Store

A unified system for managing and sharing your data across CyVerse’s tools and services

5

Page 10: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

2.4 DNA Subway

Educator-focused access to data and informatics tools for modern biology

2.5 BisQue

Bio-Image Semantic Query User Environment for the exchange and exploration of image data

2.6 SciApps

A web-based platform for reproducible bioinformatics workflows

2.7 Science APIs

CyVerse provides programmatic access to its services through multiple APIs (application programming interfaces),access points with various levels of complexity

• : Access to CyVerse resources

• : Access to TACC HPC resources

2.8 VICE

Visual Interactive Computing Environment VICE introduces graphic user interfaces (GUIs) and common IntegratedDevelopment Environments (IDEs) such as Project Jupyter Notebooks & Lab, RStudio, Shiny Apps and Linux Desk-top

Funding and Citations:

CyVerse is funded by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442.

Please cite CyVerse appropriately when you make use of our resources, CyVerse citation policy

Fix or improve this documentation:

• On Github:

• Send feedback: [email protected]

6 Chapter 2. Platform Guides

Page 11: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

Learning Center Home

2.8. VICE 7

Page 12: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

8 Chapter 2. Platform Guides

Page 13: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 3

Tool and App Integration

You can contribute to CyVerse - Here are documentation pieces of interest in developing new applications.

Documen-tation

Platform(s) Notes

Visual Interactive Compute En-vironment

Quick guide to developing for VICE.

Discovery Environment, VICE,Atmosphere

A short guide to Docker and creating your own containerizedapplications.

Discovery Environment, VICE A quick start guide to integrating different types of tools inDiscovery Environment.

Fix or improve this documentation:

• On Github:

• Send feedback: [email protected]

Learning Center Home

9

Page 14: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

10 Chapter 3. Tool and App Integration

Page 15: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 4

Quick Start Guides

These include short guides through common tasks.

Quick-start

Platform Notes

User Portal Start here to create your own accountAtmosphere andJetstream

Install anaconda (Python 2 or 3, R, Jupyter notebooks), Rstudio, Singularity, orDocker easily on any Atmosphere or Jetstream cloud computer (instance).

CyVerse DataStore

Access through Discovery Enviornment, Command Line, Cyberduck

Data Store, Dis-covery Environ-ment

Organize your dataset and request a DOI (Digital Object Identifier).

Data Store, Dis-covery Environ-ment

Learn the basic steps for setting up a collaborative project using CyVerse.

Discovery Envi-ronment

Quick start guide for integrating executable tools in DE.

Discovery Envi-ronment

Quick start guide for integrating Open Science Grid (OSG) tools in DE.

Discovery Envi-ronment

Quick start guide for integrating interactive (VICE) tools in DE.

Funding and Citations:

CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.

Please cite CyVerse appropriately when you make use of our resources, .

Fix or improve this documentation:

11

Page 16: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

• On Github:

• Send feedback: [email protected]

Learning Center Home

12 Chapter 4. Quick Start Guides

Page 17: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 5

Tutorials

These are involved tutorials that cover popular science workflows.

5.1 VICE

Tuto-rial

Date Notes

Nov. 8,2019

Perform RNAseq differential expression analysis using Read Mapping and Transcript Assem-bly (RMTA) and Rstudio-DESEq2 apps

13

Page 18: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

5.2 Discovery Environment

Tutorial NotesKallisto is a quick, highly-efficient software for quantifying transcript abundances in an RNA-Seqexperiment. Sleuth is designed to analyze and visualize the Kallisto results in R.This tutorial is a step-by-step guide for using SciApps to perform MAKER based annotationThe NCBI Sequence Read Archive (SRA) is a repository for high-throughput sequencing reads.These are valuable data for novel analysis and reuse. You can directly import data from SRA intoyour Data Store using a Discovery Environment app.FastQC is a popular tool for evaluating the quality of high-throughput sequencing reads such asfrom Illumina and PacBio.Trimmomatic is a popular application for filtering and trimming high- throughput sequencing reads.Several functions can remove populations of low quality reads, remove sequencing adaptors, andtrim low-quality regions of individual reads.The SRA is a canonical repository for sequencing data generated by high-throughput instruments.The CyVerse submission pipeline allows you to directly submit your data into an SRA-linked Bio-Project.Commonly used procedure for de novo whole genome assembly of Illumina reads using the DE:Assemble reads, Assess assemblyReduce number of transcripts and level of redundancy in an assembled transcriptome, and identifycoding sequences that can be submitted to BLASTP searches.Identify changes in gene expression levels between at least two sequenced transcriptome samples(18 separate tutorials)Input entire protein-encoding gene or transcript repertoires from genomes of interest, and clus-ter homologs (orthologs and paralogs), then query clusters to assemble gene sets based on pres-ence/absence and copy number.

|DiscoverVariantsUsing SAMTools|

Detect and call variants from sequence reads using Bowtie and SAM Tools.

Clean and filter Illumina reads using DE apps.Learn to identify genetic variants that are associated with a trait.Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more gen-erally of target sequences using high-throughput sequencing reads. It is based on the novel idea ofpseudoalignment for rapidly determining the compatibility of reads with targets, without the needfor alignment.Gain familiarity with a commonly used procedure for de novo whole genome assembly of Illuminareads using the DE.QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from rawDNA sequencing data.Become familiar with TNRS to identify, correct, and update scientific names of plants.An automated quality control analysis tool for a single and paired-end high-throughput sequencingdata (HTS) generated from Illumina sequencing platforms

14 Chapter 5. Tutorials

Page 19: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

5.3 SciApps

Tuto-rial

Notes

A genome-wide association study (or GWAS) workflow using TASSEL, EMMAX, and MLMM formixed model analysis.

5.4 Atmosphere

Tu-to-rial

Notes

Use next generation sequence data produced from Reduced Representation Libraries (RRL) such as Restric-tion site associated (RAD) tags.Introduce new users to BATools and the BATools Wrapper Script.Evolinc is a two-part pipeline to identify lincRNAs from an assembled transcriptome file (.gtf output fromcufflinks) and then determine the extent to which those lincRNAs are conserved in the genome and tran-scriptome of other species.Introduce new users to the FaST-LMM software for GWAS analysis.fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is basedon a variational Bayesian framework for posterior inference and is written in Python2.x.Install R packages on Atmosphere: Launch instance, transfer files to instance, install R package, requestimaging.Learn how to annotate and identify using KOBAS 2.0.QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA se-quencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or otherplatforms through publication quality graphics and statistics. QIIME has been applied to studies based onbillions of sequences from tens of thousands of samples.QUAST is a tool for evaluating genome assemblies by computing various metrics.rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene data database. Inaddition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo qualityassessment using third-party software (STAR, TopHat, GMAP etc.).rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene data database. Inaddition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo qualityassessment using third-party software (STAR, TopHat, GMAP etc.).Learn to navigate the Validate Workflow.

Fix or improve this documentation:

• On Github:

• Send feedback: [email protected]

5.3. SciApps 15

Page 20: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

Learning Center Home

16 Chapter 5. Tutorials

Page 21: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 6

Workshops

These are workshop formatted tutorials that can be used and/or remixed in running your own CyVerse workshop.

Work-shop

Platform(s) Notes

Discovery Environ-ment, VICE, DataStore

Introductory workshop on using Cyverse for the

Discovery Environ-ment, VICE, DataStore

Collaboration between CyVerse and NEON to use remote sensing data in RStudioand Python

Discovery Environ-ment, Atmosphere,VICE, Data Store

Topics on container technology for reproducible science

Discovery Environ-ment, Atmosphere,VICE, Data Store

Workshop to train early career scientists on using advanced cyberinfrastructureto advance their research

Discovery Environ-ment, Atmosphere,VICE, Data Store

Topics on container technology for reproducible science.

Discovery Environ-ment, Atmosphere,VICE, Data Store

Workshop to train new PIs on advanced cyberinfrastructure

Discovery Environ-ment, Atmosphere,Data Store

This is a generic agenda and slides for a one-day CyVerse Workshop overviewingthe major components of the science infrastrutcure.

Discovery Environ-ment, Atmosphere

Provision Atmosphere as a Data Science Workbench running Docker, Singular-ity, Project Jupyter, and RStudio-Server. The focus is on remote sensing andreproducible workflows in Python and R.

Discovery Environ-ment, VICE, DataStore

A short introduction to using R and RStudio

17

Page 22: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

Fix or improve this documentation

• On Github:

• Send feedback: [email protected]

Learning Center Home

18 Chapter 6. Workshops

Page 23: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 7

Webinars

Follow this link to upcoming and past CyVerse Webinars: https://cyverse.org/webinars To search for webinars orga-nized into popular topics such as Genomic File Manipulation, Genome Annotation, Image Analysis/Phenotyping andmore, view our Playlists: https://cyverse.org/webinars/playlists

Fix or improve this documentation

• On Github:

• Send feedback: [email protected]

Learning Center Home

19

Page 24: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

20 Chapter 7. Webinars

Page 25: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 8

Contributing to the Learning Center

You can contribute to the Learning Center - everything from fixing a typo to adding new documentation pieces.

Tutorial Platform(s) NotesLearning Center Quick guide to simple contributions and creating new documentation pieces.

Fix or improve this documentation:

• On Github:

• Send feedback: [email protected]

Learning Center Home

21

Page 26: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

22 Chapter 8. Contributing to the Learning Center

Page 27: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CHAPTER 9

Power Users

Power users are researchers who are looking to do more with CyVerse’s cyberinfrastructure by leveraging CyVerse’sservices for developing their own platforms, automation and scaling, deploying CyVerse’s infrastructure for their homeinstitution or country, and engaging in collaborative projects.

9.1 Letter of Support:

For letters of support and collaboration, email [email protected]

9.2 CyVerse’s APIs

There are several APIs to CyVerse’s resources:

• Terrain: RESTful API to many of CyVerse’s services (authentication, data store, VICE apps, DE apps):

– Swagger UI

– Juptyer Notebook

– Run the Jupyter Notebook in the DE

• Tapis: Formerly Agave, an indepedent API to the HPC resources CyVerse uses at TACC

9.3 External Collaborative Partnerships

The |External Collaborative Partnership| program pairs members of our user community with expert staff to addressthe computational needs of a specific scientific project. To participate, please review the required criteria and thencomplete the |ECP Request web form|. CyVerse does not provide funding support for external projects.

23

Page 28: CyVerse Documentation€¦ · CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. Please cite CyVerse appropriately

CyVerse Documentation, Release 0.1b.0

9.4 Powered by CyVerse

Third-party projects can leverage our cyberinfrastructure to provide services to their users, including:

• Authentication system: Use secure single sign-on between your application and all CyVerse services.

• Data Store: Store, share and distribute large amounts of data.

• High-performance framework: Execute analyses on High Performance Computing resources.

Get Powered by CyVerse Today

24 Chapter 9. Power Users