Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CyVerse DocumentationRelease 0.1b.0
CyVerse
Jan 18, 2021
Welcome to CyVerse
1 Getting Started 3
2 Platform Guides 52.1 Discovery Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Data Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 DNA Subway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.5 BisQue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.6 SciApps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.7 Science APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.8 VICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Tool and App Integration 9
4 Quick Start Guides 11
5 Tutorials 135.1 VICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Discovery Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145.3 SciApps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.4 Atmosphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Workshops 17
7 Webinars 19
8 Contributing to the Learning Center 21
9 Power Users 239.1 Letter of Support: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.2 CyVerse’s APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.3 External Collaborative Partnerships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239.4 Powered by CyVerse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
i
ii
CyVerse Documentation, Release 0.1b.0
Learning Center Home
Welcome to the CyVerse Learning Center
The CyVerse Learning center is a release of our learning materials in the popular “Read the Docs” formatting. Weare transitioning our leaning materials from our wiki into this format to make them easier to search, use, and update.We will be making regular contributions to these materials, and you can suggest new materials or create and shareyour own. If you have ideas or suggestions please email [email protected]. You can also view, edit, and submitcontributions on .
•
•
• Getting Started Webinars - Watch recent webinars
• About CyVerse
CyVerse Homepage:
Funding and Citations:
CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.
Please cite CyVerse appropriately when you make use of our resources, see .
Fix or improve this documentation
• On Github:
• Send feedback: [email protected]
Learning Center Home
Welcome to the CyVerse Learning Center
Welcome to CyVerse 1
CyVerse Documentation, Release 0.1b.0
The CyVerse Learning center is a release of our learning materials in the popular “Read the Docs” formatting. Weare transitioning our leaning materials from our wiki into this format to make them easier to search, use, and update.We will be making regular contributions to these materials, and you can suggest new materials or create and shareyour own. If you have ideas or suggestions please email [email protected]. You can also view, edit, and submitcontributions on .
2 Welcome to CyVerse
CHAPTER 1
Getting Started
•
•
• Getting Started Webinars - Watch recent webinars
• About CyVerse
CyVerse Homepage:
Funding and Citations:
CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.
Please cite CyVerse appropriately when you make use of our resources, see .
Fix or improve this documentation
• On Github:
• Send feedback: [email protected]
Learning Center Home
3
CyVerse Documentation, Release 0.1b.0
4 Chapter 1. Getting Started
CHAPTER 2
Platform Guides
CyVerse offers an interconnected series of platforms, tools and services. These guides will help you navigate thetop-level user platforms.
2.1 Discovery Environment
Use hundreds of bioinformatics apps and manage data in the CyVerse Data Store from a simple web interface
•
•
2.2 Atmosphere
Cloud computing with CyVerse
•
•
2.3 Data Store
A unified system for managing and sharing your data across CyVerse’s tools and services
•
•
5
CyVerse Documentation, Release 0.1b.0
2.4 DNA Subway
Educator-focused access to data and informatics tools for modern biology
•
2.5 BisQue
Bio-Image Semantic Query User Environment for the exchange and exploration of image data
•
2.6 SciApps
A web-based platform for reproducible bioinformatics workflows
•
2.7 Science APIs
CyVerse provides programmatic access to its services through multiple APIs (application programming interfaces),access points with various levels of complexity
• : Access to CyVerse resources
• : Access to TACC HPC resources
2.8 VICE
Visual Interactive Computing Environment VICE introduces graphic user interfaces (GUIs) and common IntegratedDevelopment Environments (IDEs) such as Project Jupyter Notebooks & Lab, RStudio, Shiny Apps and Linux Desk-top
•
Funding and Citations:
CyVerse is funded by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442.
Please cite CyVerse appropriately when you make use of our resources, CyVerse citation policy
Fix or improve this documentation:
• On Github:
• Send feedback: [email protected]
6 Chapter 2. Platform Guides
CyVerse Documentation, Release 0.1b.0
Learning Center Home
2.8. VICE 7
CyVerse Documentation, Release 0.1b.0
8 Chapter 2. Platform Guides
CHAPTER 3
Tool and App Integration
You can contribute to CyVerse - Here are documentation pieces of interest in developing new applications.
Documen-tation
Platform(s) Notes
Visual Interactive Compute En-vironment
Quick guide to developing for VICE.
Discovery Environment, VICE,Atmosphere
A short guide to Docker and creating your own containerizedapplications.
Discovery Environment, VICE A quick start guide to integrating different types of tools inDiscovery Environment.
Fix or improve this documentation:
• On Github:
• Send feedback: [email protected]
Learning Center Home
9
CyVerse Documentation, Release 0.1b.0
10 Chapter 3. Tool and App Integration
CHAPTER 4
Quick Start Guides
These include short guides through common tasks.
Quick-start
Platform Notes
User Portal Start here to create your own accountAtmosphere andJetstream
Install anaconda (Python 2 or 3, R, Jupyter notebooks), Rstudio, Singularity, orDocker easily on any Atmosphere or Jetstream cloud computer (instance).
CyVerse DataStore
Access through Discovery Enviornment, Command Line, Cyberduck
Data Store, Dis-covery Environ-ment
Organize your dataset and request a DOI (Digital Object Identifier).
Data Store, Dis-covery Environ-ment
Learn the basic steps for setting up a collaborative project using CyVerse.
Discovery Envi-ronment
Quick start guide for integrating executable tools in DE.
Discovery Envi-ronment
Quick start guide for integrating Open Science Grid (OSG) tools in DE.
Discovery Envi-ronment
Quick start guide for integrating interactive (VICE) tools in DE.
Funding and Citations:
CyVerse is funded entirely by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383,and DBI-1743442.
Please cite CyVerse appropriately when you make use of our resources, .
Fix or improve this documentation:
11
CyVerse Documentation, Release 0.1b.0
• On Github:
• Send feedback: [email protected]
Learning Center Home
12 Chapter 4. Quick Start Guides
CHAPTER 5
Tutorials
These are involved tutorials that cover popular science workflows.
5.1 VICE
Tuto-rial
Date Notes
Nov. 8,2019
Perform RNAseq differential expression analysis using Read Mapping and Transcript Assem-bly (RMTA) and Rstudio-DESEq2 apps
13
CyVerse Documentation, Release 0.1b.0
5.2 Discovery Environment
Tutorial NotesKallisto is a quick, highly-efficient software for quantifying transcript abundances in an RNA-Seqexperiment. Sleuth is designed to analyze and visualize the Kallisto results in R.This tutorial is a step-by-step guide for using SciApps to perform MAKER based annotationThe NCBI Sequence Read Archive (SRA) is a repository for high-throughput sequencing reads.These are valuable data for novel analysis and reuse. You can directly import data from SRA intoyour Data Store using a Discovery Environment app.FastQC is a popular tool for evaluating the quality of high-throughput sequencing reads such asfrom Illumina and PacBio.Trimmomatic is a popular application for filtering and trimming high- throughput sequencing reads.Several functions can remove populations of low quality reads, remove sequencing adaptors, andtrim low-quality regions of individual reads.The SRA is a canonical repository for sequencing data generated by high-throughput instruments.The CyVerse submission pipeline allows you to directly submit your data into an SRA-linked Bio-Project.Commonly used procedure for de novo whole genome assembly of Illumina reads using the DE:Assemble reads, Assess assemblyReduce number of transcripts and level of redundancy in an assembled transcriptome, and identifycoding sequences that can be submitted to BLASTP searches.Identify changes in gene expression levels between at least two sequenced transcriptome samples(18 separate tutorials)Input entire protein-encoding gene or transcript repertoires from genomes of interest, and clus-ter homologs (orthologs and paralogs), then query clusters to assemble gene sets based on pres-ence/absence and copy number.
|DiscoverVariantsUsing SAMTools|
Detect and call variants from sequence reads using Bowtie and SAM Tools.
Clean and filter Illumina reads using DE apps.Learn to identify genetic variants that are associated with a trait.Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more gen-erally of target sequences using high-throughput sequencing reads. It is based on the novel idea ofpseudoalignment for rapidly determining the compatibility of reads with targets, without the needfor alignment.Gain familiarity with a commonly used procedure for de novo whole genome assembly of Illuminareads using the DE.QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from rawDNA sequencing data.Become familiar with TNRS to identify, correct, and update scientific names of plants.An automated quality control analysis tool for a single and paired-end high-throughput sequencingdata (HTS) generated from Illumina sequencing platforms
14 Chapter 5. Tutorials
CyVerse Documentation, Release 0.1b.0
5.3 SciApps
Tuto-rial
Notes
A genome-wide association study (or GWAS) workflow using TASSEL, EMMAX, and MLMM formixed model analysis.
5.4 Atmosphere
Tu-to-rial
Notes
Use next generation sequence data produced from Reduced Representation Libraries (RRL) such as Restric-tion site associated (RAD) tags.Introduce new users to BATools and the BATools Wrapper Script.Evolinc is a two-part pipeline to identify lincRNAs from an assembled transcriptome file (.gtf output fromcufflinks) and then determine the extent to which those lincRNAs are conserved in the genome and tran-scriptome of other species.Introduce new users to the FaST-LMM software for GWAS analysis.fastStructure is a fast algorithm for inferring population structure from large SNP genotype data. It is basedon a variational Bayesian framework for posterior inference and is written in Python2.x.Install R packages on Atmosphere: Launch instance, transfer files to instance, install R package, requestimaging.Learn how to annotate and identify using KOBAS 2.0.QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA se-quencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or otherplatforms through publication quality graphics and statistics. QIIME has been applied to studies based onbillions of sequences from tens of thousands of samples.QUAST is a tool for evaluating genome assemblies by computing various metrics.rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene data database. Inaddition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo qualityassessment using third-party software (STAR, TopHat, GMAP etc.).rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene data database. Inaddition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo qualityassessment using third-party software (STAR, TopHat, GMAP etc.).Learn to navigate the Validate Workflow.
Fix or improve this documentation:
• On Github:
• Send feedback: [email protected]
5.3. SciApps 15
CyVerse Documentation, Release 0.1b.0
Learning Center Home
16 Chapter 5. Tutorials
CHAPTER 6
Workshops
These are workshop formatted tutorials that can be used and/or remixed in running your own CyVerse workshop.
Work-shop
Platform(s) Notes
Discovery Environ-ment, VICE, DataStore
Introductory workshop on using Cyverse for the
Discovery Environ-ment, VICE, DataStore
Collaboration between CyVerse and NEON to use remote sensing data in RStudioand Python
Discovery Environ-ment, Atmosphere,VICE, Data Store
Topics on container technology for reproducible science
Discovery Environ-ment, Atmosphere,VICE, Data Store
Workshop to train early career scientists on using advanced cyberinfrastructureto advance their research
Discovery Environ-ment, Atmosphere,VICE, Data Store
Topics on container technology for reproducible science.
Discovery Environ-ment, Atmosphere,VICE, Data Store
Workshop to train new PIs on advanced cyberinfrastructure
Discovery Environ-ment, Atmosphere,Data Store
This is a generic agenda and slides for a one-day CyVerse Workshop overviewingthe major components of the science infrastrutcure.
Discovery Environ-ment, Atmosphere
Provision Atmosphere as a Data Science Workbench running Docker, Singular-ity, Project Jupyter, and RStudio-Server. The focus is on remote sensing andreproducible workflows in Python and R.
Discovery Environ-ment, VICE, DataStore
A short introduction to using R and RStudio
17
CyVerse Documentation, Release 0.1b.0
Fix or improve this documentation
• On Github:
• Send feedback: [email protected]
Learning Center Home
18 Chapter 6. Workshops
CHAPTER 7
Webinars
Follow this link to upcoming and past CyVerse Webinars: https://cyverse.org/webinars To search for webinars orga-nized into popular topics such as Genomic File Manipulation, Genome Annotation, Image Analysis/Phenotyping andmore, view our Playlists: https://cyverse.org/webinars/playlists
Fix or improve this documentation
• On Github:
• Send feedback: [email protected]
Learning Center Home
19
CyVerse Documentation, Release 0.1b.0
20 Chapter 7. Webinars
CHAPTER 8
Contributing to the Learning Center
You can contribute to the Learning Center - everything from fixing a typo to adding new documentation pieces.
Tutorial Platform(s) NotesLearning Center Quick guide to simple contributions and creating new documentation pieces.
Fix or improve this documentation:
• On Github:
• Send feedback: [email protected]
Learning Center Home
21
CyVerse Documentation, Release 0.1b.0
22 Chapter 8. Contributing to the Learning Center
CHAPTER 9
Power Users
Power users are researchers who are looking to do more with CyVerse’s cyberinfrastructure by leveraging CyVerse’sservices for developing their own platforms, automation and scaling, deploying CyVerse’s infrastructure for their homeinstitution or country, and engaging in collaborative projects.
9.1 Letter of Support:
For letters of support and collaboration, email [email protected]
9.2 CyVerse’s APIs
There are several APIs to CyVerse’s resources:
• Terrain: RESTful API to many of CyVerse’s services (authentication, data store, VICE apps, DE apps):
– Swagger UI
– Juptyer Notebook
– Run the Jupyter Notebook in the DE
• Tapis: Formerly Agave, an indepedent API to the HPC resources CyVerse uses at TACC
9.3 External Collaborative Partnerships
The |External Collaborative Partnership| program pairs members of our user community with expert staff to addressthe computational needs of a specific scientific project. To participate, please review the required criteria and thencomplete the |ECP Request web form|. CyVerse does not provide funding support for external projects.
23
CyVerse Documentation, Release 0.1b.0
9.4 Powered by CyVerse
Third-party projects can leverage our cyberinfrastructure to provide services to their users, including:
• Authentication system: Use secure single sign-on between your application and all CyVerse services.
• Data Store: Store, share and distribute large amounts of data.
• High-performance framework: Execute analyses on High Performance Computing resources.
Get Powered by CyVerse Today
24 Chapter 9. Power Users