13
The shinySISPA Manual Introduction shinySISPA is a web-based tool intended for the researchers who are interested in defining samples with similar gene set enrichment profiles. The shinySISPA tool is based on a novel SISPA method published in our previous paper (Kowalski, et al., 2016). SISPA defines samples as with and without profile activity, on average, among genes defined as a set, by applying a change point model (Killick, et. al., 2016) to a composite, between features zscore obtained by adding or subtracting individual sample zscores between features, depending on their corresponding profiles (Fig 1; see Kowalski, et al., 2016 for method details). The individual sample zscores are computed using the Bioconductor R package GSVA (Hänzelmann, et al., 2013) for more than or equal to three genes. Figure 1. Schematic representation of theSISPA method The tool is hosted on a 64bit CentOS 6 server (http://shinygispa.winship.emory.edu/shinySISPA/) running the Shiny Server program designed to host R Shiny applications. This tool has been extensively tested on Windows 7 and Mac Pro 10 operating system with firefox and chrome browser. Given a dataset of 377 samples and 16 genes under the two-feature analysis, it took three seconds to obtain shinySISPA defined sample groups and less than a second to generate the waterfall plot. The time it takes to generate sample profile diagnostic plots depends on the number of genes in a set; it took less than 10 seconds for 16 genes in both sets of a two-feature analysis. Running shinySISPA on a local computer: 1) Download and install R or RStudio (version 3.3.2. or later) from https://cran.r- project.organd 2) Open R and install the below required packages: > install.packages(c("shiny", "GSVA", “genefilter”, "changepoint", "data.table", “ggplot2”, "plyr"))

The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

TheshinySISPAManual

IntroductionshinySISPA is a web-based tool intended for the researchers who are interested in definingsampleswithsimilargenesetenrichmentprofiles.TheshinySISPAtoolisbasedonanovelSISPAmethodpublishedinourpreviouspaper(Kowalski,etal.,2016).SISPAdefinessamplesaswithandwithoutprofileactivity,onaverage,amonggenesdefinedasaset,byapplyingachangepointmodel (Killick, et. al., 2016) to a composite, between features zscore obtained by adding orsubtracting individual sample zscores between features, depending on their correspondingprofiles(Fig1;seeKowalski,etal.,2016formethoddetails).TheindividualsamplezscoresarecomputedusingtheBioconductorRpackageGSVA(Hänzelmann,etal.,2013)formorethanorequaltothreegenes.

Figure1.SchematicrepresentationoftheSISPAmethod

Thetoolishostedona64bitCentOS6server(http://shinygispa.winship.emory.edu/shinySISPA/)running the Shiny Server program designed to host R Shiny applications. This tool has beenextensively testedonWindows7 andMacPro10operating systemwith firefox and chromebrowser.Givenadatasetof377samplesand16genesunderthetwo-featureanalysis,ittookthreesecondstoobtainshinySISPAdefinedsamplegroupsandlessthanasecondtogeneratethewaterfallplot.Thetimeittakestogeneratesampleprofilediagnosticplotsdependsonthenumberofgenesinaset;ittooklessthan10secondsfor16genesinbothsetsofatwo-featureanalysis.RunningshinySISPAonalocalcomputer:1) DownloadandinstallRorRStudio(version3.3.2.orlater)fromhttps://cran.r-

project.organd2) OpenRandinstallthebelowrequiredpackages:

>install.packages(c("shiny","GSVA",“genefilter”,"changepoint","data.table",“ggplot2”,"plyr"))

Page 2: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

3) UserscanrunshinySISPAlocallyusingthesourcecodeavailablefromtheGitHub:https://github.com/BhaktiDwivedi/shinySISPA,bytypingthebelowcommandsinRconsole:

>library(shiny)>runApp("shinySISPA")4) UserscanalsodownloadandruntheappfromGitHubdirectlyusing:>shiny::runGitHub('shinySISPA','BhaktiDwivedi')ExampleSampleDataset We have used publically available multiple myeloma (MM) patient data from the ongoingMultiple Myeloma Research Foundation (MMRF) CoMMpass clinical trial(https://www.themmrf.org/) to identify sample groups using GISPA derived gene setcharacterizingIgHtranslocation,t(14;16)intheMMcelllines(Kowalski,et.al.,2016).Datafrom377patientswithavailableclinicaloutcome,RNA-seqexpression,andcopynumberchangeatpre-treatment was downloaded from the MMRF researcher gateway portal(https://research.themmrf.org) based on the IA6 release of this trial. The expression isnormalized log2transformedcountdata,whilecopynumber isthemeansegmentvalue.Thedata is available to download from GitHub (https://github.com/BhaktiDwivedi/shinyGISPA).Userscanalsoaccessandanalyzethisdatasetbychoosingthe“Exampledata”fromthe“DataInput”optionunderthetwo-featureAnalysisTypeontheweb-interface.GettingStarted1. Selecttheanalysistype

Clickonoptionsunder“AnalysisType”toselectasinglefeatureortwo-featureanalysis.Herefeatureisdefinedbyadatatype(e.g.,expression,somaticmutations).

Page 3: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

An example of single-feature analysis could be identifying samples with increased (ordecreased) expression profilewithin the defined gene set signature, while a two-featureanalysiscouldbebasedonacombinationofanytwodatatypes,e.g.,identifyingsamplesthatexhibitdecreasedgeneexpressionanddecreasedcopychange.

2. UploadtheInputData

1

Page 4: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

Uploadtheinputdatafilegiventheselecteddatatypefrom(1).Useruploadstheinputdatafileforgenesofinterestandprofiletodefinesamplesonwithinthedatatype.Thegenesetscanbedefinedbasedonpriorknowledgederivedfromeitherbiologicalprocesses,pathways,biomarkersdiscovery,genomicanalysis,orintegratedgenesetanalysis(e.g.,GISPA).Aprofileisagenomicchangeofeitherincrease(“up”)ordecrease(“down”)withinaspecificfeatureordatatype.

Fileformatrequirements:

• Maximumfilesizelimitofupto200MB.• ASCII formatted tab-delimited file, where each row represents a gene and each

columnasample.• Here isanexampleofauseruploaded ‘File Input’ foronedata typeanalysis.First

columnisgeneidfollowedbysamplesdataasshowninthescreenshotbelow:

• Hereisanexampleofuseruploaded‘FileInput1’andFileInput2’fortwodatatypeanalysis.Foreachinputdatafile,firstcolumnisgeneidfollowedbysamplesdataasshowninthescreenshotbelow:

s1 s2 s3 s4 s5g1 6.79 8.25 10.27 11.13 9.70g2 12.05 9.40 11.34 11.71 9.62g3 0.00 0.00 0.00 0.00 0.00g4 5.15 5.72 6.04 4.68 3.41g5 8.96 5.09 6.10 1.70 6.03

Genenames

Samplenames

DataPoints

s1 s2 s3 s4 s5v1 0.50 0.23 0.40 0.60 0.71v2 0.20 0.50 0.34 1.00 0.62v3 0.00 0.30 0.00 1.00 0.00v4 0.15 0.55 0.60 0.68 0.41v5 0.40 0.09 0.10 0.70 0.03v6 0.05 0.90 0.00 0.65 0.22

s1 s2 s3 s4 s5g1 6.79 8.25 10.27 11.13 9.70g2 12.05 9.40 11.34 11.71 9.62g3 0.00 0.00 0.00 0.00 0.00g4 5.15 5.72 6.04 4.68 3.41g5 8.96 5.09 6.10 1.70 6.03

Genenames

Samplenames

DataPoints

GenevariantID’s

Samplenames

DataPoints

Page 5: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

• WhenrunningshinySISPAontwodatatypes,thesamplesmustoverlapbetweenthe

twoinputdatafiles.Thetwoinputdatafilesmusthavethesameexactnumberofsamples and sample (or column) names. Rows that corrospond to genes (probes,variants,oranyotherid)mayormaynotbethesame.

• Aminimumofatleastonegeneandtensamplesarerequired.• Noduplicatedcolumnnamesorduplicatedrownamesareallowedandanalysiswill

bestopped.• Genes (or rows) with zero variance across all samples will be excluded from the

analysis.

SnapshotofshinySISPAusingSingle-featureanalysis:

2

Click on “Choose File” option under “File Input 1” to upload a single data file

Select the desired “Sample Profile” to define the sample profile of interest. Here usercan select either “up” or “down” profile to define samples with increased or decreaseddata change in the user input gene set, i.e., to identify samples with increasedexpression within the gene set; or samples with increased variant change within thegene set.

Page 6: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

SnapshotofSISPAusingTwo-featureanalysis:

3. SelecttheChangepointMethod

Usercanspecify/modifythechangepointdetectionmethod(KillickR,etal.,2016), i.e.,tofind the optimal break pointswithin the estimated profile sample score (Kowalski, et al.,2016).Thechangescanbefoundinmeanand/orvarianceusingtheuser-specifiedmethod(“AMOC”,“BinSeg”,“PELT”,or“SeqNeigh”)giventheallottedmaximumnumberofchangepoints.Notethat thenumberofchangepoints identifiedmaydiffer for thesamedatasetdependingonthechangepointRpackageversioninstalledonthesystem.Currentlywearerunningchangepointversion2.2.2onourhostingserver.

Click on “Choose File” option under “File Input 1” to upload the First data type file

Select the desired “Sample Profile” to define the sample profile for first data type.Here user can select either “up” or “down” profile to define samples with increasedor decreased data change in the user input data.

Click on “Choose File” option under “File Input 2” to upload the Second data type file

Select the desired “Sample Profile” to define the sample profiles from the seconddata type. Here user can select either “up” or “down” profile to define samples withincreased or decreased data change in the user input data.

2

Page 7: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

4. Result

Theresultsareoutputinfourtabs:

4.1. InputData4.2. SISPAResults4.3. WaterfallPlot4.4. SampleProfile

4.1. InputData

Summarizestheuserinputdataintermsoftheinputnumberofgenes(orrows),numberofsamples (or columns), and number of genes (or rows) with zero variance among all thesamplestobeexcludedfromtheanalysis.Dataforonlythefirstfivegenesoverallsamplesisshownandvisualizedasboxplot.

3

Select method (“AMOC”, “PELT”, “SegNeigh”, or “BinSeg”) for change point identificationin the data setSelectmaximum number of change points to search for using the method

Select changes inmean, variance, or both for change point identification

Page 8: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

4.2. SISPAResults

Tableofdefinedsampleprofileswiththeirgenesetenrichmentscorefortheselecteddatatypeandsampleprofileofinterest.Theenrichmentscoresforeachsamplearecomputedusingthezscoremethod(Hänzelmann,et.al.,2013).Thescorestatisticsisrankorderedbytheuserselectedprofile(e.g.,upordown)foreachsample.Achangepointmodel(Killick,et.al.,2016)isthenappliedtothesamplescorestoidentifygroupsofsamplesthatshowsimilargenesetprofile.

4.1

Page 9: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

Samplesassignedavalueof1inthe‘sample_group’columnarethesampleswiththeuserselectedprofileactivity,whilesamplesassignedavalueof“0”arethesampleswithouttheprofile activity. The results table canbe searched, sorted, and filteredby anyof the fourcolumns.Thescatterplotontherightdisplaysallthechangepointsdetectedwithinthedata-set,samplesfallinginthetopmostchangepointarethesampleswiththeprofileactivity.Thefrequencyplotattherightmostbottomrepresentsthedistributionofthenumberofsampleswithandwithouttheprofileactivity.

4.3. WaterfallPlotVisualrepresentationofallsamplessortedbytheirzscores.Sampleswiththeprofileactivityhavethehighestscoreandareshowninorangefilledbars,whilesampleswithouttheprofileactivityare shownwithgrey-filledbars. Forany selected sampleprofile,whether “up”or“down”,theorange-filledbarswiththehighestscoresarethemostdesirableandrepresentsampleswithprofileactivity.

4.2

Page 10: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

4.4. SampleProfileRepresents thedistributionof theuser-inputdataoverall by the samplegroupswithandwithouttheprofileactivity.Thisenablestoassesswhetherornot,onanaverage,theentiregenesetdefinesthesamplegroupsbytheuserselectedsampleprofile.Withinthe“SampleProfile” tab, user also have the option to select a “Gene to Display” from the “ChooseOptions”dropdownmenuontherightmostpanel,toseetheinputdatadistributionbyeachsamplesortedwithineachsampleprofile.Thisenablestheusertoidentifythegene(s)thataremostdistinguishingamongsampleswithandwithouttheprofileactivity.

4.3

Page 11: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

5. SaveResultsUsercandownloadtheresultsasoneexcelfilebyclickingonthe“Save”buttonontheleftpanel.Thedownloadincludestableandpdfplotsoftheresults.

4.4

Page 12: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

5

Page 13: The shinySISPA Manual - bbisr.shinyapps.winship.emory.edubbisr.shinyapps.winship.emory.edu/shinySISPA/shinySISPA_Manual.pdf · samples with similar gene set enrichment profiles. The

CitationPlease cite the SISPAmethodas: Kowalski J,Dwivedi B,NewmanS, Switchenko JM,PaulyR,GutmanDA,etal.Geneintegratedsetprofileanalysis:acontext-basedapproachforinferringbiologicalendpoints.NucleicAcidsResearch.2016;44(7):e69.ReferencesHänzelmann,S.,Castelo,R.andGuinney,A.GSVA:genesetvariationanalysisformicroarrayandRNA-seqdata.BMCBioinformatics2013;14(7).Killick R, Eckley IA. changepoint: An R Package for Changepoint Analysis. J Stat Softw.2014;58(3):1-19.KowalskiJ,DwivediB,NewmanS,SwitchenkoJM,PaulyR,GutmanDA,etal.Geneintegratedsetprofile analysis: a context-based approach for inferring biological endpoints. Nucleic AcidsResearch.2016;44(7):e69.