Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
RoswellParkCancerInstituteDataScienceStrategicPlanMarch2017
v4.20170309
2
DATASCIENCETASKFORCERECOMMENDATIONSRestructureITtoincludeadistinct‘research’componentITrestructuringshouldextendtothetopoftheITadministrationinfrastructure,withanindividualleadertaskedwithenablingresearchatRPCI.Itshouldextendthroughthesupportandhelpdeskinfrastructureandstafftoincludeindividualsidentifiedas'research-enabled'.TheresearchportionofITshouldhaveit'sownstrategicplanandotheroperationalaspectsthatareonthesamefootingasthecurrentadministrative/clinicalinitiativetocreateacomprehensivestrategicplan.ITSecurityandtheLegaldepartmentmustdevelopa'NONO'policyITSecurityandLegaldepartmentsresponsetoresearcherneedsistodelivertheappropriatesolution,ratherthantoensnaretheresearcherincompleximplementationandpolicydetails.TheprimarypointofcontactfortheresearchershouldbetheIThelpdesk;negotiationsbetweensecurityandlegalshouldprimarilybe'behindthescenes'andthehelpdeskshouldbeempoweredtoescalaterecalcitrantissuestothehighestlevelsforpromptresolution.ITinfrastructureandsupportneedstoenableuseofhigh-throughputandscientificcomputing(e.g.,Linuxworkstation)resourcesThereisnoalternativetothisformoderninformatics.BuildadistributedcancerinformaticsknowledgeenvironmentAscalable‘datagrid’architecturewillprovideintegratedaccesstodataacrosstheRPCIenterprise,e.g.EHR,TumorRegistry,department-basedclinicaldatabases,researchdatabases,specimenmetadata,‘omicsdataarchives.Buildingthiscapabilityfacilitatesbothdevelopmentofcancerontologyanddatasharingrequiredfortranslationalresearchandmedicine.Implementafive-yeareducationalinitiativetoestablishaninformatics-capablecommunityThecomponentsofthisinitiativearecomprehensive:technicaltrainingonuseofcomputingresources;developmentofshortcoursesfocusingoninformaticsuseinanalysisandcomprehensionof'omicsdata;closetieswithrelevantUBacademicprograms;investigatoreducationonopportunitiesforintroducinginformaticsintotheirresearchprogramandfundingefforts;recruitmentofinformatics-trainedpost-docsandscientists.
v4.20170309
3
I.OverviewRoswellParkCancerInstitute(RPCI)isaworld-classNCI-designatedComprehensiveCancerCenterdedicatedtounderstanding,preventingandcuringcancer.Inadditiontodeliveringhigh-qualitymedicalcaretomorepeopleeffectively,oneoftheinstitute’scorestrategicgoalsisexcellenceinresearch.Theexplosionofdataintheareaofcancerresearchaswellasthelargeamountofclinicalinformationcollectedaspartofstandardofcarehaselevatedtheimportanceofeffectivelymanagingthequality,consistencyandhandlingofresearchandclinicaldata.Increasingly,accessisneededtolarge-scalecomputingandstoragedevices,togetherwithhigh-speednetworkswithglobalconnectivity.Enablingtheannotation,integrationandsharingofawidevarietyofdatatypesandsizessupportshigherorderanalyticsandvisualization,whichpositivelyimpactsqualityofresearchandpatientcare.RPCImuststructureandimplementcutting-edgedatasciencecapabilitytostayontheforefrontofcancerresearchandtreatment.Datasciencerequirementsarediverse.Individualresearchershavevaryingcomputingandinformationtechnologyneeds,includingdesktopandlaptopcomputersforindividualuse;dedicatedcomputerspairedwithspecificinstruments;softwareofvaryingdegreesofsophisticationandcomplexity;infrastructure,networkandsoftwaretofacilitatehigh-throughputpipelines;highperformancecomputerpower;andhighbandwidthnetworks.Increasingly,cancerresearchandhealthdeliverydependsondatabasetechnologyandanalysisofclinical,epidemiologic,pathologic,biologicalandoutcomesdata,resultingintheneedtocapture,manageandanalyzeverylargeheterogeneousdatasetsfromdiverse,geographicallydistributedsources.Infall2016,Dr.AdekunleOdunsiformedaDataScienceTaskForcetoreviewexistingdatasciencecapabilities,identifygaps,anddevelopacomprehensivedatasciencestrategyforRPCI.TheTaskForceincludedrepresentativesfromresearch,clinical,ethics,datagovernance,computingandinformationtechnologyareasacrosstheinstitution,andinputfromgroupsnotrepresentedontheTaskForcewassolicited.Environmentalscansinformedthedevelopmentofthestrategicplanbycatalogingdatarepositoriesofallsizeswithintheinstitution,evaluatingRPCIpoliciesthatimpactdatasciencecapability,assessingnetworkandcomputingcapacity,andgatheringinformationfromcommunitymemberswhoseinterestsandgovernanceimpactRPCI’sdatascienceenvironment.SummaryTableswithinformationfromtheenvironmentalscansmaybefoundintheAppendices.ThisdocumentservesastheRPCIdatasciencestrategicplan.
v4.20170309
4
II.VisionRPCIwillprovideauserfriendly,comprehensivedatascienceenvironmenttosupportandextendcuttingedgetranslationalcancerresearchbyestablishinganinformatics-capablecommunitythrougheducationalinitiatives,implementinganenablingresearchITinfrastructure,buildingdataconnectivity,andkeepingpacewithemergingtechnologies.III.StrategicGoalsA.LeverageexistingassetsRPCIhasmanydatasciencecapabilitiesandstrengths,including:
1. CoreCancerCenterSupportGrantanalyticresourcesa. Biostatisticsb. Bioinformaticsc. ClinicalResearchServices
2. ElectronicHealthRecord(EHR)3. ElectronicPathologysystems4. Electronicsystemstosupportclinicaladministration(e.g.scheduling,billing)5. InformationTechnology(IT)resources6. LaboratoryInformationManagementSystem(LIMS)7. ClinicalDataNetworkwithexpertiseindatastandards,cancerontologiesand
datamanagement8. CancerRegistrywithrichlongitudinaldata9. Analyticsoftwaredevelopment(e.g.RBioconductor,BioinformaticsResource)10. Clinicaltrialsdatamanagement11. Biorepositories(DBBR,PRN,Lymphoma,Leukemia,Ovary)12. NRGandALLIANCECancerCooperativeGroups13. PartnershipwiththeUniversityatBuffalo’sCenterforComputationalResources
Whiletheorganizationhassignificantstrengths,thecurrentdatascienceenvironmenthasinherentbarriersthatmakeitdifficulttomakethebestuseoftheseassets.Generally,dataaresiloed,difficulttoextractandsemanticallyheterogeneous.Theapproachofaddingmorevendorand/orinternally-developedtechnicalsolutionsadhocmayexacerbatetheproblem;increasingcomplexitytendstoresultinsiloed,uncoordinateddataandcapabilitiesandmakesdataintegrationanduseacrossplatformsmoredifficult.Tofacilitatestrongdatascience,RPCIneedsafinanciallysustainablesolution(ITresearchinfrastructure,technology,standards,processes,methods,andleadership)thatseamlesslyknitstogethertheseestablishedhighqualitydataassetsandanynewcomponentsthatwillbeaddedinthefuture,breakingdownsilos,supporting
v4.20170309
5
interoperabilityanddataaccessacrosssystemsanddomains,andenablingRPCItoachieveitsgoals.Fromadatascienceperspective,theinstitutionalgoalsinclude:
• Integrationofpatientreported,clinicalandresearchfindingstoenhancepatientengagement,treatmentandoutcomes
• Secureandtimelyaccesstoclinicaldataforresearchers• Accesstoandintegrationofdifferentkindsofdataonapatientorpopulationto
enablehighlevelanalytics• Abilitytomanage,shareandanalyzehighthroughput,highvolumeandbigdata• Integrationofderivativedatawithbiorepositoryandclinicalinformation• Strongdatasciencetechnicalinfrastructureandexpertisetofacilitatetheaward
offederal,foundationandindustrialpartnershipsandgrantdollars• Enhanceabilitytoattractstronginformatics,basicscience,andclinical
researcherstoRPCIAresearchITinfrastructureistheunderpinningofahighlyfunctionaldatasciencecapabilityandcriticalformeetingtheaboveaims.RPCIfacesthechallengeofmanagingincreasingdatavolumeandcomplexity.Asignificantcapitalinvestmentindatacaptureandvarioustechnologiestomanagethatinformationhastheresultedintheemergenceofdatasilosandisolatedsystems.FailingtomakethenecessaryinvestmentinastrongdatasciencesolutioninthenextfewyearswillleavetheRPCIincreasinglychallengedtohandledataproductivelyandprofitablyandmayprecludeitfromreachingitsgoals.B.Developaninformatics-capableinfrastructureandcommunityDevelopinghighleveldatasciencecapabilityrequires,atitscore,anenablingITandsoftwareinfrastructurethatisfinanciallysustainable,scalable,platformagnostic,dataformatagnostic,minimallydisruptivetoexistingworkflows,andinclusiveoftheinstitution’ssignificantinvestmentintechnology(meaningitmustnotrequirethediscardorreplacementofexistingsoftwaresystemsthatsupportestablishedworkflows).Italsorequiresaninstitutionalculturethatempowersresearcherswithdatascienceeducation,enablingpolicies,cross-departmentalsupportforrapidandreasonableproblemresolution,andresearch-appropriaterisktolerance.WearerecommendingbothITreorganizationandcultureshift.
v4.20170309
6
IV.StrategicInitiatives:ActionableStepsThisTaskForcerecommendsthesupportofseveralinitiativestodrivethisstrategy.A.ReorganizeRPCIITtoincludeadistinct‘research’component.FeedbackfromfacultyandstaffconsistentlypointstoanITinfrastructurethatcripplesresearchactivitieswithoverlyrisk-aversepolicies,dearthofchannelsfortimelyresolutionofresearchITissues,inadequatestoragespace,inflexibilitywithresearch-specificsoftwareinstallationonmachines,andinadequateinfrastructure(e.g.workstationsanddatasharing-enablinginfrastructure).Additionally,wenotedinadequateonsiteresearchITsupport.Existingresourcesmaybeleveragedtodevelopa‘research’ITnetwork,howeverthisnetworkmustbeanITarmwithfullleadership,decision-makingauthorityandsupport.Supportmustincludebothhelpdeskstaffwhoarewelleducatedinresolvingresearchissuesandeffectiveproceduresforresolvingissues.Althoughthe‘research’ITnetworkwillnothandleclinicaldataexchangedinthedeliveryofhealthcare,clinicalandbillingdatamaymoveontothisnetworkinthecontextofresearch,thereforeprivacyand/orde-identificationprocedureswillbecarefullyaddressed,andclosecollaborationwithRPCIITiscritical.Importantly,asegregatednetworkwouldallowinvestigatorstohaveadministrativerightstotheirownpersonalcomputers,enablingthemtouploadsoftwarepackagesnecessaryfortheirresearch.Makingfulluseofa‘research’ITnetworkrequiresresearchersandITstafftobecapableofusingscientificcomputingresources.Tothisendwerecommendthedevelopmentofanambitiouseducationalprogramfocusingoninformaticsuseinmanagement,analysisandcomprehensionofdiversebiomedicaldatae.g.‘omicsandotherhighthroughputdata.Theprogramwouldincludetechnicaltrainingonuseofcomputingresources,shortinformaticscourses,andinvestigatoreducationonopportunitiesforintroducinginformaticsintotheirresearchprogramandfundingefforts.B.Constructaworld-classdatasciencetechnicalinfrastructureRPCIneedsaneffectivedatasciencetechnicalinfrastructurewherescientificdata,clinicalandbillingdata,study-specificdataandbiospecimenmetadatamaybecaptured,assessedandsharedataninstitutionallevel.Tocreateatransparent,grid-typearchitecture,werecommendfocusingoninitiallyaddressingfivemajorinformaticsgoals:
1. DefineaninformationmodelfordescribingtheRPCIresearchspace(thefocusoftheDSCO)andpromotingthedatastandardsacrossRPCI
2. DefineaninformationmodelfordescribingtheRPCIclinicalspace(maybeacquiredfromordevelopedincollaborationwiththeclinicalinformaticsteam)
v4.20170309
7
3. Enableallcomponentsofthetechnicalinfrastructuretobedistributed4. Provideuserfriendlysoftwareinterfacesforcapture,discoveryandauthorized
accessofdataresourcesacrossRPCI(interfacesthatarecomplexmaydetractfrombestuseofinformation)
5. ProvideasecuretransferanddistributioninfrastructuretomeetUnitedStatesFederalandRPCIregulationsfordatasharingofhealthinformation
6. ProvideanintegratedportalenvironmentforaccesstothedistributedRPCIdataandresources
TheRPCIdatasciencearchitecture,intendedtoleverageinformaticstechnologytoenableresearchanddevelopmentofnewapproachestoscientificdiscovery,willtakeaverypragmaticapproachbydeconstructingtheprocessofcancerresearchintoasetoffunctionsandprovidingalayeredsystemwithapplicationsconstructedontopoftheinfrastructuretoenableRPCIresearcheffortstobeanintegratednetwork.Theapplicationsrepresentcriticalfunctionsthatareperformedbytheresearchcommunity.Furthermore,byintegratingexistinganddevelopedapplicationsintoanenterprisesystem,RPCIwillprovidethecapabilityformanagingtheinformationassetsataninstitute-widelevel.Thearchitecturethereforeisdecomposedintoasetofprojectsthatmakeuptheinformaticsportfolio.Theprojectsmaybeimplementedacrosstheinstitutionbyadiverse,collaborative“informaticsteam”frommultiplegroupsacrossRPCI.Theprojectsinclude:1.DevelopmentofCommonDataElements(CDEs)toexplicitlycaptureandmanagedataattributesinaconsistentmanner2.Developmentofacancerontology,organizingtheCDEsintoasetofobjectsandrelationshipsthatrepresentstheinformationspaceofRPCIcancerresearchthatwillleverageandalignascloselyaspossiblewithnationalandindustrystandards3.DocumentationoftheEMRdatamodeltofacilitateuseofEMRdata4.Developmentofanoverarchingintegratedsystemforaccessingandsharinginformationincludingbiospecimen,epidemiological,biomarker,clinical,pathologic,ontologic,outcome,billingandstudyinformation(e.g.‘omicsdata)fromcancerresearch5.Developmentofaninformationsystemresourceforcaptureandmanagementofcomplexbiomedicalresearchdata.Thisincludesbuildingateamwithexpertiseinsystemsarchitecture,databasetheory,relationaldatabaseandwebtechnology,andprocessengineeringtocollaboratewithinvestigatorswhoseprojectsaretoocomplexforasimpletoollikeREDCap.Thisteamwilltriageprojectsforappropriateinformaticssupportandparticipateingrantwritingtoensureappropriateresourcingofnew
v4.20170309
8
researchandstrongtechnicalgrantsmanship.RPCI’sClinicalDataNetworkiswellpositionedtogrowtofillthisneed.6.Developmentofaninfrastructureforcapturingandwarehousingresultswhencentralizeddatastorageisbeneficial.TheseresultsmayincludetherawandprocesseddatafromRPCIresearchstudies.Theinfrastructuremayprovideacommonsoftwarecomponent,“CatalogandArchiveService,”thatcanbeconfiguredtocaptureinformationacrossverydifferentstudies.7.DevelopmentofcapabilitiesintextanddataminingandNaturalLanguageProcessing(NLP)byRPCIfacultyandexternalcollaborators,implementationofwhichtobefacilitatedbythedatasciencearchitecture.8.DevelopmentofawebenabledRPCIDataSciencePortalforsharingresourcesandresultswiththeresearchcommunity(withinandoutsideoftheRPCI).WerecommendspecificationanddevelopmentofanApacheOODT-basedresearchandanalyticsplatformtoenable1)institute-widedatacapture,organization,sharing,integration,archive,anddissemination;2)integrationofexistingdatarepositoriesintoacenter-widecapability;and3)collaborationamongresearchersandresearchgroups.Thisprovenoverarchingopensourceinfrastructureallowsbothintegrationofexistingprogramsanddataresourcesandincorporationofnewtechnologiesanddatarepositoriesastheyaredevelopedoracquired.Thisdatasciencecapabilitybringsinformationtogetherandstagesitforanalyticandvisualizationoperations;itistheintegrating“middlepiece”thatmakespossibletheuseoftheincreasingnumberofemerginganalyticalanddataclassificationtools.ApacheOODTisanApacheSoftwareFoundationtop-levelprojectthathasastrongcommunityandfeatureshighqualityinteroperabletools.Toavoid“reinventingthewheel,”wewillexplorethesuccessfuldatascienceenvironmentsatseveralhigh-functioningacademicinstitutions,e.g.FredHutchinsonCancerCenter,MemorialSloanKetteringCancerCenter,DanaFarberCancerInstitute/Harvard,andVanderbilt,tomodelthepositivesandlearnfromthenegatives.C.EstablishDataScienceWorkingGroupstodevelopandexecutedetailedplansWerecommendtheestablishmentofthefourworkinggroupsdescribedinthetablebelow.ThegoaloftheseWorkingGroupsistosupportthestrategicplanwithdevelopmentofrecommendationsandspecificactionitemsinthefollowingcriticalareas:
v4.20170309
9
WorkingGroup
Objectives Membership
DataScienceResearchNetwork&HighPerformanceComputing(DSRN)
1.Documentrequirementsforadistinct‘research’ITdivisionofRPCIIT2.Estimateone-timeandongoingcoststobeincurred3.Developimplementationplan4.Developelementsofeducationalprogram(communicatepossibilitiesandaccess);engageRPCIeducationresources-5.DefineRPCIneedsandsystemrequirementsforhighperformancecomputing(HPC)6.IdentifyandcoordinateresourcesandeffortsforHPCprogramacrossRPCI7.Directthedevelopmentofuser-friendlyaccesstohighperformancecomputing,includingthesupportneededtomakeHPCaccessibletoresearchers
MartinMorgan(Lead),ChrisDarlak,SongLiu,AlanHutson,KevinEng,SandraGollnick,ScottGould,JianminWang
DataSciencePolicies&RiskManagement(DSPRM)
1.Amendordevelopenabling,sustainableandenforceabledatasciencepolicies2.Engineerprocessesforapproving,enforcingandupdatingpolicies3.Developpipelineforresolutionofresearchinformaticsproblems-4.ConsultwithDataScienceWorkingGroupstoassessriskofnewlydevelopingprocessesandtechnology5.Facilitatedatascienceinitiativesbyresearchingguidelinesandrequirements,consultingwithappropriateRPCIresourcesandauthorities,andmeetingsecurityandethicalrequirements
CamilleWicher(Lead),AlanHutson,EverettWeiss,ChrisDarlak,AmiColeman,LaurieMusial
DataScienceCancerOntology(DSCO)
1.Developaninstitution-widestandardintheformofontologyfordataandmetadatatofacilitatedataconsistency,sharingandintegration2.DevelopCommonDataElements(CDEs)fromtheontology3.Establishatimelineandphasesforontologydevelopment4.PromotethedatastandardsacrossRPCI
WilliamDuncan(Lead),EverettWeiss,JoshKillion,ChrisDarlak
DataScienceBiospecimen(DSB)
1.Addressdevelopmentofmetadatastandards(aspartofcancerontology)2.Explorecentralizationofstandardsandbankingandproposerecommendations,incollaborationwiththe5specimenbanks-DataBankandBiorepositorySharedResource(DBBR),PRN,Lymphoma,LeukemiaandOvary3.MakerecommendationstoRPCIleadershipandstakeholders
KirstenMoysich(Lead),SandraGollnick,BarbaraFoster,BrahmSegal
v4.20170309
10
D.ReorganizationSuccessfulimplementationofthisstrategywillrequirereorganizationinRPCIInformationTechnologyaswellastheCDNandothertechnicalteams.ResearchITandDataSciencewilldovetailwiththeresearchnetworkandresearchdatabasesystembuildingshiftingtotheresearchdatascienceteams.StrongcollaborationbetweenITandDataScienceiscriticaltosuccess.ThisTaskForcewilldevelopaproposalforanewadministrativestructurethatwilladvanceproposeddatascienceinitiatives;theproposalwillbepresentedtoseniorleadershipforconsideration.E.OversightandGuidanceThedevelopmentofadatascienceprogramwillbestrengthenedbytheguidanceofaScientificandTechnicalAdvisoryBoard(STAB)comprisedofcancerareadomainexpertsaswellasethics,datagovernanceandclinicalleadership.TheestablishmentoftheSTABbothaddsanoutsidevoicetothedatascienceprioritiesandbroadenstheoutreachoftheimplementeddatasciencecapabilities.ThespecificgoalsoftheSTABincludeguidingtheTaskForceonnewandemergingscientificandclinicalneeds,institutionalchangesimpacting(orimpactedby)datascience,andexistenceordevelopmentofothersynergisticprogramsbothwithinandexternaltoRPCI.Importantly,theSTABwillhelptopreventinstancesofparallelsiloedactivities.TheTaskForcewilllooktotheSTABforsupportresolvingissuesandovercomingroadblocks.InYear1theSTABwillmeetwiththeTaskForcebi-annually;themeetingscheduleafterYear1willbedeterminedasinitialprogressisassessed.TheDataScienceTaskForcewillinvitethesefacultyandstafftotheSTAB:ChristineAmbrosoneStephenEdgeAndreiGudkovKerryKerlinJamesMohlerAdekunleOdunsiThefigurebelowdescribesthelinesofaccountabilityandguidanceamongthegroupsspecifyingandimplementingthisdatasciencestrategy.Dottedlinesindicatekeycollaborations.
v4.20170309
11
V.BusinessTheproposedorganizationalchanges,informationalmodeling,andarchitectureandsoftwaredevelopmentwillrequirecapitalinvestment.TheTaskForcewillcontinuetomeettoidentifycosts(informedbytheWorkingGroups),opportunitiesforrevenue,andaprocessforassessingresourcesrequiredtosupportandintegratenewresearchprojects,technologyanddataresources.Someoutcomeofthiseffortwillbeenhancementofourabilitytocompeteforlargergrantsandpotentiallymorecommercializationopportunities.PilotstudiesforresearchplatformAninitialphasewilldevelopthedatascienceprogramandarchitecturetosupporttworesearchprojectsthatwillbestrengthenedbythecapabilitiesproposedinthisstrategy.Leveragingfundsfromtheseprojects,inconjunctionwithRPCIresources,willenablerapidorganizationandimplementationoffirst-phasedatasciencecapabilities.Earlysuccesswilldemonstratetheneedfor,andbenefitof,improveddatascience.PhotodynamicTherapy(PDT)RegistryPrincipalInvestigatorsDrs.SandraGollnickandMaryReidAnexistingpartnershipbetweenRPCIandConcordiaLaboratoriesInc.willbeexpandedbythefundingofanewcomprehensiveregistry.ThenewregistrywillcollectdataonpatientswhohavereceivedPDTforlungcanceroresophagealcancer,withexpansiontoheadandneckcancer,mesothelioma,pancreaticcarcinoma,braincancerandcholangiocarcinoma.Thismulti-centerregistrycanbebuiltwithinthenewdatascienceenvironment(networkandarchitecture),strengthenedbytheopportunityforbothcentralarchiveanddynamicintegrationofdatastoredatdistributedsources,an
v4.20170309
12
interactiveportalfeaturingbothpublicandprotectedspaces,andcleardataandmetadatastandards.Thearchitecturewillallowforintegrationoftechnologiesalreadydeveloped,e.g.alegacysystemthroughwhichphysicianssharecaseinformation.OvarianSPOREandP50SupplementPrincipalInvestigatorDr.AdekunleOdunsiUniversityofPittsburgCancerCenter(UPCI)andRPCIhavereceivedtheNCI’sSpecializedProgramofResearchExcellence(SPORE)forOvarianCancer.ThefocusoftheSPOREistoreducethemorbidityandmortalityofovariancancerthroughinnovativetranslationalresearchanddevelopmentofnewimmunotherapyapproachesfortreatmentof,andriskassessmentfor,ovariancancer.AcomponentofaP50supplementtotheOvarianSPOREaddressestheestablishmentofatechnicalinfrastructuretofacilitatecatalog,archiveandintegrationoftrialandotherdataacrosstheSPORE,allbasedonthedevelopmentofaninformationstandard.Thedatascienceneedsofthissupplementarefulfilledwiththearchitectureproposedinthisstrategicplan.Fundsfromthisprojectmaybecontributedfortheinitialphaseofdatascienceinfrastructureimplementation.V.TimelineFeedbackfromresearchfacultyexpressingstrongneedfortheinfrastructure,technologyandservicesproposedinthisstrategyplanmotivatesinitiationofactivitiesassoonaspossible.Administrative/leadershipapprovalofthisStrategicPlanwilltriggerthedevelopmentofaformal3–5yearprojectplanoutlinedinthefollowinghigh-leveltimeline.Group Milestone Q1 Q2 Q3 Q4DataScienceWorkingGroups
Establishmembershipandinitialmeeting
X
Developgoals,projectplan,anticipatedchallenges,timeline,budgetaryneeds
X X
ReviewplanswithDataScienceTaskForce
X
Confirmrequiredresources X X Implementprojectplan X X X
DataScienceTechnicalArchitecture(ApacheOODT)
Developobjectivestatement,projectplan,anticipatedchallenges,timeline,budgetaryneeds
X
Confirmrequiredresources X X Implementprojectplan X X X
v4.20170309
13
Review/Interviewdatascienceatotherinstitutions
Makecontactandorganizevideo/teleconferences
X
Gatherinformation X X DataScienceTaskForce
Meet,discussplansandneedswithWorkingGroups
X X X X
Developfullbudget X X MeetwithSTAB X X X X
ScientificandTechnicalAdvisoryBoard(STAB)
Invitemembersandorganizequarterlymeetings
X
Meet,discussprogressandchallenges;STABprovidesfeedbackandguidance
X X X X
v4.20170309
14
Thisdatasciencestrategyrecommendsa“bestof”approachthatleveragesthestrongexpertise,technologyandmethodsalreadyinplaceatRPCIasthebasisforaworld-classinstitution-widedatasciencecapability.Implementinganindependent‘research’ITdivision,developingaeducationalprogramsandbuildingascalableoverarchingarchitecturetosecurelytieexistingtechnology,resourcesanddatarepositoriestogetherallowsforgrowthandtheadditionofdatatypesandtechnologiesyettobedeveloped.Wewillenablemoretranslationalandcollaborativescienceandprovideaninformaticsenvironmentrequiredforattractingtop-notchfacultytoRPCI.Ourgoalis,throughbuildingavibrant,capableresearchcomputingcommunity,tobecomeaworldleaderindatascienceforcancerresearch.DataScienceTaskForceParticipantsKristenAntonChrisDarlakSandyGollnickAlanHutsonSongLiuJamesMohlerMartinMorganKirstenMoysichLaurieMusialAnuragSinghEverettWeissCamilleWicher
v4.20170309
15
AppendixA:PolicyEvaluationThetablebelowcontainshighlightsfromaexistingsummaryofRPCIinstitutionalpoliciesthatintersectwithdatascienceandresearchandnotessomemissingpolicies.Thetaskforcesubcommitteereviewedpoliciesinthe900InformationTechnologyseries,1100GeneralScientificResearchseries,and1200GeneralEducationStaffseriesandextractedthetenpoliciesaboveasexamplesofpoliciesthatsignificantlyimpactresearch.Alargevolumeofpoliciesaredifficulttounderstand,updateandenforce.TheTaskForceproposestheestablishmentofaDataSciencePoliciesandRiskManagementWorkingGroup(DSPRM)toamendordevelopenablingdatascienceandresearchpolicies.Thesepolicieswillbereasonableinnumber,accessibleandenforceable.Effectivechangemanagementanddisseminationmethodswillbedeveloped.DSPRMwillconferwithRPCIIT,datagovernance,ethicsandadministrationinthecourseofpolicydevelopment.RPCIPolicy# Summaryofpolicyfocusandcomments905.1 TheNOPHI/subjectlinerequirementforemailusageintermsof
sendingattachmentsisnotanofficialpolicy907.1 Officially,RPCIhasthreenetworks:WorkforceNetwork(primary
networkusedbyanyoneperformingworkinsupportofthemissionofRPCI);Patient/Visitor/VendorNetwork;andDemilitarizedZoneNetwork(forITapprovedexternalaccesstospecialRPCIserversandservices)
921.1 IThaschosentofollowtheindustrybestpracticesconstructcalled“ThePrincipleofLeastPrivileges”(restrictsaresearcher’sadministrativerightstohis/hermachines)
925.1 AlldatabasescreatedforRPCIorHRIthatcontainPHIand/orPIIdatamustberegisteredwiththeITDepartment
937.1 TheinformationsecurityorganizationfollowstheHiTrustInformationSecurityFrameworkforitsgovernance,policies,procedures,andcontrols.ITSecurityshallhaveadistinctbudgetline.Budgetplanningandexpendituresshallbeapprovedviathecommitteesnotedpreviouslyandbebasedonrisk.
1107.1 ClinicalDataNetworkdirectorapprovestheHonestBrokerapplications(TherearemanyfacultyandstaffatRPCIservingasdefactohonestbrokerswhoarenotontheofficiallydesignatedlist)
1208.1 Theretentionofdatapolicy:maybeanissueintermsofstoragerequirements
NopoliciesexistpertainingtoourrelationshipwiththeUniversityatBuffalointermsofCCRand/orutilizationoftheUBinternetatRPCI,e.g.downstreamaffiliationagreements,etc.
v4.20170309
16
NopoliciesexistrelativetoLINUX-basedissues Documentsthataredeemedsearchableoni2andthataresupposed
tobeupdatedyearlyasnotedinthepolicystatementssuchasthe“ITSecurityRiskManagementPlan”arenotcurrentlyfoundonline
v4.20170309
17
AppendixB.ResearchComputingEvaluationStrengthsWeaknessesOpportunitiesThreatsAnalysisof:RPCIResearchComputingInfrastructureStrengths:In-HouseComputingCluster–RPCIfullyownsa1600CoreclusterHighPerformanceComputingsystemcomprisedof97computenodesand2headnodes,100TBoffastproductionstorageand114TBofslowerstorage.Thecluster,purchasedin2012withfundingfromwesternNewYorkregionaleconomicdevelopmentalcouncil,hasbeenusedprimarilytosupportanumberofsequencingprojectsatCenterforPersonalizedMedicineandGenomicsSharedResource.ITTechnologyTalent–DuringthepastseveralyearsIThasacquiredasmallpoolITtalentcapableofsupportingthistypeofresearchcomputingenvironment.Weaknesses:Desktop&InfrastructureSupport–Desktop&ITInfrastructuresupportofresearchcomputingenvironmentiscurrentlyprovidedviathecorporateITServiceDesk&Infrastructureteamswhichalsosupportsbusinessandclinicaloperations(mainlyWindows).Overallthissupportstructuredoesnotallowforadeeplevelsupportwithinaresearchcomputingenvironment(ex:MacandLinux),astheServiceDeskandInfrastructureteamsarespreadacrosstheentireenterprisewithminimalopportunitytospecializeinsomeoftheuniquenessoftheresearchcomputingenvironmentatRPCI.ComputingClusterUtilization–InvestigatorsatRPCIarenotutilizingtheclustertoitsfullcomputationalpotentialintheirresearchefforts,mainlydueto1)thelackoftraining/educationonhowtoruntheiranalysiswithinthisenvironment,2)thelimitationofstoragespaceintheeraofbigdata(currentlytheclusterreaches~90%storagequotaevery3months),3)aswellasthecurrentdesktoplimitationsofbeingprimarilya“Windows”shopatRPCI.HIPAACoveredEntity-BecausealloftheRPCIoperationisconsidereda“coveredentity”underHIPAA(includingresearchcomputing),theresearchcomputingenvironmentmustcomplywithHIPAArequirementstoprotecttheprivacyandsecurityofhealthinformation,thusmakingtheITenvironmentlessflexibletoworkundercomparedtootheracademicmedicalenvironmentswhohaveseparatelegalentitiessuchasUPMC(MedicalCentervs.University)&FredHutch(ResearchCentervs.SeattleCareCancerAlliance).Opportunities:UpgradingRPCIBioinformaticsCapabilities–Manyofthe‘wetlab’componentsoftheRPCIresearchoperationarehighlyproductiveandcreativeintheirendeavors.ComplementingthoseeffortswitheasiertouseBioinformaticsresources&
v4.20170309
18
technologywouldupgradetheresearcheffortsatRPCIoverall,allowingresearcherstofocusontheirresearch,andnotthemechanicsof‘howto’storeandanalyzetheirresearchdatawithintheRPCIenvironment.Anupgradedbioinformaticsenvironmentwouldincludeenhancedtraining,tools,technologyandfocusedITsupport,additionallymakingtheclusterpublicallyavailabletothewholeRPCIresearchcommunity.InvestigatorTrust–ManyNCIfundedgrantshaveasmallhumansubjectcomponent,howeverRPCIlacksthetrustthatmanyofourbasicscientistsindeeddonotworkwithprotectedhealthinformation(PHI)inanycapacity,andrequiresthemtoworkinthesameITenvironmentastheclinicalworkforcehavingappropriateITsecuritymeasurestoprotectregulateddata.Byrecognizingthatasignificantamountofourscientificcommunitydoesnothandleregulateddatawithintheirresearchefforts,RPCIcouldtrustscientificworkforcememberstoperformtheirresearchinasignificantlymoreflexible(lesssecure)ITenvironment,similartotheirresearchcollegesatmanyotheracademicinstitutions.UBCOE-TheNYSCenterofExcellenceinBioinformaticsandLifeSciences(CBLS)isahubforlifesciencesinnovationandtechnology-basedeconomicdevelopmentdrivingscientificdiscovery,facilitatingcollaborationamongacademia,industryandthepublicsectortocreatejobsthatdirectlyimpacttheregion’sandstate’seconomies.GiventheproximityoftheUBCOE&RPCI,anenhanced(moreformal)relationshipbetweenthetwoentitiesmaybeadvantageousforRoswellParkCancerInstitute.Threats:UseofBigData-Theabilitytoanalyze“BigData”isacrucialabilitywithintoday’scancerresearchspace.ApanelsetupbyPresidentObama’sMoonshotprogramsuggestscancercurescouldliewithinknownbigdata,statingwemustmakebetteruseofourexistingresearchdata.ThecurrentstateofRPCI’sResearchComputingInfrastructurecouldputtheinstituteatadisadvantageinthisregardcomparedtootherorganizationsbettersuitedtoprocess&analyzevarioustypesofbigdata.
v4.20170309
19
AppendixC.DataRepositoryEvaluationThetablebelowcontainsasampleofthedatarepositoriesinplaceatRPCI.Therepositoriesrangefromsmalltolarge,maybeorganizedindatabaseorspreadsheetformat,mayrunontheRPCIserversoronindividualmachines,andareorganizedbyindividualresearchersorITResearchComputing/CDNstaff.OurevaluationhasallowedusonlytoestimatethenumberandkindsofdatarepositoriesatRPCI.ThenumberofdatabasesbeinggeneratedatRPCIisgrowingsteadily.DatabaseName SupportingSoftwarePhotodynamicTherapy(PDT)Database AccessEHR-ElectronicHealthRecord(SCM) AllscriptsEMREHRAnalyticsDatabase AllscriptsEMRLaboratoryMedicineResulting--Cerner Cerner-OraclePsychologyDatasets ExcelPediatricLongTermFollow-upDatabase ExcelClinicalGeneticsDatabase ExcelMelanomaLymphNodeMetastasisDatabase ExcelNCCNNon-Hodgkin'sLymphomaOutcomesDatabase ExcelandCDdiscsNCCNBreastCancerOutcomesDatabase ExcelonCDNsharedriveandCDdiscsNCCNColorectalOutcomesDatabase ExcelonCDNsharedriveandCDdiscsNCCNNon-SmallCellLungCancerOutcomesDatabase ExcelonCDNsharedriveandCDdiscsBladder(Robot-assistedRadicalCystectomy)Database EXPeRT--OracleandwebProstateCancerDatabase EXPeRT--OracleandwebOvarianSPOREP1 EXPeRT--OracleandwebStaceyScottLungCancerRegistry EXPeRT--OracleandwebPancreasDatabase EXPeRT--Oracleandweb/REDCapClinicalTrialsDataManagementSystem EXPeRT--OracleandWebHemOncBiobankDatabase LIMS-OracleOvarianBiobank LIMS-OracleOvarianFamilialCancerRegistry LIMS-OracleDBBRbiobank LIMS-OracleDBBRAnnotationDataSet(DADS) LIMS-OraclePRNBiobankDatabase LIMS-OracleInvisionRegistrationandBillingSystem MainframesystemCancerRegistry MetriqGammaKnifeDatabase MSAccessTIES mySQLAnatomicPathologySystem(PowerPath) OracleRadiationMedicineClinicalCareDatabase REDCapHeadandNeckCancerDatabase REDCapBreastScreeningandHighRiskManagementDatabase REDCap
v4.20170309
20
RenalTumor(Kidney)Database REDCapCenterforImmunotherapy-ClinicalDatabase(ovariandatabase) REDCapAutofluorescenceScreening(HighRiskOral)Database REDCapTobaccoCessationDatabase REDCapMelanomaDatabase REDCapNewYorkStateSepsis REDCapOvarianSPOREP4 REDCapGUMedicine:Kidney REDCapGUMedicine:Urothelial REDCapGU:Medicine:Prostate REDCapH&NMelanoma REDCapLiverSurgical REDCapEsophagealDatabase REDCapRenalTumorPatientsonActiveSurveillance REDCapSpineSurgical REDCapNYSTEM-multipledatabases REDCapPediatricDatabase REDCapQualityofLifeSurveys REDCapSurvivorship REDCapHIPEC REDCapFinancialbillingdatabase-Pinpoint SQLHemOncClinicalDatabase SQLandAccessBMTDatabase SQLback-AccessfrontendBreastProgramDatabase SQLback-AccessfrontendGeneralThoracicSurgeryDatabase SQLbackwebfront