41
We are always looking for data Finding and accessing human genomic data for research Cambridge, 22 nd August 2016 Slides will be made available online Tweets welcome #CamFindData

Workshop - finding and accessing data - Cambridge August 22 2016

Embed Size (px)

Citation preview

Page 1: Workshop - finding and accessing data - Cambridge August 22 2016

We are always looking for data

Finding and accessing human genomic data for

research

Cambridge, 22nd August 2016

Slides will be made available online

Tweets welcome #CamFindData

Page 2: Workshop - finding and accessing data - Cambridge August 22 2016

Outline of the day

- Data sources and data access (Charlotte)- Case study: University of Cambridge- Coffee break- Introduction to Repositive (Fiona)- Hands-on session: searching for data- Round up and closure

Page 3: Workshop - finding and accessing data - Cambridge August 22 2016

On-line tools used during the workshop

To ask questions during the presentation and answer questions:

go to slido.com

enter event code: 1641

To leave feedback on the workshop:

http://tinyurl.com/feedback220816

Page 4: Workshop - finding and accessing data - Cambridge August 22 2016

We are on twitter: @glyn_dk

@repositiveio@DNAdigest

@CamOpenData

Cambridge, 22nd August 2016

Slides will be made available online

Tweets welcome #CamFindData

Page 5: Workshop - finding and accessing data - Cambridge August 22 2016

1. What data are you looking for?

Join at slido.com with the event code #1641

This workshop will focus on finding and accessing human genomic data.

… why would you be looking for genomic data for your research?

Page 6: Workshop - finding and accessing data - Cambridge August 22 2016

How much data do you need to publish a paper?

2001: 1 human genome

2012: 1000 Genomes (1092 genomes, since increased to ~2500)

2015: UK10K & deCODE (>100k induviduals) Cancer Genome Atlas ~11,000 genomesExAC consortium 65,000 exomes

?

Page 7: Workshop - finding and accessing data - Cambridge August 22 2016

Case studies

Raquel,PhDStudent,London,UK.

Researchinggenesassociatedwithrareeyedisorders.

Problems:- Doesn’tknowwheretolook

fordata.- Doesn'tknowifdataeven

exists.

“I gave up on finding the data - it was very time consuming and not proving fruitful – so I started focusing more on generating my own data.”

Mahantesh,AcademicResearcher,Taipei,Taiwan.

Studyingpharmacogenomicsincardiovascularepidemiology.

Problems:- Needslotsofdata.- Knowsitexistsbutstruggles

withgettingaccesstoit.

“Often it’s very hard to get the required number of cases and controls to carry out research in public health and epidemiology.”

Jana,CompanyBiocurator,Zurich,Switzerland.

BiocuratingmicroarrayandRNA-Seqdata.

Problems:- Needslotsofdata.- Lotsofdataouttherebut

hardtofilterdownto‘useful/relevant’data.

“Many repositories don’t list the metadata details I need to know if a dataset is useful to me, I can waste a lot of time searching.”

Page 8: Workshop - finding and accessing data - Cambridge August 22 2016

What can I do?

PRO TIPS:

Involve a statistician early on in your study design!

Include more reference data in your analysis

Search for collaborators who have the data you need

Tell your colleagues and peers what type of data you have in your lab

Use external sources of data….

Page 9: Workshop - finding and accessing data - Cambridge August 22 2016

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Large amounts of data, but not accessible

≈.5 PB Sequenceavailable

80+ PB

Sequencedeveryyear

WGS data available in public repos

Exponential growth rate

Under-utilised datahashuge potentialfor

medicalresearch

Page 10: Workshop - finding and accessing data - Cambridge August 22 2016

2. Data resources from around the world

Public repositories

• some you apply for access, especially if data contains clinical info or whole genome PID

• some are open access: GEO, SRA, PGP, OpenSNP, GigaDB, …

• some are consented for general research use, some have specific consent

Page 11: Workshop - finding and accessing data - Cambridge August 22 2016

How many data sources?

How many sources of human genomics data do you know

about?

Page 12: Workshop - finding and accessing data - Cambridge August 22 2016

Hundreds of data sources…buttheyaren’teasytofind!

http://dx.doi.org/10.1371/journal.pbio.1002418 First 30 data sources listed here:

Jan-15 Mar-15 Jun-15 Sep-15 Dec-15 Mar-16 Jun-160

50

100

150

200

250

300

1025 33 35

102

174

239

Page 13: Workshop - finding and accessing data - Cambridge August 22 2016

DATA is fragmented

Page 14: Workshop - finding and accessing data - Cambridge August 22 2016

Data sources across the globeGEOlocationof278datasourcesanalysed.

Found by tracking IP address of the source.

Theseinclude:

PublicRepositories

Universities

Companies

BioBanks

Researchconsortiums

Page 15: Workshop - finding and accessing data - Cambridge August 22 2016

It may be confusing

Page 16: Workshop - finding and accessing data - Cambridge August 22 2016

Data source content

Assay Types

Dedicated to…

Page 17: Workshop - finding and accessing data - Cambridge August 22 2016

More information about data sources

… in our recent paper:

http://tinyurl.com/plos-biology-repositive

Page 18: Workshop - finding and accessing data - Cambridge August 22 2016

3. Getting access to Restricted data

Benefits:• Strictgovernance• Individualsareprotected• Reviewofconsent• Applicantsignsforfull

responsibilityforgovernance

Disadvantages:• Nocontrolofdataonceaccess

isgiven• Highbarrierforaccess–too

high?

Page 19: Workshop - finding and accessing data - Cambridge August 22 2016

Data accessibility

Candownloadthedatastraightawayorafterloggingin.

Needtoapplyforaccesstothedata.

HasbothOpenandRestrictedaccessdatawithinone

repository.

Access type of 225 sampled data sources.

Page 20: Workshop - finding and accessing data - Cambridge August 22 2016

Often a long process

Bottlenecks: • Finding relevant and usable

data• Getting authorisation to

access data• Formatting data• Storing and moving data

We studied the problem with qualitative interviews followed by a survey of researchers in

human genetics

T. A. van Schaik et alThe need to redefine genomic data sharing: a focus on data accessibility, Applied & Translational Genomics, 2014 10.1016/j.atg.2014.09.013

Page 21: Workshop - finding and accessing data - Cambridge August 22 2016

Often a long process

Researchers spend months trying find and access genomic data, and often choose to not access data at all

Page 22: Workshop - finding and accessing data - Cambridge August 22 2016

NIH / eRA Commons login

No

Yes

Organisation registered with eRA

Organisation has DUNS number

No

NoWrite research proposal

Yes+ 2-3 days

+ 1-2 weeks

+ 1 week

Yes

Submit proposal

+ 1-2 days

Access grantedFind/Download/Decrypt data

+ 1-4 weeks

Science…

+ 1-2 days

PRO Tip: If you use human genomic data, apply for the GRU datasets in dbGaP, one application – access to all the GRU datasets.

dbGaP application process

Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-dbgap/

Page 23: Workshop - finding and accessing data - Cambridge August 22 2016

Sanger eDAM Account

No

Write research proposal

+ 1 hourYes

Submit proposal

+ 1-2 days

Access grantedFind/Download/Decrypt data

+ 2-7 days

Science…

+ 1-2 days

EGA application process

Blog Post:http://blog.repositive.io/how-to-successfully-apply-for-access-to-ega/

Page 24: Workshop - finding and accessing data - Cambridge August 22 2016

• PostdoctoralresearcheratUniversityofCambridgeMedicalSchool

• WorkingongeneticinheritanceandCancer• UsingNGSdataandbioinformatics

• Aftersearchingfordataonlineshedecidedtoapplyfor:• 2dbGaPdatasets• 3EGAdatasets

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 25: Workshop - finding and accessing data - Cambridge August 22 2016

The Research Operations Office -willhelpyouwiththecontracts(DTAs)andsignatures.

• HasadesignatedindividualwhoprocessesalldbGaPapplicationsastheyallabidebyNIHlegalrestrictionsandregulationsabouthowtohandlethedataoncegrantedaccess.

• ForEGAapplications,eachDTAmustgetprocessedseparatelybecausethereisnoconsensusforthe‘contracts’betweeneachdataset.

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 26: Workshop - finding and accessing data - Cambridge August 22 2016

The nominated IT director -willbespecifictoyourdepartment.

• TheywillneedtoconfirmyoucansupporttherequirementsoftheDTA.

• IftheheadofyourdepartmentalITisnothappytosign–theheadofITfortheUniversitywillbeabletosignitoff.

Cambridge specific Case Study

Blog Post:Pending… will be on http://blog.repositive.io/

Page 27: Workshop - finding and accessing data - Cambridge August 22 2016

Top Tips:Beprepared…

• Thinkaboutyourstoragespace!

• Thinkaboutwhatsortofanalysisandprocessingyouaregoingtodowiththedataonceyoudohaveit.Aftersuchalongprocess,theapprovalcouldbetooquick!!

• Designatetime!

• Understandwhatyouneedbeforeyoustarttheapplicationprocess!

• Youonlyhave1year!

Cambridge specific Case Study

Page 28: Workshop - finding and accessing data - Cambridge August 22 2016

4. Not all data is restricted

Applyingforaccesstorestricteddataisahardandtimeconsumingprocess.

Thinkaboutusingopen access data!

Page 29: Workshop - finding and accessing data - Cambridge August 22 2016

Makethe(research)worldabetterplacebysharinginreturn

Best practices: Share in return!

Page 30: Workshop - finding and accessing data - Cambridge August 22 2016

• Ifyouexpectdatatobeavailabletoyou–youhavetomakeyourdataavailabletoo!

• Encouragecollaborations:powerbynumbers

1. Get credit –publishandmakeyourdataavailable2. Give credit –citedatasources3. Understand consent –forallusesofclinicaldata

Best practices

Page 31: Workshop - finding and accessing data - Cambridge August 22 2016

• Useallavailabletools to make your life easier:• Datapublicationsvisibilityandcitationsforyourdata,e.g.

GigaScienceandScientific Data

• Figshare,Zenodo,Dryadforsharingopenaccessdata

• PhenomeCentral,Matchmaker exchange forrarediseaseresearch

• Repositiveforfindingdataacrossrepositoriesandmakeyourowndatadiscoverable

Best practices: use the tools

Page 32: Workshop - finding and accessing data - Cambridge August 22 2016

• Digital consent:towardsautomaticprocessingofapplications

• Dynamic consent andpowertothepatient,e.g.PatientsKnowBest

• Privacy-preserving access todatasets:preservingcontrolandgovernancewithdatacustodian,lowerbarrierforaccess

What the future holds

Page 33: Workshop - finding and accessing data - Cambridge August 22 2016

Workshop: Findingandaccessinghumangenomicdataforresearch

Fiona Nielsen – August 22nd 2016

Page 34: Workshop - finding and accessing data - Cambridge August 22 2016

We are always looking for data

Genetics, Cancer,

Rare diseaseresearch

Weneedaccesstotherightdataattherighttime

DNAinterpretation

requireslots of data

Page 35: Workshop - finding and accessing data - Cambridge August 22 2016

Data is not easy to find and access

FRAGMENTEDPoor visibility of available

genomic data

ADMIN BURDENHuge overhead to manage

data access

BAD CULTURELack of data sharing habits in

research culture

Page 36: Workshop - finding and accessing data - Cambridge August 22 2016

We are enabling best practices

MAKE DATA DISCOVERABLE

SIMPLIFY WORKFLOWS

CONTRIBUTE TOCOMMUNITY

DNAdigest and Repositive – Connecting the world of genomic datahttp://www.tinyurl.com/plos-biology-repositive

Page 37: Workshop - finding and accessing data - Cambridge August 22 2016

Connecting the world of genomic data

Page 38: Workshop - finding and accessing data - Cambridge August 22 2016

Live demo http://discover.repositive.io

Page 39: Workshop - finding and accessing data - Cambridge August 22 2016

Team 2 minute presentation

1. Introduction What data did you try to find and why?Have you tried to search for this data before?

2. MethodsThe 5 main steps you took on Repositive to try and find this data.

3. ResultsDid you find the data on Repositive?What challenges did you encounter?

4. ConclusionSum up your experience in 1 sentence.

1 2 3 4 5

Page 40: Workshop - finding and accessing data - Cambridge August 22 2016

Tell us your thoughts: @repositiveio

@glyn_dk

And read more on http://repositive.io

Bugs and feedback to: Charlotte at Repositive.io

Page 41: Workshop - finding and accessing data - Cambridge August 22 2016

Thank you!