41
Genomics resources Feeding your inner bioinformatician Associate Professor Mik Black Department of Biochemistry, University of Otago

Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

GGeennoommiiccss rreessoouurrcceessFeeding  your  inner  bioinformatician

Associate Professor Mik BlackDepartment of Biochemistry, University of Otago

Page 2: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Brief  aside:  who  am  I?Background in statistics: my (rather diverse and collaborative) researchinvolves the development and application of statistical methods forproblems in human disease genomics.Heavily involved in the establishment of two government-fundednational infrastructure initiatives in New Zealand:

Formerly the bioinformatics team leader for NZ Genomics Limited(2012), and still a semi-active team member.

·

·

NZGL (New Zealand Genomics Ltd) - inter-university collaborationin genomics and bioinformatics.NeSI (NZ eScience Infrastructure) - cross institutional (universitiesand Crown Research Institutes) collaboration in high performancecomputing and eResearch.

-

-

·

2/41

Page 3: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Some  backstory,  or  "how  did  I  get  this  gig?"This slot used to be about "Genomics Infrastructure"

This year I wanted to focus more on skills development - why thechange?

·Using external providers to generate your sequence dataOptions (and caveats) for outsourcing the bioinformatic and/orstatistical analysis of your data

--

·

We are all "Biological Data Scientists" - genomic data analysis is acore component of modern molecular research.Programming, version control, Open Science, reproducibleresearch: these are core skills and concepts that are relevant forALL researchers.

-

-

3/41

Page 4: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Some  backstory,  or  "how  did  I  get  this  gig?"

BUT:  outsourcing  some  of  the  analytic  workload  isn't  necessarilybad  -­‐  let's  still  consider  that  as  an  option.

This slot used to be about "Genomics Infrastructure"

This year I wanted to focus more on skills development - why thechange?

·Using external providers to generate your sequence dataOptions (and caveats) for outsourcing the bioinformatic and/orstatistical analysis of your data

--

·

We are all "Biological Data Scientists" - genomic data analysis is acore component of modern molecular research.Programming, version control, Open Science, reproducibleresearch: these are core skills and concepts that are relevant forALL researchers.

-

-

4/41

Page 5: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Overview  -­‐  this  talk  will  cover...Bioinformatics service provision: outsourcing the analysis ofgenomic data.Community resources: doing bioinformatics without (quite) being abioinformaticianTraining: growing your computational skill set.

·

·

·

5/41

Page 6: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Overview  -­‐  this  talk  will  cover...

But  no  Game  of  Thrones...

Bioinformatics service provision: outsourcing the analysis ofgenomic data.Community resources: doing bioinformatics without (quite) being abioinformaticianTraining: growing your computational skill set.

·

·

·

6/41

Page 7: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Overview  -­‐  this  talk  will  cover...

But  no  Game  of  Thrones...

First  though:  I'm  a  statistician  -­‐  let's  generate  some  data.

Bioinformatics service provision: outsourcing the analysis ofgenomic data.Community resources: doing bioinformatics without (quite) being abioinformaticianTraining: growing your computational skill set.

·

·

·

7/41

Page 8: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Outsourcing  your  bioinformaticsWhat are you outsourcing?

Make sure a full analysis plan is in place before committing to the work.

·Quality assessment and basic bioinformatics?Generic data analysis?Domain-specific analysis?Tailored analysis for your specific question?

----

·If possible, have the plan inspected by an independent "expert".-

8/41

Page 9: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Outsourcing  your  bioinformatics

DON'T  underestimate  the  value  that  an  expert  team  ofbioinformaticians  can  bring  to  your  project,  but  DO  make  sure  youknow  what  you  will  be  getting  from  them  (and  the  cost...)

What are you outsourcing?

Make sure a full analysis plan is in place before commuting to the work.

·Quality assessment and basic bioinformatics?Generic data analysis?Domain-specific analysis?Tailored analysis for your specific question?

----

·If possible, have the plan inspected by an independent "expert".-

9/41

Page 10: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Outsourcing  your  bioinformaticsQuality assessment: this will usually be provided with the data. Don'tbe afraid to ask for more information (and even more QA).Basic bioinformatics: e.g., quality trimming/filtering and alignment toa reference genome - make sure you are very clear about what youwant (if you know): trim/filter parameters, genome build, organism (!),aligner, parameters...Generic data analysis: e.g., variant calling, differential expression etcTailored analysis for your specific question: this requires the mostspecification (and input from you) and should involve a provider with abackground in this area.

·

·

··

10/41

Page 11: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Outsourcing  your  bioinformatics

ALWAYS  specify  that  you  require  the  code  used  to  perform  theanalysis  -­‐  you  need  to  know  what  was  done  every  step  of  the  way

Quality assessment: this will usually be provided with the data. Don'tbe afraid to ask for more information (and even more QA).Basic bioinformatics: e.g., quality trimming/filtering and alignment toa reference genome - make sure you are very clear about what youwant (if you know): trim/filter parameters, genome build, organism (!),aligner, parameters...Generic data analysis: e.g., variant calling, differential expression etcTailored analysis for your specific question: this requires the mostspecification (and input from you) and should involve a provider with abackground in this area.

·

·

··

11/41

Page 12: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Brief  aside  -­‐  publishingWhen outsourcing genomics and bioinformatics work, discusspublishing expectations up front.

There are some advantages to including genomics and bioinformaticspersonnel on publications:

·

some researchers feel that "fee-for-service" work does notconstitute a "meaningful contribution" to a paper.other researchers treat "service providers" more like collaborators.

-

access to deep expertise (can be particularly helpful at reviewtime).collaborative approach can lead to greater engagement in the workbeing done.

-

-

12/41

Page 13: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Community  resources:  what  is  available?Databases/Browsers (the big players)

Software tools:

Generic data sources:

·NCBI (http://www.ncbi.nlm.nih.gov/)Ensembl (http://www.ensembl.org/)UCSC (https://genome.ucsc.edu/)MANY domain-specific options (check Nat Gen annual DB issue)

----

·GenomeSpace (Galaxy, GenePattern, Cytoscape,...)R/BioconductorThat scary command line thing....

---

·GEO, ArrayExpress, inSilicoDB, SRA, dbGaP, EGA...-

13/41

Page 14: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

GenePattern:  web-­‐based  analysis  platform

http://www.broadinstitute.org/cancer/software/genepattern/

14/41

Page 15: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Galaxy:  web-­‐based  analysis  platform

https://usegalaxy.org/

15/41

Page 16: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

GenomeSpace:  joining  up  the  cool  stuff...

http://www.genomespace.org/

16/41

Page 17: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

GenomeSpace:  joining  up  the  cool  stuff...

http://www.genomespace.org/

17/41

Page 18: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Training  -­‐  "I'm  sure  I  can  do  it  better..."Most researchers aren't looking to outsource the investigativecomponent of their research.

Training: there are MANY opportunities available for up-skilling:

·

Data analysis is a fundamental part of scientific investigation.Many investigators want to "own" the entire dataprocessing/analysis process, others don't.

--

·

Bioplatforms Australia: http://www.bioplatforms.com.auInstitute based (e.g., IMB Winter School...)Software Carpentry: http://software-carpentry.org/NZGL: http://nzgenomics.co.nzOnline courses (what to choose...??)

-----

18/41

Page 19: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Training  -­‐  know  what  you  needBioinformatics is a VERY broad field: what is it that you want to learn?

Specialization is a GOOD thing, if you can afford it

·Early-stage analysis: QA/QC, alignmentMore specialized: variant calling (SNPs, CNV, other SV),assembly, RNA-seq count generation, metagenomics...Further downstream: analysis of "processed" data (clustering,prediction, pathways, network reconstruction, phylogeny...)

--

-

·it's great to be a jack-of-all-trades, but the "master-of-none"trade-off can be a problem.makes sense to invest your time where it will be most effective:learn the skills most relevant to what you are trying to accomplishwith your research.

-

-

19/41

Page 20: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Training  -­‐  "I'm  sure  I  can  do  it  better..."

DO  take  advantage  of  training  opportunities,  but  DON'Toverestimate  what  is  being  provided.

There  is  only  so  much  we  can  teach  in  a  few  days...

Learn  "enough  to  be  dangerous",  and  then  find  a  good"bioinformatics  buddy"  to  keep  you  from  going  astray  -­‐  know  yourlimitations.

20/41

Page 21: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Upskilling:  a  case  study  (my  group)Small research group of relatively junior graduate students andresearch assistants

Common requirements across projects

None of these areas is specific to bioinformatics/genetics/genomics.

·

Mix of computer science, statistics and biology/geneticsbackgrounds.Similar/related research projects, and common needs in terms ofskills development

-

-

·programming skillsversion controlcollaboration toolsreproducible research

----

·

21/41

Page 22: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

What  were/are  our  needs?Unix shell

R

HPC cluster access

·general usage for data manipulationscripting for basic automation

--

·general statistical analysis (esp. linear models)genetics/genomics data analysis techniquesdata visualisationreproducible research

----

·simulations and permutations/resamplingembarrassingly parallel...

--

22/41

Page 23: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

A  non-­‐sustainable  training  model...During the second half of 2014 I prepared training sessions for myresearch group on:

That was exhausting...

·

dplyr/tidyrreproducible researchggplot2ggvisshinyBayesian modelling with JAGSgenomic data visualisationlinear algebra and linear models

--------

·

23/41

Page 24: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

A  better  approachSoftware Carpentry

Regular workshops offered throughout Australia and New Zealand

Data Carpentry (more domain-focused) now also offered.

·Unix ShellR/PythonGitMySQL

----

·Australia:

NZ: NeSI (Aleksandra Pawlik: [email protected])

-Belinda Weaver ([email protected])Damien Irving ([email protected]).

--

24/41

Page 25: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Software  Carpentry"Since 1998, Software Carpentry has been teaching researchers inscience, engineering, medicine, and related disciplines the computingskills they need to get more done in less time and with less pain."

http://software-carpentry.org

Trained instructorsComprehensive lessons

··

25/41

Page 26: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Data  CarpentryIn May 2014, the first "Data Carpentry bootcamp" was taught:

We now use the Data Carpentry material to give our incoming 4th yearBiochemistry and Genetics students a two day "crash course" in dataanalysis with R.

·

"Data Carpentry develops and teaches workshops on thefundamental data skills needed to conduct research."sibling organisation to Software Carpentryhttp://www.datacarpentry.org/

-

--

·

Gives them the basic tools needed for the analytic components oftheir 4th year projects.Prepares them to take a Software Carpentry workshop later in theyear.

-

-

26/41

Page 27: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Software  CarpentrySWC instructor training, Melbourne, Jan 2015

Inaugural Research Bazaar (ResBaz), Melbourne, Feb 2015

·Two group members and myself attended.Became certified SWC instructors.

--

·postgraduate students and early/mid-career researchersSWC training + many other workshops

--

27/41

Page 28: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Saved  by  SYSKAWe now had three trained SWC instructors in our extended researchgroup

Time for SYSKA: Sh*t You Should Know About

·

the students were taking over!the senior students were now able to train others... and so werethe junior students

--

·rotating weekly slotsshort (20-30 minute) presentation (by student) to group onsomething useful or topicalPython Tricks, Vim vs Emacs, NeSI HPC, Shell tricks, RMySQL,LaTeX, dplyr (again)...

--

-

28/41

Page 29: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Expanding:  Mozilla  Study  Groups

https://github.com/mozillascience/studyGroupLessons

Announced in April 2015 by Mozilla Science Lab·skill sharing and idea discoverycommunity supportlots of introductory lessons:

---

29/41

Page 30: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Additional  lessons:  Bioconductor  course  material

http://bioconductor.org/help/course-materials/2016/

30/41

Page 31: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

(Our)  Mozilla  Study  Group  formatOtago-based Mozilla Study group takes the student-led training beyondour immediate research group.

Fortnightly meetings

Lightning SYSKA!

·

·

4 session rotating format:2 weeks of nominated topics: hands-on coding1 week of hacky hour1 week of 5x5 lightning SYSKA

----

·

5 presenters, 5 minutes eachPresent a cool topicUse topics/interest to decide content for future lessons

---

31/41

Page 32: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Other  events:  Research  BazaarDigital skills training for graduate students and early careerresearchers.

Site-specific programme

Fantastic opportunity for upskilling (especially as a group), and meetingother members of the research community.

·

First held at University of Melbourne in February 2015.Sites throughout NZ and Australia (and the Americas) in 2016.Look for "ResBaz Week" in February 2017.

---

·Generally a Software Carpentry core, plus more advanced lessonsfor SWC "graduates"Key note presentationsModules on a broad range of a digital skills and tools.

-

--

·

32/41

Page 33: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

My  group:  where  are  we  now?Senior group members are competent SWC instructors or helpers

Reproducible Research and Open Science concepts/techniques arestarting to be used more frequently.

·Good study group attendanceExtended research group is becoming competent with core digitalresearch tools (Shell, R, Git)We have a solid collection of training materials (both general, anddomain-specific)Presentations are hands-on: major advantage (and a good stepforward)

--

-

-

·

33/41

Page 34: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Reproducible  researchWe are currently (I hope) in the midst of a "reproducibility revolution"

The R computing environment provides a good example of this, butthere are a number of others (e.g., iPython notebooks).

·

increased emphasis on sharing all aspects of our research.strong emphasis on the use (and development) of open sourcetools that build on existing frameworks.move (by many) towards the use of frameworks for ensuring thatwe are doing "reproducible research".

--

-

·

Rstudio (http://rstudio.com) includes R markdown by default.Facilitates the production of high-quality output (HTML, PDF, evenWord!) with embedded analysis and results.

--

34/41

Page 35: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Rstudio  interface

http://rstudio.com

35/41

Page 36: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

R  markdown  output

http://rmarkdown.rstudio.com/

36/41

Page 37: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Tools  for  collaborationGenomeSpace (e.g., Galaxy and GenePattern) provide domain-specifictools for the collaborative sharing of data and analyses.A number of groups combine cloud-based tools in an ad hoc fashion togenerate a collaborative research environment:

·

·

storage provision (e.g., Dropbox , Google Drive , FigShare )code sharing/editing + version control (e.g., Git/Github, Bitbucket)reproducible research (e.g., R markdown, iPython notebooks)shared/collaborative web-based analysis (e.g., RStudio Server,Shiny Server).

- ∗ ∗ ∗

---

37/41

Page 38: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Tools  for  collaboration

 Note  potential  data  security/privacy  issues.

GenomeSpace (e.g., Galaxy and GenePattern) provide domain-specifictools for the collaborative sharing of data and analyses.A number of groups combine cloud-based tools in an ad hoc fashion togenerate a collaborative research environment:

·

·

storage provision (e.g., Dropbox , Google Drive , FigShare )code sharing/editing + version control (e.g., Git/Github, Bitbucket)reproducible research (e.g., R markdown, iPython notebooks)shared/collaborative web-based analysis (e.g., RStudio Server,Shiny Server).

- ∗ ∗ ∗

---

38/41

Page 39: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

Tools  for  collaboration

Although  seemingly  haphazard,  this  approach  provides  a  lot  offlexibility  for  incorporating  new  tools  as  they  emerge.

GenomeSpace (e.g., Galaxy and GenePattern) provide domain-specifictools for the collaborative sharing of data and analyses.A number of groups combine cloud-based tools in an ad hoc fashion togenerate a collaborative research environment:

·

·

storage provision (e.g., Dropbox, Google Drive, FigShare)code sharing/editing + version control (e.g., Git/Github, Bitbucket)reproducible research (e.g., R markdown, iPython notebooks)shared/collaborative web-based analysis (e.g., RStudio Server,Shiny Server).

----

39/41

Page 40: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

SummaryOutsourcing your analysis - know what you are getting:

Shared resources - know what is available:

Personal workflow - know what you are doing:

·Clearly define plans and expectations in terms of the data andanalysis that you are paying for.Ensure you have the resources needed to complete the project.

-

Generic and domain-specific resources exist that can facilitate,streamline and complement your research.

-

·Upskill yourself: interact with your research community.Know your tools, and develop (and follow!) an analysis plan.The "reproducible research" paradigm offers a valuable set ofresources to help ensure reproducibility.

---

40/41

Page 41: Feeding’your’inner’bioinformaticianbioinformatics.org.au/ws/wp-content/uploads/sites/... · Basic bioinformatics: e.g., quality trimming/filtering and alignment to a reference

A  (non-­‐exhaustive)  list  of  useful  local  links:Australia:

New Zealand:

·QFAB: http://qfab.orgAGRF: http://agrf.org.auQCIF: http://www.qcif.edu.auCombine: https://combine.org.auAus. Bioinformatics Network: http://australianbioinformatics.netBioplatforms Australia: http://bioplatforms.com.auEnsembl/EMBL resources (local): https://www.embl-abr.org.au

-------

·NZGL: http://nzgenomics.co.nzBioinformatics Institute: http://www.bioinformatics.org.nzNeSI: http://nesi.org.nz

---

41/41