19
NHS R Users: The Art of the Possible Dr. Andrew Hill, Clinical Lead for Stroke Services, St Helens and Knowsley Teaching Hospitals NHS Trust, United Kingdom Email: [email protected]

NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

NHS R Users:

The Art of the Possible

Dr. Andrew Hill, Clinical Lead for Stroke Services,

St Helens and Knowsley Teaching Hospitals NHS Trust,

United Kingdom

Email: [email protected]

Page 2: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Aims of this talk • WHY might we want to use R?

• HOW might we want to use R?

• How do we build sustainable workflows and reuseable tools?

• What is possible with R with good workflow?

• SSNAP R tools – a case example of complex workflow and development needs

Page 3: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

It looks like you’re trying to develop a complex data analysis process. Would you like help? • Get help with writing the complex data

analysis process • Just type the complex process without

help.

Top tip for presentations: test the cultural reference age of your audience early…

Page 4: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

https://pasokonga.com

You Can Do Anything In Excel

Page 5: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

USE EXCEL

DATA

^

Page 6: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

So Why Do People Go From Excel To R?

• Easy to break a complex spreadsheet • User interaction and workflow design are not separated.

• ‘I’ll just protect all these cells, and hide these rows and columns, and put this data validation in…etc etc.’

• Which creates more complexity to stop people breaking things and hiding the methodology from other users AND future maintainers.

• User workarounds / ‘I need it to do X so I’ll just do this…’ inevitably break it.

• Limited charting options • Limited output options / customisation

• ‘I’ll just use this VBA code / use these macros…’ • Adds complexity, harder to maintain…

• Very hard to scale up for larger tasks • BAD Workflow. • Very poor reusability.

Page 7: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

What Does ‘Good’ Look like?

• High reproducibility • High scalability • High re-usability • Be the ‘blacksmith’ not the consumer – you craft the tools so your

consumers can get at their data. • High flexibility over output styles according to task:

• Office formats for wider acceptance by PHBs. • PDF for reports. • Ability to modify for variants on the original query. • Realtime dashboards.

• Supports Information Governance best practices.

Page 8: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Development of SSNAP R Tools • The Stroke Sentinel National Audit Programme. • HQIP-commissioned; run by RCP (2012-2018); KCL (2018-). • Continuous, high-quality national audit of stroke care in England, Wales, NI since 2013. • Measures performance across all acute and community stroke providers with >98% data

completeness. • 250,000 records to date. >300 data fields per record; converted into over 1,000 aggregated

measures in a full analysis ‘portfolio’ per team. (Most reporting focuses on 45 Key Indicators across 10 Domains of care).

• From >200 teams. • Many stakeholders with different needs and levels of understanding:

• Individuals within a team understanding operational performance. • Senior members of teams undertaking QI work • Clinical Executive Boards • Local reporting (CCG / regional comparison) • National reporting (NHSE / CQC / stakeholders). • Major data source for largescale stroke research. • Accessible by the public to understand quality of stroke care in their area, including

those with aphasia (communication problems due to stroke) or visual problems.

Page 9: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Where are we? • Data input on a web tool. • Data storage in an SQL Server. • Snapshots of data taken on ‘lock dates’ for analysis and exported as CSV. • Analysis scripts in Stata to produce tables of data. • Outputs from Stata -> Office format documents. • VBA scripts to tidy up the documents. • Manual intervention to run scripts, perform checking etc. • Approximately 1 month from lock date to publication. • Analyst time intensive.

Where do we want to be? • Modernise the toolchain for maintainability. • Minimise need for analyst intervention - fully automate process where possible. • Allow ‘outliers’ to be easily identified at national level. • Improve IG measures – remove CSV files, ideally remove the need for individuals to deal

with patient-level records at all. • Work towards realtime dashboarding. • Allow us to change dataset / add / remove measures easily. • Allow us to ideally move away from manual data entry and towards taking data more

directly from provider EHRs. • Use the wide range of plot options with ggplot2 to improve design of outputs.

Page 10: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Use Source Control

• Source Control is critical to having a reproducible and scalable project.

• I use GitHub – open source projects are free; scaled license price scheme for commercial projects.

• Other options (Gitlab, BitBucket, or an in-house Git server) are options.

• http://r-bio.github.io/intro-git-rstudio/ tutorial.

• Allows you to collaborate safely on your work and share your work with others.

Page 11: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Write Your Reusable R Analysis As Packages Rather Than Scripts

• Packages allow you to share your code more easily with other R users.

• The workload is minimal: • Hadley Wickham’s (freely available online) book ‘R Packages’ help

explain all the core concepts: http://r-pkgs.had.co.nz/intro.html

• Helps avoid bad habits such as hardcoded file paths: • Resources you may need live inside the package. • Your analysis becomes a function. • Your raw data pathname is a parameter to your function:

mypackage::myanalysis(“C:\MyRawData.txt”)

If the first line of your R script is

setwd("C:\Users\jenny\path\that\only\I\have") I will come into your office and SET YOUR COMPUTER ON

FIRE 🔥.

If the first line of your R script is

rm(list = ls()) I will come into your office and SET YOUR COMPUTER ON

FIRE 🔥.

Jenny Bryan, IASC-ARS/NZSA R Conference / Twitter #rstats

Page 12: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Break Off ‘Generic’ Components And Share Them

• Projects are sat in packages and stored on version control.

• Many operations are common to many organisations.

• Where possible, share code which is of use to others as a package.

• Publish it on GitHub / CRAN (need employer’s permission).

• Encourage others to use the same code – and contribute improvements.

• Everyone ends up with an improved version of the same code – less maintenance and more features.

Page 13: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

The Possible: A Shared NHS R Repository

• A hosted NHS Git server and home page hosted within N3.

• Organisations sign up to a common code of conduct (ie. to collaborate, share resources, no data / patient info on site etc).

• May offer a politically ‘safer’ solution to risk-averse Trusts who may have poor awareness of R development than using privately-owned sites for source code.

• Facilitate publication of useful resources into the wider public domain in a controlled manner.

Page 14: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Cleanly divide Inputs, Charting functions and Outputs

SSNAPInterface Handles ‘input’

(initially CSV;

later will be

Dbplyr -> SQL)

SSNAPStats Data wrangling

’engine’.

Package for handling

Patient audit operations.

Definitions for SSNAP’s

‘cohorts’, ‘measures’,

‘domains’ outputs.

Built around dplyr (to allow dbplyr use for direct SQL

server access). Uses tidyverse packages.

SSNAPCharts Common chart / map

Outputs with a range of

Customisation options.

Ggplot2; ggmap.

SSNAPReports Rmarkdown for common

outputs.

RStudio 1.2 Preview supports *PPTX*.

Officer for some Office work.

Openxlsx for fast Excel files.

Huxtable for slower but more interoperable table

outputs.

Page 15: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Cleanly divide row operations from

aggregation operations

• Harder task than dividing Inputs / Analysis / Output

• I tend to place row operations as part of ‘inputs’.

• Aggregation operations live within the analysis.

• Thus ‘patient identifying’ work is clearly separated

from ‘aggregated anonymized’ work so you know which bits of your code pose a patient identification risk.

Page 16: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Act directly on source data rather than using files

• Most early analysis works on file operations

• More persistent work should work directly from the source data, is more reproducible and more secure from an IG perspective than having patient level data unsecured in files.

• Dbplyr package allows you to talk directly to common database packages using dplyr.

• ROnFHIR is an interesting package for using HL7 FHIR calls to fetch data from EHRs - giving us modern healthcare interop.

• (DON’T store usernames or passwords in the code – keep them as parameters).

Page 17: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Avoid Patient level data at all if possible

• By communicating directly with the database and by

separating and minimizing patient-level commands, we can aspire to avoid fetching patient-level detail.

• This is much more better from an IG perspective.

Page 18: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Realtime Dashboards

• Because we’ve designed the analysis and the charts to be totally independent of the outputs, reporting and dashboarding can share common analysis code and common charts.

• Shiny offers a great way to build dashboards (as will be shown later today).

• Options for making software available to users include setting up an R Shiny Server.

• Or small scale / ‘testing’ releases using R Shiny Electron (https://github.com/dirkschumacher/r-shiny-electron) which allows you to distribute your dashboard as a (large) standalone app to your user avoiding much of the IT permissions issues of installing R or hosting a shiny server.

Page 19: NHS R Users: The Art of the Possible€¦ · • A hosted NHS Git server and home page hosted within N3. • Organisations sign up to a common code of conduct (ie. to collaborate,

Other Resources

• R Studio runs a very high quality series of Webinars and

Rstudio::Conf talks on their website (www.rstudio.com).

• The R Consortium has a YouTube channel containing a lot of useful webinars.

• Jenny Bryan’s talks in particular on structure and approaching R Code are invaluable.

• Twitter - #rstats, @dataandme for a vast amount of links to interesting projects.