22
Flow Cytometry and Reproducible Analysis Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC

Flow Cytometry and Reproducible Analysis

Embed Size (px)

DESCRIPTION

Flow Cytometry and Reproducible Analysis. Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC. Reproducible Analysis. Can someone in a different lab replicate your results? Can someone else in your lab replicate your results? Can you replicate your own results 6 months later? - PowerPoint PPT Presentation

Citation preview

Page 1: Flow Cytometry and Reproducible Analysis

Flow Cytometry and Reproducible Analysis

Cliburn ChanDepartment of Biostatistics and

Bioinformatics, DUMC

Page 2: Flow Cytometry and Reproducible Analysis

Reproducible Analysis

• Can someone in a different lab replicate your results?

• Can someone else in your lab replicate your results?

• Can you replicate your own results– 6 months later?– When FlowJo goes from version 10.0 to 11.0?– When your lab catches fire and all your computers

melt into toxic waste?

Page 3: Flow Cytometry and Reproducible Analysis

Complexity of flow analysis

• Experimental design• Running the experiment• Raw data (FCS files)• Compensation• Transformation• Gating strategy• Gates MFI and relative frequencies• Statistical analysis – e.g. outcome correlation

Page 4: Flow Cytometry and Reproducible Analysis

Experimental design

• Is randomization done correctly?• Is the sample size sufficient?• Is there an SOP for annotating the experiment?– MIATA– MiFlowcyt

• What is the informatics strategy to ensure that data is recorded accurately and backed-up safely?

Page 5: Flow Cytometry and Reproducible Analysis

Running the experiment

• Stuff I know little about …• Janet and Jennifer will teach in this workshop– Instrument calibration– Bridging studies– Reagent qualification– Use of appropriate biological controls– Use of appropriate technical controls

Page 6: Flow Cytometry and Reproducible Analysis

Raw data (FCS files)

• Is there a file naming SOP that is followed?• Is there an SOP for recording FCS metadata?– Channel labels – fluorochrome, antibody, FMO

Page 7: Flow Cytometry and Reproducible Analysis

Inconsistent annotation example

Page 8: Flow Cytometry and Reproducible Analysis

Compensation, transformation and gating strategy

• Compensation is Real = Spillover-1 × Observed• Transformation is complicated – can think of as

linear (low values) and log (high values)• Gating strategy is hard to replicate, but can be

stored as a template and “re-used” with tweaking• Compensation, transformation and gating should

be done on a per-batch and not per-file basis• Would recommend storing workspace containing

this data in both .jo and .xml formats

Page 9: Flow Cytometry and Reproducible Analysis

Working with statisticians

• At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness.

• Statisticians do not like Excel– The first thing they will try to do is export to a CSV

or delimited file, for import into SAS or R– If this is difficult to do, they will not like you

Page 10: Flow Cytometry and Reproducible Analysis

Excel rules for happy statisticians

• 1 worksheet = 1 table• 1 cell = 1 value• Data/metadata = comprehensive & consistent• Formatting = None• Validation = Yes

Page 11: Flow Cytometry and Reproducible Analysis

1 worksheet = 1 table

• A table has column headers and a number of rows and nothing else – it is RECTANGULAR

• Do not put more than 1 table in a worksheet• Do not use non-rectangular tables• Example of good worksheet

Page 12: Flow Cytometry and Reproducible Analysis

1 worksheet = 1 table

Page 13: Flow Cytometry and Reproducible Analysis

1 cell = 1 value

• Easy to filter by tube, sample or subject• Easy to write validation rules or lookup table

Page 14: Flow Cytometry and Reproducible Analysis

1 cell = 1 value

• ID column has 3 different values• Need to do text parsing to recover information

– very error prone

Page 15: Flow Cytometry and Reproducible Analysis

Data: column names

• Consistent column names across worksheets– Singlets/Lymphocytes– Singlet/Lymphs– Singlets / Lymphocytes– Singlets/Lymphoctyes

• Use full gating path for column name– Singlets/Lymphocytes/Viable/CD4+/CM/IFN+

Page 16: Flow Cytometry and Reproducible Analysis

Data: What to record

• Better to have more data than less data– Sample type (PBMC, whole blood)– Recovery – Viability

• Better to have basic than derived data– Counts better than relative frequencies

• Keep link to raw data for reproducibility– Path to FCS and workspace files on server

• Use special indicator for missing data (e.g. NAN), not zero• Use as many columns as you need and name them sensibly

and consistently

Page 17: Flow Cytometry and Reproducible Analysis

Data: Versioning

• Do not change the data in the worksheet once it has been handed to statistician.

• If there are errors that must be corrected, make a new copy, label the filename with date and version, and send that to statistician– ArcticRatExperiment_07May2013_Version01.xlsx– ArcticRatExperiment_17May2013_Version02.xlsx

Page 18: Flow Cytometry and Reproducible Analysis

Formatting

• Don’t do it.• Avoid putting information via:– Highlighting– Fancy spacing– Different fonts and font effects– Merging cells– Comments

• Will it survive a round-trip from Excel to CSV and back again?

Page 19: Flow Cytometry and Reproducible Analysis

Formatting - Before

Page 20: Flow Cytometry and Reproducible Analysis

Formatting - After

Comments are lostHighlighting is lostBad cell formatting is lostMerged cells become missing information

Page 21: Flow Cytometry and Reproducible Analysis

Summary of Reproducible Analysis

• Know what you are doing from PBMC to Excel• SOPs are important• Annotation is important• Excel is OK if you use NONE of its features • Keep all necessary data in the same place• Keep a remote backup• Talk with your statistician

Page 22: Flow Cytometry and Reproducible Analysis

Biologist talks to Statistician

http://www.youtube.com/watch?v=Hz1fyhVOjr4