EBI is an Outstation of the European Molecular Biology Laboratory.
Image analysis and modelling of high-throughput cell based assays
Wolfgang Huber
Cellular Phenotype Assays
Library of perturbation
reagents:
RNAi (~20k)
over-expression constructs (~1k)
small mole-cules (~1M)
cell line that does something
(e.g. mitotic growth;
differentiation; respond to a
signal)
readout
Any cellular process can be probed.- (de-)activation of a signaling pathway- cell differentiation- changes in the cell cycle dynamics- morphological changes- activation of apoptosisSimilarly, for organisms (e.g. fly embryos, worms)
Phenotypes can be registered at various levels of detail- yes/no alternative- single quantitative variable- tuple of quantitative variables- image- time course
What is a phenotype? It all depends on the assay.
High-throughput microscopy screening
Genetic interactions
���� in yeast, ~73% of genes are "non-essential"(Glaever et al. Nature 418 (2002))
���� synthetic lethality phenotypes are prevalent (Tong et al. Science (2004))
���� in drosophila, ~95% no viability phenotype (Boutros, Kiger, et al. Science 303 (2004))
���� association studies for most human genetic diseases did not produce single loci with high penetrance
���� evolutionary pressure for robustness
Two types of unspecificity effects
���� because the phenotype assay may lump together a number of different underlying mechanisms (e.g. via bility assay)
���� because the reagents are not as specific to their t arget as intended
What are the implications for designing functional studies?
� need specific phenotypes: multiple assays, complex readout, over time
� use combinatorial perturbations (co-RNAi, small molecules, different genetic backgrounds)
� good preprocessing (normalisation/transformation, QA just as important as for µµµµarrays)
� graph-type models to relate the data to gene-gene a nd gene-phenotype interactions, detect patterns and es timate modules
Plate reader96 or 384 well, 1…4 measurements per well
FACS4…8 measurements per cell, thousands of cellsper well
Automated Microscopyunlimited
Monitoring tools
Bioconductor packages for cell-based assays
cellHTS (Ligia Bras, M. Boutros)
genome-wide screens with scalar (or low-dimensional ) read-outdata management, normalization, quality assessment, visualization,
hit scoring, reproducibility, publicationraw data →→→→ annotated hit list
prada (Florian Hahne) ; flowCore, -Utils et al. (B. Ellis, P. Haaland, N. Lemeur, F. Hahne)
flow cytometrydata management
EBImage (O. Sklyar)
image processing and analysisconstruction of feature extraction workflows for la rge sets of similar images
cellHTSBioconductor package for the analysis of cell-based high-throughput screening (HTS) assays
Manage all data and metadata relevant for interpret ing a cell-based screen
Data cleaning, preprocessing, primary statistical analysis
Raw data -> annotated hit list
Boutros, Bras, Huber. Analysis of cell-based RNAi screens. Genome Biology (2006)
The cellHTS package
per plate quality assessment
• Dynamic range
• Distribution of the intensity values for each repli cate
• Scatterplot between replicates and correlation coeff icient
• Plate plots for individual replicates and for stand ard deviation between replicates
per experiment quality assessment
• Boxplots grouped by plate
• Distribution of the signal in the control wells, Z' -factor
whole screen visualization
KcViab Analysis Report rendered in HTML
Original image data
1. Negative control (siRNA against Renilla luziferase)
2. Elongated cell morphology after silencing GPR124
3. Mitotic arrest after silencing CDCA1
A genome-wide siRNA screenon HEK293 cells to identify modulators of cell morp hology (apoptosis, cell cycle, …)
1
2
3
12 images per probe: 4 images in each of Hoechst-, Tritc- and Fitc-channels 22848 probes in total x 2 datasets
Expt's: Florian Fuchs, Michael Boutros, DKFZ Heidelberg
EBImageImage processing and analysis on
large sets of images in a programmatic fashion
A package of R functions - to construct workflows that integrate statistic analysis and quality assessment, using a "real" modern language
Number crunching uses C (easy to add your own C/C++ modules)
Based on ImageMagick and other C/C++ image processing libraries
Free and open source (LGPL), distributed with Bioconductor
Collaboration with Michael Boutros, Florian Fuchs (DKFZ)
orig
inal
imag
e
thre
shol
ding
open
ing/
clos
ing
dist
ance
map
cells
det
ecte
d
dete
cted
nuc
lei &
cel
ls
Image processing with R: simple operations
1
2
I/Ofiles = c(“im1.tif”, “im2.tif”)im = read.image(files)
Subsettingw = dim(im)[1]/2 - 1h = dim(im)[2]/2 - 1r1 = im[1:w, 1:h, ]w1 = r1[,, 1]
Image stacks combine(w1, r1[,,2], r1[,,3])
Logical indexingx[ x > 0.5 & w1 > 0.7 ] = 1
Colour channels, greyscalech1 = channel(w1, “asred”)ch2 = channel(res[,,2], “asgreen”)ch3 = channel(res[,,3], “asblue”)rgb = ch1 + ch2 + ch3
Image processing: arithmetic and visualization
display(x)hist(x, xlim=c(0,.7), col=”gray”)
nx = (x-min(x))/diff(range(x))
## naïve high pass filterfx = fft(x)fx[ 1:10, 1:10 ] = 0x1 = normalize(Re(fft(fx, inv=TRUE)))
Image processing: filters from ImageMagickdisplay( x )
display( edge(x, 1) )
display( blur(x, 6, 2) )
display( sharpen(x) )
## othersnormalize2enhancecontrastcgamma
denoisedespeckleumaskmediansmooth
resizeresampleflipfloprotate
segmentathreshcthresh
modulatenegate
etc
Basic tools for segmentation
1. t = thresh(w0, 40, 40, 0.001)
mask = closing(t, morphKern(5))
2. mask = opening(mask, morphKern(5))
3. dm = distmap(mask)
range(dm)
[1] 0 87
1
2
3
Locally adaptive thresholding
Mathematical Morphology
Distance map transformation
binary image -> greyscale
each pixel is given the value of its distance to the nearest background pixel
Distance map transformation
( ) ( ) ( ) 0f x = min{d x',x | b x' = }r r r r
bbf
Watershed segmentation
1. w1 = watershed(dm, 0, 1)
range(w1)
[1] 0 189
2. w2 = watershed(dm, 2, 1)
range(w2)
[1] 0 61
3. x = paintObjects(w2,
channel(w1, “rgb”))
1
2
3
distance map/ watershed segmentation can be very effective, but...:- susceptible to spurious local minima- potentially unstable around flat ridges- does not use shape or distance criteria
Voronoi diagramspartitioning of a plane with n convex seed sets into n convex polygons such that each polygon contains only one seed and every point in a polygon is closer to its seed than to any other
Example: segment nuclei (easy)use them as seed pointsVoronoi sets: estimates of cell shapes
Voronoi diagrams on image manifolds
Instead of Euclidean distance in (x,y)-plane, use geodesic distance on the image manifold
T. Jones, A. Carpenter et al.: CellProfiler
λ ·dx
dg
Voronoi diagrams on image manifolds
dm = distmap( thresh(nucl, 30, 30) )
seeds = watershed(dm, 1, 1)
mask = thresh(cell, 60, 60)
w = watershed(distmap(mask), 2, 1) ## y ellow
vi = propagate(cell, seeds, mask, lambda=0) ## r ed
v = propagate(cell, seeds, mask, lambda=2e16) ## w hite
Thumbnail overview of one plate's images
Gallery view of segmented objects of one well
Some visualisation before we continue with the anal ysis
Object features
number of objects
Generic
Moments: area, mass (=intensity), center of mass, elements of the covariance matrix and its eigenvalu es, rotation angle, Hu's 7 rotation invariants
Haralick texture features
Zernike rotation invariant moments
Application-adapted
measures of acircularity or relative overlap betwee n different stain channels
Zernike Moments
mnunit circle
1(r, ) f(r, ) d drin
mn
mA e Z− θ+= θ θ θ
π ∫
• |n|<=m, m-|n| even• |Amn| rotation invariant• careful: f a discrete image,
pixelisation of the circle
From object features to phenotypes
per-cell object features
summary statistics classification(descriptors)
per cell phenoytpeper-gene feature set
per gene phenotype
classification (class occupancies)
classification (descriptors)
Back to reality:
within plate spatial trends -
normalization and quality assessment
Batch effects
Normalization: Plate effects
Percent of control
Normalized percent inhibition
z-score
k-th welli-th plate100' ki
ki posi
xx = ×
100pos
' i kiki pos neg
i i
xx =
− ×−
' ki iki
i
xx =
µ−
Number of cells
Number of cells / no. cells in negative
controls in same plate
Long term drifts
DharmaconsiARRAYlibrary
Hek293 cells viability screenBoutros Lab DKFZ
Plate 26
proteasome subunits or components;
ATP/GTP-binding site motifs
ribosomal proteins
like-Sm nucleoproteins and ribosomal proteins
Normalization problem…Too many hits
Show imageHTS3
01 / A08
AZU1
Homo Sapiens azurocidin precursor (cationic antimicrobial protein CAP37), heparin-binding protein) (HBP)
Number of cells:Run 1: 302 / NC:308Run 2: 312 / NC:305
Wilcoxon test for acirc:p=1.11022e-16, W= 465024
Z-test acirc:p=1.87601e-17, t= 8.5637
67 / F13
GPR124
Homo Sapiens probable G protein-coupled receptor 124 precursor (tumorendothelial marker 5)
Number of cellsRun 1: 357 / NC:473.5Run 2: 357 / NC:474
Wilcoxon test for acirc:p= 0, W= 1078176
Z-test acirc:p= 4.9e-105, t= 24.5806
54/ F13
FLJ41238
Homo sapiens family with sequence similarity 79, member B (FAM79B), mRNA
Number of cells: Run 1: 281 / NC:417.5Run 2: 274 / NC:432.5
Wilcoxon test for acirc:p=0.990294, W= 440619
Z-test acirc:p=0.994775, t=-2.56542
Wilcox: Wilcoxon rank sum test with continuity correction. One sided with alternative hypothesis: shift > 0Z-test: Two-sample Welch t-test. One sided with alternative hypothesis of diff(means) > 0
Gene info obtained from ensembl using biomaRt
Phenotype of interest: elongated cells
Phenotype of interest: elongated cells
01 / A08
AZU1
Homo Sapiens azurocidin precursor (cationic antimicrobial protein CAP37), heparin-binding protein) (HBP)
Number of cells:Run 1: 302 / NC:308Run 2: 312 / NC:305
Wilcoxon test for acirc:p=1.11022e-16, W= 465024
Z-test acirc:p=1.87601e-17, t= 8.5637
67 / F13
GPR124
Homo Sapiens probable G protein-coupled receptor 124 precursor (tumorendothelial marker 5)
Number of cellsRun 1: 357 / NC:473.5Run 2: 357 / NC:474
Wilcoxon test for acirc:p= 0, W= 1078176
Z-test acirc:p= 4.9e-105, t= 24.5806
54/ F13
FLJ41238
Homo sapiens family with sequence similarity 79, member B (FAM79B), mRNA
Number of cells: Run 1: 281 / NC:417.5Run 2: 274 / NC:432.5
Wilcoxon test for acirc:p=0.990294, W= 440619
Z-test acirc:p=0.994775, t=-2.56542
acircularity T-test: acirc.T > 12 & 250 < n < 450
Phenotype of interest: elongated cells – hit list visualisation
Mitocheck: dynamic modeling of live cell populations for clustering and classification of genes and phenotypes
Gregoire Pau (EBI)
withThomas WalterBeate NeumannJan Ellenberg (EMBL)
Mitocheck time lapse dataLive cell time-lapse imaging
• HeLa cell line expressing H2B GFP• seeded on siRNA spots and grown during ~48h• fluorescence time-lapse live imaging (sampling rate =30 min)
Experimental output• video sequences of 96 images (1024x1024)• 100 MB per spot• ~200,000 spots (20 TB)
Examples
Kif11
Incenp
Neumann et al. Nature Methods
2006
Conclusions
HT microscopy of biological systems is becoming a rich source of such dataTools in Bioconductor (et al.)Reproducible researchFeature extraction, variable selection, machine lea rning
mitoODEParameters of a biologically motivated model of the data are a more useful phenotype for classification than the raw time courses
EBIElin AxelssonRichard BourgonAlessandro BrozziLigia BrasTony ChiangAudrey KauffmannGregoire PauOleg SklyarMike SmithJörn Tödling
DKFZFlorian FuchsThomas HornDierk IngelfingerSandra SteinbrinkMichael Boutros
Cristina Cruciat
Florian HahneStefan Wiemann
UCSDAmy Kiger
EMBLLars SteinmetzEugenio ManceraZhenyu XuJulien Gagneur
Jan EllenbergThomas WalterBeate Neumann
BioconductorRobert GentlemanSeth FalconMartin MorganRafael IrizarryVince Carey… & many others
This new EMBL initiative promotes cross-disciplinary research. EIPODs are supported by at least two labs at the five EMBL sites in Heidelberg and Hamburg (Germany), Grenoble (France), Hinxton (UK) and Monterotondo (Italy). EIPOD projects connect scientific fields that are usually separate, or transfer techniques to a novel context.
For a list of possible projects and further information please visit: www.embl.org/eipodYou are also encouraged to propose your own interdisciplinary project.
Online application until 31st August 2007
2007 Call for Applications
EMBL
EMBL Interdisciplinary Postdocs - EIPOD