Upload
duongxuyen
View
216
Download
1
Embed Size (px)
Citation preview
A portrait of
TheDatabase as Biomedical
Laboratory
The Connectivity Map:
by Pablo Tamayo, Broad Institute and Oracle Corporation
It is organized as a publicly available online database that contains the signatures of many drugs in the language of gene expression.
Connectivity Map
It can be “queried” with genetic signatures of disease, in an approach known as in silico drug screening, in order to find matching drugs that are therefore identified as potential new treatments for the disease.
Connectivity Map (‐)‐catechin 12,13‐EODE 3‐hydroxy‐DL‐kynurenine BW‐B70C DL‐PPMP MG‐132 cytochalasin demecolcine doxycycline minocycline monensin phenanthridinone phentolamine trichostatin tyrphostin yohimbine 15‐delta 17‐allylamino‐geldanamycin 17‐dimethylamino‐geldanamycin LY‐294002 acetylsalicylic
5186223 5186324 5213008 5286656 HC calmidazolium carbamazepine
celastrol
celecoxib clotrimazole colforsin decitabine docosahexaenoic ikarugamycin ionomycin pararosaniline quercetin rottlerin topiramate 5182598 5211181 5224221 5230742 5248896 5252917
fulvestrant geldanamycin genistein haloperidol
monorden nordihydroguaiaretic prochlorperazine rosiglitazone sirolimus thioridazine tretinoin troglitazone valproic vorinostat wortmannin clozapine trifluoperazine
5109870 5114445
5140203 5149715 5151277 5152487 5162773
5253409 5255229 5279552 Y‐27632 blebbistatin bucladesine depudecin felodipine oxaprozin prazosin pyrvinium resveratrol monastrol butirosin mercaptopurine W‐13 benserazide colchicine tioguanine paclitaxel pentamidine novobiocin 4,5‐dianilinophthalimide nocodazole 5666823
LM‐1685 NU‐1025 butein thalidomide MK‐886 arachidonic ciclosporin nifedipine arachidonyltrifluoromethane 3‐aminobenzamide probucol U0125 splitomicin HNMPA‐(AM)3 dimethyloxalylglycine fisetin copper deferoxamine tetraethylenepentamine 1,5‐isoquinolinediol SC‐58125 gefitinib staurosporine indometacin sodium iloprost
pirinixic dopamine imatinib rofecoxib cobalt quinpirole TTNPB diclofenac clofibrate
oligomycin oxamic fasudil raloxifene tacrolimus
tamoxifen dexamethasone 2‐deoxy‐D‐glucose
azathioprine nitrendipine
N‐phenylanthranilic flufenamic exisulind
sulindac fludrocortisone prednisolone tomelukast sulfasalazine amitriptyline dexverapamil exemestane verapamil chlorpropamide tolbutamide mesalazine metformin phenformin Phenylalpha‐estradiol Chlorpromazine
estradiol fluphenazine
It contains 164 (1079 v2) different drugs including most FDA approved drugs.
Connectivity Map
The CMAP can significantly speed up the rate of drug discovery, and find new uses for old drugs.
The CMAP is housed at the Broad Institute in Cambridge MA and is publicly available at
www.broad.mit.edu/cmap/
The Broad Institute is a research collaboration involving the MIT and Harvard academic and medical communities.
It was founded in 2003 through thefar‐sighted generosity of philanthropists Eli and Edythe Broad.
The Institute is organized around interdisciplinary Scientific Programs and Scientific Platforms to enable scientists to collaborate on important projects with the objective of bringing the power of genomics to medicine.
People that have participated in the project include Irene Blat,Jean‐Philippe Brunet, Steve Carr, Jon Clardy, Paul Clemons, Emily Crawford, Stephen Haggarty, William Hahn, Jim Lerner, Joshua Modell, David Peck, Xiao Peng, Srilakshmi Raj, Michael Reich, Kenneth Ross, Aravind Subramanian, David Twomey, Ru Wei and Matthew Wrobel. Justin Lamb and Todd Golub (shown in photo below) lead the CMAP team.
Photo courtesy of Justin Ide/Harvard News Office
The CMAP Team
CMAP reference: Lamb et al. The Connectivity Map: Using Gene‐Expression Signatures to Connect Small Molecules, Genes, and Disease. Science 313 (5795), 1929 (2006).
The CMAP v1 runs on an Oracle Database 10g Enterprise Edition Release 10.1 ‐ 64bit with partitioning, OLAP and data mining options.
What type of database is the CMAP?
CMAP
Web interface
Java ServletsIt captures information about the experimental process that generates the data
It stores the drug and disease signatures plus entire results sets for each user/query that can be retrieved at later times.
It has about 5,800 registered users.
It is implemented as a Java/servlet application with a web interface.
Two articles in this issue of Cancer Cell show the use of the CMAP in Leukemia and prostate cancer research to predict anticancer activity that was subsequently demonstrated in additional experiments on model systems.
Volume 10, October 2006
The Connectivity Map has been useful to identify novel therapeutics in leukemia and prostate cancer
Later in this presentation we will see the leukemia example in detail …
… then those were profiled using Affymetrix arrays of DNA
micro‐chips and a scanner
Breast Prostate Leukemia Melanoma
…on 4 different types of cell lines…
genes that go down
genes that go upThe drug
signatures are ordered
lists of genes…
…then a computer program identified drug signatures
First 164 (1079 v2) distinct drugs were selected and used in several doses and times for a total of 564 (5774 v2) instances…
CMAP
.. .they were finally
stored in the database
How was the CMAP created?
How is the CMAP queried?Starting from two patient populations
E.g. Disease and Normal…A B
…samples are extracted and profiled using Affymetrix arrays
of DNA micro‐chips and a scanner
…a computer program defines the disease signature
genes that go down
genes that go up
Disease signatureCMAP
Query...and the disease
signature itself becomes
the query
match against all the drugs
~22,000 genes
564 (5774 v2) drug instances
Disease X signature
Top genes up
Top genes down
……
How to match diseases to drugs?
is match against all the drugs by using
an statistical
test~22,000 genes… …
564 (5774 v2) drug instances
strong weak null weak strongpositive positive negative negative
Disease signature e.g. 13 genes:
7 up and 6 downA B
gene upgene down
One Example in Detail…
Notice that the CMAP queries are not standard information retrieval queries such as:
SELECT <...> FROM CMAP <...>
Because the actual link between drugs and disease does not exist until the query is made!
The match between the disease and the drug signatures is computed using an statistical test that compares the gene orderings of both signatures and computes a similarity score.
Lets see how it works…….
CMAP queries use a Kolmogorov‐Smirnov statistical test
Drug x
Disease signature
drug x’s effect on genes down up
Connectivity score Sx =0 if sign(Kup) ≠ sign(Kdown)
Kup – Kdown otherwise
Are the genes in the down signature enriched on this side?
Are the genes in the up signature enriched on this side?
1
( ) 1maxdownt
jdown
V j jbn t=
−= −
tdown = size of down signaturen = number of genes
Kdown =a if a > b
‐b if b > a
More formally:
1
( )maxupt
jup
j V jat n=
= −
tup = size of up signaturen = number of genes
Kup =a if a > b
‐b if b > a
More formally:
It can be computed entirely inside the RDBMS:
SELECT stats_ks_test(drug_instance, disease_sig, 'STATISTIC') ks_statistic,
stats_ks_test(drug_instance, disease_sig) p_valueFROM cmap.drugs c, cmap.sig sWHERE c.gene_id = s.gene_id;
CMAP queries use a Kolmogorov‐Smirnov statistical test
Finally the top scoring drugs are selected
564 drug instances connectivity scores
S1S2S3.....S564
For example: Drugs:
Sx Sy Sz
hit + miss hit –
p‐values:0.01 0.3 0.02
Drugs are sorted by their connectivity scores and hits
found by the pattern of dose/time instances of the
same drug
A (second) test is used to assess the statistical
significance of each hit
Cancer is the most common cause of death from disease in children in developed countries, and the most frequent childhood malignancy is acute lymphoblastic leukemia (ALL).
Cancer is the most common cause of death from disease in children in developed countries, and the most frequent childhood malignancy is acute lymphoblastic leukemia (ALL).
dexamethasone
Glucocorticoids have been an important component of the treatment of acute lymphoblastic leukemia (ALL) for more than 50 years. However, it is still unknown what specific factors affect sensitivity and resistance to these drugs.