In Silico Prediction of DILI - Extraction of ... Silico Prediction of DILI...آ  In Silico Prediction

Embed Size (px)

Text of In Silico Prediction of DILI - Extraction of ... Silico Prediction of DILI...آ  In Silico...

  • In Silico Prediction of DILI - Extraction of Histopathology Data from Preclinical Toxicity Studies of the eTOX Database for new In Silico Models of Hepatotoxicity

    Alexander Amberg1, Lennart T. Anger1, Manuela Stolte1, Jennifer Hemmerich1, Hans Matter2, Lilia Fisk3, Inga Tluczkiewicz4, Kevin Pinto-Gil5, Oriol López-Massaguer5, Manuel Pastor5 1Sanofi, Preclinical Safety, Frankfurt, Germany; 2Sanofi, Integrated Drug Discovery, Frankfurt, Germany; 3Lhasa Limited, Leeds, United Kingdom; 4Fraunhofer ITEM, Hannover, Germany; 5IMIM, GRIB, Barcelona, Spain



    The eTOX consortium extracted in vivo data from unpublished preclinical toxicity studies of 13 EFPIA partners. This new database contains high quality toxicity results in high detail level from 1,947 drug candidates (8,196 studies) supplemented with 1,286 chemicals from the RepDose database (2,695 studies). Different compilation steps were applied to transform these data into usable in silico model training datasets: initially, all toxicity findings were extracted from study reports (paper/PDF). Then the verbatim terms for all treatment-related hepatotoxicity findings were harmonized using special ontologies. Finally, to receive model training sets with sufficient compound numbers and chemical space coverage, all primary histopathology terms were combined and grouped to different 1st and then 2nd level clusters of similar toxicity mechanisms: e.g. primary necrosis terms such as “centrilobular”, “periportal” etc. were grouped to 1st level cluster “necrosis”, then clusters such as “necrosis”, “vacuolization” etc. were grouped to 2nd level cluster “degenerative lesions”. With this approach, various training datasets were compiled depending on the species (rat, dog and monkey), treatment durations (2 weeks - 2 years) and administrations routes. Then, different modeling approaches were applied on these datasets, including structural alerts, fragment-based and molecular descriptor-based machine learning approaches (e.g. random forest, decision tree, k nearest neighbor). Models were validated and optimized, first by internal validation (test set 10%) then by external validation using Sanofi’s confidential data. For example, best external validation results (n=66) were achieved for the 1st cluster rat necrosis models (229 positives, 198 negatives) using fragment-based (Sensitivity: 0.80, Specificity: 0.77) and a molecular descriptor-based decision tree approach (Sensitivity: 0.81, Specificity: 0.88). These validation results show that by reasonable clustering histopathology data from eTOX, it is possible to develop highly predictive in silico models for drug-induced liver injury (DILI).


    Structural Alerts – Validation and Improvement of existing and Identification of new alerts

    eTOX Training Dataset Compilation Steps to transform the eTOX in vivo data into usable in silico model training datasets 1) Data extraction of treatment-/compound-related hepatotoxicity findings from study reports




    A) EFPIA preclinical toxicity studies (paper/PDF)

    • Reports/studies: 8,196 • Compounds: 1,947

    B) RepDose DB (

    • Reports/studies: 2,695 • Compounds: 1,286

    Work performed at eTOX Hackathon (“Hack Marathon”) of toxicologist, pathologist, in silico modeler, data manager

    2) Harmonization of the verbatim terms from study reports using special ontologies and combination of the data to different model trainings datasets

    Rat • Histopathology • Clinical chemistry • Liver weight

    Dog • Histopathology • Clinical chemistry • Liver weight

    Monkey • Histopathology • Clinical chemistry • Liver weight

    No. of compounds

    EFPIA oral

    RepDose gavage, feed

    EFPIA oral

    RepDose gavage, feed

    EFPIA oral

    RepDose gavage, feed

    ≤ 28 days (~ 1 month) 889 284 380 - 62 -

    28 – 120 days (~ 3 month) 114 562 96 91 16 -

    ≥ 120 days (~ 6-24 month) 87 382 98 189 18 -

    Data matrix available for all of these training sets

     primary terms with at least one treatment-related finding (values = LOEL)

    Example data matrix histopathology

    Substance ID min dose

    max dose

    accumu- lation lipid

    bile duct hyperplasia

    congestion hyperplasia hypertrophy hypertrophy kupffer cells

    Inflammation necrosis centrilobular

    necrosis periportal

    single cell necrosis

    Compound 1 500 1500 Compound 2 50 1000 50 250 250 Compound 3 0.69 13.8 0.69 Compound 4 100 1750 750 100 Compound 5 38 510 Compound 6 50 2000 250 1000 Compound 7 60 500 60 Compound 8 100 2000 2000 100 Compound 9 5 651 Compound 10 1 100 5 100

    No. of positives 3 35 7 47 111 6 40 6 8 33

     all individual compounds

    Primary terms - preferred ontology Positive cpds. 1st level cluster

    Primary terms Positive cpds. 1st level cluster

    1st level cluster

    2nd level cluster

    hypertrophy, epithelial | hypertrophy, hepatocyte | hypertrophy 142 cell enlargement | enlargement | hypertrophy | peroxisome proliferation | swelling

    380 hypertrophy

    intracellular vacuolation | vacuolation, biliary epithelium | vacuolation, epithelial

    140 vacuolization 134 vacuolation degenerative lesions

    accumulation, lipid | increased, lipid content | intracellular increase of lipids | vacuolation, lipidic (fatty change) | vacuolation, lipidic

    131 fatty degeneration | lipidosis 114 steatosis degenerative lesions

    necrosis, fibrinoid | necrosis, focal/multifocal | necrosis, hepato- cellular | necrosis, centrilobular | necrosis, midzonal | necrosis, periportal | necrosis, zonal | single cell necrosis …

    126 apoptosis | necrosis 165 necrosis degenerative lesions

    abscess(es) | chronic inflammatory/proliferative/ metaplastic changes | inflammation, granulomatous | inflammatory processes …

    124 inflammation 67 inflammation inflammatory changes

    inflammatory cell infiltration | granuloma | histiocytic inflammatory cell infiltrate | histiocytic proliferation | increased, histiocyte number | increased plasma cell number ...

    94 granulocytes | granuloma | histiocytosis | infiltration | leukocytes | macrophages | polynuclear cells

    78 infiltration inflammatory changes

    RepDose: histopathology findingsEFPIA legacy reports: histopathology findings

    • Many compilation steps had to be applied to transform the in vivo hepatotoxicity data from the eTOX database into usable model training datasets

    • Data extraction of treatment-related findings from unpublished preclinical toxicity study reports • Harmonization of the verbatim terms using special ontologies and combination of the data • Grouping of primary histopathology terms to 1st and 2nd level cluster of similar findings (and mechanisms) to

    receive trainings data with sufficient compounds numbers and chemical space coverage

    • Various in silico models were developed from these training datasets using approaches like • Statistical structural fragment / fingerprint based models • Molecular descriptor based machine learning models (QSAR), like Decision Tree (DT), Partial Least Square

    (PLS), Random Forest (RF), k-Nearest Neighbor (kNN) etc. • Systems biology models • Structural alerts

    • Internal & external validation showed sensitivities up to 81% and specificities up to 88%

    • Case study on tienilic acid demonstrated how these in silico models can be used for the prediction of DILI and for elucidation of potential mechanisms of hepatotoxicity

    • Mulliner et al. models [7] prediction results • Leadscope & SVM prediction summary

    • eTOX models prediction results • Leadscope & C5 DT summary

    structural feature contribution structural feature contribution

    Tienilic acid Withdrawn from the market due to idiosyncratic autoimmune mediated

    hepatotoxicity findings in patients, characterized by covalent CYP2C9 binding (antibodies of acylated CYP2C9 hapten found in patients) [6]

    Structural alert: Thiophene c) metabolic CYP soft spot

    analysis (MetaSite)

    • Drug Induced Liver Injury (DILI) • still of great concern for patient safety and

    major cause for drug candidate attrition and drug withdrawal from the market

    • Data about liver toxicity • scattered and available from many sources  public, proprietary, consortia (eTOX), commercial  eSafety in vitro, preclinical in vivo, clinical, post-market

    • Chen et al. [1] reviewed several in silico models for human DILI (since 2012) • Models trained with 74 - 1087 compounds from different data sources using various modeling approaches • Performance: higher accuracy for small datasets: 70-84% (13-53 compounds)

    lower accuracy for larger datasets: 60-75% (73-1087 compounds) high specificity (90-95%), but poorer sensitivity (~50%)

     Conclusion • Sufficiently harmonized hepatotoxicity datasets from all the different sources in high detail level missing • Consequently, satisfying in silico models for generation of reliable predictions of hepatotoxicity lacking

    using these harmonized, detailed training data

    • In the eTOX consor


View more >