Reconstruction of dynamic mRNA Institutionen f£¶r klinisk farmakologi Project name Sponsors Language

  • View
    0

  • Download
    0

Embed Size (px)

Text of Reconstruction of dynamic mRNA Institutionen f£¶r klinisk farmakologi Project name...

  • UPTEC X 06 039 ISSN 1401-2138 AUG 2006

    KRISTOFFER FORSLUND

    Reconstruction of dynamic mRNA networks Master’s degree project

  • Bioinformatics Programme Uppsala University School of Engineering

    UPTEC X 06 039 Date of issue 2006-09 Author

    Kristoffer Forslund Title (English)

    Reconstruction of dynamic mRNA networks

    Title (Swedish) Abstract This work uses computer simulations to evaluate algorithms that analyze gene regulatory networks by systematically perturbed gene expression measurements. To investigate the accuracy of a family of such algorithms, they are applied repeatedly to a variety of simulated gene regulatory networks over a range of conditions. Thereby, statistical comparisons may be made between different methods. Specifically, an algorithm called Mode of action by Network Identification (MNI) is shown to be at least comparable to its parent method, and is shown to be able to distinguish between direct and indirect target genes for a given treatment under noise-free circumstances. To further evaluate the performance of MNI, further research might apply it to a wider range of existing expression data collections. Keywords Gene regulation, MNI, system recovery, simulation, mRNA, perturbation method

    Supervisors Mats Gustafsson

    Computational medicine group

    Scientific reviewer Rolf Larsson

    Institutionen för klinisk farmakologi

    Project name

    Sponsors

    Language English

    Security

    ISSN 1401-2138

    Classification

    Supplementary bibliographical information Pages 68

    Biology Education Centre Biomedical Center Husargatan 3 Uppsala Box 592 S-75124 Uppsala Tel +46 (0)18 4710000 Fax +46 (0)18 555217

  • Reconstruction of dynamic mRNA networks

    Kristoffer Forslund

    Sammanfattning För att kunna förstå människor eller andra biologiska system behöver vi veta vad deras gener gör. Detta innebär dels deras kemiska funktion, och dels de villkor under vilka de aktiveras. Hur olika gener aktiverar eller avaktiverar varandra, det vi kallar det genetiska reglernätverket, är en av de centrala frågorna i den moderna biologin, både för att vi vill förstå organismerna i sig själva och för att kunna inverka på dem. Exempelvis dyker ofta situationen upp att ett läkemedel, kanske en cancerbehandling, har effekt men utan att vi är säkra på vilken gen den faktiskt inverkar på. För att kunna besvara sådana frågor behöver vi ett sätt att kartlägga det genetiska reglernätverket. Det har lagts fram metoder för att försöka göra detta. Dessa utgår från experiment där man systematiskt stör celler av den typ man vill studera genom att överuttrycka specifika gener. Därefter mäts hur mycket mRNA som produceras för varje gen och resultaten sätts samman till en datamängd som beskriver hur mRNA-nivåerna relaterar till varandra under olika villkor. Denna datamängd analyseras sedan bioinformatiskt, genom en algoritm som söker det reglernätverk som bäst motsvarar experimenten. Hur väl de här algoritmerna faktiskt fungerar är inte helt klarlagt. Detta arbete prövar ut sådana algoritmer för analys av genetiska reglernätverk. Detta görs genom att simulera ett stort antal experiment på virtuella system av gener under olika omständigheter. Därigenom kan algoritmens resultat jämföras direkt med de faktiska system som användes för simulationen, och på så vis kan dess effektivitet utvärderas. Här har fokuserats på två specifika sådana algoritmer, och det visas att de fungerar i princip.

    Examensarbete 20 p i Bioinformatikprogrammet

    Uppsala universitet augusti 2006

  • Table of Contents

    1 Introduction 3

    1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Data analysis and pattern recognition . . . . . . . . . . . . . . . 5 1.5 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Outline of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2 Methods, materials and algorithms 8

    2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Methods for analysing perturbation data . . . . . . . . . . . . . . 9

    2.2.1 The expression change method (”subtraction method”) . 9 2.2.2 The NIR method . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 The MNI method . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.4 Variant MNI methods . . . . . . . . . . . . . . . . . . . . 16 2.2.5 Other perturbation methods . . . . . . . . . . . . . . . . 17

    2.3 Evaluating the performance of perturbation methods . . . . . . . 17 2.3.1 Model differences . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.2 Performance in published studies - NIR . . . . . . . . . . 18 2.3.3 Performance in published studies - MNI . . . . . . . . . . 18

    2.4 Simulated systems . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Simulation approach . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 Linear system model . . . . . . . . . . . . . . . . . . . . . 19 2.4.3 Cascade system . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.4 Full (Zak) system . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.5 Estimating connectivity for the full system . . . . . . . . 26 2.4.6 Random architecture system model . . . . . . . . . . . . . 29

    2.5 Perturbation sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.1 Single plasmid-type perturbations . . . . . . . . . . . . . 36 2.5.2 Pairwise plasmid-type perturbations . . . . . . . . . . . . 36

    2.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.1 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.2 Dependence on initial guess . . . . . . . . . . . . . . . . . 38 2.6.3 Other MNI variants . . . . . . . . . . . . . . . . . . . . . 38 2.6.4 Dependence on noise . . . . . . . . . . . . . . . . . . . . . 38 2.6.5 Dependence on amplification factor . . . . . . . . . . . . . 39 2.6.6 Dependence on MNI σ factor . . . . . . . . . . . . . . . . 39 2.6.7 Result corroboration MNI - NIR . . . . . . . . . . . . . . 39

    1

  • 3 Results 41

    3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Evaluating results . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    3.2.1 Matrix comparison . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 Result presentation . . . . . . . . . . . . . . . . . . . . . . 43

    3.3 Dependence on measurement noise . . . . . . . . . . . . . . . . . 44 3.3.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 49

    3.4 Dependence on amplification factor . . . . . . . . . . . . . . . . . 49 3.4.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 54

    3.5 MNI σ factor dependence . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 56

    3.6 Result corroboration MNI - NIR . . . . . . . . . . . . . . . . . . 56 3.6.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4 Discussion 61

    4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

    4.2.1 Direct/Indirect responder discrimination . . . . . . . . . . 61 4.2.2 Variant methods . . . . . . . . . . . . . . . . . . . . . . . 62 4.2.3 MNI and NIR comparison . . . . . . . . . . . . . . . . . . 62 4.2.4 Connectivity recovery . . . . . . . . . . . . . . . . . . . . 62 4.2.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    4.3 Future directions of work . . . . . . . . . . . . . . . . . . . . . . 63

    5 Acknowledgements 65

    References 66

    2

  • 1 Introduction

    1.1 Overview

    This section (1) begins with an overview of its contents (1.1), then proceeds (1.2) by describing the background and the underlying set of biological questions that the techniques addressed in this thesis seeks to answer. Next follows a very brief introduction to microarray technology (1.3) and to pattern recognition (1.4) as they relate to this problem. The goals (1.5) of the project are described, and finally (1.6) the report as a whole is outlined.

    1.2 Background

    In the so-called ”post-genomic” era following the completion of the Human Genome Project, biology has increasingly shifted towards holistic or systemic approaches, giving rise to the buzzwords of genomics, proteomics, peptidomics, transcriptomics etc. These are all variations on a central theme; as we now have access to almost every single component of the (human) biological make-up, and increasingly powerful methods of measuring several quantities at once, we may begin to study on a large scale how these components interact. A core idea here is that biological responses cannot be adequatly modelled without considering the setting in which they take place; understanding of a disease or molecular mechanism does not come from studying a single gene, but rather a set of interacting genes that together with the surrounding circumstances form a whole.

    Another key theme to the post-genomic era is that after the human genome se- quencing provided a map of our genetic make-up, the post-genomic research will provide an understanding of that map. Dete