30
In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints Caldu-Primo JL 1,2 , Verduzco-Martínez JA 3 , Alvarez-Buylla ER 1,2 , Davila-Velderrain J 4,5 * 1 Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, México, D.F., 04510, Mexico. 2 Centro de Ciencias de la Complejidad (C3), Universidad Nacional Autónoma de México, Cd. Universitaria, México, D.F., 04510, Mexico. 3 Departamento de Biologia Celular y Genetica, Facultad de Ciencias Biologicas, Universidad Autonoma de Nuevo Leon, San Nicolas de los Garza, Nuevo Leon, Mexico. 4 MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA. 5 Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA. Corresponding author: *e-mail: [email protected] . Abstract Gene essentiality estimation is a popular empirical approach to link genotypes to phenotypes. In humans, essentiality is estimated based on loss-of-function (LoF) mutation intolerance, either from population exome sequencing (in vivo) data or CRISPR-based in vitro perturbation experiments. Both approaches identify genes presumed to have strong detrimental consequences on the organism upon mutation. Are these genes functionally distinct and constrained by having key roles? Do in vivo and in vitro estimations equally recover these constraints? To address these questions, here we integrated disparate genome-scale datasets and compared structural, functional, and evolutionary features of essential genes versus genes with extremely high mutational tolerance and proteome expectation. We found that essentiality estimates do recover functional constraints. However, the organismal or cellular context of estimation leads to functionally contrasting properties underlying the constraint. Our results suggest that depletion of LoF mutations in human populations effectively captures developmental, organismal-level functional constraints not experimentally accessible through CRISPR-based screens. Finally, we identify a new set of genes ( OrgEssential), which are intolerant of LoF mutation in vivo but highly tolerant in vitro. These genes drive observed functional constraint differences and have an unexpected preference for nervous system expression. . CC-BY-ND 4.0 International license a certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under The copyright holder for this preprint (which was not this version posted October 22, 2019. ; https://doi.org/10.1101/814855 doi: bioRxiv preprint

In vivo and in vitro human gene essentiality estimations capture … · In vivo and in vitro human gene essentiality estimations capture contrasting functional constraints Caldu-Primo

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • In vivo and in vitro human gene essentiality estimations capture contrastingfunctional constraintsCaldu-Primo JL1,2, Verduzco-Martínez JA3, Alvarez-Buylla ER1,2, Davila-Velderrain J4,5*

    1Instituto de Ecología, Universidad Nacional Autónoma de México, Cd. Universitaria, México, D.F., 04510, Mexico.2Centro de Ciencias de la Complejidad (C3), Universidad Nacional Autónoma de México, Cd. Universitaria, México, D.F.,04510, Mexico.3Departamento de Biologia Celular y Genetica, Facultad de Ciencias Biologicas, Universidad Autonoma de Nuevo Leon,San Nicolas de los Garza, Nuevo Leon, Mexico.4MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, Massachusetts 02139, USA. 5Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.Corresponding author: *e-mail: [email protected].

    Abstract

    Gene essentiality estimation is a popular empirical approach to link genotypes to phenotypes. Inhumans, essentiality is estimated based on loss-of-function (LoF) mutation intolerance, either frompopulation exome sequencing (in vivo) data or CRISPR-based in vitro perturbation experiments. Bothapproaches identify genes presumed to have strong detrimental consequences on the organism uponmutation. Are these genes functionally distinct and constrained by having key roles? Do in vivo and invitro estimations equally recover these constraints? To address these questions, here we integrateddisparate genome-scale datasets and compared structural, functional, and evolutionary features ofessential genes versus genes with extremely high mutational tolerance and proteome expectation.We found that essentiality estimates do recover functional constraints. However, the organismal orcellular context of estimation leads to functionally contrasting properties underlying the constraint. Ourresults suggest that depletion of LoF mutations in human populations effectively capturesdevelopmental, organismal-level functional constraints not experimentally accessible throughCRISPR-based screens. Finally, we identify a new set of genes (OrgEssential), which are intolerantof LoF mutation in vivo but highly tolerant in vitro. These genes drive observed functional constraintdifferences and have an unexpected preference for nervous system expression.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    mailto:[email protected]://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Introduction

    Understanding the patterns and phenotypic consequences of genotypic alterations is a fundamentalproblem in evolution and development 1–4. A popular empirical approach to link genotypes tophenotypes is by estimating the degree of essentiality of a gene. A gene is considered “essential” if itis required to sustain life in cells or whole organisms, and this requirement is often estimated byexperimental perturbations 5,6. Thus, the study of essential genes was originally conducted onprokaryotes, due to their accessibility to genetic manipulation. More recently, however, geneessentiality has been estimated in multicellular eukaryotes, including mammals 7. Despite theabsolute character of the “essential” gene denomination, data from multiple studies in modelorganisms have shown strong context dependency: genes are required for survival or not dependingon environmental conditions and developmental stages 5,8,9. Sequencing technologies and gene editing techniques enabled the estimation of gene essentiality inhumans 6. The problem has been addressed following two approaches. On one hand, systematictesting of gene silencing effects on human cell cultures identifies genes that affect cell viability oroptimal fitness upon perturbation 10–13. On the other hand, population-level statistical estimates ofunexpected mutational depletion identifies genes presumed to be subjected to functional constraints14. Both approaches aim at ranking genes according to their effect on the organism (or cell) upon loss-of-function mutations. However, given the context dependency of gene essentiality, and thedifferences in the organizational level at which the effects of genotypic changes are assessed, theparallels of the two types of essentiality approximations are unclear.

    In vitro screens of mutation tolerance identify genes with an immediate effect on cell proliferation andviability; consequently, the corresponding essentiality estimates depend on the specific cell line andculture conditions being tested. In addition, cell culture experiments do not capture developmentaland functional constraints intrinsic to the organism. In vitro estimation of gene essentiality is thusinevitably tailored to cell viability and/or fitness. On the other hand, ”in vivo” measures of mutationaltolerance estimated from population-level genetic variation score genes according to theprevalence/depletion of loss-of-function (LoF) mutations. Genes showing mutational constraint areassumed to be consistent with a scenario where purifying selection filters out protein alteringmutations with detrimental effects, thus eluding fixation within the population. In this sense, in vivoestimates of mutational tolerance are considered a proxy for the effect of mutations on organismalfitness. Such effect, in turn, mirrors to some extent the notion of essentiality in the context ofpopulation dynamics 5. Both types of estimations (in vitro and in vivo) have been discussed within thecontext of human gene essentially, nonetheless 6. Hereafter we use the terms cellular viability (CV)and organismal fitness (OF) to refer to the context at which human gene essentiality is estimated bymeans of either in vitro perturbation-based or in vivo population-based mutation tolerance estimation,respectively.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/HK7h+gSn5+P7IN+94GVhttps://paperpile.com/c/r0zWIQ/Lkw4https://paperpile.com/c/r0zWIQ/LCRthttps://paperpile.com/c/r0zWIQ/o6hNhttps://paperpile.com/c/r0zWIQ/mCGT+vimS+j1Ib+PY8jhttps://paperpile.com/c/r0zWIQ/Lkw4https://paperpile.com/c/r0zWIQ/LCRt+zxss+FNhohttps://paperpile.com/c/r0zWIQ/VayShttps://paperpile.com/c/r0zWIQ/LCRt+Lkw4https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Notably, in both the CV and the OF context, a subset of mutational intolerant genes has beenobserved, leading to the idea of defining an “essential genome” containing genes that do not toleratemutations, and a “dispensable genome” including mutation-tolerant genes 6,14. Intolerant genes(essential) are commonly of interest due to their potential detrimental effect on phenotype anddisease association; however, highly tolerant genes (non-essential) might be relevant for evolvability,due to the plasticity they confer to the system at longer time-scales -- for example, as sources ofcryptic genetic variation 4 or possible editable links that integrate subsystems 15. Hereafter we will usethe terms tolerant and intolerant to refer to human essential or non-essential genes as estimated bythe degree of tolerance to LoF mutations. Despite the potential functional relevance of tolerant and intolerant genes, an understanding of themolecular determinants that discriminate the two groups has been only partially explored for humans6. Moreover, an understanding of the dependency of molecular determinants of gene essentiality onthe differences between the operational context of estimation (CV, in-vitro vs. OF, in-vivo) is lacking.To address these problems, here we systematically defined groups of human tolerant and intolerantgenes and performed an integrative and comparative analysis of the structural, functional, andevolutionary features associated with gene essentiality. We analyzed the particularities andcommonalities between genes that show extreme (in)tolerance to LoF mutation in a given context:CV, OF, or both (Figure 1).

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/Lkw4https://paperpile.com/c/r0zWIQ/fcJFhttps://paperpile.com/c/r0zWIQ/94GVhttps://paperpile.com/c/r0zWIQ/o6hN+Lkw4https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Results

    Context-dependent mutational tolerance categorises human genes

    To define genes that show extreme (in)tolerance to detrimental mutation, as estimated from patternson mutational depletion in exome sequencing data (OF) or CRISPR-based cell culture perturbationexperiments (CV) data, we first calculated for each gene and context a consensus tolerance scoremeasured by the average rank across a total of 6 and 10 previously proposed tolerance measures forOF or CV, respectively. A high correlation among pairs of individual measures within each context(OF or CV) justifies the definition of the proposed consensus score (Figure 2a). We then defined a setof mutation-intolerant (tolerant) genes based on the distribution of the computed scores, using the80th percentile of the distribution as arbitrary cut-off value. This choice approximates the previouslyproposed cut-off of pLI>0.9 used in 14 when introducing the widely used LoF-intolerance metric, pLI. Inaddition, we defined a contrasting, similar-sized set of mutation-tolerant (intolerant) genes byselecting the 20% bottom ranked genes of the consensus score distribution (Figure 2b). The size ofthe resulting gene sets are 3,028 genes for both intolerant and tolerant OF groups and 3,139 genesfor both intolerant and tolerant CV groups. Exploring the intersection between gene sets, we identified714 tolerant and 771 intolerant genes with consistent tolerance behavior across contexts. In contrast,we identified 918 inconsistent genes, whose tolerance behavior depends on the context (OF/CV)(Figure 2c).

    Structural and functional constraints predict mutational tolerance classes

    We next tested whether the different gene groups are distinctively associated with structural andfunctional molecular network properties. Following previous studies that point to a central role ofessential genes in the interactome 16–19, we first asked if (in)tolerant genes have contrasting positionsin the interactome and whether such pattern is consistent in genes affecting both organismal andcellular fitness. From a network perspective, we hypothesized that (1) a core constrainedneighborhood exists within the human interactome, which is separated from a more peripheral,scattered layer formed by mutationally tolerant genes; and (2) that, given the central role of theintolerant neighborhood, perturbations affecting the corresponding genes are more likely to conferstructural fragility to the entire system.

    By measuring network features associated with node centrality and aggregation (Figure 3a, seeMethods), we confirmed that intolerant genes estimated by either effect on OF or CV, both have asignificantly higher aggregation and centrality in the interactome, while tolerant genes consistentlyshow the opposite behavior: loose aggregation and peripheral positioning (Figure 3c, SupplementaryFigure 1). To test the vulnerability of the interactome to perturbations targeting tolerant or intolerantgenes, we analyzed the network’s behavior as a function of the progressive removal of nodes in

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/bB82+mkdd+s9ri+Y9BBhttps://paperpile.com/c/r0zWIQ/o6hNhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • decreasing order of mutational intolerance score (Methods). This analysis further confirmed that thereis a strong association between mutational patterns and the structural properties of the interactome,revealing that intolerant gene removal produces a higher structural damage than random noderemoval (Figure 3b).

    The previous data suggests that global structural properties of the interactome are related tomutational tolerance classes. We reasoned that there might also be molecular properties suggestiveof functional constraint that similarly discriminate tolerance groups. By analysing the degree ofestimated haploinsufficiency (GHIS), protein intrinsic disorder (ID), expression breadth, andspecificity; we similarly found contrasting patterns between tolerant and intolerant genes. Intolerantgenes are more likely to be haploinsufficient, to have more intrinsic disorder in protein structure, andto be more broadly expressed across tissues; while tolerant genes show exact opposite behavior(Figure 3d, Supplementary Figure 1). Together, our results confirm that mutationally tolerant andintolerant genes can be consistently discriminated by features indicative of structural and functionalconstraints, irrespective of the context in which tolerance is estimated (OF or CV).

    Evolutionary history of tolerance gene classes

    Gene essentiality and interactome centrality have been previously related to evolutionaryconservation, with a tendency for topologically central and essential genes to be conserved(evolutionarily old) 20,21. We studied whether the tolerance gene groups identified here similarly havecontrasting evolutionary conservation patterns and whether associations are consistent in genesaffecting organismal or cellular fitness. We analyzed two features of evolutionary conservation:orthology and paralogy.

    First, we evaluated the degree of gene conservation by calculating a gene conservation index (CI)that measures the proportion of species in which a human gene has a one-to-one ortholog (Figure4a). We considered a total of 187 species from 7 taxonomic groups (Archaea, Bacteria, Protozoa,Fungi, Plants, Invertebrates, Vertebrates) (Methods). Intolerant genes are significantly moreconserved than tolerant genes in both OF and CV contexts (Figure 4b, Supp. Fig. 2). Notably,however, the conservation of intolerant genes that affect cell viability is considerably higher than thatof genes affecting organismal fitness. To further explore the difference in conservation, we calculatedthe proportion of genes with a one-to-one ortholog in each taxonomic group, for every tolerancegroup (Figure 4c). This analysis revealed a clear difference between CV and OF gene groups. Inparticular, CV intolerant genes are significantly more represented in every taxonomic group except forVertebrates, while the behavior of OF intolerant genes does not deviate from random expectations,presenting only a marginal enrichment among Plants and Fungi and depletion in Bacteria (Figure 4c).This indicates a deep conservation of intolerant genes affecting cellular viability in humans andpossibly reflecting the relevance of these genes in core cell-autonomous functions.

    We next analysed the association between tolerance groups and copy number variation, consideringthe number of gene duplication events represented in each gene group. The number of duplicationevents is consistently depleted among intolerant genes in both OF and CV, suggesting furtherevolutionary constraint in mutational events for genes that are also intolerant to deleterious pointcoding mutations. In contrast, tolerant genes show overrepresentation of duplication events, but only

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/XepP+X06Ghttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • in the OF context, which is consistent with a scenario of relaxed selection pressure in these genes(Figure 5d). The evolutionary pattern of reduced duplication events in intolerant genes is alsoconsistent with a reduction in gene family size distribution for intolerant relative to tolerant genes(Supp. Fig. 2). Together, these results confirm that there is a marked difference in the evolutionaryhistory of tolerant vs intolerant genes. They also uncover clear differences in genes that affect cellvariability or organismal fitness.

    Molecular classes predict context-dependent mutational tolerance

    Our previous results revealed differences in the evolutionary history of tolerance gene classes thataffect CV or OF, suggesting that the deep conservation of intolerant genes with effect in cellularviability possibly stems from their role in core unicellular functions. To further explore this inferenceand unravel the differences between the OF and CV gene sets, we analysed the differentialoverrepresentation of tolerance gene groups within protein classes and gene ontology terms.

    We again found differences in the protein class distribution of (in)tolerant genes, depending on thecontext of influence CV or OF. (Figure 5a). Protein kinases, receptors, transcription factors (TF), andcancer associated proteins show an unexpected contrasting pattern in CV vs OF context, withenrichment of OF intolerant genes but depletion of CV intolerant genes, and a reverse pattern fortolerant genes: enrichment for CV and depletion for OF. Thus these four protein categories, whichtogether have a key role in developmental processes and associated signaling pathways, tend as agroup to not tolerate LoF mutations in the human population, yet are not strongly required for humancell viability.

    Gene ontology enrichment analysis further supports the difference between tolerance groupsdepending on CV or OF context, with distinct over-represented term (Figure 5b). Consistent with theprevious result, OF intolerant genes show over-representation of gene ontology terms related withdevelopment and cell communication, such as transcription regulation, kinases, and synapse. CVintolerant genes on the contrary, show over-representation of core cellular processes related to cellenergetics, replication, transcription, and translation. Consistent with contrasting functional propertiesof mutational gene tolerance classes depending on context, human genes that tolerate LoF mutationsin cell culture are over-represented in processes related to cell adhesion and communication -- i.e., insimilar processes that, in sharp contrast, do not tolerate mutations in the human population context(OF) (Figure 5b).

    Contrasting tissue-specificity and developmental activity of (in)tolerant genes

    While global network structural and molecular functional properties provide evidence of consistentstrong functional constraint on genes that do not tolerate LoF mutations in either human populations(OF context) or in human culture experiments (CV context), more detailed analyses of evolutionaryhistory, protein classes, and gene ontology terms suggest that the two contexts (OF and CV) capturedistinct functional roles of intolerant genes in the organism. To further explore the hypothesis ofcontrasting functional constraints, we gathered and interrogated data informative of developmentalinvolvement and tissue-specific expression.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • First, we evaluated the distribution of developmental stages in which genes are first expressed.Intolerant genes are expressed earlier in development than tolerant genes in both OF and CVcontexts (Figure 6a), with at least 98% of the genes already expressed in prenatal stages. On thecontrary, tolerant genes are depleted in prenatal stages and preferentially expressed after birth. Thissimilar pattern of early expression is consistent with the high involvement of both TF mediatedspecification, cell-attachment, and core cellular replication in embryogenesis and organogenesis.However, when considering curated gene sets involved in specific developmental processes, wefound a contrasting pattern between OF and CV gene sets, consistent with previous results. In OFcontext, intolerant genes are over-represented in every developmental category, while tolerant genesare depleted in all categories. In sharp contrast, in CV context, tolerant genes are over-represented indevelopmental processes, while intolerant genes are depleted (Figure 6b). This result furthersupports the view that genes intolerant of LoF mutations in the human population are preferentiallyinvolved in organismal development, and that the same constraint is not captured by in vitro screensof gene essentiality.

    In addition to developmental-stage associations, we next explored whether (in)tolerant gene classesof CV or OF context recover distinct preferential behavior in adult tissues. We used RNA-seq datafrom the Genotype-Tissue Expression (GTEx) project 22 to analyze patterns of tissue-specificexpression. First we performed gene expression specificity analysis to compute for each tissue andgene a quantitative measure of specific expression relative to all other tissues (Methods). Then, weused the specificity values to estimate the degree to which a gene tolerance group showsunexpectedly high preferential expression in a given tissue relative to random expectation, resulting az-score. We performed these calculations independently for each tissue and tolerance group, and foreach context (CV or OF). We again found both common and particular patterns of behavior amongOF and CV contexts. We found consistent opposite behavior in tissue preference in tolerant vsintolerant genes: in both contexts tissues with preferential expression of intolerant genes showdepleted preferential expression of tolerant genes, and vice versa (Figure 6c).

    Next, to contrast the tissue-preference behavior of tolerance groups in each context, we ordered thetissues by their relative preference to preferentially express intolerant vs tolerant genes. Wemeasured this by the log ratio of the rank of the tissue in intolerant preferential expression relative toits rank in tolerant preferential expression. Using this approach, tissues that tend to preferentiallyexpress intolerant genes and not tolerant genes appear on top (Figure 6c). Notably, this analysisuncovered a contrasting behavior between OF and CV contexts: tissues from the central nervoussystem as a group (n=12 brain regions) show the largest relative preference of intolerant geneexpression in the OF context and the least preference in the CV context. In other words, we foundthat the adult human brain tends to preferentially express genes that do not tolerate LoF mutations inthe human population and to not preferentially express both OF tolerant genes and genes required forcell viability (CV intolerant genes) (Figure 6c). The other tissues do not show any clear patterndistinguishing CV and OF measures. We corroborated the reproducibility of these results by using

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/XLaahttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • independent gene expression reference data form human protein atlas23 (Supplementary Figure 3).

    Genes with inconsistent mutational tolerance behavior

    The contrasting patterns found in OF versus CV gene groups suggests that genes with aninconsistent tolerance behavior across contexts might be driving the observed functional differences.To test this hypothesis, and to identify specific genes that capture the differential functionalconstraints accessible through population-based vs CRISPR-bases essentiality estimations, wedefined consistency gene classes and repeated all association analyses, using this time only the newgroups (Fig. 7a). We identified a group of 567 genes that do not tolerate LoF mutations in humanpopulations, but that are not required for survival in human cells (organismal intolerant but cellulartolerant genes, OI-CT). Similarly, we identified a group of 351 genes that are cellular intolerant butorganismal tolerant (CI-OT). Consistent with our previous results, OI-CT genes include major TFregulators of early cell lineage specification (e.g., SOX1, PAX6, and OLIG1), members of signalingpathways regulating these TFs (NOTCH1, NOTCH3, SMAD1), and genes encoding proteins relevantfor non-cell autonomous physiological integration (e.g. ion channels CACNA1C, CLCN3, GABRA1)(Supplementary Table 1).

    As expected, association analyses revealed clear differences in these two inconsistent groups (OI-CTand CI-OT), in particular with respect to categories with contrasting behavior in OF versus CV genesets. OI-CT genes are associated with gene ontology terms related to transcriptional regulation andneuronal communication, while CI-OT genes are associated with unicellular functions (Fig. 7b).Similarly, protein classes with contrasting behavior in OF vs CV (i.e., TFs, receptors, and kinases) arehighly over-represented in OI-CT (Figure 7c). Notably, the same enrichment patterns are notobserved when considering (in)tolerant genes with consistent behavior in both OF and CV contexts.

    We similarly identified discrepancies in the evolutionary history of the new gene groups. OI-CT geneshave less orthologs than expected within every taxonomic group except for Vertebrates, suggestingthat many of these genes emerged relatively late in evolution, in pair with the emergence ofVertebrates (Figure 7d). To further explore this observation, we calculated the number of genes ineach tolerance gene subgroup that have one-to-one orthologs only within Vertebrates, and not inother taxonomic groups. This analysis confirmed that more than 80% of OI-CT genes are exclusive tothe vertebrate branch, in sharp contrast with both consistently intolerant genes and with CI-OT genes(Figure 7e). Lastly, OI-CT genes are also highly over-represented within every developmentalprocess considered (Fig. 7f) and are preferentially expressed in adult human brain tissues (Fig. 7g).The evolutionary and functional patterns associated with the OI-CT group suggests that the genes inthis group are relevant for organismal physiology.

    Altogether, these results indicate that the genes with inconsistent mutational tolerance behavioracross OF and CV contexts drive the differences observed in OF versus CV gene groups. That is,because specific genes do not tolerate LoF mutations in the human population but are not required

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/wLDNhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • for human cell viability, and, in contrast, other genes are required for the latter but do tolerate LoFmutations across the population; in vivo or in vitro overall estimations of human gene essentialitycapture contrasting functional constraints overall.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Discussion

    We examined the degree to which measures that rank human genes according to their degree oftolerance to LoF mutations capture functional constraints. We considered tolerance estimationsbased on either in vivo exome-based population data or in vitro CRISPR-based perturbationexperiments. To interpret evidence of functional constraint, we integrated genome-wide datasetsrelated to gene function; including structural, functional, and evolutionary features; and comparedessential genes versus genes with extremely high mutational tolerance, considering randomproteome expectation. Our results indicate that intolerant genes (1) form a core networkneighborhood in the human protein interactome, (2) are enriched in molecular properties suggestiveof functional constraint, (3) are evolutionarily conserved, and (4) show preferential expression inspecific tissues and developmental stages (Figure 8). The molecular and network properties thatconsistently discriminate intolerant from tolerant genes suggest that essentiality estimates based onmutational tolerance inference do recover functional constraints, irrespective of estimation context(OF or CV). However, we also found differences in the discriminatory properties of genes dependingon whether their tolerance to mutation was estimated at the organismal or cellular level.

    Consistent with previous observations 6,14, we found that structural network properties consistentlydiscriminate tolerance/essentiality classes. Intolerant genes are central and localized in the humaninteractome, while tolerant genes are dispersed in the periphery. This relative organization predicts apreferential vulnerability of the cell to intolerant gene failure, a principle that we demonstrated here bysimulated network perturbation analysis. Both centrality and perturbation results provided resultsconsistent with the hypothesis of a dominant role of intolerant genes in influencing cell behavior.Molecularly, intolerant genes also show properties often associated with gene functional relevance.We observe a significant tendency (FDR

  • higher level of organization.

    The idea that the context at which mutational tolerance is estimatimated discriminates genesoperating at different levels of organization is reinforced when considering the differences we foundfor the enrichment of molecular classes among tolerance groups. The ontology terms associated withCV intolerant genes (mitochondria, RNA processing, ribonucleoprotein and cell cycle) relate to corefunctions required for cell survival, such as cellular metabolism and replication. On the contrary, CVtolerant genes relate to intercellular adhesion and communication (glycoprotein, cell junction, proteinkinase). In extreme contrast with the cellular level, OF intolerant genes are enriched in functionalfeatures key to multicellularity, such as organismal development and cell-cell communication; asevidenced by over-representation of transcriptional regulators, synapse genes, and proteins of classkinase, receptor, and transcription factor. These results suggest that OF measures recover functionalconstraints stemming from multicellularity and organismal regulation, a property not readily capturedby CV estimations. Consistent with this view, by considering curated gene annotations fordevelopmental processes, we found a contrasting pattern between CV and OF genes. Genesinvolved in developmental processes are enriched for OF intolerant genes and CV tolerant genes,and depleted in OF tolerant genes and CV intolerant genes.

    This contrasting behavior is also present in the tissue-specificity of gene expression. Specifically, wefound that OF intolerant and CV tolerant genes are preferentially expressed in the adult human brain,in contrast with the underexpression of both OF tolerant genes and genes required for cell viability(CV intolerant genes). The overrepresentation of OF intolerant genes in the adult brain requires acareful explanation on their specific function in the organism and why is their sequence somutationally constrained, which is beyond the scope of this paper. From the current results wespeculate that, in addition to multicellular functional constraints, the depletion of LoF mutationsestimated in human populations might capture constraints stemming from functional properties ofspecies-specific relevance, such as higher cognition and associated traits grounded on thecomplexified human brain 25.

    Although essentiality estimates from both cellular and organismal contexts do recover functionalconstraints, we found that a subgroup of 567 genes estimated as essential at the organismal level,yet nonessential at the cellular level, is responsible for the contrasting functional patterns foundbetween OF and CV intolerant genes. These genes, which we refer to as (OrgEssential), areenriched in developmental processes, transcriptional regulation, and neuronal communication; andare preferentially expressed in the human brain. Despite being evolutionary younger than otheressential genes, sharing one-to-one orthologs mainly with Vertebrates, OrgEssential genes seem tohave developed a central role in the organism, providing an example of how during evolution novelgenes can acquire essentiality properties by acting at levels of biological organization beyond corecell functionality.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/wXjDhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Methods

    Gene essentialityHuman gene essentiality estimations based on measures of tolerance to LoF mutations were takenfrom 6. These include the following scores based on the Exome Aggregation Consortium (ExAC)sample of 60,706 human exomes 14: residual variation intolerance score (RVIS) 26, EvoTol 27,missense Z-score 28, LoFtool 29, probability of haploinsufficiency (Phi) 30, probability of loss-of-functionintolerance (pLI) 14, and selection coefficient against heterozygous loss-of-function (shet) 31; andscores based on cell culture perturbation-based experiments performed in KBM7, Raji, Jiyoye,HCT116, and K562 cell lines in 12; the KBM7 cell line in 10, and RPE1, GBM514, HeLa, and DLD1 celllines in 32.

    Intrinsic Structural DisorderDisorder predictions for each protein in the human proteome were generated at residue resolutionusing IUPred 33. A gene intrinsic disorder score was calculated by averaging the predicted residuescores over the corresponding protein. Scores range from 0 to 1, with higher scores indicating ahigher propensity toward intrinsic disorder.

    HaploinsufficiencyA predictive genome-wide haploinsufficiency score (GHIS) was obtained from 34.

    Gene expression specificityReference RNA-seq data for human tissues was downloaded from the Genotype-Tissue Expressionproject (GTEx.v7) 22 (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5214/), and theHuman protein atlas (HPA) 23 (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-2836/). TheGTEx dataset includes 53 tissues profiled from 961 donors. The HPA dataset includes 32 tissuesprofiled from 122 control subjects. For both datasets the median expression over replicates wasconsidered as the expression value of the tissue. Expression breath values for each gene werecalculated as the fraction of tissues in which the gene is expressed, using an arbitrary cut-off value of2 RPKM to determine expression. Expression specificity was measured using the Tau statistic on thesame tissue-median matrix as computed in 35.

    Protein classificationsProteins were classified as Transcription factors (TF), Transporters, Receptors, Enzymes, Peptidase,Kinase, Cancer-related, and RNA binding proteins (RBP) based on combined curated annotationsextracted from the Human Protein Atlas(https://www.proteinatlas.org/humanproteome/proteinclasses) 23, TF reference in 36, RBPs referencefrom 37, and transporters and receptors reported in 38.

    Evolutionary conservationComprehensive gene homology information for each human gene with respect to 187 species wasextracted from Ensembl comparative genomics resources 39. Only one-to-one orthology relationshipswere considered to build a binary gene-species matrix. A gene conservation index was calculated foreach human gene as the fraction of species having a corresponding ortholog 39. Gene duplications

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/1bvChttps://paperpile.com/c/r0zWIQ/1bvChttps://paperpile.com/c/r0zWIQ/VXJThttps://paperpile.com/c/r0zWIQ/Y68Zhttps://paperpile.com/c/r0zWIQ/FkLuhttps://paperpile.com/c/r0zWIQ/wLDNhttps://paperpile.com/c/r0zWIQ/JYrFhttps://paperpile.com/c/r0zWIQ/wLDNhttps://paperpile.com/c/r0zWIQ/XLaahttps://paperpile.com/c/r0zWIQ/edPPhttps://paperpile.com/c/r0zWIQ/Hm1Dhttps://paperpile.com/c/r0zWIQ/FshEhttps://paperpile.com/c/r0zWIQ/mCGThttps://paperpile.com/c/r0zWIQ/j1Ibhttps://paperpile.com/c/r0zWIQ/pvGghttps://paperpile.com/c/r0zWIQ/o6hNhttps://paperpile.com/c/r0zWIQ/tylZhttps://paperpile.com/c/r0zWIQ/wtlIhttps://paperpile.com/c/r0zWIQ/6Xmwhttps://paperpile.com/c/r0zWIQ/3cPOhttps://paperpile.com/c/r0zWIQ/lOiuhttps://paperpile.com/c/r0zWIQ/o6hNhttps://paperpile.com/c/r0zWIQ/Lkw4https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • data was extracted from 40,41.

    Developmental annotations Developmental expression classes and developmental process gene annotations were downloadedfrom the Online Gene Essentiality database (OGEE.v2) at (http://ogee.medgenius.info/downloads/) 41.

    Mutational tolerance gene group definitionWe define consensus organismal fitness and cell viability mutational tolerant and intolerant groups byfirst independently calculating the gene average rank across n=6 measures of in vivo tolerance toLoF mutation for OF context, and n=10 in vitro measures for CV context. In all measures, as reportedin 6, values increase with the degree of intolerance to mutation: intolerant genes have high values.The group of Tolerant genes was defined as the bottom 20% genes with consensus score rank(lowest constraint), and Intolerant genes as the top 20% (highest constraint). 4 additional subgroupswere defined based on the overlaps between intolerant and intolerant groups (Figure 2c). Theresulting subgroups are: consistent tolerant genes (n=714 genes classified as tolerant in bothconditions), consistent intolerant genes (n=771 genes classified as intolerant in both conditions),organismal intolerant but cellular tolerant genes (OI-CT) (n=567 genes classified as OF intolerant andCV tolerant), and cellular intolerant but organismal tolerant genes (CI-OT) (n=351 genes classified asOF tolerant and CV intolerant).

    Analysis of gene set aggregation and centrality in the PPI networkA reference human protein-protein interaction (PPI) network was obtained from 42.

    Gene set aggregation in the PPI network was quantified using three complementary approaches:module size of the gene set subgraph, clustering enrichment and pairwise distance distribution.Subgraph module size was calculated by counting the number of nodes (Sc) and edges (Cc) of thelargest connected subgraph formed by proteins belonging to a given gene set. Clustering enrichmentwas measured using the SANTA method 43. Pairwise shortest distance between every protein pairwas measured using the igraph R package 44. The distance distribution of each gene set, wascharacterized by calculating its minimum (Ds) and mean (Dsm) distances. Network centrality wascalculated using three complementary measures: degree, betweenness, and coreness. Thesemeasures were quantified for every node using the igraph R package 44. For each gene set, networkaggregation and centrality enrichment was calculated comparing the gene set mean with a randomdistribution obtained from 10,000 randomly sampled gene sets of the same size, resulting in a zscore.

    Network structural robustness analysisNetwork robustness was characterized by measuring the effect on network structure after thetargeted removal of nodes according to the OF and CV mutational tolerance ranking, and comparingit with random expectation. Network structural response was assessed calculating the number ofnodes (Sf) and edges (Cf) in the perturbed giant connected component after removing a fraction ( f) ofnodes relative to the unperturbed measures. For each measure, we built a random expectation byremoving fractions from 0.01 to 0.99 of randomly selected nodes, repeating this procedure 10,000times.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/4QUhhttps://paperpile.com/c/r0zWIQ/4QUhhttps://paperpile.com/c/r0zWIQ/vRt9https://paperpile.com/c/r0zWIQ/zebWhttps://paperpile.com/c/r0zWIQ/Lkw4https://paperpile.com/c/r0zWIQ/jB6Uhttps://paperpile.com/c/r0zWIQ/emTq+jB6Uhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Enrichment of gene set functional properties and protein class distributionGene set enrichment in gene functional features (GHIS, ID, expression specificity, expressionbreadth, earliest stage expression, and developmental process annotation) was calculated bycomparing the gene set mean with the random expectation obtained from measuring the givenfeature in 10,000 randomly sampled gene sets of the same size, resulting in a z score. Deviation inprotein class distribution among gene sets was calculated by determining the percentage of genesbelonging to each protein class and comparing it to the random expectation built from 10,000randomly sampled gene sets of the same size, resulting in a z score.

    Gene set evolutionary analysisFor each gene set, deviation of conservation index and number of gene duplications were calculatedby a z test with a random distribution obtained from 10,000 randomly sampled gene sets of the samesize. The gene-species ortholog matrix was reduced by classifying species by taxonomic groupresulting in: Archaea (21 species), Bacteria (99 species), Protozoa (16 species), Fungi (8 species),Plants (9 species), Invertebrates (24 species), and Vertebrates (10 species). Using the reducedorthologs matrix, the percentage of orthologs in each taxonomic group was calculated for each geneset.

    Tissue specific and expression enrichment We used RNA-seq data from the Genotype-Tissue Expression Project (GTEx) (51 tissues) 22 andfrom the Human Protein Atlas (HPA) (32 tissues) 23 to measure tissue specific differential expressionof every gene. Gene differential expression was calculated using voom and lmFit functions from thelimma package in R 45. With the genes-tissue differential expression matrix, we measured the degreeof preferential enrichment of each gene tolerance group in every tissue. To do this we compared themean gene differential expression of every group to random expectation, calculated from 10,000randomly sampled gene sets, and computed the differential expression deviation (z score) of eachtolerance group in every tissue.

    Gene set enrichment analysisGene ontology enrichment analysis was performed using DAVID(https://david.ncifcrf.gov/summary.jsp) 46,47.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://paperpile.com/c/r0zWIQ/Fw6j+2S1Bhttps://paperpile.com/c/r0zWIQ/Ntvvhttps://paperpile.com/c/r0zWIQ/wLDNhttps://paperpile.com/c/r0zWIQ/XLaahttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Figures

    Figure 1. Overview. a, Mutational tolerance scores used to categorize human (in)tolerant genes. b,Consensus mutational tolerance score derivation (mean rank distribution) and corresponding (in)tolerant genesets. c-f, Features considered as potential determinants of mutational constraint and gene essentiality,including structural and functional features (c), evolutionary (d), protein functional characterization (e), andexpression enrichment (f).

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Figure 2. Mutational tolerant and intolerant groups definition. a, Correlation plot of mutational tolerancemeasures. b) Mutational tolerance measures mean rank distribution. Tolerant (blue) and intolerant (red) genesets are defined as the bottom and top 20%, respectively. c, Venn diagram of the defined gene sets. Groupintersections are highlighted to represent tolerant (blue) and intolerant (red) genes found consistently in bothOF and CV groups, and inconsistent gene (violet) found in contradictory groups depending on the context.

    RVISLofToolmissense.Zs_hetPhipLiWang.RajiWang.JiyoyeWang.K562Wang.KBM7Blomen.KBM7Hart.HeLaHart.HCT116Hart.RPE1Hart.GBMHart.DLD1

    0 1

    correlation

    Organism

    al fitness

    Cell

    viability

    a)

    b)

    100

    80

    50

    20

    00 5000 1000 15000

    % g

    enes

    mean rank

    c)

    *

    *

    1858 1963

    7140

    0

    0

    0 0

    0

    0

    1690

    567

    771

    351

    2017

    TolerantIntolerant *Consistent InconsistentOrganismal fitnessCell survival

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Figure 3. Structural and functional constraints. a, Network features measured in the PPI. b, Structuralrobustness of the PPI after random removal of nodes (gray lines) and directed removal of genes

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • ranked by mutational intolerance score (inset shows the pattern around the 20% of nodes removal,corresponding to the intolerant gene sets). c, Deviation (z score) of every network feature.(significance: p < 0.05 = n.s., 0.001

  • Figure 5. Protein class distribution and gene ontology term enrichment. a, Percent of genes belonging toeach protein class distribution, dashed horizontal lines indicate the random expectation, deviation from expectation is shown on top of each bar. (significance: p < 0.05 = n.s., 0.001

  • Figure 6. Gene set enrichment in tissue-specific expression, temporal stage expression, anddevelopmental processes. a, Cumulative distribution of gene set percentage of genes firstexpressed by developmental stage. b, Number of genes per gene set associated with adevelopmental process. c, Expression enrichment deviation by tissue, tissues are ordered accordingto rank log ratio. Tissues highlighted with a star are part of the central nervous system. (significance:p < 0.05 = n.s., 0.001

  • Figure 7. Consistent and inconsistent subgroups analyses. a, Subgroups definitions based onthe overlap between OF and CV gene sets. b, Ontology term enrichment for each subgroup. c,Protein class distribution enrichment. d, Percent of orthologs in each taxon, color of the columnindicates subgroup, black column shows the expected values. e, Percent of genes that only haveorthologs among vertebrates. f, Enrichment of gene presence in developmental processes. g, Braintissues expression enrichment. (significance: p < 0.05 = n.s., 0.001

  • Figure 8. Results summary. Enrichment patterns of the main features found associated to genesessentiality. Top panel: Structural and functional features distinctive of tolerant/intolerant genesirrespective of the context. Bottom panel: Distinctive features that show a divergent pattern dependingon the context in which mutational tolerance is defined.

    Tissue Expression

    Conservation

    Structural

    Functional

    Con

    stra

    ints

    ass

    ocia

    ted

    to

    mut

    atio

    nal t

    oler

    ance

    Organismal fitness

    Cell viability

    Dispersed

    Peripheric

    Tolerant

    GHIS

    ID

    Breadth

    Specificity

    Organismal fitness

    Cell viability

    Central

    Aggregated

    Intolerant

    C.I.Gene

    duplications

    Protein Classes

    Kinase

    RBP

    Cancer

    TF

    ReceptorsKinase

    RBP

    Cancer

    TF

    Receptors

    Developmental genes

    C.I.Gene

    duplications

    Gen

    e fe

    atur

    es d

    epen

    dent

    on

    mut

    atio

    n to

    lera

    nce

    cont

    ext

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Supplementary materials forIn vivo and in vitro human gene essentiality estimations capture contrasting functionalconstraintsCaldu-Primo JL1,2, Verduzco-Martínez JA3, Alvarez-Buylla ER1,2, Davila-Velderrain J4,5*

    Figure S1. PPI structural features distribution. a, Aggregation measures: mean random distribution isshown as a violin diagram and the observed value for tolerant and intolerant gene sets as a dot coloredcorresponding to the group. b, Centrality measures: distribution of node centrality, line colors correspond to thetolerant/intolerant condition and filling color to the context. c, Boxplot showing the relationship betweencoreness and mutational tolerance.

    a) b)SC

    CC

    DS

    DSM

    c)

    coreness

    Cellular viabilitycoreness

    Organismal development

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Figure S2. Evolutionary patterns. a, Gene conservation index distribution among gene sets. b, Distribution ofthe number of genes in a family for each gene set. Black circle represents the distribution mean.

    C.I.

    1.00

    0.10

    0.01

    OF CV

    a) b)

    gene

    fam

    ily s

    ize

    100

    Exp

    ecte

    d

    Tole

    rant

    Into

    lera

    nt

    OF*** **

    CV* *

    10

    1

    Exp

    ecte

    d

    Tole

    rant

    Into

    lera

    nt

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Figure S3. HPA tissue enrichment. a, Expression enrichment deviation by tissue calculated from HPAexpression dataset. Each point represents the deviation in expression enrichment for the given geneset and condition in a particular tissue, tissues are ordered according to rank log ratio. Tissueshighlighted with a star are part of the central nervous system. b, Distribution in the transcript numberfor each tissue from the central nervous system from GTEx data. c, Same as b) for HPA data.

    List of Tables

    File S1. Tables with genes mutational group membership and analyzed features scores

    File S2. Genes one-to-one orthologs.

    −4−2024

    rank log ratio

    Cell viability

    z score

    Organismal fitnessa) b)

    c)

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    https://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • References

    1. Mayr, E. THE DETERMINANTS AND EVOLUTION OF LIFE. THE EVOLUTION OF LIVING

    SYSTEMS. Proc. Natl. Acad. Sci. U. S. A. 51, 934–941 (1964).

    2. Dobzhansky, T. & Levene, H. Genetics of natural populations; proof of operation of natural

    selection in wild populations of Drosophila pseudoobscura. Genetics 33, 537–547 (1948).

    3. Waddington, C. H. CANALIZATION OF DEVELOPMENT AND THE INHERITANCE OF

    ACQUIRED CHARACTERS. Nature 150, 563–565 (1942).

    4. Gibson, G. & Dworkin, I. Uncovering cryptic genetic variation. Nat. Rev. Genet. 5, 681–690

    (2004).

    5. Zhan, T. & Boutros, M. Towards a compendium of essential genes - From model organisms to

    synthetic lethality in cancer cells. Crit. Rev. Biochem. Mol. Biol. 51, 74–85 (2016).

    6. Bartha, I., di Iulio, J., Craig Venter, J. & Telenti, A. Human gene essentiality. Nature Reviews

    Genetics 19, 51–62 (2017).

    7. Rancati, G., Moffat, J., Typas, A. & Pavelka, N. Emerging and evolving concepts in gene

    essentiality. Nat. Rev. Genet. 19, 34–49 (2018).

    8. Chen, P., Wang, D., Chen, H., Zhou, Z. & He, X. The nonessentiality of essential genes in yeast

    provides therapeutic insights into a human disease. Genome Res. 26, 1355–1362 (2016).

    9. Liu, G. et al. Gene Essentiality Is a Quantitative Property Linked to Cellular Evolvability. Cell 163,

    1388–1399 (2015).

    10. Blomen, V. A. et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350,

    1092–1096 (2015).

    11. Wang, T. et al. Identification and characterization of essential genes in the human genome.

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    http://paperpile.com/b/r0zWIQ/HK7hhttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/zxsshttp://paperpile.com/b/r0zWIQ/VayShttp://paperpile.com/b/r0zWIQ/Lkw4http://paperpile.com/b/r0zWIQ/LCRthttp://paperpile.com/b/r0zWIQ/94GVhttp://paperpile.com/b/r0zWIQ/P7INhttp://paperpile.com/b/r0zWIQ/gSn5http://paperpile.com/b/r0zWIQ/HK7hhttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/zxsshttp://paperpile.com/b/r0zWIQ/VayShttp://paperpile.com/b/r0zWIQ/Lkw4http://paperpile.com/b/r0zWIQ/Lkw4http://paperpile.com/b/r0zWIQ/LCRthttp://paperpile.com/b/r0zWIQ/94GVhttp://paperpile.com/b/r0zWIQ/P7INhttp://paperpile.com/b/r0zWIQ/gSn5http://paperpile.com/b/r0zWIQ/HK7hhttp://paperpile.com/b/r0zWIQ/HK7hhttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/mCGThttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/FNhohttp://paperpile.com/b/r0zWIQ/zxsshttp://paperpile.com/b/r0zWIQ/zxsshttp://paperpile.com/b/r0zWIQ/zxsshttp://paperpile.com/b/r0zWIQ/VayShttp://paperpile.com/b/r0zWIQ/VayShttp://paperpile.com/b/r0zWIQ/VayShttp://paperpile.com/b/r0zWIQ/Lkw4http://paperpile.com/b/r0zWIQ/Lkw4http://paperpile.com/b/r0zWIQ/LCRthttp://paperpile.com/b/r0zWIQ/LCRthttp://paperpile.com/b/r0zWIQ/LCRthttp://paperpile.com/b/r0zWIQ/94GVhttp://paperpile.com/b/r0zWIQ/94GVhttp://paperpile.com/b/r0zWIQ/94GVhttp://paperpile.com/b/r0zWIQ/P7INhttp://paperpile.com/b/r0zWIQ/P7INhttp://paperpile.com/b/r0zWIQ/P7INhttp://paperpile.com/b/r0zWIQ/gSn5http://paperpile.com/b/r0zWIQ/gSn5http://paperpile.com/b/r0zWIQ/gSn5http://paperpile.com/b/r0zWIQ/HK7hhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • Science 350, 1096–1101 (2015).

    12. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic Screens in Human Cells Using the

    CRISPR-Cas9 System. Science 343, 80–84 (2014).

    13. Wang, T. et al. Gene Essentiality Profiling Reveals Gene Networks and Synthetic Lethal

    Interactions with Oncogenic Ras. Cell 168, 890–903.e15 (2017).

    14. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–

    291 (2016).

    15. Kirschner, M. W., Gerhart, J. C. & Norton, J. The Plausibility of Life: Resolving Darwin’s

    Dilemma. (Yale University Press, 2006).

    16. Zotenko, E., Mestre, J., O’Leary, D. P. & Przytycka, T. M. Why Do Hubs in the Yeast Protein

    Interaction Network Tend To Be Essential: Reexamining the Connection between the Network

    Topology and Essentiality. PLoS Computational Biology 4, e1000140 (2008).

    17. Batada, N. N., Hurst, L. D. & Tyers, M. Evolutionary and physiological importance of hub

    proteins. PLoS Comput. Biol. 2, e88 (2006).

    18. Yu, H., Greenbaum, D., Lu, H. X., Zhu, X. & Gerstein, M. Genomic analysis of essentiality within

    protein networks. Trends in Genetics 20, 227–231 (2004).

    19. Fraser, H. B. Evolutionary Rate in the Protein Interaction Network. Science 296, 750–752 (2002).

    20. Fraser, H. B. Modularity and evolutionary constraint on proteins. Nat. Genet. 37, 351–352 (2005).

    21. Hahn, M. W. & Kern, A. D. Comparative genomics of centrality and essentiality in three

    eukaryotic protein-interaction networks. Mol. Biol. Evol. 22, 803–806 (2005).

    22. Carithers, L. J. et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The

    GTEx Project. Biopreserv. Biobank. 13, 311–319 (2015).

    23. Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 28, 1248–

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    http://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/X06Ghttp://paperpile.com/b/r0zWIQ/XepPhttp://paperpile.com/b/r0zWIQ/Y9BBhttp://paperpile.com/b/r0zWIQ/s9rihttp://paperpile.com/b/r0zWIQ/mkddhttp://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/j1Ibhttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/X06Ghttp://paperpile.com/b/r0zWIQ/XepPhttp://paperpile.com/b/r0zWIQ/Y9BBhttp://paperpile.com/b/r0zWIQ/s9rihttp://paperpile.com/b/r0zWIQ/mkddhttp://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/fcJFhttp://paperpile.com/b/r0zWIQ/fcJFhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/j1Ibhttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/wLDNhttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/XLaahttp://paperpile.com/b/r0zWIQ/X06Ghttp://paperpile.com/b/r0zWIQ/X06Ghttp://paperpile.com/b/r0zWIQ/X06Ghttp://paperpile.com/b/r0zWIQ/XepPhttp://paperpile.com/b/r0zWIQ/XepPhttp://paperpile.com/b/r0zWIQ/Y9BBhttp://paperpile.com/b/r0zWIQ/Y9BBhttp://paperpile.com/b/r0zWIQ/s9rihttp://paperpile.com/b/r0zWIQ/s9rihttp://paperpile.com/b/r0zWIQ/s9rihttp://paperpile.com/b/r0zWIQ/mkddhttp://paperpile.com/b/r0zWIQ/mkddhttp://paperpile.com/b/r0zWIQ/mkddhttp://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/bB82http://paperpile.com/b/r0zWIQ/fcJFhttp://paperpile.com/b/r0zWIQ/fcJFhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/o6hNhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/PY8jhttp://paperpile.com/b/r0zWIQ/j1Ibhttp://paperpile.com/b/r0zWIQ/j1Ibhttp://paperpile.com/b/r0zWIQ/j1Ibhttp://paperpile.com/b/r0zWIQ/vimShttp://paperpile.com/b/r0zWIQ/vimShttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • 1250 (2010).

    24. Wright, P. E. & Dyson, H. J. Intrinsically disordered proteins in cellular signalling and regulation.

    Nat. Rev. Mol. Cell Biol. 16, 18–29 (2015).

    25. Geschwind, D. H. & Rakic, P. Cortical evolution: judge the brain by its cover. Neuron 80, 633–

    647 (2013).

    26. Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to

    functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709

    (2013).

    27. Rackham, O. J. L., Shihab, H. A., Johnson, M. R. & Petretto, E. EvoTol: a protein-sequence

    based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res. 43,

    e33 (2015).

    28. Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease.

    Nat. Genet. 46, 944–950 (2014).

    29. Fadista, J., Oskolkov, N., Hansson, O. & Groop, L. LoFtool: a gene intolerance score based on

    loss-of-function variants in 60 706 individuals. Bioinformatics 33, 471–474 (2017).

    30. Bartha, I. et al. The Characteristics of Heterozygous Protein Truncating Variants in the Human

    Genome. PLoS Comput. Biol. 11, e1004647 (2015).

    31. Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants

    from human exome data. Nat. Genet. 49, 806–810 (2017).

    32. Hart, T. et al. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific

    Cancer Liabilities. Cell 163, 1515–1526 (2015).

    33. Dosztanyi, Z., Csizmok, V., Tompa, P. & Simon, I. IUPred: web server for the prediction of

    intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    http://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/wtlIhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/wXjDhttp://paperpile.com/b/r0zWIQ/sxg8http://paperpile.com/b/r0zWIQ/Hm1Dhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/wtlIhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/wXjDhttp://paperpile.com/b/r0zWIQ/sxg8http://paperpile.com/b/r0zWIQ/Hm1Dhttp://paperpile.com/b/r0zWIQ/Hm1Dhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/FshEhttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/pvGghttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/tylZhttp://paperpile.com/b/r0zWIQ/wtlIhttp://paperpile.com/b/r0zWIQ/wtlIhttp://paperpile.com/b/r0zWIQ/wtlIhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/6Xmwhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/3cPOhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/lOiuhttp://paperpile.com/b/r0zWIQ/wXjDhttp://paperpile.com/b/r0zWIQ/wXjDhttp://paperpile.com/b/r0zWIQ/wXjDhttp://paperpile.com/b/r0zWIQ/sxg8http://paperpile.com/b/r0zWIQ/sxg8http://paperpile.com/b/r0zWIQ/sxg8http://paperpile.com/b/r0zWIQ/wLDNhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • 21, 3433–3434 (2005).

    34. Steinberg, J., Honti, F., Meader, S. & Webber, C. Haploinsufficiency predictions without study

    bias. Nucleic Acids Research 43, e101–e101 (2015).

    35. Kryuchkova-Mostacci, N. & Robinson-Rechavi, M. A benchmark of gene expression tissue-

    specificity metrics. Brief. Bioinform. 18, 205–214 (2017).

    36. Lambert, S. A. et al. The Human Transcription Factors. Cell 175, 598–599 (2018).

    37. Gerstberger, S., Hafner, M. & Tuschl, T. A census of human RNA-binding proteins. Nat. Rev.

    Genet. 15, 829–845 (2014).

    38. Southan, C. et al. The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated

    quantitative interactions between 1300 protein targets and 6000 ligands. Nucleic Acids Res. 44,

    D1054–68 (2016).

    39. Herrero, J. et al. Ensembl comparative genomics resources. Database 2016, (2016).

    40. Chen, W.-H., Minguez, P., Lercher, M. J. & Bork, P. OGEE: an online gene essentiality database.

    Nucleic Acids Res. 40, D901–6 (2012).

    41. Chen, W.-H., Lu, G., Chen, X., Zhao, X.-M. & Bork, P. OGEE v2: an update of the online gene

    essentiality database with special focus on differentially essential genes in human cancer cell

    lines. Nucleic Acids Res. 45, D940–D944 (2017).

    42. Li, T. et al. A scored human protein–protein interaction network to catalyze genomic

    interpretation. Nature Methods 14, 61–64 (2017).

    43. Cornish, A. J. & Markowetz, F. SANTA: quantifying the functional content of molecular networks.

    PLoS Comput. Biol. 10, e1003808 (2014).

    44. Gabor, C. & Nepusz, T. The Igraph Software Package for Complex Network Research.

    InterJournal Complex Systems, 1695 (2006).

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    http://paperpile.com/b/r0zWIQ/4QUhhttp://paperpile.com/b/r0zWIQ/vRt9http://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/emTqhttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/Y68Zhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/JYrFhttp://paperpile.com/b/r0zWIQ/edPPhttp://paperpile.com/b/r0zWIQ/Hm1Dhttp://paperpile.com/b/r0zWIQ/4QUhhttp://paperpile.com/b/r0zWIQ/vRt9http://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/emTqhttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/Y68Zhttp://paperpile.com/b/r0zWIQ/Y68Zhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/JYrFhttp://paperpile.com/b/r0zWIQ/edPPhttp://paperpile.com/b/r0zWIQ/4QUhhttp://paperpile.com/b/r0zWIQ/4QUhhttp://paperpile.com/b/r0zWIQ/4QUhhttp://paperpile.com/b/r0zWIQ/vRt9http://paperpile.com/b/r0zWIQ/vRt9http://paperpile.com/b/r0zWIQ/vRt9http://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/zebWhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/jB6Uhttp://paperpile.com/b/r0zWIQ/emTqhttp://paperpile.com/b/r0zWIQ/emTqhttp://paperpile.com/b/r0zWIQ/emTqhttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/1bvChttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/VXJThttp://paperpile.com/b/r0zWIQ/Y68Zhttp://paperpile.com/b/r0zWIQ/Y68Zhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/FkLuhttp://paperpile.com/b/r0zWIQ/JYrFhttp://paperpile.com/b/r0zWIQ/JYrFhttp://paperpile.com/b/r0zWIQ/JYrFhttp://paperpile.com/b/r0zWIQ/edPPhttp://paperpile.com/b/r0zWIQ/edPPhttp://paperpile.com/b/r0zWIQ/edPPhttp://paperpile.com/b/r0zWIQ/Hm1Dhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/

  • 45. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and

    microarray studies. Nucleic Acids Res. 43, e47 (2015).

    46. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large

    gene lists using DAVID bioinformatics resources. Nature Protocols 4, 44–57 (2009).

    47. Huang, D. W. et al. Extracting biological meaning from large gene lists with DAVID. Curr. Protoc.

    Bioinformatics Chapter 13, Unit 13.11 (2009).

    .CC-BY-ND 4.0 International licenseacertified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under

    The copyright holder for this preprint (which was notthis version posted October 22, 2019. ; https://doi.org/10.1101/814855doi: bioRxiv preprint

    http://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/Fw6jhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/Fw6jhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/2S1Bhttp://paperpile.com/b/r0zWIQ/Fw6jhttp://paperpile.com/b/r0zWIQ/Fw6jhttp://paperpile.com/b/r0zWIQ/Fw6jhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/Ntvvhttp://paperpile.com/b/r0zWIQ/Ntvvhttps://doi.org/10.1101/814855http://creativecommons.org/licenses/by-nd/4.0/