See discussions stats and author profiles for this publication at httpswwwresearchgatenetpublication322853954
Inferring and analyzing module-specific lncRNA-mRNA causal regulatory
networks in human cancer
Article in Briefings in Bioinformatics middot February 2018
DOI 101093bibbby008
CITATIONS
5READS
359
4 authors
Some of the authors of this publication are also working on these related projects
Economic Complexity View project
Measuring the Economic Complexity of Australias States and Territories View project
Junpeng Zhang
Dali University
57 PUBLICATIONS 266 CITATIONS
SEE PROFILE
Thuc D le
University of South Australia
110 PUBLICATIONS 629 CITATIONS
SEE PROFILE
Lin Liu
University of South Australia
180 PUBLICATIONS 1267 CITATIONS
SEE PROFILE
Jiuyong Li
University of South Australia
295 PUBLICATIONS 3406 CITATIONS
SEE PROFILE
All content following this page was uploaded by Junpeng Zhang on 01 February 2018
The user has requested enhancement of the downloaded file
Inferring and analyzing module-specific lncRNAndash
mRNA causal regulatory networks in human cancerJunpeng Zhang Thuc Duy Le Lin Liu and Jiuyong LiCorresponding authors Junpeng Zhang School of Engineering Dali University Dali Yunnan 671003 Public Republic of China Tel thorn86 872 2219799 Faxthorn86 872 2219799 E-mail zhangjunpeng_411yahoocom Jiuyong Li School of Information Technology and Mathematical Sciences University of SouthAustralia Mawson Lakes 5095 SA Australia Tel thorn61 8 830 23898 Fax thorn61 8 830 23381 E-mail jiuyongliunisaeduau
Abstract
It is known that noncoding RNAs (ncRNAs) cover 98 of the transcriptome but do not encode proteins Among ncRNAslong noncoding RNAs (lncRNAs) are a large and diverse class of RNA molecules and are thought to be a gold mine of poten-tial oncogenes anti-oncogenes and new biomarkers Although only a minority of lncRNAs is functionally characterized it isclear that they are important regulators to modulate gene expression and involve in many biological functions To revealthe functions and regulatory mechanisms of lncRNAs it is vital to understand how lncRNAs regulate their target genes forimplementing specific biological functions In this article we review the computational methods for inferring lncRNAndashmRNA interactions and the third-party databases of storing lncRNAndashmRNA regulatory relationships We have found thatthe existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs andmay not reveal gene regulatory relationships which are causal relationships Moreover these methods do not consider themodularity of lncRNAndashmRNA regulatory networks and thus the networks identified are not module-specific To addressthe above two issues we propose a novel method MSLCRN to infer and analyze module-specific lncRNAndashmRNA causal reg-ulatory networks We have applied it into glioblastoma multiforme lung squamous cell carcinoma ovarian cancer andprostate cancer respectively The experimental results show that MSLCRN as an expression-based method could be a use-ful complementary method to study lncRNA regulations
Key words lncRNA mRNA lncRNAndashmRNA co-expression lncRNAndashmRNA interaction lncRNAndashmRNA causal relationshiphuman cancer
Junpeng Zhang is an associate professor at the School of Engineering Dali University He received his BSc (2009) in Bio-medical Engineering and MSc(2012) in Control Theory and Control Engineering from Kunming University of Science and Technology Kunming City China His research interestsinclude bioinformatics and data miningThuc Duy Le is a research fellow at the University of South Australia (UniSA) He received his BSc (2002) and MSc (2006) in pure Mathematics from theUniversity of Pedagogy Ho Chi Minh City Vietnam and BSc (2010) in Computer Science from UniSA He received his PhD degree in Computer Science(Bioinformatics) in 2014 at UniSA His research interests are bioinformatics data mining and machine learningLin Liu is a senior lecturer at the School of Information Technology and Mathematical Sciences University of South Australia (UniSA) She received herbachelor and master degrees in Electronic Engineering from Xidian University China in 1991 and 1994 respectively and her PhD degree in computer sys-tems engineering from UniSA in 2006 Her research interests include data mining and bioinformatics as well as Petri nets and their applications to proto-col verification and network security analysisJiuyong Li is a professor at the School of Information Technology and Mathematical Sciences University of South Australia He received his PhD degree incomputer science from the Griffith University Australia (2002) His research interests are in the fields of data mining privacy preserving and bioinfor-matics His research has been supported by five prestigious Australian Research Council Discovery grants since 2005 and he has published more than 100research papersSubmitted 13 December 2017 Received (in revised form) 8 January 2018
VC The Author(s) 2018 Published by Oxford University Press All rights reservedFor Permissions please email journalspermissionsoupcom
1
Briefings in Bioinformatics 2018 1ndash17
doi 101093bibbby008Paper
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Introduction
Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)
To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs
A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks
have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]
In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step
To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)
Computational methods for inferringlncRNAndashmRNA interactions
In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows
Sequence-based method
The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions
Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing
129 141 164 240 338
639
991
1419
2068
2596
0
500
1000
1500
2000
2500
3000
Year
Num
ber
of p
ublic
atio
ns
Figure 1 The number of lncRNA-related publications in the past decade The
number of queried publications is obtained from PubMed library with keyword
lsquolncRNArsquo
2 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions
Methodstools Categories of methods Brief descriptions Available
GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases
httpbibiserv2cebitecuni-bielefelddeguugle
RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization
httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi
RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs
httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi
Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel
On request
RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs
httpwwwtbiunivieacathtafer
IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed
httprnainformatikuni-freiburgdeIntaRNAInputjsp
RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming
httprtipsdnabiokeioacjpractip
PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account
httprthdkresourcespetcofold
RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization
httpsrthdkresourcesrisearchrisearch1php
RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays
httpsrthdkresourcesrisearch
LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing
httpwwwcuilabcnlnctar
lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics
httpwwwherbbolorg 8001lrt
Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world
httprtoolscbrcjpcgi-binRNARNAindexpl
RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast
Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets
On request
Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression
On request
Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer
On request
Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma
On request
Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method
On request
Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method
On request
Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer
On request
Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage
On request
Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells
On request
Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions
On request
Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma
On request
Module-specific lncRNA-mRNA causal regulatory networks | 3
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods
To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance
RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs
To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]
Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole
human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification
Expression-based method
At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs
Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix
Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork
4 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
Inferring and analyzing module-specific lncRNAndash
mRNA causal regulatory networks in human cancerJunpeng Zhang Thuc Duy Le Lin Liu and Jiuyong LiCorresponding authors Junpeng Zhang School of Engineering Dali University Dali Yunnan 671003 Public Republic of China Tel thorn86 872 2219799 Faxthorn86 872 2219799 E-mail zhangjunpeng_411yahoocom Jiuyong Li School of Information Technology and Mathematical Sciences University of SouthAustralia Mawson Lakes 5095 SA Australia Tel thorn61 8 830 23898 Fax thorn61 8 830 23381 E-mail jiuyongliunisaeduau
Abstract
It is known that noncoding RNAs (ncRNAs) cover 98 of the transcriptome but do not encode proteins Among ncRNAslong noncoding RNAs (lncRNAs) are a large and diverse class of RNA molecules and are thought to be a gold mine of poten-tial oncogenes anti-oncogenes and new biomarkers Although only a minority of lncRNAs is functionally characterized it isclear that they are important regulators to modulate gene expression and involve in many biological functions To revealthe functions and regulatory mechanisms of lncRNAs it is vital to understand how lncRNAs regulate their target genes forimplementing specific biological functions In this article we review the computational methods for inferring lncRNAndashmRNA interactions and the third-party databases of storing lncRNAndashmRNA regulatory relationships We have found thatthe existing methods are based on statistical correlations between the gene expression levels of lncRNAs and mRNAs andmay not reveal gene regulatory relationships which are causal relationships Moreover these methods do not consider themodularity of lncRNAndashmRNA regulatory networks and thus the networks identified are not module-specific To addressthe above two issues we propose a novel method MSLCRN to infer and analyze module-specific lncRNAndashmRNA causal reg-ulatory networks We have applied it into glioblastoma multiforme lung squamous cell carcinoma ovarian cancer andprostate cancer respectively The experimental results show that MSLCRN as an expression-based method could be a use-ful complementary method to study lncRNA regulations
Key words lncRNA mRNA lncRNAndashmRNA co-expression lncRNAndashmRNA interaction lncRNAndashmRNA causal relationshiphuman cancer
Junpeng Zhang is an associate professor at the School of Engineering Dali University He received his BSc (2009) in Bio-medical Engineering and MSc(2012) in Control Theory and Control Engineering from Kunming University of Science and Technology Kunming City China His research interestsinclude bioinformatics and data miningThuc Duy Le is a research fellow at the University of South Australia (UniSA) He received his BSc (2002) and MSc (2006) in pure Mathematics from theUniversity of Pedagogy Ho Chi Minh City Vietnam and BSc (2010) in Computer Science from UniSA He received his PhD degree in Computer Science(Bioinformatics) in 2014 at UniSA His research interests are bioinformatics data mining and machine learningLin Liu is a senior lecturer at the School of Information Technology and Mathematical Sciences University of South Australia (UniSA) She received herbachelor and master degrees in Electronic Engineering from Xidian University China in 1991 and 1994 respectively and her PhD degree in computer sys-tems engineering from UniSA in 2006 Her research interests include data mining and bioinformatics as well as Petri nets and their applications to proto-col verification and network security analysisJiuyong Li is a professor at the School of Information Technology and Mathematical Sciences University of South Australia He received his PhD degree incomputer science from the Griffith University Australia (2002) His research interests are in the fields of data mining privacy preserving and bioinfor-matics His research has been supported by five prestigious Australian Research Council Discovery grants since 2005 and he has published more than 100research papersSubmitted 13 December 2017 Received (in revised form) 8 January 2018
VC The Author(s) 2018 Published by Oxford University Press All rights reservedFor Permissions please email journalspermissionsoupcom
1
Briefings in Bioinformatics 2018 1ndash17
doi 101093bibbby008Paper
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Introduction
Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)
To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs
A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks
have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]
In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step
To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)
Computational methods for inferringlncRNAndashmRNA interactions
In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows
Sequence-based method
The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions
Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing
129 141 164 240 338
639
991
1419
2068
2596
0
500
1000
1500
2000
2500
3000
Year
Num
ber
of p
ublic
atio
ns
Figure 1 The number of lncRNA-related publications in the past decade The
number of queried publications is obtained from PubMed library with keyword
lsquolncRNArsquo
2 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions
Methodstools Categories of methods Brief descriptions Available
GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases
httpbibiserv2cebitecuni-bielefelddeguugle
RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization
httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi
RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs
httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi
Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel
On request
RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs
httpwwwtbiunivieacathtafer
IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed
httprnainformatikuni-freiburgdeIntaRNAInputjsp
RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming
httprtipsdnabiokeioacjpractip
PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account
httprthdkresourcespetcofold
RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization
httpsrthdkresourcesrisearchrisearch1php
RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays
httpsrthdkresourcesrisearch
LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing
httpwwwcuilabcnlnctar
lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics
httpwwwherbbolorg 8001lrt
Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world
httprtoolscbrcjpcgi-binRNARNAindexpl
RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast
Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets
On request
Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression
On request
Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer
On request
Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma
On request
Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method
On request
Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method
On request
Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer
On request
Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage
On request
Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells
On request
Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions
On request
Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma
On request
Module-specific lncRNA-mRNA causal regulatory networks | 3
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods
To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance
RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs
To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]
Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole
human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification
Expression-based method
At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs
Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix
Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork
4 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
Introduction
Long noncoding RNAs (lncRNAs) are non-protein coding tran-scripts with gt200 nucleotides in length Unlike small noncodingRNAs (sncRNAs) lncRNAs generally exhibit low sequence con-servation However owing to rapidly adaptive selection pres-sures the low conservation of lncRNAs (such as Air and Xist)does not indicate absence of function [1] Similar to microRNAs(miRNAs) an important class of sncRNAs evidence has shownthat lncRNAs play important roles in a wide range of biologicalprocesses even in cancers [2 3] Despite the importance oflncRNAs in many physiological and pathological processes alarge number of lncRNAs remain to be functionally character-ized For this reason the number of studies on lncRNA researchhas been increased exponentially in the past decade (as shownin Figure 1)
To achieve various biological functions lncRNAs form generegulatory networks by interacting with other biological mole-cules such as transcription factors miRNAs messenger RNAs(mRNAs) and RNA-binding proteins [4] Among these biologicalmolecules interacting with lncRNAs mRNAs are the most popu-lar ones By regulating the transcription and translation ofmRNAs lncRNAs could get involved in several vital biologicalprocesses such as cell differentiation cell proliferation andcytoprotective programs [5] Therefore the identification oflncRNAndashmRNA regulatory networks would help to uncoverfunctions and regulatory mechanisms of lncRNAs
A straightforward method for identifying lncRNAndashmRNAregulatory networks is sequence-based complementary basepairing To predict lncRNA targets several sequence-basedmethods such as GUUGle [6] RNAup [7] RNAplex [8] IntaRNA[9] RactIP [10] LncTar [11] and RIblast [12] have been devel-oped Owing to the long sequence and complex tertiary struc-ture of each lncRNA the computational costs of predictinglarge-scale lncRNAndashmRNA regulatory relationships are usuallyhigh Moreover these sequence-based methods only considerthe sequence information of lncRNAs and target mRNAs andthus the predicted lncRNAndashmRNA regulatory networks arestatic However previous studies [13ndash15] have shown thatlncRNAs exhibit condition-specific expression fashion anddynamic networks of gene regulation To identify dynamic orcondition-specific lncRNAndashmRNA regulatory networks it is nec-essary to use expression data Some expression-based methods[16ndash19] for predicting co-expressed lncRNAndashmRNA networks
have been proposed However as the predictions are based onstatistical associations found in gene expression levels onlythey may not represent the real lsquocausalrsquo lncRNAndashmRNA regula-tory relationships Furthermore the existing expression-basedmethods do not consider the modularity of lncRNAndashmRNA reg-ulatory networks an important feature of gene regulatory net-works [20]
In this article we first review the computational methodsfor inferring lncRNAndashmRNA interactions and the public data-bases for storing lncRNAndashmRNA regulatory relationshipsSecond we propose a novel method to infer Module-SpecificLncRNAndashmRNA Causal Regulatory Network (thus the proposedmethod is called MSLCRN) In the first step by consideringmodularity of networks MSLCRN uses Weighted Gene Co-expression Network Analysis (WGCNA) [21] to identify lncRNAndashmRNA co-expression modules In each module the lncRNAsand mRNAs are regarded as module-specific genes In the sec-ond step MSLCRN uses a causal inference method named inter-vention calculus when the directed acyclic graph (DAG) isabsent (IDA) [22 23] to estimate the causal effects of possiblelncRNAndashmRNA causal pairs in each module To speed up theestimation the parallelized version of IDA [24] is used to calcu-late the causal effects For each module the noncausal lncRNAndashmRNA pairs are eliminated and the retained lncRNAndashmRNAcausal pairs are further assembled to generate a module-specific lncRNAndashmRNA causal network To obtain a globallncRNAndashmRNA causal regulatory network we further integratethe identified module-specific lncRNAndashmRNA causal networksin the third step
To evaluate MSLCRN we have applied it into four humancancer data sets including glioblastoma multiforme (GBM) lungsquamous cell carcinoma (LSCC) ovarian cancer (OvCa) andprostate cancer (PrCa) from [25] The validation survival andenrichment analysis results show that the proposed methodcan help with revealing the functions and regulatory mecha-nisms of lncRNAs MSLCRN is released under the GPL-30License and is freely available through GitHub (httpsgithubcomzhangjunpeng411MSLCRN)
Computational methods for inferringlncRNAndashmRNA interactions
In this section we review the computational approaches forinferring lncRNAndashmRNA interactions In Table 1 we divide themethods into two categories (1) sequence-based method and(2) expression-based method We will separately review thesemethods as follows
Sequence-based method
The common characteristic of the sequence-based methods isthat the identification of RNAndashRNA interactions depends onRNA binding energy between two RNA molecules To evaluatethe strength of RNA binding energy a number of energy models[6ndash12 26ndash32] are proposed to predict RNAndashRNA interactions
Gerlach and Giegerich [6] propose a utility program GUUGlefor locating potential helical regions under RNA complementarybase pairs rules The method can be effectively used as a filterfor noncoding RNA (ncRNA) target prediction However the reli-able prediction of RNAndashRNA binding energies is also importantfor the identification of RNAndashRNA interactions To study thethermodynamics of RNAndashRNA interactions Muckstein et al [7]present an extension of the standard partition function methodcalled RNAup to RNA secondary structures By comparing
129 141 164 240 338
639
991
1419
2068
2596
0
500
1000
1500
2000
2500
3000
Year
Num
ber
of p
ublic
atio
ns
Figure 1 The number of lncRNA-related publications in the past decade The
number of queried publications is obtained from PubMed library with keyword
lsquolncRNArsquo
2 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions
Methodstools Categories of methods Brief descriptions Available
GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases
httpbibiserv2cebitecuni-bielefelddeguugle
RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization
httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi
RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs
httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi
Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel
On request
RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs
httpwwwtbiunivieacathtafer
IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed
httprnainformatikuni-freiburgdeIntaRNAInputjsp
RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming
httprtipsdnabiokeioacjpractip
PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account
httprthdkresourcespetcofold
RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization
httpsrthdkresourcesrisearchrisearch1php
RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays
httpsrthdkresourcesrisearch
LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing
httpwwwcuilabcnlnctar
lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics
httpwwwherbbolorg 8001lrt
Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world
httprtoolscbrcjpcgi-binRNARNAindexpl
RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast
Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets
On request
Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression
On request
Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer
On request
Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma
On request
Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method
On request
Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method
On request
Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer
On request
Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage
On request
Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells
On request
Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions
On request
Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma
On request
Module-specific lncRNA-mRNA causal regulatory networks | 3
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods
To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance
RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs
To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]
Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole
human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification
Expression-based method
At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs
Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix
Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork
4 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
Table 1 Summary of computational methods or tools for inferring lncRNAndashmRNA interactions
Methodstools Categories of methods Brief descriptions Available
GUUGle [6] Sequence-based Target prediction by locating potential helical regions of RNAndashRNA pairs under RNA base pairing rules which include G-Ubases
httpbibiserv2cebitecuni-bielefelddeguugle
RNAup [7] Sequence-based Target prediction by studying thermodynamics of RNAndashRNApairs based on the sum of the energy of binding andhybridization
httprnatbiunivieacatcgi-binRNAWebSuiteRNAupcgi
RNAcofold [26] Sequence-based Target prediction by computing the hybridization energy andbase pairing pattern of RNAndashRNA pairs
httprnatbiunivieacatcgi-binRNAWebSuiteRNAcofoldcgi
Alkan et al [27] Sequence-based Target prediction by minimizing the joint free energy of RNAndashRNA pairs under a number of energy models including basepair energy model stacked pair energy model loop energymodel
On request
RNAplex [8] Sequence-based Target prediction by finding possible hybridization sites ofRNAndashRNA pairs
httpwwwtbiunivieacathtafer
IntaRNA [9] Sequence-based Target prediction by incorporating accessibility of target sitesas well as the existence of a user-definable seed
httprnainformatikuni-freiburgdeIntaRNAInputjsp
RactIP [10] Sequence-based Target prediction by integrating approximate information onan ensemble of equilibrium joint structures into the objec-tive function of integer programming
httprtipsdnabiokeioacjpractip
PETcofold [28] Sequence-based Target prediction by taking covariance information in intra-molecular and intermolecular base pairs into account
httprthdkresourcespetcofold
RIsearch [29] Sequence-based Target prediction by implementing a simplified Turner energymodel for fast computation of hybridization
httpsrthdkresourcesrisearchrisearch1php
RIsearch2 [30] Sequence-based An updated version of RIsearch and predict targets using a sin-gle integrated seed-and-extend framework based on suffixarrays
httpsrthdkresourcesrisearch
LncTar [11] Sequence-based lncRNA target prediction by finding the minimum free energyjoint structure of RNAndashRNA pairs based on base pairing
httpwwwcuilabcnlnctar
lncRNATargets [31] Sequence-based lncRNA target prediction based on nucleic acidthermodynamics
httpwwwherbbolorg 8001lrt
Terai et al [32] Sequence-based lncRNA target prediction by developing an integrated pipelineon the K computer which is one of the fastest super-com-puters in the world
httprtoolscbrcjpcgi-binRNARNAindexpl
RIblast [12] Sequence-based Target prediction based on the seed-and-extension approach httpgithubcomfukunagatsuRIblast
Liao et al [16] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand the identified lncRNAndashmRNA interactions should be co-expressed in the same direction in no less than 3 Mousemicroarray data sets
On request
Guo et al [17] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin OvCa malignant progression
On request
Du et al [18] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodand a power function in thyroid cancer
On request
Liu et al [33] Expression-based Identify lncRNAndashmRNA interactions by using Pearson methodin human colorectal carcinoma
On request
Huang et al [34] Expression-based Identify lncRNAndashmRNA interactions associated with pneumo-nia by using Pearson method
On request
Li et al [35] Expression-based Identify dynamic lncRNAndashmRNA interactions associated withvenous congestion by using Pearson method
On request
Wu et al [19] Expression-based Identify lncRNAndashmRNA interactions by using a generalized lin-ear model to regress mRNA expression on lncRNA expres-sion in breast cancer
On request
Fu et al [36] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage
On request
Zhang et al [37] Expression-based Identify lncRNAndashmRNA interactions by considering mRNA lociwithin lncRNA and the Pearson correlation in cartilage inperipheral blood mononuclear cells
On request
Iwakiri et al [38] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by integrat-ing the tissue specificity of lncRNAs and mRNAs intosequence-based prediction of human lncRNAndashRNAinteractions
On request
Lv et al [39] Expression-based Identify tissue-specific lncRNAndashmRNA interactions by usingPearson and sequence-based methods and in human intra-hepatic cholangiocarcinoma
On request
Module-specific lncRNA-mRNA causal regulatory networks | 3
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods
To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance
RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs
To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]
Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole
human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification
Expression-based method
At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs
Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix
Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork
4 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
predicted free energies of binding with RNA interference experi-mental data RNAup can produce biologically reasonableresults For genome-wide predictions of ncRNA targets RNAupis not fast enough Therefore it is usually to be combined withother faster RNAndashRNA prediction methods
To extend the standard dynamic programming algorithmsfor computing RNA secondary structures Bernhart et al [26]propose a program named RNAcofold to compute the hybridiza-tion energy and base pairing pattern of the co-folding of twoRNA molecules However the method disregards some impor-tant interaction structures and is restricted to dimeric com-plexes Moreover for the RNAndashRNA interaction predictionpredicting the joint secondary structure of two interacting RNAsis also important To solve it Alkan et al [27] develop severalalgorithms to minimize the joint free energy between the twoRNAs under a number of energy models Assuming that con-served RNAndashRNA interactions imply conserved functionSeemann et al [28] also implement a comparative method calledPETcofold to predict the joint secondary structure of two inter-acting RNAs As PETcofold considers sequence conservation anincreasing amount of structural covariance can further improveits performance
RNAup [7] and RNAcofold [26] are too slow for genome-widesearch in finding target sites of ncRNAs To accelerate the speedof RNAndashRNA interaction predictions RNAplex [8] is presented toquickly find possible hybridization sites between two interact-ing RNAs To focus on the target search on short highly stableinteractions RNAplex introduces a per nucleotide penaltyMeanwhile another general and fast approach IntaRNA [9] isproposed to efficiently predict bacterial RNAndashRNA interactionsCompared with other existing target prediction methodsIntaRNA considers both the accessibility of target sites and theexistence of a user-defined seed Therefore it shows a higheraccuracy than competing methods Kato et al [10] also present afast and accurate prediction method RactIP for comprehensivetype of RNAndashRNA interactions In terms of predicting joint sec-ondary structures of two interacting RNAs RactIP run incompa-rably faster than competitive programs
To further achieve a speed improvement of predictingRNAndashRNA interactions Wenzel et al [29] present RIsearch forfast computation of hybridization between two interactingRNAs They show that the energy model of RIsearch is anaccurate approximation of the full energy model for near-complementary RNAndashRNA duplexes Furthermore RIsearch isfaster than RNAplex [8] in RNAndashRNA interaction searchRecently RIsearch2 [30] an updated version of RIsearch [29] isproposed to localize potential near-complementary RNAndashRNAinteractions between two RNA sequences The comparisonresults show that RIsearch2 is much faster than the previousmethods such as GUUGle [6] RNAplex [8] IntaRNA [9] andRIsearch [29]
Although the above RNAndashRNA interaction prediction meth-ods can be extended to predict lncRNAndashmRNA interactionsnone of them are exclusively used for identifying the RNA tar-gets of lncRNAs in a large scale To efficiently identify lncRNAndashmRNA interactions Li et al [11] propose a tool named LncTarLncTar explores lncRNAndashmRNA interactions by finding the min-imum free energy joint structure of two interacting RNAs basedon base pairing As LncTar runs fast and does not have a limitto RNA size it can be used for large-scale identification of theRNA targets for all RNAs Another web-based platformlncRNATargets [31] is also provided for lncRNA target predic-tion Because there is no limit to RNA size lncRNATargets canalso be used to identify the RNA targets of all RNAs In a whole
human transcriptome Terai et al [32] develop an integratedpipeline to predict lncRNAndashmRNA interactions for the first timeIn the pipeline IntaRNA [9] is used to calculate interactionenergy and RactIP [10] is used to predict joint secondary struc-ture Recently to further shorten the running time of predictinglncRNAndashmRNA interactions an ultrafast RNAndashRNA interactionprediction method RIblast [12] based on the seed-and-extensionmethod is presented The comparison results show that RIblastruns faster than RNAplex [8] IntaRNA [9] Terai et al pipeline[32] and thus can be applied to a large scale of lncRNA targetidentification
Expression-based method
At the gene expression level the co-expressed lncRNAndashmRNApairs are regarded as lncRNAndashmRNA interactions for theexpression-based methods Among the existing expression-based methods [16ndash19 33ndash39] Pearson correlation method is akey step of most methods to identify co-expressed lncRNAndashmRNA pairs
Liao et al [16] construct a lncRNAndashmRNA co-expression net-work from re-annotated mouse microarray data sets By usingPearson method they only keep the lncRNAndashmRNA pairs withPlt 001 and Pearson correlation ranked in the top or bottom005 percentile The study is the first large-scale prediction oflncRNA functions from a lncRNAndashmRNA co-expression networkTo identify immune-associated lncRNA biomarkers in OvCa Guoet al [17] make a comprehensive analysis of lncRNAndashmRNA co-expression patterns To identify lncRNAndashmRNA co-expressionpairs they calculate Pearson correlation between differentiallyexpressed lncRNAs and mRNAs They only reserve the lncRNAndashmRNA co-expression pairs with Pearson correlationgt 05 and thecorresponding False Discovery Rate (FDR)lt 001 Liu et al [33] andHuang et al [34] also use Pearson method to study lncRNAndashmRNAco-expression networks in human colorectal carcinoma andpneumonia respectively The inferred lncRNAndashmRNA co-expression networks will help to study lncRNA functionsRecently Du et al [18] propose a two-step method to conduct acomprehensive analysis of lncRNAndashmRNA co-expression patternsin thyroid cancer First they use Pearson method to calculatePearson correlation and the cutoff of Pearson correlation is 05and the corresponding FDR cutoff is 001 Second the Pearson cor-relations are transformed into an adjacency matrix
Owing to dynamic characteristic of gene regulatory net-works Wu et al [19] identify two distinct lncRNAndashmRNA co-expression networks in tumor and normal breast tissue Theyuse a generalized linear model to regress mRNA expression onlncRNA expression in tumor and normal breast tissue and onlyfocus on dynamic breast lncRNAndashmRNA co-expression pairsthat differ in tumor and normal breast tissue Meanwhile tostudy the potential role of lncRNAs in venous congestion Liet al [35] also construct a dynamic lncRNAndashmRNA co-expression network By using Pearson method they separatelycalculate Pearson correlations of each lncRNAndashmRNA pair invenous congestion and normal samples The lncRNAndashmRNApairs with Pearson correlationgt099 orlt099 and P-valuelt001are selected as lncRNAndashmRNA co-expression pairs They con-struct two types of lncRNAndashmRNA co-expression networkslsquolostrsquo network where lncRNAndashmRNA co-expression pairs onlyexisted in normal samples and lsquoobtainedrsquo network wherelncRNAndashmRNA co-expression pairs only existed in venous con-gestion samples The lsquolostrsquo and lsquoobtainedrsquo networks are furtherintegrated to obtain a dynamic lncRNAndashmRNA co-expressionnetwork
4 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
The above methods simply use matched lncRNA and mRNAexpression data to identify lncRNAndashmRNA co-expression pairsTo identify lsquocis-regulated target genesrsquo of lncRNAs some methodsalso consider mRNA loci information within lncRNA For exampleFu et al [36] combine mRNA loci information and matchedlncRNA and mRNA expression data to predict lncRNA targetsThey identify the mRNAs as targets under two conditions (i) themRNA loci are within a 300-kb window up- or downstream oflncRNA and (ii) lncRNAndashmRNA co-expression pairs are signifi-cantly positive correlated (Pearson correlationgt 08 and the corre-sponding P-valuelt 005) Zhang et al [37] also use a similarmethod to Fu et al [36] for identifying lncRNA targets The mRNAscan be regarded as targets when (1) the mRNA loci are within a10 window up- or downstream of lncRNA and (2) lncRNAndashmRNAco-expression pairs are significantly positive correlated (Pearsoncorrelationgt 098 and the corresponding P-valuelt 005)
Apart from mRNA loci information within lncRNA someemerging methods consider predictions from sequence-basedmethods as putative lncRNAndashmRNA interactions For exampleIwakiri et al [38] integrate tissue-specific lncRNA and mRNAexpression data into predictions from a sequence-basedmethod in [32] They discover that integrating tissue specificitycan improve prediction accuracy of lncRNAndashmRNA interactionsLv et al [39] also combine matched lncRNA and mRNA expres-sion data with predictions from a sequence-based methodLncTar [11] They first use Pearson method to identify co-expressed lncRNAndashmRNA co-expression pairs with Pearsoncorrelationgt095 orlt095 Then LncTar is used to further filterthe identified lncRNAndashmRNA co-expression pairs
Public databases for storing lncRNAndashmRNAregulatory relationships
In this section we review the public databases of storinglncRNAndashmRNA regulatory relationships Table 2 shows a sum-mary of the third-party public databases including experimen-tally validated and computationally predicted databases
NPInter [40] contains experimentally validated interactionsbetween ncRNAs especially lncRNAs and miRNAs The databasecontains 915 067 interactions in 188 tissues or cell lines from 68kinds of experimental technologies There is a classification ofthe functional interactions based on the functional process thatncRNA is involved in Moreover NPInter allows users to searchinteractions related publications and other information
LncRNADisease [41] not only collects experimentally sup-ported lncRNAndashdisease associations and lncRNA interactionsbut also predicts novel lncRNAndashdisease associations Recentlythe database curates 478 entries of experimentally validatedlncRNA interactions LncRNADisease provides users severalways to search lncRNA-related diseases and interactions
To study differentially expressed genes after lncRNA knock-down or overexpression Jiang et al [42] develop a databasecalled LncRNA2Target in human and mouse organisms Thedatabase has a collection of 396 experimentally validatedlncRNAndashtarget interactions In LncRNA2Target if a gene is dif-ferentially expressed after lncRNA knockdown or overexpres-sion it is regarded as a target of a lncRNA For convenienceLncRNA2Target allows users to search for the targets of singlelncRNA or for the lncRNAs that target a specific geneMeanwhile Zhou et al [43] also build a reference resourceLncReg for lncRNA-related regulatory networks The databasehas 1081 experimentally validated lncRNA-related regulatory
records between 258 nonredundant lncRNAs and 571 nonredun-dant genes
IRNdb [44] is a database that focuses on collecting immuno-logically relevant lncRNAndashtarget miRNAndashtarget and PIWI-interacting RNAndashtarget interactions The current version ofIRNdb documents 22 453 immunologically relevant lncRNAndashtar-get interactions by integrating three databases LncRNADisease[41] LncRNA2Target [42] and LncReg [43] The aim is to helpresearchers study the roles of ncRNAs in the immune systemRecently a new experimentally validated database namedlncRInter [45] was developed to collect reliable and high-qualitylncRNAndashtarget interactions The extracted lncRNAndashtarget inter-actions are all from published literature and are supported bycertain biological experiments (eg luciferase reporter assayin vitro binding assay RNA pull-down) In total lncRInter con-tains 1036 experimentally validated lncRNAndashtarget interactionsin 15 organisms
In addition to the experimentally validated databases pre-sented above there are several computationally predicted data-bases for collecting lncRNAndashmRNA interactions For instancestarBase [46] is a comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNA interaction net-works from 108 CLIP-Seq (PAR-CLIP HITS-CLIP iCLIP CLASH)data sets The lncRNAndashmRNA interactions can be extractedfrom proteinndashRNA interaction networks InCaNet [47] aimsto establish a comprehensive regulatory network betweenlncRNAs and cancer genes They identify lncRNAndashcancergene interactions by computing gene co-expression betweenlncRNAs and cancer genes BmncRNAdb [48] is a comprehensivedatabase of silkworm lncRNAs and miRNAs The database pro-vides three online tools for users to predict both lncRNAndashtargetand miRNAndashtarget interactions lncRNAtor [49] collect expres-sion data from 243 RNA-seq experiments including 5237samples of various tissues and developmental stages ThelncRNAndashmRNA co-expression pairs are identified through co-expression analysis of lncRNAs and mRNAs lncRNome [50] is acomprehensive knowledgebase of sequence structure biologi-cal functions genomic variations and epigenetic modificationson gt17 000 lncRNAs in human For lncRNAndashprotein interactionsthe database incorporates PAR-CLIP experiments and a supportvector machine-based prediction method Co-lncRNA [51] andLncRNA2Function [52] predict co-expressed lncRNAndashmRNAinteractions from RNA-Seq data and further annotates thepotential functions of human lncRNAs using functional enrich-ment analysis lncRNAMap [53] is an integrated and compre-hensive database to explore regulatory functions of humanlncRNAs By integrating small RNAs supported by publicly avail-able deep sequencing data lncRNAMap construct lncRNA-derived siRNAndashtarget interactions
In summary for experimentally validated databases userscan select individual database or combine several databases asground truth to validate the predicted lncRNAndashmRNA interac-tions As for computationally predicted databases they can beused as initial structural of sequence-based or expression-basedmethods to identify lncRNAndashmRNA interactions
Inferring and analyzing MSLCRN networksRepurposed microarray data across human cancers
We collect the repurposed lncRNA and mRNA expression dataof GBM LSCC OvCa and PrCa from [25] A lncRNA or mRNA iseliminated if it does not have a corresponding gene symbol in adata set By calculating average expression values of replicate
Module-specific lncRNA-mRNA causal regulatory networks | 5
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
lncRNAs and mRNAs we obtain unique expression value ofthese replicates Consequently we get the matched expressiondata of 9704 lncRNAs and 18 282 mRNAs in 451 GBM 113 LSCC585 OvCa and 150 PrCa samples
Pipeline of MSLCRN
As shown in Figure 2 MSLCRN contains the following threesteps to infer module-specific lncRNAndashmRNA causal regulatorynetworks
i Identification of lncRNAndashmRNA co-expression modulesGiven the matched lncRNA and mRNA expression data weuse WGCNA to generate gene co-expression modules Amodule containing at least two lncRNAs and two mRNAsare regarded as a lncRNAndashmRNA co-expression moduleand used as the input of the second step
ii Identification of module-specific lncRNAndashmRNA causal reg-ulatory networks For each lncRNAndashmRNA co-expressionmodule with each lncRNAndashmRNA pair we apply parallelIDA to estimate the causal effect of the lncRNA on the
Table 2 Public databases for storing lncRNAndashmRNA regulatory relationships
Databases Types of databases Brief descriptions Organisms Available
NPInter [40] Validated A database of experimentally verified func-tional interactions between ncRNAs(including lncRNAs miRNAs etc) and bio-molecules (proteins RNAs and DNAs)
22 organisms httpwwwbioinfoorgNPInter
LncRNADisease [41] Validated A database of experimentally supportedlncRNAndashdisease association data andlncRNAndashtarget interactions in various lev-els including protein RNA miRNA andDNA
Human httpwwwcuilabcnlncrnadisease
LncRNA2Target [42] Validated A database of lncRNAndashtarget regulatory rela-tionships experimentally validated bylncRNA knockdown or overexpression
Human mouse httpwwwlncrna2targetorg
LncReg [43] Validated A database of experimentally validatedlncRNAndashtarget interactions from publicliterature
7 organisms httpbioinformaticsustceducnlncreg
IRNdb [44] Validated A database of immunologically relevantncRNAs (miRNAs lncRNAs and otherncRNAs) and target genes
Human mouse httpcompbiomasseyacnzappsirndb
lncRInter [45] Validated A database of experimentally validatedlncRNAndashtarget interactions extracted frompeer-reviewed publications
15 organisms httpbioinfolifehusteducnlncRInter
starBase [46] Predicted A comprehensive database of systematicallyidentifying the RNAndashRNA and proteinndashRNAinteraction networks from 108 CLIP-Seq(PAR-CLIP HITS-CLIP iCLIP CLASH) datasets
Human httpstarbasesysueducn
lnCaNet [47] Predicted A database of establishing a comprehensiveregulatory network source for lncRNA andcancer genes
Human httplncanetbioinfo-minzhaoorg
BmncRNAdb [48] Predicted A comprehensive database of the silkwormlncRNAs and miRNAs as well as the threeonline tools for users to predict the targetgenes of lncRNAs or miRNAs
Bombyx mori httpgenecqueducnBmncRNAdbindexphp
lncRNAtor [49] Predicted A comprehensive resource of encompassingannotation sequence analysis geneexpression protein binding and phyloge-netic conservation
6 organisms httplncrnatorewhaackr
lncRNome [50] Predicted A comprehensive knowledgebase on thetypes chromosomal locations descriptionon the biological functions and diseaseassociations of lncRNAs
Human httpgenomeigibresinlncRNome
Co-LncRNA [51] Predicted A computationally predicted database toidentify GO annotations and KEGG path-ways affected by co-expressed protein-cod-ing genes of a single or multiple lncRNAs
Human httpwwwbio-bigdatacomCo-LncRNA
LncRNA2Function [52] Predicted A comprehensive resource of investigatingthe functions of lncRNAs based on co-expressed lncRNAndashmRNA interactions
Human httpmlghiteducnlncrna2function
lncRNAMap [53] Predicted An integrated and comprehensive databaseof regulatory functions of lncRNAs and act-ing as ceRNAs
Human httplncRNAMapmbcnctuedutw
6 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
mRNA We use the absolute value of the causal effect(AVCE) to evaluate the strength of the regulation of thelncRNA on the mRNA and a higher AVCE indicates a stron-ger lncRNA regulation The lncRNAndashmRNA pairs with highAVCEs in each module are considered as module-specificlncRNAndashmRNA causal regulatory relationships and we calleach module with these relationships identified a module-specific causal regulatory network
iii Identification of global lncRNAndashmRNA causal regulatorynetwork We integrate the module-specific lncRNAndashmRNA
causal regulatory networks to form the global lncRNAndashmRNA causal regulatory network
Identification of lncRNAndashmRNA co-expression modules
In systems biology WGCNA [21] is a popular method for findingthe correlation patterns among genes across samples and canbe used to identify clusters or modules of highly co-expressedgenes Therefore we use WGCNA to first infer lncRNAndashmRNAco-expression modules
Figure 2 The pipeline of MSLCRN First WGCNA is used to identify lncRNAndashmRNA co-expression modules from matched lncRNA and mRNA expression data Second
we infer lncRNAndashmRNA causal regulatory relationships in each module by using parallel IDA method For each module we assemble the identified lncRNAndashmRNA reg-
ulatory relationships to obtain a module-specific lncRNAndashmRNA causal regulatory network Third the module-specific lncRNAndashmRNA causal regulatory networks are
integrated to form a global lncRNAndashmRNA causal regulatory network
Module-specific lncRNA-mRNA causal regulatory networks | 7
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
Specifically the matched lncRNA and mRNA expressiondata are used as the input of WGCNA For each pair of genesi and j the gene co-expression similarity sij of the pair is definedas
sij frac14 jcorethi jTHORNj (1)
where jcor(i j)j is the absolute value of the Pearson correlationbetween genes i and j The gene co-expression similarity matrixis denoted by Sfrac14 [sij]
To pick an appropriate soft-thresholding power for trans-forming the similarity matrix S into an adjacency matrix A weuse the scale-free topology criterion for soft-thresholding andthe minimum scale free topology fitting index R2 is set as 09Then the topological overlap matrix (TOM) Wfrac14 [wij] is gener-ated based on the adjacency matrix Afrac14 [aij] The TOM similaritywij between genes i and j is defined
wij frac14P
uaiuauj thorn aij
minfP
uaiuP
uaujg thorn 1 aij(2)
where u denotes all genes of the matched lncRNA and mRNAexpression data The TOM dissimilarity between genes i and j isdenoted by dijfrac14 1 - wij To identify gene co-expression modulesthe TOM dissimilarity matrix Dfrac14 [dij] is clustered using optimalhierarchical clustering method [54] Here the identified geneco-expression modules are groups of lncRNAs and mRNAs withhigh topological overlap The lncRNAs and mRNAs of eachlncRNAndashmRNA co-expression module are considered for possi-ble lncRNAndashmRNA causal relationships in the next step
Identification of module-specific lncRNAndashmRNA causalregulatory networks
After the identification of lncRNAndashmRNA co-expression mod-ules we use the parallel IDA method [24] to estimate causaleffects of possible lncRNAndashmRNA causal pairs in each moduleThe application of parallel IDA method to matched lncRNAand mRNA expression data for estimating causal effectsincludes two steps (i) learning the causal structure fromexpression data using the parallel-PC algorithm [24] and(ii) estimating the causal effects of lncRNAs on mRNAs byapplying do-calculus [55]
In step (i) Vfrac14 L1 Lm T1 Tn is a set of random varia-bles denoting m lncRNAs and n mRNAs The causal structure isin the form of a DAG where a node denotes a lncRNA Li ormRNA Tj and an edge between two nodes represents a causalrelationship between them We use the parallel-PC algorithm aparallel version of the PC algorithm [56] to learn the causalstructures (the DAGs) from expression data Starting with a fullyconnected undirected graph the parallel-PC algorithm deter-mines if an edge is retained or removed in the graph by con-ducting conditional independence tests in parallel Then to geta DAG the directions of edges in the obtained graph are ori-ented As different DAGs may represent the same conditionalindependence the parallel-PC algorithm uses a completed par-tially directed acyclic graph (CPDAG) to uniquely describe anequivalence class of DAGs In this work we use the R-packageParallelPC [57] to implement the parallel-PC algorithm and setthe significant level of the conditional independence testsafrac14 001
In step (ii) we are only interested in estimating the causaleffect of the directed edge Li Tj where vertex is Li a parent ofvertex Tj As described above a CPDAG may generate a class ofDAGs For the causal effect of Li Tj in a CPDAG we use do-calculus [55] to estimate the causal effects of Li on Tj in a class ofDAGs Then we use the minimum absolute value of all possiblecausal effects as a final causal effect of Li Tj As for the detailsof how the parallel IDA method is applied to estimate causalrelationships from expression data the readers can refer to [24]
The estimated causal effects can be positive or negativereflecting the up or down regulation by the lncRNAs on themRNAs For the purpose of constructing the regulatory net-works we use the absolute values of the causal effects (AVCEs)to evaluate the strengths of the regulation and thus to confirmthe regulatory relationships
We set different AVCE cutoffs from 010 to 060 with a step of005 to generate MSLCRN networks in GBM LSCC OvCa andPrCa respectively For each cutoff we merge the identifiedMSLCRN networks to obtain global lncRNAndashmRNA causal regu-latory networks in the four human cancers respectively Asshown in Table 3 a higher cutoff selection causes a smallerglobal lncRNAndashmRNA causal regulatory network but bettergoodness of fit To make a trade-off between the size of theglobal lncRNAndashmRNA causal regulatory networks and goodnessof fit we set a compromised AVCE cutoff with a value of 045 Ifthe AVCE of a lncRNA on a mRNA is 045 or above we considerthere is a causal regulatory relationship between the lncRNAndashmRNA pair Under the compromise cutoff we have a moderatesize of the global lncRNAndashmRNA causal regulatory networks inGBM LSCC OvCa and PrCa Meanwhile the node degree distri-butions of four global lncRNAndashmRNA causal regulatory net-works also follow power law distribution (the fitted power curveis in the form of yfrac14 axb) well with R2gt 08
Validation survival and enrichment analysis
Previous studies have demonstrated that about 20 of thenodes in a biological network are essential and are regarded ashub genes [58 59] Therefore when analyzing a global lncRNAndashmRNA causal network we select the 20 of lncRNAs with thehighest degrees in the network as hub lncRNAs The degree of alncRNA node in the global network is the number of mRNAsconnected with it
To validate the predicted module-specific lncRNAndashmRNAcausal regulatory relationships we obtain the experimentallyvalidated lncRNAndashmRNA regulatory relationships from thethree widely used databases NPInter v30 [40] LncRNADiseasev2017 [41] and LncRNA2Target v12 [42] Furthermore we retainexperimentally validated lncRNAndashmRNA regulatory relation-ships associated with the four human cancer data sets asground truth
We perform survival analysis using the R-package survival[60] A multivariate Cox model is used to predict the risk scoreof each tumor sample Then all tumor samples in each cancerdata set are equally divided into high- and low-risk groupsaccording to their risk scores Moreover we calculate theHazard Ratio between the high- and the low-risk groups andperform the Log-rank test
To further investigate the underlying biological processesand pathways related to each of the MSLCRN networks we usethe R-package clusterProfiler [61] to conduct functional enrich-ment analysis on the networks respectively The GeneOntology (GO) [62] biological processes and Kyoto Encyclopedia
8 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
of Genes and Genomes (KEGG) [63] pathways with adjustedP-valuelt005 [adjusted by Benjamini-Hochberg (BH) method]are regarded as functional categories for the MSLCRN networks
We also collect a list of lncRNAs and mRNAs that areassociated with GBM LSCC OvCa and PrCa to study diseaseenrichment of each of the MSLCRN networks The list of disease-associated lncRNAs is obtained from LncRNADisease v2017 [41]Lnc2Cancer v2016 [64] and MNDR v20 [65] The list of disease-associated mRNAs is from DisGeNET v50 [66] To evaluatewhether a MSLCRN network is significantly enriched in a specificdisease we use a hyper-geometric distribution test as follows
p frac14 1 FethxjBNMTHORN frac14 1Xx1
ifrac140
N
i
B N
M i
B
M
(3)
In the formula B is the number of all genes in the expressiondata set N denotes the number of all genes associated with aspecific disease in the expression data set M is the number ofgenes in a MSLCRN network and x is the number of genes asso-ciated with a specific disease in a MSLCRN network A MSLCRNnetwork is significantly enriched in a specific disease if theP-valuelt 005
Network analysis validation and comparisonon MSLCRN networkslncRNAs exhibit dynamic positive gene regulationacross cancers
By following the first step of the MSLCRN method we haveidentified 23 38 45 and 32 lncRNAndashmRNA co-expression mod-ules in GBM LSCC OvCa and PrCa respectively In the secondstep of the MSLCRN method we eliminate the noncausallncRNAndashmRNA pairs in lncRNAndashmRNA co-expression modulesAs a result we generate 23 38 45 and 32 module-specificlncRNAndashmRNA causal regulatory networks in GBM LSCC OvCaand PrCa respectively After merging the module-specificlncRNAndashmRNA causal regulatory networks for each data set weobtain the four global lncRNAndashmRNA regulatory networks inGBM LSCC OvCa and PrCa respectively
To understand the overlap and difference of module-specificgenes module-specific lncRNAndashmRNA causal regulatory rela-tionships and module-specific hub lncRNAs in the four humancancers we generate three set intersection plots using theR-package UpSetR [67] As shown in Figure 3 we find that themajority of module-specific genes (5752) module-specificlncRNAndashmRNA causal regulatory relationships (9902) andmodule-specific hub lncRNAs (8922) tend to be cancer-specific Only a small portion of module-specific genes (396) andmodule-specific lncRNAndashmRNA causal regulatory relationships(6) are shared by the four cancers Especially none of themodule-specific hub lncRNAs are common between the fourcancers In addition the causal effects are positive for 99569672 9993 and 7863 of the causal regulatory relationshipsidentified in GBM LSCC OvCa and PrCa respectively Theseresults indicate that lncRNAs are more likely to exhibit dynamicpositive gene regulation across cancers The results are alsoconsistent with the proposition that the positive gene regula-tion by lncRNAs would be desired in specific situations [68]
Differential network analysis uncovers cancer-specificlncRNAndashmRNA causal networks
In this section we focus on studying cancer-specific lncRNAndashmRNA causal networks using differential network analysisThus the GBM-specific LSCC-specific OvCa-specific and PrCa-specific lncRNAndashmRNA causal networks are identified Asshown in Figure 4A the distributions of node degrees in thesefour cancer-specific lncRNAndashmRNA causal networks followpower law distributions well with R2frac14 09774 09923 09723and 08310 respectively Thus these four cancer-specificlncRNAndashmRNA causal networks are scale free indicating thatmost mRNAs are regulated by a small number of lncRNAs
Table 3 Degree distributions of global lncRNAndashmRNA causal regula-tory networks with different cutoffs in GBM LSCC OvCa and PrCa
Datasets Cutoffs Number of causalregulations
yfrac14axb R2
GBM 010 11 847 yfrac142274x06893 04161015 10 924 yfrac142495x07275 05460020 9732 yfrac142745x0767 06475025 8461 yfrac142958x08074 06757030 7176 yfrac143194x08319 06807035 6041 yfrac143363x08703 07203040 4997 yfrac143741x09348 07999045 4074 y54082x21034 08694050 3279 yfrac144194x118 09244055 2583 yfrac143896x1259 09463060 1862 yfrac143666x143 09792
LSCC 010 789 172 yfrac143143x06071 04829015 684 524 yfrac143475x06323 05841020 569 369 yfrac143905x06525 06578025 451 346 yfrac144855x06928 07789030 340 860 yfrac146341x07554 08796035 244 547 yfrac148147x08379 09504040 166 593 yfrac149724x0935 09848045 108 024 y51031x21018 09933050 66 335 yfrac149425x1068 09963055 37 632 yfrac147807x1089 09948060 19 547 yfrac146565x1169 09972
OvCa 010 333 146 yfrac143272x05928 05042015 232 794 yfrac144192x06262 06531020 159 872 yfrac146398x07216 08247025 112 792 yfrac148816x08356 09120030 80 808 yfrac141008x09472 09551035 57 099 yfrac149545x1014 09744040 38 517 yfrac148198x1066 09748045 24 439 y56575x21066 09697050 14 435 yfrac14540x1079 09551055 7973 yfrac144368x1107 09319060 4026 yfrac143285x1107 09460
PrCa 010 1 894 322 yfrac143089x06245 02750015 1 749 595 yfrac143586x06787 03582020 1 594 744 yfrac144013x07169 04316025 1 429 858 yfrac144271x0732 04919030 1 260 968 yfrac144389x07244 05616035 1 097 654 yfrac144406x0702 06470040 946 439 yfrac144485x06816 07338045 812 687 y55175x207005 08206050 694 558 yfrac14667x07588 08823055 584 834 yfrac148833x08469 09332060 474 654 yfrac141113x09684 09503
Note The AVCE cutoffs range from 010 to 060 with a step of 005
The bold values are the degree distributions of global lncRNA-mRNA causal reg-
ulatory networks with a compromised AVCE cutoff (045) in four human cancers
Module-specific lncRNA-mRNA causal regulatory networks | 9
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
Next we use four lists of lncRNAs and mRNAs associatedwith GBM LSCC OvCa and PrCa to discover lncRNAndashmRNAcausal networks that are associated with the four human can-cers We define that cancer-related lncRNAndashmRNA causal regu-latory relationships are those in which at least one regulatoryparty is cancer-related lncRNA or mRNA As a result wehave extracted GBM-related LSCC-related OvCa-related andPrCa-related lncRNAndashmRNA causal networks from the fourcancer-specific lncRNAndashmRNA causal networks (details inSupplementary File S1) To understand the potential biologicalprocesses and pathways of the four cancer-related lncRNAndashmRNA causal networks we identify significant GO biologicalprocesses and KEGG pathways using functional enrichmentanalysis In Figure 4B several top GO biological processes and
KEGG pathways such as cytokine activity [69] G-proteincoupled receptor binding [70] TNF signaling pathway [71] cAMPsignaling pathway [72] pathways in cancer are closely associ-ated with the occurrence and development of cancer Thisresult suggests that the identified cancer-related lncRNAndashmRNA causal networks may be involved in the occurrence anddevelopment of human cancer
Conservative network analysis highlights a corelncRNAndashmRNA causal regulatory network acrosshuman cancers
Although most of the lncRNAndashmRNA causal regulatory relation-ships are cancer-specific there are still a number of common
396 424
106 115
1370
133 173
1599
126
1729
1224
252
2523
2157
5081
0
2000
4000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
025
0050
0075
00
1000
0
Set Size
6 283
13 1 80 673
206
3969
76 3493
407
2816
9950
7
1948
7
0
250000
500000
750000
Inte
rsec
tion
Siz
e
PrCa
Ov Ca
LSCC
GBM
0e+0
0
2e+0
5
4e+0
5
6e+0
5
8e+0
5
Set Size
6 1 2 933
2
41
6 11
246
47
524
0
200
400In
ters
ectio
n S
ize
PrCa
Ov Ca
LSCC
GBM
0200
400
600
Set Size
Module-specific genes
Module-specific causal regulations Module-specific hub lncRNAs 808611
A
B C
Figure 3 Overlap and difference of module-specific genes module-specific causal regulations and module-specific hub lncRNAs across GBM LSCC OvCa and PrCa
(A) Module-specific genes (both lncRNAs and mRNAs) intersection plot (B) Module-specific causal regulations intersection plot (C) Module-specific hub lncRNAs inter-
section plot The red lines denote common genes and causal regulations across GBM LSCC OvCa and PrCa
10 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
causal regulatory relationships between the four global net-works To evaluate whether there is a common core of lncRNAndashmRNA causal regulatory relationships in the global regulatorynetworks across human cancers we concentrate on the con-served lncRNAndashmRNA causal regulatory relationships thatexisted in at least three human cancers
As shown in Figure 5A the majority of the conservedlncRNAndashmRNA causal regulatory relationships form a closely
connected community This finding indicates that the con-served lncRNAndashmRNA causal regulatory network may be a corenetwork across human cancers
The survival analysis shows that the lncRNAs and mRNAs inthe core network can significantly distinguish the metastasisrisks between the high- and low-risk groups in GBM OvCa andPrCa data sets (Figure 5B) This result suggests that the core net-work may act as a common network biomarker of GBM OvCa
Cancer-specific networks Causal regulations y=axb R2
GBM-specific 2816 y=4812x-1292 09774
LSCC-specific 99507 y=1034x-1014 09923
OvCa-specific 19487 y=7366x-1156 09723
PrCa-specific 808611 y=5243x-07055 08310
1 10 100 11001
10
100
1100
Degree of genes
Nu
mb
er o
f ge
nes
GBM-specific fitting curveLSCC-specific fitting curveOvCa-specific fitting curvePrCa-specific fitting curveGBM-specific degree distributionLSCC-specific degree distributionOvCa-specific degree distributionPrCa-specific degree distribution
solutecation symporter activitysymporter activity
gated channel activitysodium ion transmembrane transporter activity
ion channel activitysubstrate-specific channel activity
channel activitypassive transmembrane transporter activity
growth factor activitycation channel activity
metal ion transmembrane transporter activitycollagen bindingheparin binding
sulfur compound bindingintegrin binding
glycosaminoglycan bindingextracellular matrix binding
peptide receptor activityG-protein coupled peptide receptor activity
chemokine bindingG-protein coupled receptor binding
growth factor bindingcytokine binding
serine-type endopeptidase activitydeath receptor activity
tumor necrosis factor-activated receptor activitycytokine receptor activity
protein heterodimerization activitydipeptidase activity
glycoprotein bindingRAGE receptor binding
cytokine receptor bindingcytokine activity
GBM(203)
LSCC(1015)
OvCa(479)
PrCa(3019)
001
002
003
004
padjust
GeneRatio002
004
006
008
Taste transduction
ECM-receptor interaction
cAMP signaling pathway
Calcium signaling pathway
Neuroactive ligand-receptor interaction
PI3K-Akt signaling pathway
Pathways in cancerRegulation of actin cytoskeleton
Complement and coagulation cascades
AGE-RAGE signaling pathway in diabetic complications
Hematopoietic cell lineage
Th17 cell differentiation
Inflammatory bowel disease (IBD)
Malaria
Osteoclast differentiation
Influenza A
Tuberculosis
Intestinal immune network for IgA production
Chagas disease (American trypanosomiasis)
Leishmaniasis
TNF signaling pathway
Toll-like receptor signaling pathway
Rheumatoid arthritis
Cytokine-cytokine receptor interaction
GBM(129)
LSCC(500)
OvCa(266)
PrCa(1364)
GeneRatio
005
010
015
001
002
003
004padjust
GO enrichment analysis KEGG enrichment analysis
A
B
Figure 4 Differential network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) Degree distribution of cancer-specific lncRNAndashmRNA
causal networks in GBM LSCC OvCa and PrCa (B) Functional enrichment analysis of cancer-related lncRNAndashmRNA causal networks in GBM LSCC OvCa and PrCa
Module-specific lncRNA-mRNA causal regulatory networks | 11
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
and PrCa In Figure 5B we also find that the core network con-tains several cancer genes (34 26 30 and 38 cancer genes asso-ciated with GBM LSCC OvCa and PrCa respectively)
By conducting GO and KEGG enrichment analysis we findthat the core network is significantly enriched in 399 GO biologi-cal processes and 3 KEGG pathways (details in SupplementaryFile S2) Of the 399 GO biological processes 2 GO terms includ-ing negative regulation of cell adhesion (GO 0007162) and cyto-kine production in immune response (GO 0002367) areinvolved in three cancer hallmarks Tissue Invasion andMetastasis Tumor Promoting Inflammation and EvadingImmune Detection [73] This observation implies that the corenetwork may control these cancer-related hallmarks
Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
We divide the hub lncRNAs into two categories (1) conserved hublncRNAs which exist in at least three human cancers and (2)cancer-specific hub lncRNAs which only exist in single humancancer As a result we obtain 9 conserved hub lncRNAs and 828cancer-specific hub lncRNAs (include 11 GBM-specific 246 LSCC-specific 47 OvCa-specific and 524 PrCa-specific hub lncRNAs)
To evaluate whether the hub lncRNAs can distinguish meta-stasis risks of human cancers we use them to predict metasta-sis risks for tumor samples in GBM LSCC OvCa and PrCaAs shown in Figure 6A the conserved hub lncRNAs can discrim-inate the metastasis risks of tumor samples significantly(Log-rank P-valuelt 005) in four human cancers In Figure 6Bexcepting LSCC-specific hub lncRNAs owing to failing to fit aCox regression model GBM-specific OvCa-specific and PrCa-specific hub lncRNAs can discriminate the metastasis risks oftumor samples significantly in GBM OvCa and PrCa respec-tively (Log-rank P-valuelt 005) These results suggest that thehub lncRNAs are discriminative and can act as biomarkers todistinguish between high- and low-risk tumor samples
Experimentally validated lncRNAndashmRNA regulations aremostly bad hits for LncTar
Using a collection of experimentally validated lncRNAndashmRNAregulatory relationships (details in Supplementary File S3) asthe ground truth the numbers of experimentally confirmedlncRNAndashmRNA causal regulations are 17 14 20 and 42 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4)
Figure 5 Conservative network analysis of global lncRNAndashmRNA causal networks across GBM LSCC OvCa and PrCa (A) The core lncRNAndashmRNA causal network that
occurred in at least three human cancers The red diamond nodes and white circle nodes denote lncRNAs and mRNAs respectively (B) Survival analysis of the core
lncRNAndashmRNA causal network
12 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
We further apply a representative sequence-based methodcalled LncTar [11] to the experimentally validated lncRNAndashmRNAcausal regulatory relationships discovered by MSLCRN There aretwo main reasons for choosing LncTar First LncTar does nothave a limit to input RNA size Second LncTar uses a quantitativestandard rather than expert knowledge to determine whetherlncRNAs interact with mRNAs Similar to LncTar we also set -01as normalized binding free energy (ndG) cutoff to determinewhether lncRNAndashmRNA pairs interact with each other In otherwords the lncRNAndashmRNA pairs with ndG01 are regarded as
lncRNAndashmRNA regulatory relationships Among the experimen-tally confirmed lncRNAndashmRNA causal regulatory relationships thatare discovered by MSLCRN the numbers of successfully predictedlncRNAndashmRNA regulations using LncTar are 0 0 1 and 1 in GBMLSCC OvCa and PrCa respectively (details in SupplementaryFile S4) The result indicates that our experimentally confirmedlncRNAndashmRNA causal regulations are mostly bad hits for LncTarMeanwhile this result also suggests that expression-based andsequence-based methods may be complementary with each otherin predicting lncRNAndashmRNA regulations
A
B
Figure 6 Survival analysis of hub lncRNAs (A) Conserved hub lncRNAs in GBM LSCC OvCa and PrCa datasets (B) Survival analysis of cancer-specific hub lncRNAs
Module-specific lncRNA-mRNA causal regulatory networks | 13
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
MSLCRN networks are biologically meaningful
In this section we conduct GO and KEGG enrichment analysisto check whether the MSLCRN networks are associated withsome biological processes and pathways significantlyEnrichment analysis uncovers that 15 of the 23 (6522)MSLCRN networks in GBM 29 of the 38 (7632) MSLCRN net-works in LSCC 30 of the 45 (6667) MSLCRN networks inOvCa and 20 of the 32 (6250) MSLCRN networks in PrCa aresignificantly enriched in at least one GO biological process orKEGG pathway respectively (details in Supplementary File S5)This result implies that most of the MSLCRN networks in eachcancer are functional networks
We further investigate whether the MSLCRN networks aresignificantly enriched in GBM LSCC OvCa and PrCa diseasesrespectively We discover that 5 of the 23 MSLCRN networks7 of the 38 MSLCRN networks 6 of the 45 MSLCRN networks and6 of the 32 MSLCRN networks are significantly enriched in GBMLSCC OvCa and PrCa diseases respectively (details inSupplementary File S5) This result indicates that severalMSLCRN networks are closely associated with GBM LSCC OvCaand PrCa diseases
Altogether functional and disease enrichment analysis resultsshow that MSLCRN networks are biologically meaningful
Comparison with other PC-based networkinference methods
Based on a parallel version of the PC algorithm [56] the parallelIDA method in the second step of MSLCRN learns the causalstructure from expression data Owing to the popularity of thePC algorithm in causal structure learning some other networkinference methods including PCA-CMI [74] PCA-PMI [75] andCMI2NI [76] have also successfully applied it for network infer-ence Different from the three methods using conditional orpartial mutual information to infer lncRNAndashmRNA regulationsour method estimates causal effects to identify lncRNAndashmRNAregulations For comparisons we also use the PCA-CMI PCA-PMI and CMI2NI methods to infer module-specific lncRNAndashmRNA regulatory relationships Similar to our method (whichuses the parallel IDA method) the strength cutoff of lncRNAndashmRNA regulatory relationships in PCA-CMI PCA-PMI andCMI2NI methods is also set to 045
We evaluate the performance of each method in terms offinding experimentally validated lncRNAndashmRNA regulatoryrelationships functional MSLCRN networks and disease-associated MSLCRN networks As shown in Table 4 in terms ofthe three criteria MSLCRN performs the best in GBM LSCCOvCa and PrCa data sets This result suggests that MSLCRN is auseful method to infer module-specific lncRNAndashmRNA regula-tory network in human cancers
Conclusions and discussion
Notwithstanding lncRNAs do not encode proteins directly theyengage in a wide range of biological processes including cancerdevelopments through their interactions with other biologicalmacromolecules eg DNA RNA and protein Therefore touncover the functions and regulatory mechanisms of lncRNAsit is necessary to investigate lncRNAndashtarget regulatory networkacross different types of biological conditions
As a biological network the lncRNAndashtarget regulatory net-work exhibits a high degree of modularity Each functionalmodule is responsible for implementing specific biological
functions Moreover modularity is an important feature ofhuman cancer development and progression Thus from a net-work community point of view it is necessary to investigatemodule-specific lncRNAndashmRNA regulatory networks
Until now several statistical correlation or associationmeasures eg Pearson Mutual Information and ConditionalMutual Information have been used to infer gene regulatorynetworks However these methods tend to identify indirect reg-ulatory relationships between genes The identified gene regu-latory networks cannot reflect real lsquocausalrsquo regulatoryrelationships To better understand lncRNA regulatory mecha-nism it is vital to investigate how lncRNAs causally influencethe expression levels of their target mRNAs
In this work the computational methods for inferringlncRNAndashmRNA interactions and the publicly available data-bases of lncRNAndashmRNA regulatory relationships are firstreviewed Then to address the above two issues we propose anovel computational method MSLCRN to study module-specific lncRNAndashmRNA causal regulatory networks across GBMLSCC OvCa and PrCa diseases In contrast to other approaches(expression-based and sequence-based methods) MSLCRN hastwo unique features First MSLCRN considers the modularity oflncRNAndashmRNA regulatory networks Instead of studying globalregulatory relationships between lncRNAs and mRNAs wefocus on investigating the regulatory behavior of lncRNAs in themodules of interest Second considering the restrictions withconducting gene knockout experiments MSLCRN uses thecausal inference method IDA to infer causal relationshipsbetween lncRNAs and mRNAs based on expression data Thepromising results suggest that exploiting modularity of generegulatory network and causality-based method could provideanother effective approach to elucidating lncRNA functions andregulatory mechanisms of human cancers
Despite the advantages of MSLCRN there is still room toimprove it First the WGCNA method only allows clusteringgenes across all samples from the matched lncRNA and mRNAexpression data In fact a class of genes may exhibit similarexpression patterns across a subset of samples An alternativesolution of this problem is to use a bi-clustering method to iden-tify lncRNAndashmRNA co-expression modules Second it is stilltime-consuming to estimate causal effects from large expres-sion data sets When constructing the module-specific lncRNAndashmRNA causal regulatory networks the running time of parallelIDA is still high on estimating the causal effects of lncRNAs onmRNAs In future more efficient parallel IDA method is neededto explore lncRNAndashmRNA causal regulatory relationships inlarge-scale expression data Third previous research [38]has shown that the prediction accuracy of lncRNAndashmRNA inter-actions can be improved by integrating both sequence data and
Table 4 Comparison results in terms of experimentally validatedlncRNAndashmRNA regulatory relationships functional MSLCRN net-works and disease-associated MSLCRN networks
Methods GBM (a b c) LSCC (a b c) OvCa (a b c) PrCa (a b c)
MSLCRN (17 15 5) (14 29 7) (20 30 6) (42 20 6)PCA-CMI (2 13 0) (0 11 0) (0 7 1) (0 20 2)PCA-PMI (2 15 1) (0 11 0) (0 8 2) (1 18 1)CMI2NI (2 15 0) (0 11 0) (0 7 1) (0 19 1)
Note afrac14number of experimentally validated lncRNAndashmRNA regulatory relation-
ships bfrac14number of functional MSLCRN networks cfrac14number of disease-asso-
ciated MSLCRN networks
14 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
expression data To improve the accuracy of the predictedlncRNAndashmRNA regulatory relationships it is necessary todevelop an ensemble method (fusing sequence-based andexpression-based methods) to infer lncRNAndashmRNA regulatorynetwork Finally recent studies [77] show that lncRNAs can actas competing endogenous RNAs (ceRNAs) or miRNA sponges toattract miRNAs for bindings by competing with mRNAsTherefore some predicted lncRNAndashmRNA regulatory relation-ships are lncRNA-related ceRNAndashceRNA interactions To furtherimprove the prediction of lncRNAndashmRNA regulatory relation-ships it is necessary to remove the crosstalk relationshipsbetween lncRNAs and mRNAs
Key Points
bull Among ncRNAs lncRNAs are a large and diverse classof RNA molecules and are thought to be a gold mine ofpotential oncogenes anti-oncogenes and newbiomarkers
bull lncRNAs exhibit dynamic positive gene regulationacross human cancers
bull Hub lncRNAs are discriminative and can distinguishmetastasis risks of human cancers
bull There is still a lack of ground truth for validating pre-dicted lncRNAndashmRNA regulatory relationships
bull There is still room to develop reliable methods for elu-cidating lncRNA regulatory mechanisms
Supplementary Data
Supplementary data are available online at httpsacademicoupcombib
Funding
The National Natural Science Foundation of China (No61702069) the Applied Basic Research Foundation ofScience and Technology of Yunnan Province (No2017FB099) the NHMRC Grant (No 1123042) and theAustralian Research Council Discovery Grant (NoDP140103617)
References1 Pang KC Frith MC Mattick JS Rapid evolution of noncoding
RNAs lack of conservation does not mean lack of functionTrends Genet 200622(1)1ndash5
2 Kung JT Colognori D Lee JT Long noncoding RNAs pastpresent and future Genetics 2013193(3)651ndash69
3 Schmitt AM Chang HY Long noncoding RNAs in cancer path-ways Cancer Cell 201629(4)452ndash63
4 Zhang Y Tao Y Liao Q Long noncoding RNA a crosslink inbiological regulatory network Brief Bioinform 2017 doi 101093bibbbx042
5 Yoon JH Abdelmohsen K Gorospe M Posttranscriptionalgene regulation by long noncoding RNA J Mol Biol 2013425(19)3723ndash30
6 Gerlach W Giegerich R GUUGle a utility for fast exact match-ing under RNA complementary rules including G-U base pair-ing Bioinformatics 200622(6)762ndash4
7 Muckstein U Tafer H Hackermuller J et al Thermodynamicsof RNA-RNA binding Bioinformatics 200622(10)1177ndash82
8 Tafer H Hofacker IL RNAplex a fast tool for RNA-RNA inter-action search Bioinformatics 200824(22)2657ndash63
9 Busch A Richter AS Backofen R IntaRNA efficient predictionof bacterial sRNA targets incorporating target site accessibil-ity and seed regions Bioinformatics 200824(24)2849ndash56
10Kato Y Sato K Hamada M et al RactIP fast and accurate pre-diction of RNA-RNA interaction using integer programmingBioinformatics 201026(18)i460ndash6
11Li J Ma W Zeng P et al LncTar a tool for predicting the RNAtargets of long noncoding RNAs Brief Bioinform 201516(5)806ndash12
12Fukunaga T Hamada M RIblast an ultrafast RNA-RNA inter-action prediction system based on a seed-and-extensionapproach Bioinformatics 201733(17)2666ndash74
13Derrien T Johnson R Bussotti G et al The GENCODE v7 cata-log of human long noncoding RNAs analysis of their genestructure evolution and expression Genome Res 201222(9)1775ndash89
14Gloss BS Dinger ME The specificity of long noncoding RNAexpression Biochim Biophys Acta 20161859(1)16ndash22
15Munshi A Mohan V Ahuja YR Non-coding RNAs a dynamicand complex network of gene regulation J PharmacogenomicsPharmacoproteomics 20167156
16Liao Q Liu C Yuan X et al Large-scale prediction of long non-coding RNA functions in a coding-non-coding gene co-expression network Nucleic Acids Res 201139(9)3864ndash78
17Guo Q Cheng Y Liang T et al Comprehensive analysis oflncRNA-mRNA co-expression patterns identifies immune-associated lncRNA biomarkers in ovarian cancer malignantprogression Sci Rep 20155(1)17683
18Du Y Xia W Zhang J et al Comprehensive analysis of longnoncoding RNA-mRNA co-expression patterns in thyroidcancer Mol Biosyst 201713(10)2107ndash15
19Wu W Wagner EK Hao Y et al Tissue-specific co-expressionof long non-coding and coding RNAs associated with breastCancer Sci Rep 2016632731
20Barabasi AL Oltvai ZN Network biology understanding thecellrsquos functional organization Nat Rev Genet 20045(2)101ndash13
21Langfelder P Horvath S WGCNA an R package for weightedcorrelation network analysis BMC Bioinformatics 20089559
22Maathuis HM Kalisch M Buhlmann P Estimating high-dimensional intervention effects from observational dataAnn Stat 200937(6A)3133ndash64
23Maathuis HM Colombo D Kalisch M et al Predicting causaleffects in large-scale systems from observational data NatMethods 20107(4)247ndash8
24Le T Hoang T Li J et al A fast PC algorithm for high dimen-sional causal discovery with multi-core PCs IEEEACM TransComput Biol Bioinform 2016 doi 101109TCBB20162591526
25Du Z Fei T Verhaak RG et al Integrative genomic analysesreveal clinically relevant long noncoding RNAs in humancancer Nat Struct Mol Biol 201320(7)908ndash13
26Bernhart SH Tafer H Muckstein U et al Partition functionand base pairing probabilities of RNA heterodimersAlgorithms Mol Biol 20061(1)3
27Alkan C Karakoc E Nadeau JH et al RNA-RNA interactionprediction and antisense RNA target search J Comput Biol200613(2)267ndash82
28Seemann SE Richter AS Gesell T et al PETcofold predictingconserved interactions and structures of two multiple align-ments of RNA sequences Bioinformatics 201127(2)211ndash19
29Wenzel A Akbasli E Gorodkin J RIsearch fast RNA-RNAinteraction search using a simplified nearest-neighborenergy model Bioinformatics 201228(21)2738ndash46
Module-specific lncRNA-mRNA causal regulatory networks | 15
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
30Alkan F Wenzel A Palasca O et al RIsearch2 suffix array-based large-scale prediction of RNA-RNA interactions andsiRNA off-targets Nucleic Acids Res 201745e60
31Hu R Sun X lncRNATargets a platform for lncRNA target pre-diction based on nucleic acid thermodynamics J BioinformComput Biol 201614(4)1650016
32Terai G Iwakiri J Kameda T et al Comprehensive predictionof lncRNA-RNA interactions in human transcriptome BMCGenomics 201617(Suppl 1)12
33Liu J Wu S Li M et al LncRNA expression profiles reveal theco-expression network in human colorectal carcinoma Int JClin Exp Pathol 201691885ndash1892
34Huang S Feng C Chen L et al Identification of potential keylong non-coding RNAs and target genes associated withpneumonia using long non-coding RNA sequencing (lncRNA-Seq) a preliminary study Med Sci Monit 2016223394ndash408
35Li J Xu Y Xu J et al Dynamic co-expression network analysisof lncRNAs and mRNAs associated with venous congestionMol Med Rep 201614(3)2045ndash51
36Fu M Huang G Zhang Z et al Expression profile of long non-coding RNAs in cartilage from knee osteoarthritis patientsOsteoarthritis Cartilage 201523(3)423ndash32
37Zhang F Gao C Ma XF et al Expression profile of long non-coding RNAs in peripheral blood mononuclear cells frommultiple sclerosis patients CNS Neurosci Ther 201622(4)298ndash305
38 Iwakiri J Terai G Hamada M Computational prediction oflncRNA-mRNA interactionsby integrating tissue specificity inhuman transcriptome Biol Direct 201712(1)15
39Lv L Wei M Lin P et al Integrated mRNA and lncRNA expres-sion profiling for exploring metastatic biomarkers of humanintrahepatic cholangiocarcinoma Am J Cancer Res 20177688ndash99
40Hao Y Wu W Li H et al NPInter v30 an upgraded databaseof noncoding RNA-associated interactions Database 20162016baw057
41Chen G Wang Z Wang D et al LncRNADisease a databasefor long-non-coding RNA-associated diseases Nucleic AcidsRes 201341D983ndash6
42 Jiang Q Wang J Wu X et al LncRNA2Target a database fordifferentially expressed genes after lncRNA knockdown oroverexpression Nucleic Acids Res 201543D193ndash6
43Zhou Z Shen Y Khan MR et al LncReg a reference resourcefor lncRNA-associated regulatory networks Database 20152015bav083
44Denisenko E Ho D Tamgue O et al IRNdb the database ofimmunologically relevant non-coding RNAs Database 20162016baw138
45Liu CJ Gao C Ma Z et al lncRInter a database of experimen-tally validated long non-coding RNA interaction J GenetGenomics 201744(5)265ndash8
46Li JH Liu S Zhou H et al starBase v20 decoding miRNA-ceRNA miRNA-ncRNA and protein-RNA interaction net-works from large-scale CLIP-Seq data Nucleic Acids Res 201442(D1)D92ndash7
47Liu Y Zhao M lnCaNet pan-cancer co-expression networkfor human lncRNA and cancer genes Bioinformatics 201632(10)1595ndash7
48Zhou QZ Zhang B Yu QY et al BmncRNAdb a comprehen-sive database of non-coding RNAs in the silkworm Bombyxmori BMC Bioinformatics 201617(1)370
49Park C Yu N Choi I et al lncRNAtor a comprehensiveresource for functional investigation of long non-codingRNAs Bioinformatics 201430(17)2480ndash5
50Bhartiya D Pal K Ghosh S et al lncRNome a comprehensiveknowledgebase of human long noncoding RNAs Database20132013bat034
51Zhao Z Bai J Wu A et al Co-LncRNA investigating thelncRNA combinatorial effects in GO annotations and KEGGpathways based on human RNA-Seq data Database 20152015bav082
52 Jiang Q Ma R Wang J et al LncRNA2Function a compre-hensive resource for functional investigation of humanlncRNAs based on RNA-seq data BMC Genomics 201516(Suppl 3)S2
53Chan WL Huang HD Chang JG lncRNAMap a map of puta-tive regulatory functions in the long non-coding transcrip-tome Comput Biol Chem 20145041ndash9
54Langfelder P Horvath S Fast R functions for robust correla-tions and hierarchical clustering J Stat Softw 2012461ndash17
55 Judea P Causality Models Reasoning and Inference New YorkNY Cambridge University Press 2000
56Spirtes P Glymour C Scheines R Causation Prediction andSearch 2nd edn Cambridge MIT Press 2000
57Le T Hoang T Li J et al ParallelPC an R package for efficientconstraint based causal exploration arXiv prepring 2015arXiv151003042v1
58Hahn MW Kern AD Comparative genomics of centrality andessentiality in three eukaryotic protein-interaction networksMol Biol Evol 200522(4)803ndash6
59Song J Singh M Roth FP From hub proteins to hub modulesthe relationship between essentiality and centrality in theyeast interactome at different scales of organization PLoSComput Biol 20139(2)e1002910
60Therneau TM Grambsch PM Modeling Survival Data Extendingthe Cox Model New York Springer Press 2000
61Yu G Wang L-G Han Y He Q-Y clusterProfiler an R packagefor comparing biological themes among gene clusters OMICS201216(5)284ndash7
62Ashburner M Ball CA Blake JA et al Gene ontology tool forthe unification of biology Nat Genet 200025(1)25ndash9
63Kanehisa M Goto S KEGG Kyoto Encyclopedia of Genes andGenomes Nucleic Acids Res 200028(1)27ndash30
64Ning S Zhang J Wang P et al Lnc2Cancer a manually curateddatabase of experimentally supported lncRNAs associatedwith various human cancers Nucleic Acids Res 201644(D1)D980ndash5
65Wang Y Chen L Chen B et al Mammalian ncRNA-diseaserepository a global view of ncRNA-mediated disease net-work Cell Death Dis 20134e765
66Pi~nero J Bravo A Queralt-Rosinach N et al DisGeNET a com-prehensive platform integrating information on humandisease-associated genes and variants Nucleic Acids Res 201745(D1)D833ndash9
67Conway JR Lex A Gehlenborg N UpSetR an R package for thevisualization of intersecting sets and their propertiesBioinformatics 201733(18)2938ndash40
68Wahlestedt C Targeting long non-coding RNA to therapeuti-cally upregulate gene expression Nat Rev Drug Discov 201312(6)433ndash46
69Mantovani G Maccio A Lai P et al Cytokine activity incancer-related anorexiacachexia role of megestrol acetateand medroxyprogesterone acetate Semin Oncol 19982545ndash52
70Dorsam RT Gutkind JS G-protein-coupled receptors and can-cer Nat Rev Cancer 20077(2)79ndash94
71Wang X Lin Y Tumor necrosis factor and cancer buddies orfoes Acta Pharmacol Sin 200829(11)1275ndash88
16 | Zhang et al
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats
72Fajardo AM Piazza GA Tinsley HN The role of cyclic nucleo-tide signaling pathways in cancer targets for prevention andtreatment Cancers 20146(1)436ndash58
73Hanahan D Weinberg RA Hallmarks of cancer the next gen-eration Cell 2011144(5)646ndash74
74Zhang X Zhao XM He K et al Inferring gene regulatory net-works from gene expression data by path consistencyalgorithm based on conditional mutual informationBioinformatics 201228(1)98ndash104
75Zhao J Zhou Y Zhang X et al Part mutual information forquantifying direct associations in networks Proc Natl Acad SciUSA 2016113(18)5130ndash5
76Zhang X Zhao J Hao JK et al Conditional mutual inclusiveinformation enables accurate quantification of associationsin gene regulatory networks Nucleic Acids Re 201543(5)e31
77Le TD Zhang J Liu L et al Computational methods for identi-fying miRNA sponge interactions Brief Bioinform 201718(4)577ndash90
Module-specific lncRNA-mRNA causal regulatory networks | 17
Downloaded from httpsacademicoupcombibadvance-article-abstractdoi101093bibbby0084833470by University of Durham useron 01 February 2018View publication statsView publication stats