Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Research ArticleIdentification of Hub Genes Associated with ImmuneInfiltration in Cardioembolic Stroke by Whole BloodTranscriptome Analysis
Qiaoqiao Li ,1,2,3 Xueping Gao ,4 Xueshan Luo,1,2,3 Qingrui Wu,1,2,3 Jintao He,1,2,3
Yang Liu,1,3 Yumei Xue,1,3 Shulin Wu ,1,3 and Fang Rao 1,2,3
1Guangdong Cardiovascular Institute, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, China2Research Center of Medical Sciences, Guangdong Provincial People’s Hospital, China3Provincial Key Laboratory of Clinical Pharmacology, Guangdong Academy of Medical Sciences, Guangzhou 510080, China4Department of Clinical Laboratory Medicine, Southwest Hospital, Third Military Medical University (Army Medical University),Chongqing 400038, China
Correspondence should be addressed to Fang Rao; [email protected]
Received 3 November 2021; Accepted 11 December 2021; Published 15 January 2022
Academic Editor: Ting Su
Copyright © 2022 Qiaoqiao Li et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Cardioembolic stroke (CS) is the most common type of ischemic stroke in the clinic, leading to high morbidity and mortalityworldwide. Although many studies have been conducted, the molecular mechanism underlying CS has not been fully grasped.This study was aimed at exploring the molecular mechanism of CS using comprehensive bioinformatics analysis and providingnew insights into the pathophysiology of CS. We downloaded the public datasets GSE58294 and GSE16561. Differentiallyexpressed genes (DEGs) were screened via the limma package using R software. CIBERSORT was used to estimate theproportions of 22 immune cells based on the gene expression profiling of CS patients. Using weighted gene correlationnetwork analysis (WGCNA) to cluster the genes into different modules and detect relationships between modules and immunecell types, hub genes were identified based on the intersection of the protein-protein interaction (PPI) network analysis andWGCNA, and their clinical significance was then verified using another independent dataset GSE16561. Totally, 319 geneswere identified as DEGs and 5413 genes were clustered into nine modules using WGCNA. The blue module, with the highestcorrelation coefficient, was identified as the key module associated with stroke, neutrophils, and B cells naïve. Based on the PPIanalysis and WGCNA, five genes (MCEMP1, CLEC4D, GPR97, TSPAN14, and FPR2) were identified as hub genes. Correlationanalysis indicated that hub genes had general association with infiltration-related immune cells. ROC analysis also showed theyhad potential clinical significance. The results were verified using another dataset, which were consistent with our analysis. Fivecrucial genes determined using integrative bioinformatics analysis might play significant roles in the pathophysiologicalmechanism in CS and be potential targets for pharmaceutic therapies.
1. Introduction
Stroke is a devastating cerebrovascular disease, containingtwo types: ischemic stroke (IS) and hemorrhagic stroke(HS). Accounting for approximately 80% of all cases, ische-mic stroke is the most common subtype, which is triggeredby arterial embolization or thromboembolism in the cere-brum [1]. A wide range of clinical manifestations of IS
includes physical disability, impaired cognitive and emo-tional abilities [2].
Constituting about 20% of ischemic stroke, cardioem-bolic stroke (CS) was mainly caused by nonvalvular atrialfibrillation, myocardial infarction, and rheumatic heartdisease [3, 4]. Atrial fibrillation is not only the most com-mon sustained cardiac arrhythmia but also one of the mostfrequent risk factors that contribute to CS. Moreover, along
HindawiDisease MarkersVolume 2022, Article ID 8086991, 23 pageshttps://doi.org/10.1155/2022/8086991
with fast economic development, social urbanization, agingof population, and changes of lifestyle, the prevalence ofCS has increased dramatically, imposing a tremendousmedical and social-economic burden on patients. Preventivestrategies are generally recommended for all cardioembolicstroke patients, including universal elements of cardiovascu-lar risk factor management such as treatment of diabetesmellitus, blood pressure control, alcohol and tobacco reduc-tion, and antiplatelet medication [5]. Diagnosis of stroke wasrestricted to history, physical examination, and radiologicalimaging, which were with limited availability and highercost. Furthermore, due to the lack of specific early diagnosticmarkers at the early onset of CS, missed diagnosis andmisdiagnosis forces remain relatively common in patients.Thrombolytic therapies are currently the most effectivetreatment available after CS, regrettably, due to the mainmethod for the treatment of cerebral infarction which islimited by a time window (about 3 h), which results in onlyaround one-third of patients with diagnosed CS which aresuitable for receiving these curative therapies [6]. Hence,identification of specific biomarkers for patients at the earlyonset of CS will prove beneficial. The mechanism of cardio-embolic stroke (CS) is complex and involves a myriad ofdistinct pathogenic factors, consisting of inflammation,oxidative stress, excitotoxicity, apoptosis, excitotoxicity, ionimbalance, and neuroprotection [7]. Nevertheless, thedetailed communicative regulatory mechanisms leading toCS remain incompletely understood. Increasing evidenceindicates that immune cells play a considerable role in thepathogenesis of CS. Immune-mediated inflammatorymarkers such as CRP and IL-6 have been reported to beassociated with CS [8]. However, the specific molecularmechanism underlying immune or inflammatory marker-mediated CS still needs further investigation.
Weighted gene correlation network analysis (WGCNA)is used to build a coexpression network, detect genemodules, and assess the relationships between gene modulesand the biological phenotypes in order to screen the candi-date diagnostic biomarkers or potential therapeutic targets.
In our study, we aimed to explore the association betweenimmune cells and CS using integrated bioinformaticsmethods. CIBERSORT was applied to estimate the propor-tions of 22 immune cells based on the gene expression profil-ing of CS patients, and WGCNA was then used to identifythe key module associated with CS and immune infiltration.Candidate hub genes were then identified within the keymodules. Based on the protein-protein interaction (PPI) net-work, hub genes were identified. Potential clinical signifi-cance of the genes was then determined by using thereceiver operating characteristic curve analysis. We hope thatthis research can offer new insights into significant diagnosticbiomarker and potential therapeutic targets for treating CS.
2. Methods
2.1. Medical Ethics. The raw datasets were available from theNCBI Gene Expression Omnibus repository under accessionnumber (GEO https://www.ncbi.nlm.nih.gov/geo/info/
linking.html; GSE58294 and GSE16561). In our study,neither human trials nor animal experiments were applied.
2.2. Data Acquisition. We downloaded the correspondingdatasets (GSE58294, GSE16561) available from the GEOdatabase for further analysis. In the dataset of GSE58294(GPL570), the expression matrix of a total of 92 individualswas acquired from the blood samples, including 69 cardio-embolic stroke patients and 23 controls. Cardioembolicstroke subjects were analyzed at three time points: less than3 hours, 5 hours, and 24 hours following the event. TheGSE16561 dataset (GPL6883) contains 39 diagnosed withischemic stroke, and 24 healthy control subjects. All sampleswere obtained from the blood of patients.
2.3. Data Preprocessing and Differentially Expressed Gene(DEG) Screening. The origin microarray data preprocessing,including normalization and background correction, wasperformed by using the “Affy” package in R; then, the geneexpression profile was generated [9]. Principal componentanalysis was performed to distinguish the cardioembolicstroke and control samples. Differentially expressed genes(DEGs) between two groups were identified using the Biocon-ductor package Limma (linear models for microarray analy-sis) [10]. Genes with ∣log 2 fold‐change ðFCÞ ∣ >1 and adjustp value < 0.05 were regarded as statistically significant DEGs.
2.4. GO and KEGG Analyses. Gene Ontology (GO) analysisand a Kyoto Encyclopedia of Genes and Genomes (KEGG)term enrichment analysis were performed using Cluster-Profiler software in R language, which showed the biolog-ical processes (BPs), cellular components (CCs), molecularfunctions (MFs), and pathways related to DEGs and genesin the key modules identified in the following. The enrich-ment significance threshold was set to an adjusted p valuebelow 0.05 [11].
2.5. GSEA Analysis. Using the median expression level ofMast Cell Expressed Membrane Protein 1(MCEMP1) as cut-off, cardioembolic stroke samples were divided into low andhigh expression groups. Gene set enrichment analysis(GSEA) was conducted by the “gseaplot2” package to iden-tify the differentially activated signaling pathways in the highMCEMP1 expression group. An ∣NES∣ > 1 and FDR < 0:25were considered as statistically significant (NES: normalizedenrichment score; FDR: false discovery rate) [10].
2.6. Immune Cell Infiltration Analysis. Normalized geneexpression matrixes were utilized to estimate the relativeproportions of 22 types of infiltrating immune cells usingthe CIBERSORT algorithm [12]. The correlation betweenimmune cells was determined by Spearman’s correlationand visualized by heatmap. Next, significant immune cellsbetween cardioembolic stroke and control samples werescreened with the threshold Wilcoxon test at p value < 0.05.
2.7. Weighted Gene Coexpression Network Analysis
2.7.1. Construction of Coexpression Network. The genesranking in the top 25% of the median absolute deviation inthe corresponding expression matrix were selected for
2 Disease Markers
−4
0
4
Nor
mal
ized
sign
al in
tens
ity
Group
Control
Cardioembolic stroke
GSM
1406
033
GSM
1406
034
GSM
1406
035
GSM
1406
036
GSM
1406
037
GSM
1406
038
GSM
1406
039
GSM
1406
040
GSM
1406
041
GSM
1406
042
GSM
1406
043
GSM
1406
044
GSM
1406
045
GSM
1406
046
GSM
1406
047
GSM
1406
048
GSM
1406
049
GSM
1406
050
GSM
1406
051
GSM
1406
052
GSM
1406
053
GSM
1406
054
GSM
1406
055
GSM
1406
056
GSM
1406
057
GSM
1406
058
GSM
1406
059
GSM
1406
060
GSM
1406
061
GSM
1406
062
GSM
1406
063
GSM
1406
064
GSM
1406
065
GSM
1406
066
GSM
1406
067
GSM
1406
068
GSM
1406
069
GSM
1406
070
GSM
1406
071
GSM
1406
072
GSM
1406
073
GSM
1406
074
GSM
1406
075
GSM
1406
076
GSM
1406
077
GSM
1406
078
GSM
1406
079
GSM
1406
080
GSM
1406
081
GSM
1406
082
GSM
1406
083
GSM
1406
084
GSM
1406
085
GSM
1406
086
GSM
1406
087
GSM
1406
088
GSM
1406
089
GSM
1406
090
GSM
1406
091
GSM
1406
092
GSM
1406
093
GSM
1406
094
GSM
1406
095
GSM
1406
096
GSM
1406
097
GSM
1406
098
GSM
1406
099
GSM
1406
100
GSM
1406
101
GSM
1406
102
GSM
1406
103
GSM
1406
104
GSM
1406
105
GSM
1406
106
GSM
1406
107
GSM
1406
108
GSM
1406
109
GSM
1406
110
GSM
1406
111
GSM
1406
112
GSM
1406
113
GSM
1406
114
GSM
1406
115
GSM
1406
116
GSM
1406
117
GSM
1406
118
GSM
1406
119
GSM
1406
120
GSM
1406
121
GSM
1406
122
GSM
1406
123
GSM
1406
124
(a)
−60
−40
−20
0
20
40
60
PC2
(11.
3%)
−50 0 50
PC1 (15.2%)
GroupControlCardioembolic stroke
(b)
−4
−2
0
2
4
6
8
UM
AP2
−2 0 2 4UMAP1
GroupControlCardioembolic stroke
(c)
Figure 1: Continued.
3Disease Markers
Weighted Gene Coexpression Network Analysis (WGCNA)by the “WGCNA” package in R [13]. Briefly, WGCNA wasapplied to construct a coexpression network based on thematrix of pairwise Pearson correlation coefficients. To satisfythe scale-free topology, an appropriate soft thresholdingpower β should be determined. Then, the genes can be clus-tered into different functional modules with different colors,which were clustered and classified by the dynamic tree cutalgorithm with min. Module size was 50, and the minimumheight for merging modules was 0.25. The grey modulerepresented the genes that cannot be merged.
2.7.2. Correlation Analysis and Identification of Key Modules.Module eigengenes (MEs) were considered to be a represen-tation of the corresponding gene expression profile in differ-ent modules. Stroke and immune cell infiltration levels wereselected as the main clinical traits. The module membership(MM) was defined as the correlation of MEs with geneexpression. Gene significance (GS) was defined as the corre-lation coefficient in the Spearman correlation between geneexpression and clinical traits. Modules with the highest GSlevels were regarded as key modules and selected for furtheranalysis. Furthermore, genes with MM> 0:8 and GS > 0:5were defined as hub genes [14].
2.8. Construction of PPI (Protein-Protein Interaction)Network. DEGs were imported to the search tool of theSTRING database to generate the PPI network identifyingthe interactions between the hub genes with the thresholdof interaction score > 0:9. The hub genes’ expression patternand biological function in the PPI network were visualizedby “igraph” (version 1.2.6) and “ggraph” (version 1.0.1)packages in “R” [15, 16]. PPI networks of MCEMP1 werecalculated using the GeneMANIA algorithm [17].
2.9. Identification of Key Genes. To screen out the key genesin the development of cardioembolic stroke, we made inter-section of hub genes in the WGCNA and PPI as candidatehub genes. Heatmaps of the candidate hub gene expressionpatterns were generated with the R package “ComplexHeat-map” (version 2.0.0) [18]. In order to ensure the accuracy androbustness of identification of key genes, CytoHubba, a plug-inCytoscape software (version 3.6.7), was applied to screen thetop 10 key genes in candidate hub genes’ PPI networks viathe degree methods [19]. The intersection of five algorithmsin CytoHubba was employed to generate real key genes.
2.10. Correlation Analysis of Immune Cells and Key Genes.We investigated the relationship between key gene’s expres-sion and relative percentages of immune cells in
UpNo significantDown
0
10
20
30
−2 0 2
−Log
10 (P
.adj)
Log2 (Fold change)
(d)
ZNF595GLOD5SPIBLPAR4LECT2TIMM8AAP000525.9SRCIN1FAM133ASH3GL3LOC100996760FAT3PRTGBC048132ACSM2APOM121L9POVOL2GAGE1HLA−DQA1BTNL3HLA−DRB4CD177VSIG4HTRA1FGF13MAOAMETTL7BBTNL8FCARTOPORS−AS1ARG1OLAHGRB10INSCSLC26A8MCEMP1LILRA5CACNA1EANKRD22NT5DC4
Group
Control
Cardioembolic stroke
−4
−2
0
2
4
(e)
Figure 1: Data preprocessing and identification of DEGs. (a) Box plots after normalization of the raw data between cardioembolic strokeand healthy control samples. (b) PCA for cardioembolic stroke and healthy control samples. (c) UMAP analysis. (d) The volcano plot ofDEGs. (e) The heatmap of top 40 DEGs. DEGs: differentially expressed genes; PCA: principal component analysis; UMAP: uniformmanifold approximation and projection.
4 Disease Markers
cardioembolic stroke samples by using Spearman correlationtest analysis. The results were visualized using the R packageggplot2 in R software [10]. p < 0:05 was considered statisti-cally significant.
2.11. Receiver Operating Characteristic (ROC) Analysis. Toidentify the potential clinical significance of key genes, thediagnostic values of the key genes in GSE58294 were evalu-
ated by applying “pROC” packages [20]. Another dataset(GSE16561) was used for independent verification.
2.12. Validation of the Key Gene Expression.We downloadedthe validation datasets GSE58294 and GSE16561 from theGEO database (http://www.ncbi.nlm.nih.gov/geo/). Allexpression values of genes were normalized. A Wilcoxonrank-sum test was performed by comparing the key gene
GO:0042119
GO:0043312G
O:0002283
GO:0002446
GO:0042581GO:0070820
GO:0070
821
GO
:003
5580
GO:0035
579
GO:1904724
Decreasing Increasing
Z−score
LogFC
Upregulated
IDGO: 0042119
Description
GO: 0043312GO: 0002283GO: 0002446GO: 0042581GO: 0070820GO: 0070821GO: 0035580GO: 0035579GO: 1904724
Neutrophil activationNeutrophil degranulation
Neutrophil activation involved in immune responseNeutrophil mediated immunity
Specific granuleTertiary granule
Tertiary granule membraneSpecific granule lumen
Specific granule membraneTertiary granule lumen
(a)
TSPAN
14
TCIRG1
MM
P8
HEBP2FPR2HPSELRG1GPR84TNFAIP6
SLPIGYG1
DEFA4
IL18RAP
CLEC5A
CFD
FOLR3
ANXA3
MMP9CLE
C4DCD
177
FCAR
MCE
MP1
ARG
1
GO:00
4211
9
GO:0043312
GO:0002283
GO:0042581GO:0070820
GO
:0070821
1.5 2.0LogFC
(b)
0.0
0.2
0.4
Enric
hmen
t sco
re
−3−2−1
012
Rank
ed li
st m
etric
REACTOME_NEUTROPHIL_DEGRANULATIONREACTOME_CELLULAR_RESPONSES_TO_EXTERNAL_STIMULIREACTOME_TOLL_LIKE_RECEPTOR_CASCADESREACTOME_INTERLEUKIN_1_FAMILY_SIGNALINGKEGG_NOD_LIKE_RECEPTOR_SIGNALING_PATHWAY
5000 10000 15000Rank in ordered dataset
(c)
Figure 2: Functional enrichment analysis of DEGs. (a) GOCircle plot showing the top 10 of significantly changed functional terms of DEGs.The height of bars in the inner ring represented the -log 10 adjust P values of GO terms, with higher bars indicating higher significance ofthe GO category, and color corresponded to the z-score. The scatter plots in the out ring showed the expression levels of each gene (log foldchange) in the corresponding GO terms. The description of GO categories was displayed in the table at the bottom. (b) GO chord plotshowed genes linked by ribbons to their corresponding GO terms. Different colors represented different GO terms. (c) Results of geneset enrichment analysis in the cardioembolic stroke group.
5Disease Markers
Group
−2
−1
0
1
2
GroupCardioembolic stroke
Control
(a)
CD4-positive, alpha-beta T cell cytokine production
Positive regulation of triglyceride metabolic process
T cell cytokine production
Triglyceride metabolic process
Glycerolipid biosynthetic process
Regulation of innate immune response
Neutrophil degranulation
Neutrophil activation involved in immune response
Neutrophil mediated immunity
Neutrophil activation
0.05 0.10 0.15 0.20Generatio
0.01
0.02
p.adjust
Counts3
10
17
(b)
Figure 3: Continued.
6 Disease Markers
FBXO30
HECW2
KBTBD7
OSBPL1AUBE2S
ARG1
MMP8
MMP9CD19
FCGR1A
HLA−DQA1
ITCH
CD177CD79B
CFD
CLEC4D
CLEC5A
DEFA4
FCARFOLR3
FPR2
GPR84
GPR97
HEBP2
HPSE
LRG1 MCEMP1
SLPI
TNFAIP6
TSPAN14
HIST1H4D
HIST1H4HAGFG1
COL4A3
COX7B
CUX1
ENAM
F5
FGF9 LIN7A
MAP2K6
MSRB2
NDUFB3
PPFIA1SH3GL3SPIDR
F12
COL5A3
UQCRQ
TGFA GAB1
S1PR3
LPAR4
MSL3SYN2
TXN
PTPRDSPRY1
XRCC2
Co-expression0.00.20.40.60.8
−2
−1
0
1
2
LogFC
Cluster
Adaptive immune systemCollagen metabolic processImmune response regulating signaling pathwayLeukocyte activationNegative regulation of megakaryocyte differentiation
(c)
Hub genes in WGCNA Hub genes in DEGsPPI Network
85 25 33
(d)
TNFAIP6FPR2TGFAHIST1H4DLIN7AAGFG1CLEC4DKBTBD7MSL3GAB1PPFIA1ARG1HECW2MCEMP1MAP2K6HEBP2TXNSPIDRTSPAN14OSBPL1AF5GPR97LRG1HIST1H4HMSRB2
Group
−4
−2
0
2
4
Group
Cardioembolic stroke
Control
(e)
Figure 3: Continued.
7Disease Markers
expression value between stroke and control using a p value< 0.05 to indicate statistical significance. Box plots of theexpression of key genes were illustrated by the ggplot2 Rpackage.
2.13. Statistical Analysis. All data analyses were performed inR v3.6.3. Details of these bioinformatics analyses weredescribed in corresponding subsections. A p value < 0.05was defined as statistically significant.
3. Results
3.1. Data Preprocessing and DEG Screening. Box plots afternormalization of the raw data are illustrated in Figure 1(a).Principal component analysis (PCA) and uniform manifoldapproximation and projection (UMAP) analysis showed agood distinction between cardioembolic stroke and controlsamples (Figures 1(b) and 1(c)). Under the screening criteriaof padjust < 0:05 and ∣ log 2 fold‐change ðFCÞ∣ > 1, a total of319 genes were identified as DEGs, of which 198 genes wereupregulated and 121 genes were downregulated. Thevolcano plot of DEGs is displayed in Figure 1(d). DEGs inGSE58294 were arranged based on the fold change ofexpression values, the top 40 were illustrated by applyingheatmap (Figure 1(e)).
3.2. Functional Enrichment Analyses. All DEGs were selectedfor function enrichment analysis; the top 10 most significantGO terms according to their adjust p values were displayedin GOCircle plots (Figure 2(a)). The majority of terms inthe biological process category were associated with neutro-phil activation (GO:0042119), neutrophil degranulation
(GO:0043312), and neutrophil activation involved inimmune response (GO:0002283). Genes involved in biolog-ical processes that were upregulated in cardioembolic strokewere shown using a chord plot (Figure 2(b)). We furtherexplored our microarray data by using GSEA with the“ggplot2” package in R language; the results indicated thatthe cardioembolic stroke groups were mostly enriched interms of neutrophil degranulation, cellular response toexternal stimuli, toll-like receptor cascades, interleukin-1family signaling, and nod-like receptor signaling pathway(Figure 2(c)). Hub genes in key module were selected to per-form GO enrichment analyses in order to investigate thebiological function, as displayed in Figure 3(b).
3.3. Immune Cell Infiltration Analysis. Applying the CIBER-SORT algorithm, we investigated the difference of immuneinfiltration among cardioembolic stroke and control samplesin 22 subpopulations of immune cells (Figure 4(a)). Further-more, the results of correlation analysis of infiltratedimmune cells implied that T cells CD8 and B cells naïve,neutrophils, and macrophages M2 were positively corre-lated, and T cells CD8 and T cells CD4 naïve, neutrophils,and T cells CD8 were negatively correlated (Figure 4(b)).As shown in the heatmap and violin plot, compared withthe control sample, T cells memory activated (p < 0:001),NK cells resting (p = 0:003), macrophages M0 (p = 0:005),macrophages M2 (p = 0:003), and neutrophils (p = 0:001)infiltrated more in the cardioembolic stroke group, while Bcells naïve (p < 0:001), T cells CD8 (p = 0:005), and T cellsCD memory resting (p = 0:009) showed the opposite results(Figures 4(c) and 4(d)).
MCC
DMNC
MNC
DegreeEPC
0
0
0
15
00
00
0
00 0
0
0
0
0
0
0 0
0
0 0
0
0
0 0 0
00 1
CLEC4D, MCEMP1, GPR97,FPR2, TSPAN14
(f)
Figure 3: Identification of the key genes. (a) Heatmap of 110 hub genes in blue module. (b) GO analysis of 110 hub genes in blue module.(c) PPI network of 58 DEGs. The red color represented the upregulated genes while the blue color represented downregulated genes. DEGswere subsequently divided into different clusters based on their biological functions. The width of intergenic lines indicated the score ofcoexpression. (d) Venn plot of hub genes in WGCNA and DEGs. A total of 25 genes were identified as key genes. (e) Heatmap showingthe expression profile of 25 key genes. (f) A Venn diagram between five algorithms of CytoHubba. CLEC4D, MCEMP1, GPR97, FPR2,and TSPAN14 were determined as the crucial genes in the cardioembolic stroke.
8 Disease Markers
B cells naive
B cells memoryPlasma cellsT cells CD8
T cells CD4 naiveT cells CD4 memory restingT cells CD4 memory activated
T cells regulatory (Tregs)
T cells gamma delta
NK cells resting
NK cells activated
MonocytesMacrophages M0Macrophages M1Macrophages M2
Mast cells restingEosinophils
Neutrophils
Blood cardioembolic stroke Blood control
0%
20%
40%
60%
80%
100%
Dendritic cells restingDendritic cells activated
(a)
B cells naive
B cells memory
Plasma cells
T cells CD8
T cells CD4 naive
T cells CD4 memory resting
T cells CD4 memory activated
T cells gamma delta
NK cells resting
NK cells activated
Monocytes
Macrophages M0
Macrophages M1
Macrophages M2
Dendritic cells resting
Dendritic cells activated
Mast cells resting
Eosinophils −1.0
−0.5
0.0
0.5
1.0
⁎ p < 0.05⁎⁎ p < 0.01
Correlation
Neutrophils
T cells regulatory (Tregs)
⁎⁎
⁎⁎ ⁎⁎
⁎⁎ ⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎ ⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎ ⁎⁎ ⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎⁎
⁎
⁎
⁎
⁎
⁎
⁎
⁎
⁎
⁎
⁎ ⁎
⁎⁎
⁎
⁎
⁎
⁎
⁎⁎
⁎
⁎
⁎
⁎⁎
(b)
Figure 4: Continued.
9Disease Markers
T cells CD4 naive
T cells CD4 memory resting
T cells CD4 memory activated
NK cells activated
T cells gamma delta
T cells regulatory (Tregs)
Dendritic cells activated
Macrophages M1
Dendritic cells resting
Macrophages M0
Eosinophils
B cells memory
Plasma cells
B cells naive
Macrophages M2
Mast cells resting
Neutrophils
Monocytes
T cells CD8
NK cells resting
Group
0
0.1
0.2
0.3
Group
Cardioembolic_stroke
Control
(c)
Cardioembolic_stroke
Control
0.0
0.1
0.2
0.3
0.4
0.5
Frac
tion
P < 0.001P = 0.05
P = 0.892
P = 0.005
P = 0.433
P = 0.009
P < 0.001
P = 0.246P = 0.53
P = 0.003
P = 0.003
P = 0.2
P = 0.005P = 0.581
P = 0.003
P = 0.422P = 0.105
P = 0.528P = 0.311
P = 0.001
B ce
lls n
aive
B ce
lls m
emor
y
Plas
ma c
ells
T ce
lls C
D8
T ce
lls C
D4
naiv
e
T ce
lls C
D4
mem
ory
resti
ng
T ce
lls C
D4
mem
ory
activ
ated
T ce
lls g
amm
a del
ta
NK
cells
resti
ng
NK
cells
activ
ated
Mon
ocyt
es
Mac
roph
ages
M0
Mac
roph
ages
M1
Den
driti
c cel
ls re
sting
Den
driti
c cel
ls ac
tivat
ed
Mas
t cel
ls re
sting
Eosin
ophi
ls
Neu
trop
hils
T ce
lls re
gula
tory
(Tre
gs)
Mac
roph
ages
M2
(d)
Figure 4: Immune cell infiltration analysis. (a) Relative proportions of 20 types of infiltrated immune cells between cardioembolic strokeand healthy control groups. (b) Correlation heatmap of all 20 immune infiltrated cells. (c) Heatmap of the 20 immune cell proportionsin all samples in GSE58294. (d) Violin plot showing the significant changes of the immune infiltration level in cardioembolic strokecompared to the control group.
10 Disease Markers
GPR97
NeutrophilsMacrophages M0
Macrophages M2NK cells activated
T cells regulatory (Tregs)B cells naive
MonocytesB cells memory
T cells gamma deltaT cells CD4 naive
NK cells restingT cells CD4 memory activated
Dendritic cells restingMacrophages M1Mast cells resting
Plasma cellsT cells CD8
Dendritic cells activatedT cells CD4 memory resting
Eosinophils
Correlation
0.2
0.4
0.6
0.8P
| Cor |0.2
0.4
0.6
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8
(a)
FPR2
NeutrophilsMacrophages M0
T cells CD4 memory activatedT cells gamma delta
B cells naiveMonocytes
NK cells activatedMacrophages M2
T cells regulatory (Tregs)Macrophages M1
B cells memoryDendritic cells activated
T cells CD4 naiveDendritic cells resting
T cells CD4 memory restingNK cells resting
T cells CD8Mast cells resting
EosinophilsPlasma cells
Correlation| Cor |
0.2
0.4
0.6
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8P
(b)
T cells CD4 memory activatedNeutrophils
Macrophages M0T cells gamma delta
T cells CD4 naiveDendritic cells activated
MonocytesT cells CD4 memory resting
NK cells activatedEosinophils
B cells memoryB cells naivePlasma cells
Dendritic cells restingT cells regulatory (Tregs)
Mast cells restingMacrophages M1Macrophages M2
NK cells restingT cells CD8
Correlation
0.2
0.4
0.6
P
| Cor |0.10.20.3
−0.4 −0.2 0.0 0.2 0.4
CLEC4D
(c)
Figure 5: Continued.
11Disease Markers
3.4. Correlation Analysis between Key Genes and Infiltration-Related Immune Cells. As implied from the correlation anal-ysis, MCEMP1 displayed a positive correlation with neutro-phils (r = 0:489, p < 0:001) and macrophages M0 (r = 0:376,p = 0:001) and a negative correlation with NK cells resting(r = –0:383, p = 0:001). FPR2 displayed a positive correlationwith neutrophils (r = 0:728, p = 1:26E − 12) and a negativecorrelation with T cells CD8 (r = −0:244, p = 0:042).TSPAN14 was positively correlated with neutrophils(r = 0:468, p < 0:001) and macrophages M0 (r = 0:266, p =0:0267). GPR97 positively correlated with neutrophils(r = 0:726, p = 1:66E − 12) and negatively correlated with Tcells CD4 memory resting (r = −0:325, p = 0:006) and T cellsCD8 (r = −0:263, p = 0:029). CLEC4D showed positively cor-related with neutrophils (r = 0:29, p = 0:014) and negativelycorrelated with macrophages M2 (r = −0:321, p = 0:007), Tcells CD8 (r = −0:388, p < 0:001), and NK cells resting(r = −0:331, p = 0:005) (Figures 5(a)–5(e)).
3.5. Construction of Coexpression Network. Based on thescreening criteria above, a total of 5413 genes were subjectedto WGCNA. To detect the possible outlier samples, a clustertree including 92 samples, clinic traits, and infiltration-related immune cells was performed by applying averagelinkage methods. Results indicated that no outlier was foundin the samples included in the WGCNA analysis(Figure 6(a)). We then established a scale-free (scale-freeR2 = 0:90) coexpression network with the soft-thresholding
power β = 3 (Figures 6(b) and 6(c)). After merging thehighly correlated modules by a clustering height cut-off of0.25 (Figure 7(a)), nine modules were finally obtained forfurther analysis. Ultimately, initial modules and mergedmodules display under the clustering tree (Figure 7(b)).Then, the correlations between the modules were analyzed;there was no significant correlation between different mod-ules (Figure 7(c)). The correlation analysis of transcriptswas performed within the modules, and no significant corre-lation between different modules was detected, implying thereliability of the division of modules (Figure 7(d)).
3.6. Identification of the Clinically Significant Modules andHub Genes. The association between the modules and clini-cal traits (disease status and infiltration-related immunecells) was explored by measuring the correlation betweenME values and clinical features. The results indicated thatblue module was positively correlated with cardioembolicstroke (r = 0:86, p = 1e − 27), neutrophils (r = 0:64, p = 4e −12), and macrophages M0 (r = 0:4, p = 8e − 05), and negativecorrelations were observed between blue modules and B cellsnaive (r = −0:58, p = 9e − 10) and blue modules and T cellsCD8 (r = −0:45, p = 7e − 06) (Figure 8(a)). Additionally, themodule significance displayed in bar diagram showed themean gene significance across whole genes of each module.Blue module was identified as the most clinically significantmodule (Figure 8(b)). Scatter plots of GS for stroke vs. MM(Figure 8(c)), GS for neutrophils vs. MM (Figure 8(d)), GSfor B cells naive vs. MM (Figure 8(e)), and GS for T cells
Neutrophils
Macrophages M0
Macrophages M2
Monocytes
NK cells activated
T cells gamma delta
Dendritic cells resting
B cells memory
T cells regulatory (Tregs)
T cells CD4 memory activated
T cells CD4 memory resting
T cells CD4 naive
Eosinophils
Mast cells resting
Dendritic cells activated
Plasma cells
T cells CD8
B cells naive
NK cells resting
Correlation
0.25
0.50
0.75
P
| Cor |0.1
0.2
0.3
0.4
MCEMP1
−0.4 −0.2 0.0 0.2 0.4
(d)
Correlation−0.4 −0.2 0.0 0.2 0.4
NeutrophilsMacrophages M0
B cells naiveMacrophages M2
NK cells restingT cells CD4 memory activated
NK cells activatedMonocytes
T cells regulatory (Tregs)Macrophages M1
Dendritic cells restingT cells CD8
Dendritic cells activatedT cells CD4 memory resting
T cells CD4 naiveT cells gamma delta
Mast cells restingEosinophils
B cells memoryPlasma cells
0.25
0.50
0.75
P
| Cor |0.10.2
0.30.4
TSPAN14
(e)
Figure 5: Correlation analysis between key gene expressions and the relative percentages of immune cells in the cardioembolic stroke group.(a–e) Lollipop plots illustrated the relationship between the relative proportions of infiltration-related immune cells and (a) GPR97, (b)FPR2, (c) CLEC4D, (d) MCEMP1, and (e) TSPAN14.
12 Disease Markers
GSM
1406
118
GSM
1406
076
GSM
1406
095
GSM
1406
111
GSM
1406
121
GSM
1406
078
GSM
1406
098
GSM
1406
082
GSM
1406
107
GSM
1406
120
GSM
1406
117
GSM
1406
075
GSM
1406
094
GSM
1406
065
GSM
1406
112
GSM
1406
071
GSM
1406
087
GSM
1406
070
GSM
1406
102
GSM
1406
104
GSM
1406
110
GSM
1406
124
GSM
1406
086
GSM
1406
037
GSM
1406
106
GSM
1406
058
GSM
1406
081
GSM
1406
122
GSM
1406
064
GSM
1406
099
GSM
1406
088
GSM
1406
060
GSM
1406
113
GSM
1406
059
GSM
1406
085
GSM
1406
069
GSM
1406
084
GSM
1406
109
GSM
1406
072
GSM
1406
090
GSM
1406
115
GSM
1406
067
GSM
1406
103
GSM
1406
074
GSM
1406
093
GSM
1406
097
GSM
1406
077
GSM
1406
073
GSM
1406
092 GSM
1406
068
GSM
1406
083
GSM
1406
108
GSM
1406
045
GSM
1406
055
GSM
1406
051
GSM
1406
041
GSM
1406
046
GSM
1406
040
GSM
1406
042
GSM
1406
036
GSM
1406
053
GSM
1406
048
GSM
1406
049
GSM
1406
033
GSM
1406
043
GSM
1406
052
GSM
1406
054
GSM
1406
047
GSM
1406
044
GSM
1406
050
GSM
1406
034
GSM
1406
035
GSM
1406
038
GSM
1406
039
GSM
1406
114
GSM
1406
061
GSM
1406
089
GSM
1406
119
GSM
1406
063
GSM
1406
096
GSM
1406
066
GSM
1406
101
GSM
1406
116
GSM
1406
062
GSM
1406
091
GSM
1406
056
GSM
1406
079
GSM
1406
100
GSM
1406
123
GSM
1406
080
GSM
1406
057
GSM
1406
105
30
50
70
90Sample dendrogram and trait heatmap
hclust (*, "average")dist(datExpr)
Hei
ght
Stroke
No_Stroke
B.cells.naive
T.cells.CD8
T.cells.CD4.naive
NK.cells.activated
NK.cells.resting
Neutrophils
(a)
5 10 15 20
Scale independence
Soft threshold (power)
Scal
e fre
e top
olog
y m
odel
fit,
signe
d (R
^2)
1
2
3 4 5 6 7 8 9 10 12 13 14 15 16 17 18 19 20
5 10 15 20
0
200
400
600
800
1000
Mean connectivity
Soft threshold (power)
Mea
n co
nnec
tivity
1
2
3
45 6 7 8 9 10 12 13 14 15 16 17 18 19 200.0
0.2
0.4
0.6
0.8
(b)
Histogram of k
k
Freq
uenc
y
0 100 200 300 400 500 600 1.6 1.8 2.0 2.2 2.4 2.6 2.8
Check scale free topology scale R^2= 0.9, slope = −1.1
log10 (k)
log1
0 (p
(k))
−1.5
−0.5
−1.0
0
1500
1000
500
(c)
Figure 6: Clustering of samples and selection of the best fit soft-thresholding power. (a) Clustering according to the expression level ofcardioembolic stroke patients. The color intensity was proportional to disease status (stroke, no stroke) and infiltration-related immunecells. (b) The cut-off for soft-thresholding power β was set to be 0.80, and β =3 was determined. (c) The coexpression network exhibits ascale-free topology.
13Disease Markers
MEp
ink
MEb
lack
MEp
urpl
e
MEm
agen
ta
MEy
ello
w
MEb
lue
MEb
row
n
MEt
urqu
oise
MEg
reen
MEr
ed0.0
0.2
0.4
0.6
0.8
1.0
0.2
0.4
0.6
0.8
1.0
Clustering of ME before combinedH
eigh
t
MEm
agen
ta
MEy
ello
w
MEp
ink
MEb
lack
MEp
urpl
e
MEb
lue
MEt
urqu
oise
MEg
reen
MEr
ed
Clustering of ME after combined
Hei
ght
(a)
Gene dendrogram and module colors
d
Hei
ght
Dynamic tree cut
Merged dynamics
0.5
1.0
0.9
0.8
0.7
0.6
(b)
Figure 7: Continued.
14 Disease Markers
CD8 vs. MM (Figure 8(f)) in the blue module were plotted,respectively. The results indicated that blue module washighly correlated with stoke and immune infiltration.Under the screening criteria of ∣MM∣ > 0:8 and ∣GS∣ > 0:5, 110 highly connected hub genes were determined forfurther analyses.
3.7. Screening Key Genes by Integrating Multiple Analysis.Heatmap showed 110 hub genes in the blue module thatwere highly expressed in the cardioembolic stroke group(Figure 3(a)). To gain further insight into the biologicalfunctions of the hub genes in blue module, GO analyses wereapplied and results indicated they were mainly enriched in“neutrophil activation,” “neutrophil-mediated immunity,”“regulation of immune response,” “T cell cytokine produc-tion,” and “CD4 positive, alpha-beta T cell cytokine produc-tion” (Figure 3(b)). DEGs were imported into the online toolof the STRING database to generate the PPI network, and atotal of 58 genes were identified as hub genes, and theirexpression pattern and biological functions are also visual-ized in Figure 3(c). The results implied that most of themwere associated with immune response and collagen meta-bolic process. In addition, we used the Venn diagram tooverlap the hub genes in the PPI network and blue module
targets and found 25 overlapped DEGs (Figure 3(d)). As dis-played in Figure 3(e), the 25 gene expression patternsbetween stroke and control were also visualized. Five algo-rithms of the CytoHubba, containing MCC, DMNC, MNC,EPC, and Degree, were then used to process the 25 DEGPPI network to screen the top ten genes. A Venn diagram(Figure 3(f)) was made to build the intersection of genesidentified by five algorithms, and CLEC4D, MCEMP1,GPR97, FPR2, and TSPAN14 were determined as key genes.
3.8. MCEMP1 and Its Associated Signal Pathways. We usedthe full information provided by the Gene-MANIA databaseto identify the 20 next neighboring proteins of theMCEMP1-related query genes; CLEC4D and FPR2 wereinvolved in this analysis. Networks were presented inFigure 9(a). To explore the potential biological functionsof MCEMP1 in CS, GSEA was applied to identify the dif-ferentially activated signaling pathways in the highMCEMP1 expression group. Results indicated that theterm of neutrophil degranulation, NFKB pathway, inflam-masomes, the NLRP3 inflammasome, and IL1R pathwaywas significantly enriched in the high expression groupof MCEMP1 (Figure 9(b)). Heatmap displayed the associ-ated significantly enriched genes in the term of NLRP3
MEm
agen
ta
MEy
ello
w MEp
ink
MEb
lack
MEp
urpl
e
MEb
lue
MEt
urqu
oise
MEg
reen
MEr
ed
0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
1.0
1
(c)
Network heatmap plot, selected genes
(d)
Figure 7: Construction of coexpression network. (a) Clustering dendrograms were cut at a height of 0.25 to detect and combine the similarmodules. (b) Origin and merged modules displaying under the clustering tree. (c) Adjacency heatmap of module eigengenes. Red indicatedhigh correlation, and blue represented the opposite results. (d) Clustering dendrogram of nine module eigengenes.
15Disease Markers
inflammasome (Figure 9(c)), and interleukin signal path-way (Figure 9(d)).
3.9. ROC Analysis of Key Genes. We performed receiveroperating characteristic (ROC) analysis of CLEC4D,MCEMP1, GPR97, FPR2, and TSPAN14 to further validatethe diagnostic value of those key genes. The results indicatedthat all these crucial genes showed potential clinical signifi-cance at 3 h (Figure 10(a)), 5 h (Figure 10(b)), and 24 h(Figure 10(c)) following the cardioembolic stroke event. Fur-thermore, the validation dataset (GSE16561) confirmed theabove-presented results: CLEC4D (AUC 0.913), GPR97(AUC 0.847), MCEMP1 (AUC 0.796), and TSPAN14(AUC 0.718) (Figure 11(c)). To improve the efficiency indistinguishing the capacity of stroke, we constructed thecombined diagnosis model of four crucial genes; the AUCvalue of the stroke reached to 0.946 (95% CI: 0.892–0.999)(Figure 11(b)). These results implied that all crucial genesplayed key roles in stroke.
3.10. Validation of Key Gene Expression. We further vali-dated the expression of these key genes in two datasets. Indataset GSE16561, with the threshold of p < 0:05, CLEC4D,GPR97, MCEMP1, and TSPAN14 were significantly upregu-lated in the stroke group (Figures 11(c) and 11(d)) (Theplatform GPL6883 did not explore FPR2’s expression). Inanother dataset GSE58294, at 3 h, 5 h, and 24 h postonset,key genes (CLEC4D, MCEMP1, GPR97, FPR2, andTSPAN14) were significantly upregulated in the strokegroup, as compared with those in the normal control(Figures 10(d)–10(g)).
4. Discussion
Cardioembolic stroke (CS) results in a high rate of disability,morbidity, and mortality, which is a common centralnervous system disease with poor prognosis [21]. CS is acommon and complex disease with multiple risk factorsand causes, including atrial fibrillation, coronary heartdisease, valvular heart disease, hypertension, obesity, anddiabetes [21, 22]. Previous studies indicated that inflamma-tion and immunity response were involved in the occurrenceand development of CS [23]. Markus et al. have alsoreported potential cardioembolic stroke biomarkers in theirstudy, including common inflammatory markers CRP, inter-leukin-6, interleukin-1β, and tumor necrosis factor-α [24],whereas, currently, there were no specific and highly sensi-tive biomarkers for distinguishing CS from large strokecases. Therefore, it is imperative to find potential new candi-date biomarkers in order to help physicians to develop astrategy for treating CS at early stages.
In this study, we downloaded the GSE58294 datasetfrom the available GEO database and estimated the compo-sition of the immune cells using CIBERSORT algorithmsbased on the expression matrix, then employing WGCNAto determine the modules associated with the immune celltypes. Totally, nine modules were screened; among them,blue module was the most significantly associated with CS,neutrophils, B cells naïve, and T cells CD8. To our knowl-
edge, it is the first time to use WGCNA to explore the rela-tionships between immune cell types and CS. Wesystematically analyzed the proportion of specific types ofimmune cells in CS patients. It may provide a novel insightinto the strategies for diagnosis and immunotherapy of CS.Under the condition of MM> 0:8 and GS > 0:5, 110 candi-date hub genes were then identified within the key modules.We then applied functional enrichment analysis on genes,and results indicated that genes were mainly enriched inneutrophil activation, neutrophil-mediated immunity,regulation of innate immune response, and T cell cytokineproduction. Additionally, DEGs between CS and controlwere also screened and used to construct the PPI network.Genes within the network were clustered into differentsubclades, including the adaptive immune system, collagenmetabolic process, and immune response regulating signal-ing pathway. In order to find potential new biomarkers, wegenerated another new 25 hub genes by taking the intersec-tion of hub genes in DEGs’ PPI network and hub genes inkey module. Based on the CytoHubba, five hub genes weredetermined, including MCEMP1, CLEC4D, TSPAN14,GPR97, and FPR2. The relationship between hub genesand immune cell was also determined, and results showedthat genes were significantly positively correlated withneutrophils and macrophages M0 and negatively corre-lated with T cells CD8. Finally, by using the ROC analysis,we found that not only individual crucial genes but alsothe combined diagnosis model of them had potential diag-nostic significance.
Mast cell expressed membrane protein 1, this geneencodes a single-pass transmembrane protein MCEMP1.Based on its expression pattern, it is thought to be involvedin regulating mast cell differentiation or immune responses.Jian et al. have reported that MCEMP1 was found to behighly expressed in rats with cerebral ischemic stroke [25].Furthermore, silencing MCEMP1 resulted in the upregula-tion of vascular endothelial growth factor (VEGF), whiledownregulation of Caspase3 led to the promotion of micro-vessel density (MVD) in rats with ischemic stroke [25].Moreover, silencing of MCEMP1 could increase Ki67-positive cells and reduce terminal deoxynucleotidyltransferase-mediated d-UTP nick end labeling (TUNEL)positive cells in the marginal zone of cortical infarction inrats. Their study has proved that silenced MCEMP1 couldsuppress neuronal apoptosis and enhance angiogenesis inrats with cerebral ischemic stroke, emphasizing on thatMCEMP1 silencing could serve as a therapeutic target forcerebral ischemic stroke treatment. Raman et al. implicatedthat peripheral blood expression of MCEMP1 within 1month after stroke has been proposed as a diagnosis andprognostic biomarker for primary stroke [26]. With all thisbeing taken into consideration, MCEMP1 is a key moleculein the regulation and maintenance of the cerebral ischemicstroke. However, the role of MCEMP1 in cardioembolicstroke (CS) and its underlying mechanisms remain poorlyunderstood. Our study was first to associate the immuneresponse and CS and prove that MCEMP1 was correlatedwith neutrophils, B cells naïve, and T cells CD8. Besides this,NLRP3 inflammasome and interleukin-1 signal pathway
16 Disease Markers
Module−trait relationships
−1
−0.5
0
0.5
1
MEmagenta
MEyellow
MEpink
MEblack
MEpurple
MEblue
MEturquoise
MEgreen
MEred
−0.06(0.6)
0.06(0.6)
−0.052(0.6)
−0.021(0.8)
0.013(0.9)
0.079(0.5)
−0.014(0.9)
−0.23(0.03)
−0.082(0.4)
0.11(0.3)
0.27(0.01)
0.065(0.5)
−0.065(0.5)
−0.0043(1)
−0.18(0.09)
0.13(0.2)
−0.15(0.1)
−0.068(0.5)
−0.11(0.3)
−0.091(0.4)
0.088(0.4)
0.28(0.007)
0.025(0.8)
−0.025(0.8)
0.11(0.3)
0.61(1e−10)
−0.089(0.4)
0.11(0.3)
0.63(2e−11)
−0.099(0.3)
−0.15(0.2)
−0.22(0.03)
−0.55(2e−08)
−0.56(8e−09)
0.56(8e−09)
0.4(8e−05)
0.32(0.002)
0.46(4e−06)
−0.15(0.2)
−0.14(0.2)
0.3(0.004)
−0.49(7e−07)
−0.088(0.4)
−0.7(8e−15)
−0.52(1e−07)
0.52(1e−07)
0.69(4e−14)
0.41(6e−05)
0.2(0.06)
−0.0093(0.9)
−0.085(0.4)
0.16(0.1)
−0.47(2e−06)
−0.055(0.6)
−0.52(1e−07)
0.86(1e−27)
−0.86(1e−27)
−0.58(9e−10)
−0.45(7e−06)
−0.34(8e−04)
0.37(3e−04)
0.12(0.3)
−0.29(0.005)
0.4(8e−05)
−0.2(0.06)
0.64(4e−12)
−0.056(0.6)
0.056(0.6)
0.083(0.4)
0.016(0.9)
0.34(0.001)
0.3(0.003)
−0.12(0.2)
0.14(0.2)
−0.37(3e−04)
−0.42(3e−05)
−0.36(4e−04)
0.5(4e−07)
−0.5(4e−07)
−0.34(0.001)
−0.28(0.006)
0.18(0.09)
0.34(0.001)
−0.039(0.7)
0.047(0.7)
−0.085(0.4)
−0.48(2e−06)
−0.11(0.3)
0.58(2e−09)
−0.58(2e−09)
−0.41(5e−05)
−0.47(2e−06)
−0.00037(1)
0.54(3e−08)
−0.14(0.2)
−0.12(0.3)
0.088(0.4)
−0.43(2e−05)
0.36(5e−04)
Stro
ke
No_
Stro
ke
B.ce
lls.n
aive
T.ce
lls.C
D8
T.ce
lls.C
D4.
mem
ory.
resti
ng
T.ce
lls.C
D4.
mem
ory.
activ
ated
NK.
cells
.resti
ng
NK.
cells
.activ
ated
Mac
roph
ages
.M0
Mac
roph
ages
.M2
Neu
trop
hils
(a)
black blue green magenta pink purple red turquoise yellow
Gene significance for stroke across module (p−value = 0)
Gen
e sig
nific
ance
0.0
0.6
0.5
0.4
0.3
0.2
0.1
(b)
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
0.0
Module membership in blue module
Module membership vs. gene significancecor =0.9, p < 1e−200
Gen
e sig
nific
ance
for s
trok
e
(c)
0.2 0.4 0.6 0.8
Module membership vs. gene significance cor = 0.75, p <1e−200
Module membership in blue module
Gen
e sig
nific
ance
for n
eutr
ophi
ls
0.2
0.4
0.6
0.8
0.0
(d)
Figure 8: Continued.
17Disease Markers
were significantly enriched in the MCEMP1 high-expressiongroup. Consequently, all of the results have suggested thatthe MCEMP1 was involved in the process of inflammatoryand immune response and it was worthy of additional inves-tigation and development.
CLEC4D (C-Type Lectin Domain Family 4 Member D),a member of the C-type lectin/C-type lectin-like domain(CTL/CTD) superfamily, acted as a pattern recognitionreceptor (PRR) of the innate immune system: recognizeddamage-associated molecular patterns (DAMPs) ofpathogen-associated molecular patterns (PAMPs) of bacte-ria [27, 28]. CLEC4D played vital roles as regulators of celladhesion, cell-cell signaling, inflammation, and immuneresponse [29]. Moreover, studies have shown that the rela-tive mRNA expression of CLEC4D in peripheral blood ofpatients suffering ischemic stroke within 24 h after onsetwas dramatically increased, compared with the normal con-trol group, which was consistent with our analysis results[29]. Additionally, our study indicated that CLEC4D waspositively correlated with neutrophils and T cells CD4 mem-ory activated, while negatively associated with T cells CD8,which implied that CLEC4D might act through an inflam-matory mechanism dependent upon immune effectors incardioembolic stroke.
GPR97, also named as ADGRG3, is especially expressedin whole blood, particularly in neutrophils. GPR97 was a sig-nificant molecule that regulated the development of B celland migration of lymphatic endothelial cells in vitro viathe small GTPases RhoA and CDC42 [30]. Wang et. al havealso verified that GPR97 regulated proinflammatory cyto-kine production in vitro culture assay and played an impor-tant role in the development of experimental autoimmuneencephalomyelitis (EAE), which indicated that it may have
a therapeutic potential for the treatment of CNS autoimmu-nity [31]. However, the role of GPR97 in CS is unclear andneeds to be further explored.
Tetraspanin 14 (TSPAN14), expressed by many types oftissues, especially whole blood, was involved in neutrophildegranulation, positive regulation of notch signaling path-way, and protein maturation. A previous study has reportedthat TSPAN14 was correlated with periventricular whitematter hyperintensities which was an indicator of a historyof cerebrovascular disease [32]. Our results indicated thatTSPAN14 was positively associated with macrophages M0and neutrophils, which suggested TSPAN14 may contributeto CS by participating in immunity and inflammation.
Other genes with high degree in the crucial gene cluster,such as FPR2, also played vital roles in CS pathogenesis.FPR2 is preferentially expressed by monocytes, as previouslydiscussed, and was found to be expressed mainly by mam-malian phagocytic leukocytes and involved in inflammationand antibacterial host defense [33]. Vital et al. found thattargeting the AnxA1/FPR2/ALX pathway represents anattractive therapeutic strategy for the treatment of thrombo-inflammation, counteracting, e.g., stroke in high-risk patientcohorts [34]. The findings of Gavins et al. implicated thatFPR ligands, particularly in the brain, could be novel andexciting anti-inflammatory therapeutics for the treatmentof a variety of clinical conditions, including stroke [35].There are also several limitations still detected in our presentstudy. First, the data we used was from public databases,which were limited in the sample size. Further research withlarger sample sizes should be carried out to validate ourresults. Second, the functions and potential molecular mech-anisms of genes are quite complicated, and further verifica-tion of cellular and animal experiments is required.
0.2 0.4 0.6 0.8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Module membership vs. gene significancecor = 0.78, p < 1e−200
Module membership in blue module
Gen
e sig
nific
ance
for B
.cells
.nai
ve
(e)
0.2 0.4 0.6 0.8
Module membership vs. gene significance cor = 0.66, p = 2.1e−167
Module membership in blue module
Gen
e sig
nific
ance
for T
.cells
.CD
8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(f)
Figure 8: Screening of clinical related modules. (a) Heatmap of module-trait correlation. Red represented positive correlation and bluerepresented negative correlation. (b) Gene significance for stroke across all modules, the blue module was determined as clinical relatedmodule. (c–f) Scatter plot for correlation between the module membership (MM) and gene significance (GS) revised MM and GS (c)stroke, (d) neutrophils, (e) B cells naïve, and (f) T cells CD8.
18 Disease Markers
Genetic interactions
Co-localizationPredictedCo-expressionPhysical Interactions
Networks
PathwayShared protein domains
(a)
0.0
0.2
0.4
0.6
Enric
hmen
t Sco
re
−1
0
1
Rank
ed li
st m
etric
5000 10000 15000Rank in ordered dataset
MCEMP1
“High” “Low”
REACTOME_NEUTROPHIL_DEGRANULATIONBIOCARTA_NFKB_PATHWAYREACTOME_INFLAMMASOMESREACTOME_THE_NLRP3_INFLAMMASOMEBIOCARTA_IL1R_PATHWAY
(b)
NLRP3 Inflammasome
NLRP3
MEFV
NFKB2
PYCARD
PSTPIP1
NFKB1
TXNIP
SUGT1
TXN
CASP1
Group
MCEMP1 High
MCEMP1 Low
−2
−1
0
1
2
(c)
Interleukin1 signal pathway
PSMC3NKIRAS2PSMB6PSMD8PSMB7PSMA7PSMD4PSMC1S100A12MYD88NFKB2PSMB8PSMB3UBCPSMD9PSMB4UBBUBA52PSMC2PSMD6PSMA1PELI3TNIP2IL1BMAP2K6NFKBIAPELI2MAP2K4NFKB1PSMD1PSMD13PSME3IKBKGTOLLIPIL1RNMAP3K3IL1AMAP3K8IRAK4IL1R2IL1R1IRAK3
Group
MCEMP1 High
MCEMP1 Low
−4
−2
0
2
4
(d)
Figure 9: MCEMP1 and its associated signal pathways. (a) MCEMP1 and its coexpression network. (b) GSEA results. (c, d) NLRP3inflammasome and interleukin-1 signal pathway were significantly enriched in the MECMP1 high group.
19Disease Markers
0.0
0.2
0.4
0.6
0.8
1.0
Sens
itivi
ty (T
PR)
CLEC4D (AUC = 0.926)
MCEMP1 (AUC = 0.928)
GPR97 (AUC = 0.926)
TSPAN14 (AUC = 0.958)
FPR2 (AUC = 0.862)
0.0 0.2 0.4 0.6 0.8 1.0
1−specificity (FPR)
3h
(a)
0.0
0.2
0.4
0.6
0.8
1.0
Sens
itivi
ty (T
PR)
CLEC4D (AUC = 0.983)
MCEMP1 (AUC = 0.968)
GPR97 (AUC = 0.985)
TSPAN14 (AUC = 0.938)
FPR2 (AUC = 0.924)
0.0 0.2 0.4 0.6 0.8 1.0
1−specificity (FPR)
5h
(b)
24h
0.0
0.2
0.4
0.6
0.8
1.0
Sens
itivi
ty (T
PR)
CLEC4D (AUC = 0.922)
MCEMP1 (AUC = 0.996)
GPR97 (AUC = 0.979)
TSPAN14 (AUC = 0.985)
FPR2 (AUC = 0.909)
0.0 0.2 0.4 0.6 0.8 1.01 − specificity (FPR)
(c)
Card
io_s
trok
e_25
Card
io_s
troke
_29
Card
io_s
troke
_10
Card
io_s
troke
_46
Card
io_s
troke
_56
Card
io_s
troke
_27
Card
io_s
troke
_52
Card
io_s
troke
_33
Card
io_str
oke_
5
Cardio_
strok
e_19
Cardio_
strok
e_23
Cardio_
strok
e_62
Cardio_str
oke_1
Cardio_str
oke_24
Cardio_stroke_15
Cardio_stroke_42
Cardio_stroke_49
Cardio_stroke_9
Cardio_stroke_17
Cardio_stroke_20
Cardio_stroke_14
Cardio_stroke_61
Cardio_stroke_59
Cardio_stroke_54
Cardio_stroke_67Cardio_stroke_36Cardio_stroke_44Cardio_stroke_26Cardio_stroke_48Cardio_stroke_38
Cardio_stroke_30Cardio_stroke_51
Cardio_stroke_4
Cardio_stroke_40
Cardio_stroke_8
Cardio_stroke_41
Cardio_stroke_39
Cardio_stroke_63
Cardio_stroke_60
Cardio_stroke_65
Cardio_stroke_11
Cardio_stroke_64
ctr_16
ctr_6ctr_21
ctr_2ctr_18ctr_20ctr_3ctr_17
1_rtcctr_
10ct
r_7
ctr_
12ct
r_13
ctr_
23ctr_9
ctr_1
4
ctr_1
9
ctr_1
5
Cardio_
strok
e_18
Cardio_
strok
e_37
Cardio_str
oke_31
Cardio_str
oke_47
Cardio_stroke_6
Cardio_stroke_3
Cardio_stroke_34
Cardio_stroke_16
Cardio_stroke_32Cardio_stroke_12
Cardio_stroke_69Cardio_stroke_45Cardio_stroke_68Cardio_stroke_35Cardio_stroke_7
Cardio_stroke_57Cardio_stroke_43
Cardio_stroke_55ctr_5ctr_8
Cardio_stroke_2
Cardio_stroke_50
Cardio_stroke_22
ctr_4
ctr_11
Cardio_stroke_58
Cardio_stroke_21
ctr_22
Cardio_stroke_66
Cardio_stroke_13
Cardio_stroke_28
Cardio_stroke_53
CLEC4DMCEMP1
GPR97TSPAN14
FPR2
−2
−1
0
1
2
(d)
0
5
10
15
20
The e
xpre
ssio
n le
vel o
f mRN
A
CLEC4D MCEMP1 GPR97 TSPAN14 FPR2
3h
ControlCardioembolic stroke
⁎⁎⁎
⁎⁎⁎⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
(e)
0
5
10
15
20
25
The e
xpre
ssio
n le
vel o
f mRN
A
CLEC4D MCEMP1 GPR97 TSPAN14 FPR2
5h
ControlCardioembolic stroke
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
(f)
24h
0
5
10
15
20
The e
xpre
ssio
n le
vel o
f mRN
A
CLEC4D MCEMP1 GPR97 TSPAN14 FPR2
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
⁎⁎⁎
ControlCardioembolic stroke
(g)
Figure 10: The potential clinical values and expression level of five key genes. (a-c) Applying ROC analysis of 5 key genes to discriminatebetween cardioembolic stroke and control group, (a) 3 hours, (b) 5 hours, and (c) 24 hours following cardioembolic stroke. (d) Annularheatmap showing the expression of hub genes in each sample. (e–g) Box plots displaying changes in expression levels of CLEC4D,MCEMP1, GPR97, TSPAN14, and FPR2 in 3 h, 5 h, and 24 h after cardioembolic stroke. All crucial genes were significantly increased inpatients with cardioembolic stroke compared to normal individuals.
20 Disease Markers
0.0
0.2
0.4
0.6
0.8
1.0
Sens
itivi
ty (T
PR)
CLEC4D (AUC = 0.913)
GPR97 (AUC = 0.847)
MCEMP1 (AUC = 0.796)
TSPAN14 (AUC = 0.718)
0.0 0.2 0.4 0.6 0.8 1.0
1−specificity (FPR)
(a)
0.0
0.2
0.4
0.6
0.8
1.0
Sens
itivi
ty (T
PR)
modelAUC: 0.946
CI: 0.892−0.999
0.0 0.2 0.4 0.6 0.8 1.01−specificity (FPR)
(b)
TSPAN14
MCEMP1
CLEC4D
GPR97
Group
GroupStroke
Control
−4
−2
0
2
4
(c)
0
2
4
6
8
The e
xpre
ssio
n le
vel o
f mRN
A
CLEC4D GPR97 MCEMP1 TSPAN14
StrokeControl
⁎⁎⁎⁎⁎⁎
⁎⁎⁎
⁎⁎
(d)
Figure 11: The verification of crucial genes as biomarkers for stroke in an independent dataset. (a) Receiver operating characteristic curves forindividual CLEC4D, GPR97, MCEMP1, and TSPAN14 of stroke versus control. (b) Evaluation of clinical diagnostic efficacy of 4 key genesignatures (AUC = 0.946, 95% CI = 0.892–0.999) (Logistic regression model = -14.1075+ 7.9797∗CLEC4D+ 1.8645∗GPR97+ 1.8384∗MCEMP1+ 2.7048∗TSPAN14). (c) Heatmap showing the relative expression levels of CLEC4D, GPR97, MCEMP1, and TSPAN14 in eachsample. (d) Box plot indicated the expression of 4 crucial genes following stroke were all significantly higher than healthy individuals.
21Disease Markers
5. Conclusion
In our study, we performed WGCNA to analyze the rela-tionships between immune cell types and cardioembolicstroke (CS) for the first time. Five crucial genes (MCEMP1,TSPAN14, CLEC4D, GPR97, and FPR2) were identified.These five genes may therefore be potential in CS and areworthy of further investigation.
Data Availability
The datasets used in this study can be acquired in NCBIdatasets (GEO https://www.ncbi.nlm.nih.gov/geo/info/linking.html; GSE58294; GSE16561).
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Qiaoqiao Li and Fang Rao designed the study, and QiaoqiaoLi, Xueping Gao, and Xueshan Luo conducted the literaturesearch. Qingrui Wu and Jintao He analyzed the data andprepared figures and/or tables. Qiaoqiao Li and Yang Liucarried out the data analysis. Qiaoqiao Li and Xueping Gaowrote the paper. Yumei Xue, Shulin Wu, and Fang Rao per-formed editing and reviewing. All authors contributed ideasand comments, revised the paper, and approved the finalversion. Qiaoqiao Li, Xueping Gao, and Xueshan Luo con-tributed equally to this work.
Acknowledgments
This work was supported by grants from the Medical Scienceand Technology Foundation of Guangdong Province(2019B020230004), the National Natural Science Founda-tion of China (No. 81870254), and the High-Level HospitalConstruction Plan (No. DFJH201808).
References
[1] Y. M. Qiu, C. L. Zhang, A. Q. Chen et al., “Immune cells in theBBB disruption after acute ischemic stroke: targets for immunetherapy?,” Frontiers in Immunology, vol. 12, 2021.
[2] J. Huang, F.-C. Zhou, B. Guan et al., “Predictors of remissionof early-onset poststroke depression and the interactionbetween depression and cognition during follow-up,” Fron-tiers in Psychiatry, vol. 9, 2019.
[3] Q. Xu, B. Zhao, Y. Ye et al., “Relevant mediators involved inand therapies targeting the inflammatory response inducedby activation of the NLRP3 inflammasome in ischemic stroke,”Journal of Neuroinflammation, vol. 18, no. 1, p. 123, 2021.
[4] H. Park, M. Han, Y. D. Kim et al., “Impact of the total numberof carotid plaques on the outcome of ischemic stroke patientswith atrial fibrillation,” Journal of Clinical Medicine, vol. 8,no. 11, p. 1897, 2019.
[5] V. L. Feigin, B. Norrving, M. G. George, J. L. Foltz, G. A. Roth,and G. A. Mensah, “Prevention of stroke: a strategic globalimperative,” Nature Reviews Neurology, vol. 12, no. 9,pp. 501–512, 2016.
[6] D. Petrovic-Djergovic, S. N. Goonewardena, and D. J. Pinsky,“Inflammatory disequilibrium in stroke,” CirculationResearch, vol. 119, no. 1, pp. 142–158, 2016.
[7] C. Cheyuo, A. Jacob, R. Wu, M. Zhou, G. F. Coppa, andP. Wang, “The parasympathetic nervous system in the questfor stroke therapeutics,” Journal of Cerebral Blood Flow andMetabolism, vol. 31, no. 5, pp. 1187–1195, 2011.
[8] B. M. Famakin, “The immune response to acute focal cerebralischemia and associated post-stroke immunodepression: afocused review,” Aging and Disease, vol. 5, no. 5, pp. 307–326, 2014.
[9] L. Gautier, L. Cope, B. M. Bolstad, and R. A. Irizarry, “Affy -analysis of Affymetrix GeneChip data at the probe level,” Bio-informatics, vol. 20, no. 3, pp. 307–315, 2004.
[10] C. Ginestet, “ggplot2: elegant graphics for data analysis,” Jour-nal of the Royal Statistical Society Series a-Statistics in Society,vol. 174, no. 1, pp. 245-246, 2011.
[11] G. Yu, L.-G. Wang, Y. Han, and Q.-Y. He, “clusterProfiler: anR package for comparing biological themes among gene clus-ters,” Omics-a Journal of Integrative Biology, vol. 16, no. 5,pp. 284–287, 2012.
[12] A. M. Newman, C. B. Steen, C. L. Liu et al., “Determining celltype abundance and expression from bulk tissues with digitalcytometry,” Nature Biotechnology, vol. 37, no. 7, pp. 773–782, 2019.
[13] P. Langfelder and S. Horvath, “WGCNA: an R package forweighted correlation network analysis,” BMC Bioinformatics,vol. 9, no. 1, 2008.
[14] Y. Kong, Z.-C. Feng, Y.-L. Zhang et al., “Identification ofimmune-related genes contributing to the development ofglioblastoma using weighted gene co-expression network anal-ysis,” Frontiers in Immunology, vol. 11, 2020.
[15] J. A. Gustavsen, S. Pai, R. Isserlin, B. Demchak, and A. R. Pico,“RCy3: network biology using Cytoscape from within R,”F1000Research, vol. 8, pp. 1774–1774, 2019.
[16] E. T. Bullmore and O. Sporns, “Complex brain networks:graph theoretical analysis of structural and functional sys-tems,” Nature Reviews Neuroscience, vol. 10, no. 3, pp. 186–198, 2009.
[17] D. Warde-Farley, S. L. Donaldson, O. Comes et al., “TheGeneMANIA prediction server: biological network integra-tion for gene prioritization and predicting gene function,”Nucleic Acids Research, vol. 38, suppl_2, pp. W214–W220,2010.
[18] Z. Gu, R. Eils, and M. Schlesner, “Complex heatmaps revealpatterns and correlations in multidimensional genomic data,”Bioinformatics, vol. 32, no. 18, pp. 2847–2849, 2016.
[19] P. Shannon, A. Markiel, O. Ozier et al., “Cytoscape: a softwareenvironment for integrated models of biomolecular interac-tion networks,” Genome Research, vol. 13, no. 11, pp. 2498–2504, 2003.
[20] X. Robin, N. Turck, A. Hainard et al., “pROC: an open-sourcepackage for R and S+ to analyze and compare ROC curves,”BMC Bioinformatics, vol. 12, no. 1, 2011.
[21] W. Liang, X. Huang, andW. Chen, “The effects of baicalin andbaicalein on cerebral ischemia: a review,” Aging and Disease,vol. 8, no. 6, pp. 850–867, 2017.
[22] S.-J. Hong, S. K. Kim, D. H. Yun, J. Chon, and H. J. Park,“Association between microRNA-4669 polymorphism andischemic stroke in a Korean population,” Disease Markers,vol. 2019, 9 pages, 2019.
22 Disease Markers
[23] A. Drieu, D. Levard, D. Vivien, and M. Rubio, “Anti-inflam-matory treatments for stroke: from bench to bedside,” Thera-peutic Advances in Neurological Disorders, vol. 11,p. 175628641878985, 2018.
[24] A. Markus, S. Valerie, and K. Mira, “Promising biomarker can-didates for cardioembolic stroke etiology. A brief narrativereview and current opinion,” Frontiers in Neurology, vol. 12,2021.
[25] R. Jian, M. Yang, and F. Xu, “Lentiviral-mediated silencing ofmast cell-expressed membrane protein 1 promotes angiogene-sis of rats with cerebral ischemic stroke,” Journal of CellularBiochemistry, vol. 120, no. 10, pp. 16786–16797, 2019.
[26] K. Raman, M. J. O'Donnell, A. Czlonkowska et al., “PeripheralBloodMCEMP1Gene expression as a biomarker for strokeprognosis,” Stroke, vol. 47, no. 3, pp. 652–658, 2016.
[27] L.-L. Zhu, X.-Q. Zhao, C. Jiang et al., “C-Type Lectin ReceptorsDectin-3 and Dectin-2 Form a Heterodimeric Pattern- Recog-nition Receptor for Host Defense against Fungal Infection,”Immunity, vol. 39, no. 2, pp. 324–334, 2013.
[28] Y. Miyake, K. Toyonaga, D. Mori et al., “C-type lectin MCL isan FcR gamma-coupled receptor that mediates the adjuvanti-city of mycobacterial cord factor,” Immunity, vol. 38,pp. 1050–1062, 2013.
[29] L. M. Graham, V. Gupta, G. Schafer et al., “The C-type LectinReceptor CLECSF8 (CLEC4D) Is Expressed by Myeloid Cellsand Triggers Cellular Activation through Syk Kinase,” Journalof Biological Chemistry, vol. 287, no. 31, pp. 25964–25974,2012.
[30] N. Valtcheva, A. Primorac, G. Jurisic, M. Hollmen, andM. Detmar, “The Orphan Adhesion G Protein-coupled Recep-tor GPR97 Regulates Migration of Lymphatic EndothelialCells via the Small GTPases RhoA and Cdc42,” Journal of Bio-logical Chemistry, vol. 288, no. 50, pp. 35736–35748, 2013.
[31] J. Wang, X. Wang, X. Chen et al., “Gpr97/Adgrg3 amelioratesexperimental autoimmune encephalomyelitis by regulatingcytokine expression,” Acta Biochimica et Biophysica Sinica,vol. 50, no. 7, pp. 666–675, 2018.
[32] N. J. Armstrong, K. A. Mather, M. Sargurupremraj et al.,“Common genetic variation indicates separate causes for peri-ventricular and deep white matter hyperintensities,” Stroke,vol. 51, no. 7, pp. 2111–2121, 2020.
[33] R. D. Ye, S. L. Cavanagh, O. Quehenberger, E. R. Prossnitz, andC. G. Cochrane, “Isolation of a cDNA that encodes a novelgranulocyte N-formyl peptide receptor,” Biochemical and Bio-physical Research Communications, vol. 184, no. 2, pp. 582–589, 1992.
[34] S. A. Vital, E. Y. Senchenkova, J. Ansari, and F. N. E. Gavins,“Targeting AnxA1/formyl peptide receptor 2 pathway affordsprotection against pathological thrombo-inflammation,” Cell,vol. 9, no. 11, p. 2473, 2020.
[35] F. N. E. Gavins, “Are formyl peptide receptors novel targets fortherapeutic intervention in ischaemia-reperfusion injury?,”Trends in Pharmacological Sciences, vol. 31, no. 6, pp. 266–276, 2010.
23Disease Markers