9
Research Article FeatureOptimizationofExhaledBreathSignals BasedonPearson-BPSO LijunHao, 1,2 MinZhang, 1 andGangHuang 3 1 School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China 2 Medical Instrumentation College, Shanghai University of Medicine & Health Sciences, Shanghai 201318, China 3 Shanghai Key Laboratory of Molecular Imaging, Shanghai University of Medicine & Health Sciences, Shanghai 201318, China CorrespondenceshouldbeaddressedtoGangHuang;[email protected] Received 24 September 2021; Revised 3 November 2021; Accepted 10 November 2021; Published 3 December 2021 AcademicEditor:FazlullahKhan Copyright©2021LijunHaoetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Featureoptimization,whichisthethemeofthispaper,isactuallytheselectiveselectionofthevariablesontheinputsideatthe timeofmakingapredictivekindofmodel.However,animprovedfeatureoptimizationalgorithmforbreathsignalbasedonthe Pearson-BPSOwasproposedandappliedtodistinguishhepatocellularcarcinomabyelectronicnose(eNose)inthepaper.First, the multidimensional features of the breath curves of hepatocellular carcinoma patients and healthy controls in the training samples were extracted; then, the features with less relevance to the classification were removed according to the Pearson correlation coefficient; next, the fitness function was constructed based on K-Nearest Neighbor (KNN) classification error and feature dimension, and the feature optimization transformation matrix was obtained based on BPSO. Furthermore, the transformationmatrixwasappliedtooptimizethetestsample’sfeatures.Finally,theperformanceoftheoptimizationalgorithm wasevaluatedbytheclassifier.eexperimentresultshaveshownthatthePearson-BPSOalgorithmcouldeffectivelyimprovethe classification performance compared with BPSO and PCA optimization methods. e accuracy of SVM and RF classifier was 86.03% and 90%, respectively, and the sensitivity and specificity were about 90% and 80%. Consequently, the application of Pearson-BPSOfeatureoptimizationalgorithmwillhelpimprovetheaccuracyofhepatocellularcarcinomadetectionbyeNoseand promote the clinical application of intelligent detection. 1.Introduction Hepatocellularcarcinomaisamalignanttumorwithahigh incidence rate and high mortality rate, which seriously endangersthequalityandlifeofourlives.Accordingtothe statistics, in 2018, the mortality rate of hepatocellular car- cinoma accounted for 8.2% of the total number of cancer cases in the world [1]. More than half of the world’s He- patocellularcarcinomacasesanddeathsoccurinChina[2]. Studieshavefoundthattheearlysymptomsofhepatocellular carcinomaareparticularlyinsignificant.Manypatientswith hepatocellular carcinoma have entered the middle and late stagesofthediseasewhentheyarediagnosed.erefore,to reduce the mortality, it is anxious to improve the early diagnosis and screening of hepatocellular carcinoma. Diseasecanleadtometabolicchanges,whichcanresult indifferentexhalationgases.eresultshowedthatthereare significant differences in 6 VOCs in the exhaled gases of hepatocellular carcinoma patients compared with healthy controls (P < 0.05) [3]. e emerging expiratory detection technology in recent years can detect the exhalation gas of humanbodyanddeterminethedisease,whichcanbewidely used in early clinical examination [4]. As an expiratory detectiondevice,theeNosedevicerecordstheexhaledgasby theresponseofitsinternalsensorstodifferentexhaledgases. Itisworthnotingthatthedevicedoesnotdetectthespecific gascomposition;itjustrecordstheoverallresponsecurveof the gas. Researchers need to build models through a large number of different volunteer gas response curves and find the potential relationship between disease and expiratory Hindawi Mobile Information Systems Volume 2021, Article ID 1478384, 9 pages https://doi.org/10.1155/2021/1478384

FeatureOptimizationofExhaledBreathSignals BasedonPearson-BPSO

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Research ArticleFeature Optimization of Exhaled Breath SignalsBased on Pearson-BPSO

Lijun Hao12 Min Zhang1 and Gang Huang 3

1School of Health Science and Engineering University of Shanghai for Science and Technology Shanghai 200093 China2Medical Instrumentation College Shanghai University of Medicine amp Health Sciences Shanghai 201318 China3Shanghai Key Laboratory of Molecular Imaging Shanghai University of Medicine amp Health Sciences Shanghai 201318 China

Correspondence should be addressed to Gang Huang huanggsumhseducn

Received 24 September 2021 Revised 3 November 2021 Accepted 10 November 2021 Published 3 December 2021

Academic Editor Fazlullah Khan

Copyright copy 2021 Lijun Hao et al +is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Feature optimization which is the theme of this paper is actually the selective selection of the variables on the input side at thetime of making a predictive kind of model However an improved feature optimization algorithm for breath signal based on thePearson-BPSO was proposed and applied to distinguish hepatocellular carcinoma by electronic nose (eNose) in the paper Firstthe multidimensional features of the breath curves of hepatocellular carcinoma patients and healthy controls in the trainingsamples were extracted then the features with less relevance to the classification were removed according to the Pearsoncorrelation coefficient next the fitness function was constructed based on K-Nearest Neighbor (KNN) classification error andfeature dimension and the feature optimization transformation matrix was obtained based on BPSO Furthermore thetransformation matrix was applied to optimize the test samplersquos features Finally the performance of the optimization algorithmwas evaluated by the classifier+e experiment results have shown that the Pearson-BPSO algorithm could effectively improve theclassification performance compared with BPSO and PCA optimization methods +e accuracy of SVM and RF classifier was8603 and 90 respectively and the sensitivity and specificity were about 90 and 80 Consequently the application ofPearson-BPSO feature optimization algorithmwill help improve the accuracy of hepatocellular carcinoma detection by eNose andpromote the clinical application of intelligent detection

1 Introduction

Hepatocellular carcinoma is a malignant tumor with a highincidence rate and high mortality rate which seriouslyendangers the quality and life of our lives According to thestatistics in 2018 the mortality rate of hepatocellular car-cinoma accounted for 82 of the total number of cancercases in the world [1] More than half of the worldrsquos He-patocellular carcinoma cases and deaths occur in China [2]Studies have found that the early symptoms of hepatocellularcarcinoma are particularly insignificant Many patients withhepatocellular carcinoma have entered the middle and latestages of the disease when they are diagnosed +erefore toreduce the mortality it is anxious to improve the earlydiagnosis and screening of hepatocellular carcinoma

Disease can lead to metabolic changes which can resultin different exhalation gases+e result showed that there aresignificant differences in 6 VOCs in the exhaled gases ofhepatocellular carcinoma patients compared with healthycontrols (P lt 005) [3] +e emerging expiratory detectiontechnology in recent years can detect the exhalation gas ofhuman body and determine the disease which can be widelyused in early clinical examination [4] As an expiratorydetection device the eNose device records the exhaled gas bythe response of its internal sensors to different exhaled gasesIt is worth noting that the device does not detect the specificgas composition it just records the overall response curve ofthe gas Researchers need to build models through a largenumber of different volunteer gas response curves and findthe potential relationship between disease and expiratory

HindawiMobile Information SystemsVolume 2021 Article ID 1478384 9 pageshttpsdoiorg10115520211478384

response to realize the purpose of disease diagnosis based oneNose

In the process of constructing a disease classificationmodel using exhalation signals feature extraction is thefirst step In order not to lose information that mayaffect the accuracy of the test we usually extract as manyfeatures as possible [5] However this may cause redundancyof features increase in computation and decrease incomputation speed and accuracy +erefore the study offeature optimization algorithms has received increasingattention At present principal component analysis (PCA) iswidely used in feature optimization+e algorithm is a lineardimension reduction algorithm which transforms high-dimensional data into low-dimensional data by matrixcompression +e algorithm is fast in calculation and low incomplexity but it is not very good for dimensionality re-duction of complex nonlinear data [6] A binary particleswarm optimization (BPSO) algorithm proposed by Ebert in1997 can be used to solve the problem In recent years thealgorithm has been continuously improved and applied tofeature selection by researchers such as BPSO based on GABPSO based on average fitness BPSO combined with bac-terial algorithm and immune algorithm [7ndash9] In theseresearches the improvement of BPSO mostly focuses on thedesign of fitness function in particle swarm optimizationalgorithm without considering the characteristics of featureselection in the actual classification problem +erefore themaximum performance of the algorithm cannot be exploitedSome researchers propose to use SVM-RFE method to selectpart of the initial population of the particle swarm optimi-zation algorithm to reduce the search space of particles +ealgorithm can effectively improve the accuracy of classifica-tion and recognition [10] However the dimension of SVM-RFE algorithm is different each time which leads to therandomness of the final optimization result

Moreover due to the uncertainty of the feature opti-mization factor the optimization algorithm is not suitablefor generalization to feature optimization of new samples

An improved feature optimization algorithm Pearson-BPSO based on the traditional BPSO algorithm and Pearsoncorrelation coefficient was proposed in the paper +epurpose of this study is to test the feasibility of the newalgorithm We applied three different algorithms includingPCA BPSO and Pearson-BPSO to optimize features+en weevaluate the three optimization algorithms by the performanceof two different classification models based on support vectormachine (SVM) and random forest (RF) respectively +ecomparison of the results has shown that the new featureoptimization algorithmwas beneficial to improve the accuracyof the classifier +e next section ie Section 2 following theintroduction is based on the materials and methods Afterthis Section 3 contains the results+e discussion is actually inSection 4 and Section 5 is the concluding section

2 Materials and Methods

21 Signal Acquisition +e eNose device also known as anartificial olfactory system can simulate the biological ol-factory systems through the combination of gas sensor and

pattern recognition technology Its basic principle is to use agas sensor to simulate olfactory sensory nerve cells in abiological olfactory systems and use a computer or specialchips to process the collected information to achieve thepurpose of identifying gas or odor [11]

In the project the response curve of exhaled gas withsensors was collected by an eNose device named ILD3000which was designed by the UST Sensors GmbH Company ofGermany As shown in Figure 1 three different gas sensorsRS1 RS2 and RS3 are the core of the hardware system [12]Different people exhaled different gas composition and thusthe sensor response curves will also be distinct +e gassensors are the reactive part of the measuring system whereeach layer of the sensor possesses different sensitivities andselectivity for a variety of different gases at varying tem-peratures +e three gas sensors in the device are theGGS1000 series sensor which is sensitive to combustiblegases the GGS3000 series sensor which can detect hy-drocarbons especially for C1 C2 C8 and the GGS7000series sensor which can detect NO2 [13] +e controllabletemperature sensor Rt is used to provide a suitable tem-perature environment to improve the response ability of thesensor to the gas

As shown in Table 1 during the study the expiratorydata of 121 volunteers were collected in Renji Hospitalincluding 69 patients with hepatocellular carcinoma and 52healthy controls All expiratory data were collected volun-tarily During the collection process the disposable exha-lation nozzle was used which was completed in vitrowithout any interventional device and no harm to the hu-man body +e inclusion criteria of the volunteers were thatthe patients must be of primary liver cancer no othermetastatic cancer no respiratory diseases and no history ofsmoking and drinking in the past three months After thatthe collection should be carried out in case of fasting

As shown in Figure 2 in the test we could simulta-neously obtain three response curves (indicated by threedifferent colors yellow grey and orange) of exhaled gas anda temperature curve by the eNose device corresponding tothree different sensors+e temperature varies from 280degC to420degC and the response curve represents the response re-sistance of the sensor to different gases Compared with thetemperature the resistance value varies widely

22 Signal Preprocessing As shown in Figure 2 the valuesand amplitudes of the curves for the three sensors collectedin a single time varied greatly To facilitate the comparisonthe normalized processing method was used firstly to reducethe magnitude without changing the waveform state Eachcurves were transformed by formula (1) into relative valueswithin the range of [0 1] to simplify the subsequent analysis

yk(i j) yk(i j) minus min yk(i)( 1113857

max yk(i)( 1113857 minus min yk(i)( 1113857 (1)

where yk(i) represents the i-th sample curve collected by acertain sensor and the sample length is 60 k in yk(i) can beA B and C representing three sensors respectively i inyk(i) represents the i-th sample yk(i j) means that the j-th

2 Mobile Information Systems

point of the i-th sample curve +e value of j is from 0 to 59and min(yk(i)) max(yk(i)) represents the minimum andmaximum of the signal respectively

23 Features Extraction After the signal curve was nor-malized as many features as possible for each curve were thenextracted In the study we extracted time features frequency-domain features statistical features for each curve and rel-evant features between the three curves obtained by differentsensors +e 15 time-domain features were maximum valueand corresponding position minimum value and corre-sponding position mean peak-to-peak rectified mean

variance standard deviation waveform factor pulse factorpeak factor margin factor and area +e 14 frequency-do-main features included center of gravity frequency frequencyvariance root mean square difference spectrum and powerspectrum calculated by various methods +e 10 statisticalfeatures were extreme deviation median quantile and plu-rality coefficient of variation skewness kurtosis autocor-relation coefficient and information entropy In addition thetwo-by-two correlations between the three sensor signals werecalculated and three features were obtained Finally for onebreath test of each volunteer we combine all features of thethree curves and 2082 dimensional high-dimensional fea-tures could be achieved

Dat

aPr

oces

sing

Resistancevalues of three

channels

InputAmplifier

TemperaturecontrolTemperature control and collect

RS3

Rt

RS1 RS2

Gassensors

Gas response data

Temperaturevalues

Figure 1 Construction of electronic nose

Table 1 +e basic information of all volunteers

Male Female Total number AgeHepatocellular carcinoma 58 11 69 5558 + 1063Healthy control 35 17 52 5160 + 1457Total number 93 28 121 --

n0

50000

100000

150000

200000

250000

300000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Sensor

TSensorA

SensorBSensorC

Ω

Figure 2 Signal curves collected by different sensors of the eNose device

Mobile Information Systems 3

24 Feature Optimization In the classification task theinitial analysis of the sample feature base is to extract thefeatures that are most significant for distinguishing differentcategories from the original data while discarding thosefeatures that do not contribute much to the classification+us feature optimization actually removes irrelevant fac-tors and reduces their interference with the classification+e selection of the most optimal feature set can effectivelyreduce the dimensionality of the feature space +ereforefeature optimization can reduce the computational effortand increase the computational speed

As shown in Figure 3 the traditional BPSO featureoptimization algorithm was improved which not onlyconsiders the classification accuracy and the number offeatures but also makes full use of the feature of classifi-cation to consider the correlation between features andcategories Before using BPSO algorithm for feature selec-tion the correlation between features and categories wasfirstly calculated According to Pearson correlation coeffi-cient the first certain number of features with high corre-lation was selected It is important to note that the number offeatures selected can be set by experience In the study thenumber was taken as a relatively large number 1000 +enwith the optimization objectives of reducing the classifica-tion error rate and feature dimensionality of KNN thefitness function was constructed +erefore the optimalfeature subset was selected based on BPSO and the featureoptimization operator was determined +e specific flow ofthe algorithm is shown in Figure 3

241 Initial Screening of Features Based on Pearson Cor-relation Coefficient Pearson correlation coefficient wasproposed and evolved by the British statistician Karl Pearsonin the 1880s [14] +e coefficient can be used to measure thecorrelation (linear correlation) between two variables X andY and its value is between - 1 and 1

In the study the eigenvalue of each sample was regardedas the input variable x and the label of each sample wasregarded as the variable y +e Pearson correlation coeffi-cient could determine the degree of correlation between thelabel and each feature in a multidimensional feature set bycalculating the correlation between the input features andthe output labels +en according to the Pearson correlationcoefficient the preliminary screening of features could berealized

+e Pearson correlation coefficient was obtained by thefollowing formula

ρXY E(XY) minus E(X)E(Y)

E X2

1113872 1113873 minus E2(X)

1113969 E Y2

1113872 1113873 minus E2(Y)

1113969 (2)

where E represents the mathematical expectation andX andY represent the input feature and output labels respectively+e value of correlation coefficient is between minus1 and 1When the value of correlation coefficient is close to 0 there isno correlation between them When the value of correlationcoefficient is close to 1 there is a significant positive cor-relation between the feature and the label Similarly when

the value is close to minus1 there is a negative correlation be-tween the input variable and the label +at is when thevalue of an input feature rises the label will be classifiedchange

242 Feature Optimization Based on Recognition Error andFeature Dimension For a M times N dataset each row repre-sents a sample M rows represent M samples and N col-umns represent N features of a sample Feature optimizationwas essential to find the smallest possible subset of featuresamong these N features With the new features the highercorrect classification results could be ensured +e subset offeatures could be regarded as the optimized features

By calculating the optimized conversion factor moresample features could be optimized +e main steps [15 16]are as follows

Step 1 Set the features after initial screening as particles thefeature dimension as the dimension of particles and theinitial number of particles to 300 +e positions of theparticles and the individual optimal positions were ran-domly initialized using binary encoding

Step 2 +e fitness function for feature selection is con-structed based on the classification error rate and the op-timized feature dimension as shown in the followingformula

fitness(i) k1 times error(i) + k2 timesWei du(i)

D (3)

where fitness(i) is the fitness obtained based on particleserror(i) is the error rate of classifier recognition after featureselection based on particles D is the original feature di-mensionWei du(i) is the feature dimension selected basedon particles k1 and k2 are the weights of classifier recog-nition error rate and feature dimension optimization whichcan be taken as 08 and 02 respectively

Step 3 +e fitness value of each particle is calculatedaccording to Step 2 and the individual and global dynamicfactors and inertia weights are updated according to thefitness value as formulae (4) to (6)

c1 rand times 24 minus 14 timesiter

TNum1113874 1113875 (4)

c2 rand times 09 + 16 timesiter

TNum1113874 1113875 (5)

w wmax minus wmax minus wmin( 1113857 timesiter

TNum (6)

Among them c1 and c2are the dynamic factors of in-dividual adjustment and global adjustment w is the inertiaweight ran d is a random number [0 1] iter is the numberof iterations TNum is the preset number of iterations wmaxand wmin are the maximum inertia weight and minimuminertia weight respectively

4 Mobile Information Systems

According to formulae (4) to (6) the iterative updatevalue of velocity can be further calculated such as thefollowing formula

v(i + 1) w times v(i) + c1 times(p(i) minus x(i)) + c2 times(g minus x(i))(7)

v(i + 1) is the updated velocity value w is the dynamicinertia weight v(i) is the last velocity value and x(i) is thecurrent position p(i) is the individual optimal position andg is the global optimal position

Step 4 Multiple iterations are performed and the particlepositions are updated binarized (0 1) according to thevelocity definition condition using formulae (8) and (9)

sig vi( 1113857 1

1 + eminusv(i) (8)

xij(iter + 1) 0 if randge sig vij(iter + 1)1113872 11138731113872 1113873

1 otherwise1113890 (9)

In formula (8) a sigmoid function is used to map thevelocity to the interval [0 1] as a probability and this probabilityis the probability that the particle will take a value of 1 next

Also Xij (iter + 1) in formula (9) is the absolute prob-ability of a change in position

Step 5 Determine whether the maximum number of itera-tions has been reached If the number of iterations has reachedTNum then the optimized feature subset was got according tothe optimal position in the population history and the optimalposition record will be used as the feature optimization con-version operator otherwise return to Step 3

3 Results

After feature optimization it is necessary to evaluate theeffect of optimization by quantitative methods In the paperwe evaluate the effect of feature optimization on the per-formance of the classifier

First of all we obtain the feature optimization operatorbased on the training exhalation sample features and im-plement the feature optimization for the test samples +especific steps are as follows

First the collected two types of samples totaling 121 breathsignals containing healthy control and hepatocellular carci-noma patients were divided into a training set and a test setaftermultidimensional feature extraction+en the training set

Valid signal extraction

Signal Normalization

High-dimensional feature extraction

Initial screening of features based on Pearson correlationcoefficients and the determination of the screening factor P

Determine the feature optimization conversionfactors

Construction of Fitnessfunction

Number of iterationslt100

Update individual and global optimalvalues and positions

Pearson

BPSO

Figure 3 Flow diagram of the improved Pearson-BPSO feature optimization algorithm

Mobile Information Systems 5

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

response to realize the purpose of disease diagnosis based oneNose

In the process of constructing a disease classificationmodel using exhalation signals feature extraction is thefirst step In order not to lose information that mayaffect the accuracy of the test we usually extract as manyfeatures as possible [5] However this may cause redundancyof features increase in computation and decrease incomputation speed and accuracy +erefore the study offeature optimization algorithms has received increasingattention At present principal component analysis (PCA) iswidely used in feature optimization+e algorithm is a lineardimension reduction algorithm which transforms high-dimensional data into low-dimensional data by matrixcompression +e algorithm is fast in calculation and low incomplexity but it is not very good for dimensionality re-duction of complex nonlinear data [6] A binary particleswarm optimization (BPSO) algorithm proposed by Ebert in1997 can be used to solve the problem In recent years thealgorithm has been continuously improved and applied tofeature selection by researchers such as BPSO based on GABPSO based on average fitness BPSO combined with bac-terial algorithm and immune algorithm [7ndash9] In theseresearches the improvement of BPSO mostly focuses on thedesign of fitness function in particle swarm optimizationalgorithm without considering the characteristics of featureselection in the actual classification problem +erefore themaximum performance of the algorithm cannot be exploitedSome researchers propose to use SVM-RFE method to selectpart of the initial population of the particle swarm optimi-zation algorithm to reduce the search space of particles +ealgorithm can effectively improve the accuracy of classifica-tion and recognition [10] However the dimension of SVM-RFE algorithm is different each time which leads to therandomness of the final optimization result

Moreover due to the uncertainty of the feature opti-mization factor the optimization algorithm is not suitablefor generalization to feature optimization of new samples

An improved feature optimization algorithm Pearson-BPSO based on the traditional BPSO algorithm and Pearsoncorrelation coefficient was proposed in the paper +epurpose of this study is to test the feasibility of the newalgorithm We applied three different algorithms includingPCA BPSO and Pearson-BPSO to optimize features+en weevaluate the three optimization algorithms by the performanceof two different classification models based on support vectormachine (SVM) and random forest (RF) respectively +ecomparison of the results has shown that the new featureoptimization algorithmwas beneficial to improve the accuracyof the classifier +e next section ie Section 2 following theintroduction is based on the materials and methods Afterthis Section 3 contains the results+e discussion is actually inSection 4 and Section 5 is the concluding section

2 Materials and Methods

21 Signal Acquisition +e eNose device also known as anartificial olfactory system can simulate the biological ol-factory systems through the combination of gas sensor and

pattern recognition technology Its basic principle is to use agas sensor to simulate olfactory sensory nerve cells in abiological olfactory systems and use a computer or specialchips to process the collected information to achieve thepurpose of identifying gas or odor [11]

In the project the response curve of exhaled gas withsensors was collected by an eNose device named ILD3000which was designed by the UST Sensors GmbH Company ofGermany As shown in Figure 1 three different gas sensorsRS1 RS2 and RS3 are the core of the hardware system [12]Different people exhaled different gas composition and thusthe sensor response curves will also be distinct +e gassensors are the reactive part of the measuring system whereeach layer of the sensor possesses different sensitivities andselectivity for a variety of different gases at varying tem-peratures +e three gas sensors in the device are theGGS1000 series sensor which is sensitive to combustiblegases the GGS3000 series sensor which can detect hy-drocarbons especially for C1 C2 C8 and the GGS7000series sensor which can detect NO2 [13] +e controllabletemperature sensor Rt is used to provide a suitable tem-perature environment to improve the response ability of thesensor to the gas

As shown in Table 1 during the study the expiratorydata of 121 volunteers were collected in Renji Hospitalincluding 69 patients with hepatocellular carcinoma and 52healthy controls All expiratory data were collected volun-tarily During the collection process the disposable exha-lation nozzle was used which was completed in vitrowithout any interventional device and no harm to the hu-man body +e inclusion criteria of the volunteers were thatthe patients must be of primary liver cancer no othermetastatic cancer no respiratory diseases and no history ofsmoking and drinking in the past three months After thatthe collection should be carried out in case of fasting

As shown in Figure 2 in the test we could simulta-neously obtain three response curves (indicated by threedifferent colors yellow grey and orange) of exhaled gas anda temperature curve by the eNose device corresponding tothree different sensors+e temperature varies from 280degC to420degC and the response curve represents the response re-sistance of the sensor to different gases Compared with thetemperature the resistance value varies widely

22 Signal Preprocessing As shown in Figure 2 the valuesand amplitudes of the curves for the three sensors collectedin a single time varied greatly To facilitate the comparisonthe normalized processing method was used firstly to reducethe magnitude without changing the waveform state Eachcurves were transformed by formula (1) into relative valueswithin the range of [0 1] to simplify the subsequent analysis

yk(i j) yk(i j) minus min yk(i)( 1113857

max yk(i)( 1113857 minus min yk(i)( 1113857 (1)

where yk(i) represents the i-th sample curve collected by acertain sensor and the sample length is 60 k in yk(i) can beA B and C representing three sensors respectively i inyk(i) represents the i-th sample yk(i j) means that the j-th

2 Mobile Information Systems

point of the i-th sample curve +e value of j is from 0 to 59and min(yk(i)) max(yk(i)) represents the minimum andmaximum of the signal respectively

23 Features Extraction After the signal curve was nor-malized as many features as possible for each curve were thenextracted In the study we extracted time features frequency-domain features statistical features for each curve and rel-evant features between the three curves obtained by differentsensors +e 15 time-domain features were maximum valueand corresponding position minimum value and corre-sponding position mean peak-to-peak rectified mean

variance standard deviation waveform factor pulse factorpeak factor margin factor and area +e 14 frequency-do-main features included center of gravity frequency frequencyvariance root mean square difference spectrum and powerspectrum calculated by various methods +e 10 statisticalfeatures were extreme deviation median quantile and plu-rality coefficient of variation skewness kurtosis autocor-relation coefficient and information entropy In addition thetwo-by-two correlations between the three sensor signals werecalculated and three features were obtained Finally for onebreath test of each volunteer we combine all features of thethree curves and 2082 dimensional high-dimensional fea-tures could be achieved

Dat

aPr

oces

sing

Resistancevalues of three

channels

InputAmplifier

TemperaturecontrolTemperature control and collect

RS3

Rt

RS1 RS2

Gassensors

Gas response data

Temperaturevalues

Figure 1 Construction of electronic nose

Table 1 +e basic information of all volunteers

Male Female Total number AgeHepatocellular carcinoma 58 11 69 5558 + 1063Healthy control 35 17 52 5160 + 1457Total number 93 28 121 --

n0

50000

100000

150000

200000

250000

300000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Sensor

TSensorA

SensorBSensorC

Ω

Figure 2 Signal curves collected by different sensors of the eNose device

Mobile Information Systems 3

24 Feature Optimization In the classification task theinitial analysis of the sample feature base is to extract thefeatures that are most significant for distinguishing differentcategories from the original data while discarding thosefeatures that do not contribute much to the classification+us feature optimization actually removes irrelevant fac-tors and reduces their interference with the classification+e selection of the most optimal feature set can effectivelyreduce the dimensionality of the feature space +ereforefeature optimization can reduce the computational effortand increase the computational speed

As shown in Figure 3 the traditional BPSO featureoptimization algorithm was improved which not onlyconsiders the classification accuracy and the number offeatures but also makes full use of the feature of classifi-cation to consider the correlation between features andcategories Before using BPSO algorithm for feature selec-tion the correlation between features and categories wasfirstly calculated According to Pearson correlation coeffi-cient the first certain number of features with high corre-lation was selected It is important to note that the number offeatures selected can be set by experience In the study thenumber was taken as a relatively large number 1000 +enwith the optimization objectives of reducing the classifica-tion error rate and feature dimensionality of KNN thefitness function was constructed +erefore the optimalfeature subset was selected based on BPSO and the featureoptimization operator was determined +e specific flow ofthe algorithm is shown in Figure 3

241 Initial Screening of Features Based on Pearson Cor-relation Coefficient Pearson correlation coefficient wasproposed and evolved by the British statistician Karl Pearsonin the 1880s [14] +e coefficient can be used to measure thecorrelation (linear correlation) between two variables X andY and its value is between - 1 and 1

In the study the eigenvalue of each sample was regardedas the input variable x and the label of each sample wasregarded as the variable y +e Pearson correlation coeffi-cient could determine the degree of correlation between thelabel and each feature in a multidimensional feature set bycalculating the correlation between the input features andthe output labels +en according to the Pearson correlationcoefficient the preliminary screening of features could berealized

+e Pearson correlation coefficient was obtained by thefollowing formula

ρXY E(XY) minus E(X)E(Y)

E X2

1113872 1113873 minus E2(X)

1113969 E Y2

1113872 1113873 minus E2(Y)

1113969 (2)

where E represents the mathematical expectation andX andY represent the input feature and output labels respectively+e value of correlation coefficient is between minus1 and 1When the value of correlation coefficient is close to 0 there isno correlation between them When the value of correlationcoefficient is close to 1 there is a significant positive cor-relation between the feature and the label Similarly when

the value is close to minus1 there is a negative correlation be-tween the input variable and the label +at is when thevalue of an input feature rises the label will be classifiedchange

242 Feature Optimization Based on Recognition Error andFeature Dimension For a M times N dataset each row repre-sents a sample M rows represent M samples and N col-umns represent N features of a sample Feature optimizationwas essential to find the smallest possible subset of featuresamong these N features With the new features the highercorrect classification results could be ensured +e subset offeatures could be regarded as the optimized features

By calculating the optimized conversion factor moresample features could be optimized +e main steps [15 16]are as follows

Step 1 Set the features after initial screening as particles thefeature dimension as the dimension of particles and theinitial number of particles to 300 +e positions of theparticles and the individual optimal positions were ran-domly initialized using binary encoding

Step 2 +e fitness function for feature selection is con-structed based on the classification error rate and the op-timized feature dimension as shown in the followingformula

fitness(i) k1 times error(i) + k2 timesWei du(i)

D (3)

where fitness(i) is the fitness obtained based on particleserror(i) is the error rate of classifier recognition after featureselection based on particles D is the original feature di-mensionWei du(i) is the feature dimension selected basedon particles k1 and k2 are the weights of classifier recog-nition error rate and feature dimension optimization whichcan be taken as 08 and 02 respectively

Step 3 +e fitness value of each particle is calculatedaccording to Step 2 and the individual and global dynamicfactors and inertia weights are updated according to thefitness value as formulae (4) to (6)

c1 rand times 24 minus 14 timesiter

TNum1113874 1113875 (4)

c2 rand times 09 + 16 timesiter

TNum1113874 1113875 (5)

w wmax minus wmax minus wmin( 1113857 timesiter

TNum (6)

Among them c1 and c2are the dynamic factors of in-dividual adjustment and global adjustment w is the inertiaweight ran d is a random number [0 1] iter is the numberof iterations TNum is the preset number of iterations wmaxand wmin are the maximum inertia weight and minimuminertia weight respectively

4 Mobile Information Systems

According to formulae (4) to (6) the iterative updatevalue of velocity can be further calculated such as thefollowing formula

v(i + 1) w times v(i) + c1 times(p(i) minus x(i)) + c2 times(g minus x(i))(7)

v(i + 1) is the updated velocity value w is the dynamicinertia weight v(i) is the last velocity value and x(i) is thecurrent position p(i) is the individual optimal position andg is the global optimal position

Step 4 Multiple iterations are performed and the particlepositions are updated binarized (0 1) according to thevelocity definition condition using formulae (8) and (9)

sig vi( 1113857 1

1 + eminusv(i) (8)

xij(iter + 1) 0 if randge sig vij(iter + 1)1113872 11138731113872 1113873

1 otherwise1113890 (9)

In formula (8) a sigmoid function is used to map thevelocity to the interval [0 1] as a probability and this probabilityis the probability that the particle will take a value of 1 next

Also Xij (iter + 1) in formula (9) is the absolute prob-ability of a change in position

Step 5 Determine whether the maximum number of itera-tions has been reached If the number of iterations has reachedTNum then the optimized feature subset was got according tothe optimal position in the population history and the optimalposition record will be used as the feature optimization con-version operator otherwise return to Step 3

3 Results

After feature optimization it is necessary to evaluate theeffect of optimization by quantitative methods In the paperwe evaluate the effect of feature optimization on the per-formance of the classifier

First of all we obtain the feature optimization operatorbased on the training exhalation sample features and im-plement the feature optimization for the test samples +especific steps are as follows

First the collected two types of samples totaling 121 breathsignals containing healthy control and hepatocellular carci-noma patients were divided into a training set and a test setaftermultidimensional feature extraction+en the training set

Valid signal extraction

Signal Normalization

High-dimensional feature extraction

Initial screening of features based on Pearson correlationcoefficients and the determination of the screening factor P

Determine the feature optimization conversionfactors

Construction of Fitnessfunction

Number of iterationslt100

Update individual and global optimalvalues and positions

Pearson

BPSO

Figure 3 Flow diagram of the improved Pearson-BPSO feature optimization algorithm

Mobile Information Systems 5

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

point of the i-th sample curve +e value of j is from 0 to 59and min(yk(i)) max(yk(i)) represents the minimum andmaximum of the signal respectively

23 Features Extraction After the signal curve was nor-malized as many features as possible for each curve were thenextracted In the study we extracted time features frequency-domain features statistical features for each curve and rel-evant features between the three curves obtained by differentsensors +e 15 time-domain features were maximum valueand corresponding position minimum value and corre-sponding position mean peak-to-peak rectified mean

variance standard deviation waveform factor pulse factorpeak factor margin factor and area +e 14 frequency-do-main features included center of gravity frequency frequencyvariance root mean square difference spectrum and powerspectrum calculated by various methods +e 10 statisticalfeatures were extreme deviation median quantile and plu-rality coefficient of variation skewness kurtosis autocor-relation coefficient and information entropy In addition thetwo-by-two correlations between the three sensor signals werecalculated and three features were obtained Finally for onebreath test of each volunteer we combine all features of thethree curves and 2082 dimensional high-dimensional fea-tures could be achieved

Dat

aPr

oces

sing

Resistancevalues of three

channels

InputAmplifier

TemperaturecontrolTemperature control and collect

RS3

Rt

RS1 RS2

Gassensors

Gas response data

Temperaturevalues

Figure 1 Construction of electronic nose

Table 1 +e basic information of all volunteers

Male Female Total number AgeHepatocellular carcinoma 58 11 69 5558 + 1063Healthy control 35 17 52 5160 + 1457Total number 93 28 121 --

n0

50000

100000

150000

200000

250000

300000

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Sensor

TSensorA

SensorBSensorC

Ω

Figure 2 Signal curves collected by different sensors of the eNose device

Mobile Information Systems 3

24 Feature Optimization In the classification task theinitial analysis of the sample feature base is to extract thefeatures that are most significant for distinguishing differentcategories from the original data while discarding thosefeatures that do not contribute much to the classification+us feature optimization actually removes irrelevant fac-tors and reduces their interference with the classification+e selection of the most optimal feature set can effectivelyreduce the dimensionality of the feature space +ereforefeature optimization can reduce the computational effortand increase the computational speed

As shown in Figure 3 the traditional BPSO featureoptimization algorithm was improved which not onlyconsiders the classification accuracy and the number offeatures but also makes full use of the feature of classifi-cation to consider the correlation between features andcategories Before using BPSO algorithm for feature selec-tion the correlation between features and categories wasfirstly calculated According to Pearson correlation coeffi-cient the first certain number of features with high corre-lation was selected It is important to note that the number offeatures selected can be set by experience In the study thenumber was taken as a relatively large number 1000 +enwith the optimization objectives of reducing the classifica-tion error rate and feature dimensionality of KNN thefitness function was constructed +erefore the optimalfeature subset was selected based on BPSO and the featureoptimization operator was determined +e specific flow ofthe algorithm is shown in Figure 3

241 Initial Screening of Features Based on Pearson Cor-relation Coefficient Pearson correlation coefficient wasproposed and evolved by the British statistician Karl Pearsonin the 1880s [14] +e coefficient can be used to measure thecorrelation (linear correlation) between two variables X andY and its value is between - 1 and 1

In the study the eigenvalue of each sample was regardedas the input variable x and the label of each sample wasregarded as the variable y +e Pearson correlation coeffi-cient could determine the degree of correlation between thelabel and each feature in a multidimensional feature set bycalculating the correlation between the input features andthe output labels +en according to the Pearson correlationcoefficient the preliminary screening of features could berealized

+e Pearson correlation coefficient was obtained by thefollowing formula

ρXY E(XY) minus E(X)E(Y)

E X2

1113872 1113873 minus E2(X)

1113969 E Y2

1113872 1113873 minus E2(Y)

1113969 (2)

where E represents the mathematical expectation andX andY represent the input feature and output labels respectively+e value of correlation coefficient is between minus1 and 1When the value of correlation coefficient is close to 0 there isno correlation between them When the value of correlationcoefficient is close to 1 there is a significant positive cor-relation between the feature and the label Similarly when

the value is close to minus1 there is a negative correlation be-tween the input variable and the label +at is when thevalue of an input feature rises the label will be classifiedchange

242 Feature Optimization Based on Recognition Error andFeature Dimension For a M times N dataset each row repre-sents a sample M rows represent M samples and N col-umns represent N features of a sample Feature optimizationwas essential to find the smallest possible subset of featuresamong these N features With the new features the highercorrect classification results could be ensured +e subset offeatures could be regarded as the optimized features

By calculating the optimized conversion factor moresample features could be optimized +e main steps [15 16]are as follows

Step 1 Set the features after initial screening as particles thefeature dimension as the dimension of particles and theinitial number of particles to 300 +e positions of theparticles and the individual optimal positions were ran-domly initialized using binary encoding

Step 2 +e fitness function for feature selection is con-structed based on the classification error rate and the op-timized feature dimension as shown in the followingformula

fitness(i) k1 times error(i) + k2 timesWei du(i)

D (3)

where fitness(i) is the fitness obtained based on particleserror(i) is the error rate of classifier recognition after featureselection based on particles D is the original feature di-mensionWei du(i) is the feature dimension selected basedon particles k1 and k2 are the weights of classifier recog-nition error rate and feature dimension optimization whichcan be taken as 08 and 02 respectively

Step 3 +e fitness value of each particle is calculatedaccording to Step 2 and the individual and global dynamicfactors and inertia weights are updated according to thefitness value as formulae (4) to (6)

c1 rand times 24 minus 14 timesiter

TNum1113874 1113875 (4)

c2 rand times 09 + 16 timesiter

TNum1113874 1113875 (5)

w wmax minus wmax minus wmin( 1113857 timesiter

TNum (6)

Among them c1 and c2are the dynamic factors of in-dividual adjustment and global adjustment w is the inertiaweight ran d is a random number [0 1] iter is the numberof iterations TNum is the preset number of iterations wmaxand wmin are the maximum inertia weight and minimuminertia weight respectively

4 Mobile Information Systems

According to formulae (4) to (6) the iterative updatevalue of velocity can be further calculated such as thefollowing formula

v(i + 1) w times v(i) + c1 times(p(i) minus x(i)) + c2 times(g minus x(i))(7)

v(i + 1) is the updated velocity value w is the dynamicinertia weight v(i) is the last velocity value and x(i) is thecurrent position p(i) is the individual optimal position andg is the global optimal position

Step 4 Multiple iterations are performed and the particlepositions are updated binarized (0 1) according to thevelocity definition condition using formulae (8) and (9)

sig vi( 1113857 1

1 + eminusv(i) (8)

xij(iter + 1) 0 if randge sig vij(iter + 1)1113872 11138731113872 1113873

1 otherwise1113890 (9)

In formula (8) a sigmoid function is used to map thevelocity to the interval [0 1] as a probability and this probabilityis the probability that the particle will take a value of 1 next

Also Xij (iter + 1) in formula (9) is the absolute prob-ability of a change in position

Step 5 Determine whether the maximum number of itera-tions has been reached If the number of iterations has reachedTNum then the optimized feature subset was got according tothe optimal position in the population history and the optimalposition record will be used as the feature optimization con-version operator otherwise return to Step 3

3 Results

After feature optimization it is necessary to evaluate theeffect of optimization by quantitative methods In the paperwe evaluate the effect of feature optimization on the per-formance of the classifier

First of all we obtain the feature optimization operatorbased on the training exhalation sample features and im-plement the feature optimization for the test samples +especific steps are as follows

First the collected two types of samples totaling 121 breathsignals containing healthy control and hepatocellular carci-noma patients were divided into a training set and a test setaftermultidimensional feature extraction+en the training set

Valid signal extraction

Signal Normalization

High-dimensional feature extraction

Initial screening of features based on Pearson correlationcoefficients and the determination of the screening factor P

Determine the feature optimization conversionfactors

Construction of Fitnessfunction

Number of iterationslt100

Update individual and global optimalvalues and positions

Pearson

BPSO

Figure 3 Flow diagram of the improved Pearson-BPSO feature optimization algorithm

Mobile Information Systems 5

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

24 Feature Optimization In the classification task theinitial analysis of the sample feature base is to extract thefeatures that are most significant for distinguishing differentcategories from the original data while discarding thosefeatures that do not contribute much to the classification+us feature optimization actually removes irrelevant fac-tors and reduces their interference with the classification+e selection of the most optimal feature set can effectivelyreduce the dimensionality of the feature space +ereforefeature optimization can reduce the computational effortand increase the computational speed

As shown in Figure 3 the traditional BPSO featureoptimization algorithm was improved which not onlyconsiders the classification accuracy and the number offeatures but also makes full use of the feature of classifi-cation to consider the correlation between features andcategories Before using BPSO algorithm for feature selec-tion the correlation between features and categories wasfirstly calculated According to Pearson correlation coeffi-cient the first certain number of features with high corre-lation was selected It is important to note that the number offeatures selected can be set by experience In the study thenumber was taken as a relatively large number 1000 +enwith the optimization objectives of reducing the classifica-tion error rate and feature dimensionality of KNN thefitness function was constructed +erefore the optimalfeature subset was selected based on BPSO and the featureoptimization operator was determined +e specific flow ofthe algorithm is shown in Figure 3

241 Initial Screening of Features Based on Pearson Cor-relation Coefficient Pearson correlation coefficient wasproposed and evolved by the British statistician Karl Pearsonin the 1880s [14] +e coefficient can be used to measure thecorrelation (linear correlation) between two variables X andY and its value is between - 1 and 1

In the study the eigenvalue of each sample was regardedas the input variable x and the label of each sample wasregarded as the variable y +e Pearson correlation coeffi-cient could determine the degree of correlation between thelabel and each feature in a multidimensional feature set bycalculating the correlation between the input features andthe output labels +en according to the Pearson correlationcoefficient the preliminary screening of features could berealized

+e Pearson correlation coefficient was obtained by thefollowing formula

ρXY E(XY) minus E(X)E(Y)

E X2

1113872 1113873 minus E2(X)

1113969 E Y2

1113872 1113873 minus E2(Y)

1113969 (2)

where E represents the mathematical expectation andX andY represent the input feature and output labels respectively+e value of correlation coefficient is between minus1 and 1When the value of correlation coefficient is close to 0 there isno correlation between them When the value of correlationcoefficient is close to 1 there is a significant positive cor-relation between the feature and the label Similarly when

the value is close to minus1 there is a negative correlation be-tween the input variable and the label +at is when thevalue of an input feature rises the label will be classifiedchange

242 Feature Optimization Based on Recognition Error andFeature Dimension For a M times N dataset each row repre-sents a sample M rows represent M samples and N col-umns represent N features of a sample Feature optimizationwas essential to find the smallest possible subset of featuresamong these N features With the new features the highercorrect classification results could be ensured +e subset offeatures could be regarded as the optimized features

By calculating the optimized conversion factor moresample features could be optimized +e main steps [15 16]are as follows

Step 1 Set the features after initial screening as particles thefeature dimension as the dimension of particles and theinitial number of particles to 300 +e positions of theparticles and the individual optimal positions were ran-domly initialized using binary encoding

Step 2 +e fitness function for feature selection is con-structed based on the classification error rate and the op-timized feature dimension as shown in the followingformula

fitness(i) k1 times error(i) + k2 timesWei du(i)

D (3)

where fitness(i) is the fitness obtained based on particleserror(i) is the error rate of classifier recognition after featureselection based on particles D is the original feature di-mensionWei du(i) is the feature dimension selected basedon particles k1 and k2 are the weights of classifier recog-nition error rate and feature dimension optimization whichcan be taken as 08 and 02 respectively

Step 3 +e fitness value of each particle is calculatedaccording to Step 2 and the individual and global dynamicfactors and inertia weights are updated according to thefitness value as formulae (4) to (6)

c1 rand times 24 minus 14 timesiter

TNum1113874 1113875 (4)

c2 rand times 09 + 16 timesiter

TNum1113874 1113875 (5)

w wmax minus wmax minus wmin( 1113857 timesiter

TNum (6)

Among them c1 and c2are the dynamic factors of in-dividual adjustment and global adjustment w is the inertiaweight ran d is a random number [0 1] iter is the numberof iterations TNum is the preset number of iterations wmaxand wmin are the maximum inertia weight and minimuminertia weight respectively

4 Mobile Information Systems

According to formulae (4) to (6) the iterative updatevalue of velocity can be further calculated such as thefollowing formula

v(i + 1) w times v(i) + c1 times(p(i) minus x(i)) + c2 times(g minus x(i))(7)

v(i + 1) is the updated velocity value w is the dynamicinertia weight v(i) is the last velocity value and x(i) is thecurrent position p(i) is the individual optimal position andg is the global optimal position

Step 4 Multiple iterations are performed and the particlepositions are updated binarized (0 1) according to thevelocity definition condition using formulae (8) and (9)

sig vi( 1113857 1

1 + eminusv(i) (8)

xij(iter + 1) 0 if randge sig vij(iter + 1)1113872 11138731113872 1113873

1 otherwise1113890 (9)

In formula (8) a sigmoid function is used to map thevelocity to the interval [0 1] as a probability and this probabilityis the probability that the particle will take a value of 1 next

Also Xij (iter + 1) in formula (9) is the absolute prob-ability of a change in position

Step 5 Determine whether the maximum number of itera-tions has been reached If the number of iterations has reachedTNum then the optimized feature subset was got according tothe optimal position in the population history and the optimalposition record will be used as the feature optimization con-version operator otherwise return to Step 3

3 Results

After feature optimization it is necessary to evaluate theeffect of optimization by quantitative methods In the paperwe evaluate the effect of feature optimization on the per-formance of the classifier

First of all we obtain the feature optimization operatorbased on the training exhalation sample features and im-plement the feature optimization for the test samples +especific steps are as follows

First the collected two types of samples totaling 121 breathsignals containing healthy control and hepatocellular carci-noma patients were divided into a training set and a test setaftermultidimensional feature extraction+en the training set

Valid signal extraction

Signal Normalization

High-dimensional feature extraction

Initial screening of features based on Pearson correlationcoefficients and the determination of the screening factor P

Determine the feature optimization conversionfactors

Construction of Fitnessfunction

Number of iterationslt100

Update individual and global optimalvalues and positions

Pearson

BPSO

Figure 3 Flow diagram of the improved Pearson-BPSO feature optimization algorithm

Mobile Information Systems 5

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

According to formulae (4) to (6) the iterative updatevalue of velocity can be further calculated such as thefollowing formula

v(i + 1) w times v(i) + c1 times(p(i) minus x(i)) + c2 times(g minus x(i))(7)

v(i + 1) is the updated velocity value w is the dynamicinertia weight v(i) is the last velocity value and x(i) is thecurrent position p(i) is the individual optimal position andg is the global optimal position

Step 4 Multiple iterations are performed and the particlepositions are updated binarized (0 1) according to thevelocity definition condition using formulae (8) and (9)

sig vi( 1113857 1

1 + eminusv(i) (8)

xij(iter + 1) 0 if randge sig vij(iter + 1)1113872 11138731113872 1113873

1 otherwise1113890 (9)

In formula (8) a sigmoid function is used to map thevelocity to the interval [0 1] as a probability and this probabilityis the probability that the particle will take a value of 1 next

Also Xij (iter + 1) in formula (9) is the absolute prob-ability of a change in position

Step 5 Determine whether the maximum number of itera-tions has been reached If the number of iterations has reachedTNum then the optimized feature subset was got according tothe optimal position in the population history and the optimalposition record will be used as the feature optimization con-version operator otherwise return to Step 3

3 Results

After feature optimization it is necessary to evaluate theeffect of optimization by quantitative methods In the paperwe evaluate the effect of feature optimization on the per-formance of the classifier

First of all we obtain the feature optimization operatorbased on the training exhalation sample features and im-plement the feature optimization for the test samples +especific steps are as follows

First the collected two types of samples totaling 121 breathsignals containing healthy control and hepatocellular carci-noma patients were divided into a training set and a test setaftermultidimensional feature extraction+en the training set

Valid signal extraction

Signal Normalization

High-dimensional feature extraction

Initial screening of features based on Pearson correlationcoefficients and the determination of the screening factor P

Determine the feature optimization conversionfactors

Construction of Fitnessfunction

Number of iterationslt100

Update individual and global optimalvalues and positions

Pearson

BPSO

Figure 3 Flow diagram of the improved Pearson-BPSO feature optimization algorithm

Mobile Information Systems 5

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

was used to determine the feature optimization operator +especific method was as follows using tag value 1 (representinghepatocellular carcinoma patients) and tag value 0 (repre-senting healthy controls) to construct a tag array Taking it asthe dependent variable y and the high-dimensional samplefeature array as the variable x the interrelationships betweensample features and categories were calculated by Pearsoncorrelation analysis +us the sample feature groups weresorted by the absolute values of Pearson correlation coefficientsand the top 1000-dimensional features were retained Fur-thermore the fitness function was constructed by KNN clas-sification error rate and feature dimension and the optimalsubset was achieved based on BPSO And meanwhile thefeature optimization conversion factor was obtained After-wards feature optimization was performed on the test set usingthe feature optimization operator derived in the above steps

Once the feature optimization was completed the nextstep was to build the classifier

Two different classifiers were constructed to obtain amore respectable evaluation One was a classifier built basedon the support vector mechanism (SVM classifier) and theother was a classifier built based on the random forestmethod (RF classifier)

Here we applied two different classifiers to classify anddetect the optimized features processed by three variousoptimizationmethods By comparing the performance of theclassifiers we found that the Pearson-BPSO is more effectivein classification compared to the other two traditionalfeature optimization methods PCA and BPSO

31 Performance Comparison of Pearson-BPSO and BPSOTo compare the feature optimization effect of the improvedPearson-BPSO and the traditional BPSO the search for theoptimal subset of features and the determination of thefeature transformation factor were performed based on theabove two algorithms respectively

Figure 4 shows the adaptation curves of the two BPSOalgorithms in 100 iterations +e horizontal coordinate in thefigure represents the number of iterations and the maximumsetting is 100 the vertical coordinate is the fitness value and thesmaller value means better optimization performance Amongthem Figure 4(a) shows the adaptation curve of the improvedPearson-BPSO algorithm Based on the optimized fitnessfeature dimension could be reduced to 251 and the adaptationvalue could be lower than 0045 Figure 4(b) shows the ad-aptation curve of the traditional BPSO algorithm with theoptimized feature dimension of 712 and the optimal adaptationvalue of about 008

32 Classification Performance +e performance of thefeature mapping optimization algorithm can be reflected bythe performance of the classifier Here we calculated theperformance of the SVM classifier and RF classifier tocompare the performance of the optimization algorithms

After the feature extraction of the original samples thepositive and negative samples were divided into 10-fold andcombined into 10 groups of sample data Onefold of the data(about 7 cases of liver and 5 cases of control) was taken as the

test sample each time and the rest of the samples were takenas the training samples +en the feature transformationfactors calculated by the improved feature optimizationalgorithm Pearson-BPSO the traditional BPSO and thePCA optimization algorithm were used to optimize andreduce the feature dimension of the training samples and testsamples respectively to obtain different optimized featuredatasets Furthermore the classifiers were constructed basedon SVM and RF respectively and the classification per-formance was calculated for each time +e process wasrepeated ten times and different onefold data was taken astest samples in turn and the classification performance wascalculated separately for each time Finally the average ofeach performance was obtained as the performance metricsof the two classifiers under the three different feature op-timization algorithms as shown in Tables 2 and 3

From Table 2 we found that the best accuracy was 8603and the best sensitivity was 9079 when the Pearson-BPSOfeature optimization was applied From Table 3 we found thatthe best accuracy was 90 and the best sensitivity was 9483when the Pearson-BPSO feature optimization was applied

+e performance indicators in Table 2 and 3 include thefollowing Acc is used to measure the accuracy of theclassifier in correctly classifying samples Sens represents thesensitivity of the classifier in recognizing hepatocellularcarcinoma samples Spec is the specificity of the classifier inrecognizing normal samples F-score represents the com-prehensive performance of the classifier and the higher theF-score value the better the performance of the classifier

4 Discussion

According to the mechanism of breath testing due to path-ological reasons the metabolism of hepatocellular carcinomapatients will change and the composition of exhaled gas willalso change +erefore classification and recognition of ex-halation data of patients with hepatocellular carcinoma andhealthy people were the most important work of intelligentdetection of hepatocellular carcinoma In the study we dis-tinguished hepatocellular carcinoma by constructing a di-chotomous model to distinguish the breath signals ofhepatocellular carcinoma patients and healthy individuals

+e findings from the reviewed studies were consistentwith other studies that have shown that volatile breathbiomarkers can discriminate persons with malignant solidtumors from noncancer control subjects [17] Howeverthere is no clear conclusion on the types of volatile markergases for hepatocellular carcinoma +e present study isbased on the fact that the collection device used can respondto a large number of volatile exhaled gases including pos-sible hepatocellular carcinoma specific exhaled gases amongthem [18] We do not need to know the specific type of gaswe just need to record the overall response of the exhaledgases containing some specific gases Here we attempted toconstruct a dichotomous classifier using the differentcharacteristics of the integrated response curves of exhaledgas in healthy individuals and hepatocellular carcinomapatients However to improve the validity of the exhaled gasresponse detection we need to use GC-MS to further

6 Mobile Information Systems

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

compare and analyze exhaled gas from patients and healthyindividuals and to determine the specific gas species+us ahighly sensitive eNose specifically designed to detect somediseases then can be designed

In addition the devices used in the study have not beenapplied to the clinic and there are no clear internationalstandards for the manner and criteria of exhaled gas col-lection +e amount and sources of clinical data utilized inthe data analysis are relatively limited Some advanced in-telligent algorithms [19] such as deep learning which werebased on big data cannot be utilized Hence the estab-lishment of a breath database is an essential step to advancethe clinical application of eNose research It depends on theestablishment of international unified standards for breathcollection +e collection criteria include the type of gascollected the collection method the patientrsquos age genderdiet or not and even race and other more comprehensiveinformation needs to be collected

+e study is still in the exploratory stage the amount ofdata collected is limited and the results of clinical analysis

may be one-sided Due to the limited nature of the sampletraditional machine learning algorithms were used in thestudy for classification ie signal preprocessing and featureextraction and classification model construction to distin-guish hepatocellular carcinoma patients from healthy in-dividuals In the specific work because the humanexhalation signal collected by the eNose device has a largeinterindividual and individual variability at different mo-ments which cannot visually and effectively distinguish thedata of hepatocellular carcinoma patients from other healthdata the signal is firstly subjected to feature extraction +eextraction of signal curvesrsquo features helps discover morepotential information However high-dimensional featuresmay lead to the degradation of classification accuracy andslow computation so the optimization of features becomes ahot topic of research

A way to measure whether feature optimization is moreeffective is to feed the same features optimized by differentalgorithms into the same classifier and test the classificationperformance In the study the tenfold cross-validationmethod was used in the data analysis taking into accountthat the performance of the constructed classifiers variedwhen different training samples were used and the averageperformance was used as the final measure In additionbased on the randomness of selection when dividing sam-ples an imbalance between positive and negative samples intraining and testing samples may occur To keep the samplesconsistent a stratified screening method is used +at is thepositive and negative samples are divided by tenfold sepa-rately and the divided data is then further combined intotraining and testing samples

In addition to evaluate the generalization ability of theoptimization algorithm the KNN classifier used in the al-gorithm for optimizing the fitness was avoided whenselecting the classifier and the SVM classifier and RFclassifier were chosen instead From the tables we found thatthe classifiers applying the improved optimization algorithm

0 10 20 30 40 50 60 70 80 90 100004

0045

005

0055

006

0065

007

0075

008

Number of iterations

Fitness curveFi

tnes

s val

ue

(a)

0 10 20 30 40 50 60 70 80 90 100008

0085

009

0095

01

0105

011

Number of iterations

Fitn

ess v

alue

Fitness curve

(b)

Figure 4 Adaptation curves (a) Pearson-BPSO (b) BPSO

Table 2 Performance of SVM classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 8603 9079 7864 0878BPSO 7769 8798 6538 0811PCA 7519 8663 7356 0795

Table 3 Performance of RF classifier based on three differentfeature optimization algorithms

Algorithms Acc Sens Spec F-scorePearson-BPSO 90 9483 8517 0841BPSO 8442 8780 8140 0800PCA 8249 9090 7633 0832

Mobile Information Systems 7

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

Pearson-BPSO all outperformed the other optimizationalgorithms

Although there is still much work to be done in thisstudy we can obtain the following conclusion from theexperimental results First it is meaningful and feasible thatthe eNose device can identify hepatocellular carcinoma fromhealthy controls However there are still many difficulties toovercome in the clinical application of eNose Secondly theimproved feature optimization algorithm is indeed benefi-cial to improve the detection performance to some extent asshown by 10 times the average results of the two differentclassifiers From Tables 2 and 3 we can find both classifiersapplying the improved optimization algorithm Pearson-BPSO all outperforming the other optimization algorithms

5 Conclusion

Different gases have different response curves which lead todifferent sensor measurement signals and different exhala-tion signals After collecting the human exhalation signalthrough an electronic nose it is difficult to distinguish thedata of hepatocellular carcinoma patients from other healthdata intuitively and effectively because of the great differ-ences between individuals and at different times +e ex-traction of waveform features is helpful in finding morepotential information However high-dimensional featuresmay lead to the decline of classification accuracy and slowcalculation so feature optimization has become a researchhotspot

In the paper an improved feature optimization algo-rithm Pearson-BPSO is proposed based on binary particleswarm optimization (BPSO) for the ldquotwo-classificationrdquo taskof distinguishing hepatocellular carcinoma patients andhealthy people by breath Based on the Pearson coefficient ofthe relationship between the quantifiable feature and thelabel the algorithm preliminarily sifts the features optimizesthe feature set to minimize the KNN classification recog-nition rate and feature dimension improves the classifica-tion accuracy of the algorithm and reduces the amount ofdata Compared with the traditional BPSO algorithm andPCA algorithm this algorithm improves the classificationperformance to a certain extent and is conducive to im-proving the classification accuracy and detection speed ofelectronic nose detection

6 Future Work

In the next step we can further analyze the correlationbetween the features and then effectively combine it with thismethod to search the optimal subset more directionally andimprove the accuracy of the classifier We will also applymore advanced algorithms and continuously optimize theimproved feature optimization method Pearson-BPSO toachieve more stable and better classification results [20]

Data Availability

+e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

+e authors declare that they have no conflicts of interest

Acknowledgments

+is research was financially supported by the Key Programof National Natural Science Foundation of China (Grant no81830052) and Shanghai Municipal Education Commission(Class II Plateau Disciplinary Construction Program ofMedical Technology of SUMHS 2018ndash2020) +e datacollection work was completed by the team from ShanghaiJiao Tong University and Shanghai University of Technol-ogy And the electronic nose department was supported bythe German UST Company

References

[1] F Bray J Ferlay I Soerjomataram R L Siegel L A Torreand A Jemal ldquoGlobal cancer statistics 2018 GLOBOCANestimates of incidence and mortality worldwide for 36 cancersin 185 countriesrdquo CA A Cancer Journal for Clinicians vol 68no 6 pp 394ndash424 2018

[2] L James and Global burden of disease cancer collaborationldquoGlobal regional and national cancer incidence mortalityyears of life lost years lived with disability and disability-adjusted life-years for 29 cancer groups 1990 to a systematicanalysis for the global burden of disease studyrdquo Journal JAMAOncology vol 4 no 11 pp 1553ndash1568 2018

[3] A S Bannaga H Tyagi E Daulton J A Covington andR P Arasaradnam ldquoExploratory study using urinary volatileorganic compounds for the detection of hepatocellular car-cinomardquo Molecules vol 26 no 9 p 2447 2021

[4] M-A Galen A-M Lou-Anne G David et al ldquoBreathmetabolomics provides an accurate and noninvasive approachfor screening cirrhosis primary and secondary liver tumorrdquoJournal of Hepatology communications vol 4 no 7pp 1041ndash1055 2020

[5] X Yu D Zhan L Liu H Lv L Xu and J Du ldquoA privacy-preserving cross-domain healthcare wearables recommen-dation algorithm based on domain-dependent and domain-independent feature fusionrdquo IEEE Journal of Biomedical andHealth Informatics vol 9 pp 89344ndash89359 2021

[6] C D TONG and X H SHI ldquoMutual information based onPCA algorithm with application in process monitoringrdquoJournal of CIESC vol 66 no 10 pp 4101ndash4106 2015

[7] Y J Lu and D Y Li ldquoFeature selection algorithm based onlabel correlationrdquo Pattern Recognition and Artificial Intelli-gence vol 33 no 8 pp 716ndash722 2020

[8] B Liu M D Cai and Y C Bo ldquoA feature extraction andclassification algorithm based on PSO-CSP-SVM for motorimagery EEG signalsrdquo Journal of Central South University(Science and Technology) vol 51 no 10 pp 2855ndash2866 2020

[9] L MA C W LU Q H GU and S L RUAN ldquoParticle swarmoptimization with search operator of improved pigeon-in-spired algorithmrdquo Journal of Pattern Recognition and Arti-ficial Intelligence vol 31 no 10 pp 909ndash920 2018

[10] J LIN L Xu and L LIU ldquoFeature selection method based onSVM-RFE and particle swarm optimizationrdquo Journal ofChinese Computer Systems vol 36 no 8 pp 1865ndash1868 2015

[11] C G Waltman T A T Marcelissen andJ G H van Roermund ldquoExhaled-breath testing for prostatecancer based on volatile organic compound profiling using an

8 Mobile Information Systems

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9

electronic nose device (aeonose) a preliminary reportrdquo Eu-ropean Urology Focus vol 6 no 6 pp 1220ndash1225 2020

[12] L J Hao and G Huang ldquoResearch on non-invasive detectionmethod of liver cancer by respiratory based on electronicnoserdquo Journal of Transducer and Microsystem Technologiesvol 39 no 4 pp 46ndash48 2020

[13] V Andreas W Katharina K Tobias et al ldquoDetecting can-nabis use on the human skin surface via an electronic nosesystemrdquo Journal of Sensors vol 14 pp 13256ndash13272 2014

[14] J Y Xie Z Z Wu and Q Q Zheng ldquoAn adaptive 2D featureselection algorithm based on information gain and personcorrelation coefficientrdquo Journal of Shaanxi Normal University(Natural Science Edition) vol 48 no 6 pp 69ndash81 2020

[15] Z B Zhu J F Tao and H L Ge ldquoPassive sonar targetclassification and recognition technique based on BPSO-KNNalgorithmrdquo Journal of Technical Acoustics vol 38 no 2pp 219ndash223 2019

[16] J D Han L Sun and S L Wang ldquoAndroid malware ap-plication detection method based on BPSO-NBrdquo Journal ofComputers and Modernization vol 4 pp 109ndash113 2017

[17] B Swanson L Fogg W Julion and M T Arrieta ldquoElectronicnose analysis of exhaled breath volatiles to identify lungcancer cases a systematic reviewrdquo Journal of the Association ofNurses in AIDS Care vol 31 no 1 pp 71ndash79 2020

[18] G Danila C Sare D A Mario R Veronica S Antonio andB Maurizia ldquoAn E-nose for the monitoring of severe liverimpairment a preliminary studyrdquo Journal of Sensor vol 19no 17 Article ID 3656 2019

[19] M Yu T Quan Q Peng X Yu and L Liu ldquoA model-basedcollaborate filtering algorithm based on stacked auto en-coderrdquo Neural Computing amp Applications vol 290 2021

[20] W Hamdy I Elansary A Darwish and A E Hassanien ldquoAnoptimized classification model for COVID-19 pandemicbased on convolutional neural networks and particle swarmoptimization algorithmrdquo Journal of Digital Transformationand Emerging vol 3 2021

Mobile Information Systems 9