10
Research Article Machine Learning-Based Multitarget Tracking of Motion in Sports Video Xueliang Zhang 1 and Fu-Qiang Yang 2 1 Department of PE and Art Education, Zhejiang Yuexiu University, Shaoxing, Zhejiang 312000, China 2 School of Data and Computer Science, Shandong Women’s University, Jinan, Shandong 250002, China Correspondence should be addressed to Xueliang Zhang; 20131102@zyufl.edu.cn Received 30 January 2021; Revised 10 March 2021; Accepted 15 March 2021; Published 22 March 2021 Academic Editor: Wei Wang Copyright © 2021 Xueliang Zhang and Fu-Qiang Yang. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In this paper, we track the motion of multiple targets in sports videos by a machine learning algorithm and study its tracking technique in depth. In terms of moving target detection, the traditional detection algorithms are analysed theoretically as well as implemented algorithmically, based on which a fusion algorithm of four interframe difference method and background averaging method is proposed for the shortcomings of interframe difference method and background difference method. e fusion algorithm uses the learning rate to update the background in real time and combines morphological processing to correct the foreground, which can effectively cope with the slow change of the background. According to the requirements of real time, accuracy, and occupying less video memory space in intelligent video surveillance systems, this paper improves the streamlined version of the algorithm. e experimental results show that the improved multitarget tracking algorithm effectively improves the Kalman filter-based algorithm to meet the real-time and accuracy requirements in intelligent video surveillance scenarios. 1. Introduction Target detection and tracking is an emerging technology in computer vision, which has become the focus of many scholars in this field and is mainly used in intelligent video surveillance, traffic control, military surveillance, and other areas [1]. e implementation of target detection and tracking relies on a variety of technologies, such as digital image processing, artificial intelligence, etc. e content is to provide the database and prerequisites for subsequent pattern recognition, behaviour understanding, and image analysis through the detection and tracking function. Network construction has developed rapidly in recent decades, and technology in various computer fields is de- veloping rapidly, so people being eager to replace humans with computers to complete the boring panoply of work, information acquisition, and processing work is one of them [2]. Intelligent video surveillance can be completed to assist people to simplify the information acquisition process, extract key information, achieve information analysis and other functions, and intelligent video surveillance compared with traditional video surveillance, intelligent video sur- veillance can not only complete simple monitoring tasks but also can replace the human brain with a computer for in- formation analysis and behaviour understanding [3]. On this basis, the intelligent surveillance video can also analyse and understand the movement behaviour of the target. Intelli- gent video surveillance is autonomous, preventive, and early warning and can accomplish tasks such as tracking specific people as well as vehicles, detecting the location of unex- pected events, and daily monitoring, which has great practical value for modern society [4]. is series of intelligent video surveillance content can be divided into three major modules and motion target detection and identification, target tracking technology is the basic module of intelligent video surveillance, and the technology in the basic module is mostly derived from image processing [5]. e research in this direction improves the performance of moving target detection and tracking al- gorithms in video sequences in various aspects, such as robustness and real-time performance [6]. At this stage of intelligent video surveillance, facing real-time background Hindawi Complexity Volume 2021, Article ID 5533884, 10 pages https://doi.org/10.1155/2021/5533884

Machine Learning-Based Multitarget Tracking of Motion in

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning-Based Multitarget Tracking of Motion in

Research ArticleMachine Learning-Based Multitarget Tracking of Motion inSports Video

Xueliang Zhang 1 and Fu-Qiang Yang2

1Department of PE and Art Education Zhejiang Yuexiu University Shaoxing Zhejiang 312000 China2School of Data and Computer Science Shandong Womenrsquos University Jinan Shandong 250002 China

Correspondence should be addressed to Xueliang Zhang 20131102zyufleducn

Received 30 January 2021 Revised 10 March 2021 Accepted 15 March 2021 Published 22 March 2021

Academic Editor Wei Wang

Copyright copy 2021 Xueliang Zhang and Fu-Qiang Yang -is is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in anymedium provided the original work isproperly cited

In this paper we track the motion of multiple targets in sports videos by a machine learning algorithm and study its trackingtechnique in depth In terms of moving target detection the traditional detection algorithms are analysed theoretically as well asimplemented algorithmically based on which a fusion algorithm of four interframe difference method and background averagingmethod is proposed for the shortcomings of interframe difference method and background difference method -e fusionalgorithm uses the learning rate to update the background in real time and combines morphological processing to correct theforeground which can effectively cope with the slow change of the background According to the requirements of real timeaccuracy and occupying less video memory space in intelligent video surveillance systems this paper improves the streamlinedversion of the algorithm-e experimental results show that the improved multitarget tracking algorithm effectively improves theKalman filter-based algorithm to meet the real-time and accuracy requirements in intelligent video surveillance scenarios

1 Introduction

Target detection and tracking is an emerging technology incomputer vision which has become the focus of manyscholars in this field and is mainly used in intelligent videosurveillance traffic control military surveillance and otherareas [1] -e implementation of target detection andtracking relies on a variety of technologies such as digitalimage processing artificial intelligence etc -e content is toprovide the database and prerequisites for subsequentpattern recognition behaviour understanding and imageanalysis through the detection and tracking function

Network construction has developed rapidly in recentdecades and technology in various computer fields is de-veloping rapidly so people being eager to replace humanswith computers to complete the boring panoply of workinformation acquisition and processing work is one of them[2] Intelligent video surveillance can be completed to assistpeople to simplify the information acquisition processextract key information achieve information analysis andother functions and intelligent video surveillance compared

with traditional video surveillance intelligent video sur-veillance can not only complete simple monitoring tasks butalso can replace the human brain with a computer for in-formation analysis and behaviour understanding [3] On thisbasis the intelligent surveillance video can also analyse andunderstand the movement behaviour of the target Intelli-gent video surveillance is autonomous preventive and earlywarning and can accomplish tasks such as tracking specificpeople as well as vehicles detecting the location of unex-pected events and daily monitoring which has greatpractical value for modern society [4]

-is series of intelligent video surveillance content canbe divided into three major modules and motion targetdetection and identification target tracking technology is thebasic module of intelligent video surveillance and thetechnology in the basic module is mostly derived from imageprocessing [5] -e research in this direction improves theperformance of moving target detection and tracking al-gorithms in video sequences in various aspects such asrobustness and real-time performance [6] At this stage ofintelligent video surveillance facing real-time background

HindawiComplexityVolume 2021 Article ID 5533884 10 pageshttpsdoiorg10115520215533884

update and the effectiveness of moving target detection andtracking problems arising frommutual occlusion of multipletargets need to be solved and improved so the research andimprovement of moving target detection and trackingtechnology in video images are of great significance and willalso have a positive economic impact on the future of in-telligent video surveillance system industry [7]

Given the above background this topic focuses on thestudy and exploration of motion target detection andtracking technology to provide the theoretical basis andtechnical support for intelligent video surveillance systems

2 Related Research Work

Many foreign well-known universities and research insti-tutions have conducted continuous and in-depth researchon motion target tracking technology [8] So far greatprogress has been made in the field of motion target trackingtechnology For example Song et al conducted an in-depthstudy of the interframe difference algorithm and proposedan effectively improved algorithm [9] And a research groupled by De et al has been working on dynamic targets andthey improved the algorithm of the traditional backgroundmodel [10] Jiang et al improved the background modelalgorithm and improved the background model building asan adaptive process [11] -e National Science Foundationhas conducted in-depth research on detection and trackingtechniques for targets in complex environments and hasdone a lot of work in this area [12] Vehicle and motiontracking have been extensively researched and developed bya team of researchers at the University of Reading UK [13]Scholars at the University of Maryland have developed areal-time visual monitoring system that enables not only thelocalization of human body parts but also the tracking ofmultiple people [14] Meanwhile international academicjournals and academic conferences have made importantcontributions to the development of motion target trackingtechnology

Pourshamsi et al proposed a region-based fully con-volutional neural network to integrate the detection resultswith the tracker output to produce more tracker-friendlycandidates for the problem of possible unreliable targets inthe detector output [15] Mittal et al were the first to proposemodelling the data association problem as a cost networkflow with nonoverlapping constraints and solving theminimum-cost flow in this network to find the globallyoptimal trajectory association for multiple objectives [16]-e global optimal greedy algorithm also modelled using agraph-theoretic approach solves the data association usingthe K shortest path By modelling the trajectory-detectionassociation problem as a linear programming problemGhamisi et al used the simplex method for solving andapplying it to the tracking of multiple feature points in stereovision [17] Zhang et al formulated the multiobjectivetracking as a continuous energy function minimizationproblem and approximated the global optimal associationsolution by finding the strong local minima through theconjugate gradient method [18] Yuan and Pu proposed todesign energy functions to effectively mine the sparse

epistasis model of detection results and POI algorithmscombined with motion re-retrieval ideas in motion rei-dentification applications GoogLeNet network was used totrain on a large-scale motion reidentification dataset toobtain extraction parameters for motion epistemic featuresand cosine distance was used to measure the similaritybetween different motion epistemic features in the testingphase which effectively improved the performance ofmultitarget tracking [19]

In related research people have done less research on themethod adopted in this article and the research in thisarticle has certain advantages -is paper focuses on thevisual multitarget tracking problem in video surveillancesystem based on the study and analysis of related literatureat home and abroad the most representative practical valueand research significance of the motion target as the maintracking object in this paper based on the demand ofadaptive target number change and real time this papermainly researches the online multitarget tracking algorithmbased on motion target -e purpose of this paper is torealize the fast identification and localization of multiplemotion targets in a complex environment to track multipletargets in the sequence at the same time maintain theconsistency of multiple target identities between framesaccurately and get the motion trajectory of multiple motiontargets and to lay the foundation for subsequent tasks suchas abnormal warning behaviour analysis and trafficmonitoring

3 Research and Analysis of MachineLearning for Motion Multitarget Tracking inSports Video

31 Machine Learning Algorithm for Multiobjective TrackingDigital image processing consists of a series of informationacquisition and processing techniques in images and videossuch as denoising enhancement recovery segmentationfeature extraction and target recognition In this paper westudy the target detection and tracking algorithms undersurveillance video and the preliminary research work in-volves image denoising image enhancement and mor-phological transformation in image preprocessing -eseimage preprocessing techniques are briefly introduced andanalysed in the following

Median filtering uses a nonlinear digital filteringmethod and both it and the homomorphic filtering methodcan enhance the image Median filtering can improve theproblem of image blurring brought by linear filtering undersome conditions [20] It is implemented by first sampling theinput signal and discriminating the result of the samplingWhen the discriminated result is representative of the signalthe values in the observation window are sorted and themiddle value of the sorted series is taken as the output thena new sample is obtained and finally the above operation iscycled Median filtering has a good suppression effect forpretzel noise and has a good removal effect for the high-frequency part in Fourier space Since the grey value of thehigh-frequency component in the image signal changes

2 Complexity

rapidly median filtering can play a role in eliminating someof the components It also has a relatively good effect in thesuppression of edge blurring occasions that preserve theedge characteristics

G(x y) Med f(x + k y + l) (k l) isinW1113864 1113865 (1)

Gaussian filtering is a linear smoothing filter that caneffectively smooth images and suppress Gaussian noise -etemplate coefficients in Gaussian filtering are variedaccording to the distance from the centre of the templateand the mean value is used as the output value

Zero-mean discrete Gaussian filter function is

G(x y) exp minusx2

+ 1σ2

1113888 1113889 (2)

Two-dimensional zero-mean defeated 6-s function filteridle number is

G(x y) exp minusx2

+ y2

+ 12σ2

1113888 1113889 (3)

-e general Gaussian template is 3lowast 3 or 5lowast 5 and thefollowing is the distribution of the weight valuesrespectively

132

times

2 4 2

4 8 4

2 4 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

1556

times

2 8 14 8 2

8 32 52 32 8

14 52 82 52 14

8 32 52 32 8

2 8 14 8 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(4)

-e specific implementation is based on the previousstate sequence the optimal estimation of the next state of thesystem through the state equation and the observationequation Because there are often disturbances such as noisein the system in the observed data the optimal estimationcan also be regarded as a filtering process -e optimal valueof the present moment is obtained by combining themeasured value of the current moment with the measuredvalue of the previous moment and the noise error to find theoptimal solution and update it and similarly the values ofthe velocity and position of the next moment can be ob-tained by iteration -e equations underlying the Kalmanfilter are as follows

xi Ailowast(iminus1)xi + wi

zi Hilowast(iminus1)xi + vi(5)

For a general neural network each pixel of the image isusually connected to each neuron in the fully connectedlayer while a convolutional neural network connects eachhidden node to only a local region of the image thus re-ducing the number of parameters to train In the con-volutional layer of a convolutional neural network theweights corresponding to the neurons are the same and the

number of trained parameters can be reduced because theweights are the same Suppose that in a target detectionsystem A B C are used and good results are achieved but atthis time you do not know which one of A B C plays a roleso you keep it A B remove C and experiment to see the roleof C in the entire system Shared weights and biases are alsoreferred to as convolutional kernels or filters Since theimages to be processed are often larger than large the mostimportant thing is to get effective image features withoutanalysing the original image during the actual processing-erefore a simple convolutional neural network structurecan be used like the idea of image compression for con-volutional operations where the image is resized by adownsampling process as shown in Figure 1

-eoretically the fully connected feedforward neuralnetwork has rich feature representation capability with thehelp of which it can be well used for the analysis of imageimages However there are several problems in practical usesuch as the difficulty in adapting to the variability of imagesand the complexity of computation Convolutional neuralnetworks effectively solve the problems encountered by fullconnectivity in terms of local connectivity weight sharingand downsampling -e local connection is such that eachneuron is not connected to every neuron in the previous layer-e use of local connectivity effectively reduces the number ofparameters in the training process Weight sharing allows agroup of neuron connections to share weights which alsoreduces the number of parameters and speeds up trainingDownsampling uses pooling to reduce the number of samplesper layer of the neural network and to improve the robustnessof the model For tasks related to image processing con-volutional neural networks increase the speed of training andensure processing results by retaining feature parameters andreducing unnecessary parameters

-e convolutional layer reduces the connectivity ofneurons by noncomplete connectivity thus reducing thecomputational complexity but the number of neurons is notsignificantly reduced the dimensionality of subsequentcomputations is still high and the overfitting problem caneasily occur To solve this problem a pooling layer forpooling operations is introduced also known as a down-sampling layer -e downsampling layer can greatly reducethe dimensionality of the features thus reducing the com-putational effort and avoiding the overfitting problem

Although the tracking targets are different they shouldall have some commonalities which need to be learned bythe network However training with tracking data is difficultbecause the same object which is the target in one videosequence may be the background in another video se-quence -e targets in each video sequence are completelydifferent and various challenges arise such as occlusiondeformation and rotation Many existing network modelsare mainly for tasks such as target detection classificationand segmentation and because they must divide manyclasses of targets these networks are large and increase thecomputational burden In the tracking problem the networkneeds to be divided into only two categories target andbackground Usually the tracked targets are small so a verylarge network is not needed to solve the tracking problem

Complexity 3

MDNet is proposed to address the abovementioned pointsof the current situation by learning the commonality of thesetargets through a multidomain learning network structure

In the detection of moving targets there are differentnoises in the video image sequences and the binary imagesobtained after thresholding usually have breaks and holes inthe target area which are processed using open and closedoperations to improve the detection accuracy

-e essence of the open operation is to erode and expandthe image which eliminates burrs and scatter in the image-e closed operation is the opposite of the open operation interms of implementation steps and can be used to fill in thehole effects that exist in the image

Take A as the target object and B as the structure element(3lowast 3 or 5lowast 5) -e open operation on A with structuralelement B is denoted as

A middot B (AotimesB)oplusB (6)

Most of the visible light in nature can be composed ofthree fundamental colours and the RGB colour space is alsoconstructed based on the principle of three fundamentalcolours with the three components of the RGB colour spacerepresenting the x- y- z-axes with the origin representingblack and the dotted endpoints along the square in the figurerepresenting white Any colour light D can be composed ofR G and B in different proportions

D R(p) + G(p) + B(p) (7)

From the schematic diagram of the two-frame differ-ential method we can see that the algorithm has thecharacteristics of simple implementation steps a smallnumber of operations and small requirements for system

implementation Because the core idea in the algorithm is touse the neighbouring two frames to do image differencingso the camera is subject to fewer light and shadow inter-ference factors more suitable for the actual scene is thewarehouse class of burglary video surveillance but at thesame time by the core idea of the algorithm the algorithm iseasy to contrast between the images to a lesser degree andthe difference is not obvious in the case of the resultingdifficulty to extract the feature value of the moving target inthe image Since the time difference between two frames istoo small and the sampling effect of the difference algorithmbetween two frames is related to the movement speed of themoving target when the movement speed is small or eventhe moving target is stationary the phenomenon of notdetecting the moving target can occur

Dk(x y) fk+1(x + k y + l) minus fk(x y)1113868111386811138681113868

1113868111386811138681113868 (8)

-e core idea of the average background method is tocalculate the mean and standard deviation of each pixel asthe background model

Average(x y) 1n

1113944

n

i1fi+1(x + k y + l) + fi(x y)

11138681113868111386811138681113868111386811138681113868 (9)

When tracking motion targets with complex shapes thetarget model created by kernel tracking cannot accuratelyrepresent the target and the silhouette tracking method candescribe the adaptive shape -e silhouette tracking andkernel tracking algorithms are the same in idea both usethe initial frame to build the target model Silhouettetracking extracts the target contour based on the estab-lished target model and then uses contour matching tosearch the target area of the subsequent video frames

Template frame

Detection frame

CNN

CNN

6 times 6 times 256127 times 127 times 3

255 times 255 times 3

22 times 22 times 256

Conv

Conv

Conv

Conv

4 times 4 times (2k times 256)

20 times 20 times 256

4 times 4 times (4k times 256)

20 times 20 times 256 17 times 17 times 4kRegression branch

Adw

Classificationbranch

17 times 17 times 2k

Adw

Positive

One group k groups

One group k groups

Complex radar return Matched

filteringSquare-law

detector

In-phaseQuadrature

Lagging window

x1

x1

x1

xnxn

xn

DBSCAN clustering algorithm

Clutter background level estimation

Monte carlo methodPfa

Trained ANN model

v

aZ

T = aZ

ComparatorH0H1

xn xnxnxn

xnxn

Leading window

dx dy dw dh

Under testReference

Guard

Negative

Figure 1 Network structure of machine learning algorithm for multitarget tracking

4 Complexity

Common silhouette tracking methods are divided into twocategories contour tracking methods and shape matchingmethods-e silhouette tracking method applies to the caseof partial overlap of target silhouettes between consecutiveframes and the energy function and state-space model areused to estimate the position information of the targetsilhouette in the current frame -e contour tracking al-gorithm can adjust the centre of gravity and size of thecontour according to the changing appearance of themoving target the shape matching method is to search forthe target shape by matching the established target modelshape to the current video image All three tracking al-gorithms have their applicable occasion conditions andnow the three methods are compared and analysed seeTable 1

32 Multiobjective Scheme Design for Motion in Sports VideoA good multiobjective tracking dataset often determines agood or bad tracking algorithm To have a fair and objectivecomparison with other algorithms it is required that thedataset be complete and rich contain all kinds of specialcases and can reflect the accuracy and robustness of thealgorithm A complete multitarget tracking dataset shouldcontain (1) different viewpoints for example SORT-basedtarget tracking has a high tracking accuracy for fixedviewpoints under low-altitude surveillance video but a lowaccuracy when the video is changed to a moving viewpointfor in-vehicle video (2) target scenes with different densitiesif there are too few targets then it cannot reflect the algo-rithmrsquos handling of target occlusion target deformation andsimilar target interference (3) different light intensity thesame target in computer vision the pixel value of thecorresponding position in a rainy day and a sunny day willproduce a large difference we can also use this difference toevaluate the robustness of an algorithm (4) different videoframe rate the video frame size is sensitive to the relevantparameters of the corresponding model of the multitargettracking algorithm and similarly we can also use this dif-ference to evaluate the robustness of an algorithm -e costof our calculation method is about 80 of other studies andthe results can be increased by 5

To make a comprehensive and fair comparison of dif-ferent algorithms this paper uses a publicly available datasetas a benchmark dataset for the performance measurement ofmultitarget tracking algorithms [21] -e dataset is thedataset used to measure the criteria of multitarget trackingmethods in themultitarget tracking series-e dataset of thisclass contains a rich set of scenarios to meet the require-ments of various situations mentioned above -e datasetcontains a total of 14 video sequences of which 7 videosequences form the training set and the remaining 7 se-quences form the test set each of which is very different fromthe other It may be a top view a flat view or a top view -ephotos are taken by high-resolution cameras and each framecontains multiple detection frames on average and eachsequence contains multiple target numbers

Generative models more widely used in image super-resolution face recognition and generation image

complementation and style conversion etc while the ap-plication of generative models is still in its infancy in sceneswith less detailed image features and more backgroundinterference like multitarget tracking For this kind of imageprocessing problems with complex interference informa-tion a low percentage of positive sample pixels in the field ofview and inconspicuous target features it is usually difficultfor the generative model to fully learn the distribution of keydata in the input samples and the complex scene infor-mation has a large impact on the accuracy of the generativemodel sample authenticity determination [22] -e multi-target tracker may encounter the problem of driftingmisfollowing or losing the identity due to the change ofidentity caused by the deformation of the target in theprocess of tracking specific targets [23ndash25] In the onlinetracking process the stability and accuracy of the tracker canbe effectively improved if more diverse samples (egsamples of the same target in multiple views and multiplemotion states) can be generated based on a limited numberof samples of a specific target in the history frame-ereforethis paper proposes a generative model based on conditionalvariational self-encoder-conditional generation adversarialnetwork and analyses and adjusts the target to be tracked in amultitarget tracking environment to generate multiviewsamples for a specific target to expand the training space ofthat target to enhance the performance of the trackingalgorithm

Variational self-encoder (VAE) shown in Figure 2 is atype of self-encoder and is an improved structure for gen-erative tasks In essence it adds Gaussian noise to thestandard self-encoder structure on top of the hidden variableof the output of the encoder (the encoder used to calculatethe mean of the distribution) so that the decoder can bemore stable to noise changes during the training process andadds KL scatter as a regular term to constrain the zero meanfor the output of the decoder In addition an additionalencoder is added to dynamically adjust the strength ofGaussian noise by calculating the variance of the inputdistribution ie during the training process of the decoderwhen the reconstruction error is larger than the KL scatterthe noise is appropriately reduced to reduce the fittingdifficulty while when the reconstruction error is smallerthan the KL scatter the noise is appropriately increased toincrease the fitting difficulty so as to improve the decoderrsquosability to generate samples

-e principle of variational self-encoder is based onvariational inference and the overall structure is a completeprobabilistic inferential model which can estimate the edgeprobability of the input image and maximize the edgeprobability to adjust the parameters of the model fortraining it can also estimate the Gaussian posterior prob-ability of the hidden variable to realize the decoding processfrom the hidden variable to the generated sample Based onthe theory of variational inference the lower bound estimatecan be obtained by calculating the parameters of the vari-ational lower bound and this estimate can be optimizedusing the standard stochastic gradient descent method iebackpropagation can train the model -is also means thatthe two-part structure of the variational self-encoder can be

Complexity 5

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 2: Machine Learning-Based Multitarget Tracking of Motion in

update and the effectiveness of moving target detection andtracking problems arising frommutual occlusion of multipletargets need to be solved and improved so the research andimprovement of moving target detection and trackingtechnology in video images are of great significance and willalso have a positive economic impact on the future of in-telligent video surveillance system industry [7]

Given the above background this topic focuses on thestudy and exploration of motion target detection andtracking technology to provide the theoretical basis andtechnical support for intelligent video surveillance systems

2 Related Research Work

Many foreign well-known universities and research insti-tutions have conducted continuous and in-depth researchon motion target tracking technology [8] So far greatprogress has been made in the field of motion target trackingtechnology For example Song et al conducted an in-depthstudy of the interframe difference algorithm and proposedan effectively improved algorithm [9] And a research groupled by De et al has been working on dynamic targets andthey improved the algorithm of the traditional backgroundmodel [10] Jiang et al improved the background modelalgorithm and improved the background model building asan adaptive process [11] -e National Science Foundationhas conducted in-depth research on detection and trackingtechniques for targets in complex environments and hasdone a lot of work in this area [12] Vehicle and motiontracking have been extensively researched and developed bya team of researchers at the University of Reading UK [13]Scholars at the University of Maryland have developed areal-time visual monitoring system that enables not only thelocalization of human body parts but also the tracking ofmultiple people [14] Meanwhile international academicjournals and academic conferences have made importantcontributions to the development of motion target trackingtechnology

Pourshamsi et al proposed a region-based fully con-volutional neural network to integrate the detection resultswith the tracker output to produce more tracker-friendlycandidates for the problem of possible unreliable targets inthe detector output [15] Mittal et al were the first to proposemodelling the data association problem as a cost networkflow with nonoverlapping constraints and solving theminimum-cost flow in this network to find the globallyoptimal trajectory association for multiple objectives [16]-e global optimal greedy algorithm also modelled using agraph-theoretic approach solves the data association usingthe K shortest path By modelling the trajectory-detectionassociation problem as a linear programming problemGhamisi et al used the simplex method for solving andapplying it to the tracking of multiple feature points in stereovision [17] Zhang et al formulated the multiobjectivetracking as a continuous energy function minimizationproblem and approximated the global optimal associationsolution by finding the strong local minima through theconjugate gradient method [18] Yuan and Pu proposed todesign energy functions to effectively mine the sparse

epistasis model of detection results and POI algorithmscombined with motion re-retrieval ideas in motion rei-dentification applications GoogLeNet network was used totrain on a large-scale motion reidentification dataset toobtain extraction parameters for motion epistemic featuresand cosine distance was used to measure the similaritybetween different motion epistemic features in the testingphase which effectively improved the performance ofmultitarget tracking [19]

In related research people have done less research on themethod adopted in this article and the research in thisarticle has certain advantages -is paper focuses on thevisual multitarget tracking problem in video surveillancesystem based on the study and analysis of related literatureat home and abroad the most representative practical valueand research significance of the motion target as the maintracking object in this paper based on the demand ofadaptive target number change and real time this papermainly researches the online multitarget tracking algorithmbased on motion target -e purpose of this paper is torealize the fast identification and localization of multiplemotion targets in a complex environment to track multipletargets in the sequence at the same time maintain theconsistency of multiple target identities between framesaccurately and get the motion trajectory of multiple motiontargets and to lay the foundation for subsequent tasks suchas abnormal warning behaviour analysis and trafficmonitoring

3 Research and Analysis of MachineLearning for Motion Multitarget Tracking inSports Video

31 Machine Learning Algorithm for Multiobjective TrackingDigital image processing consists of a series of informationacquisition and processing techniques in images and videossuch as denoising enhancement recovery segmentationfeature extraction and target recognition In this paper westudy the target detection and tracking algorithms undersurveillance video and the preliminary research work in-volves image denoising image enhancement and mor-phological transformation in image preprocessing -eseimage preprocessing techniques are briefly introduced andanalysed in the following

Median filtering uses a nonlinear digital filteringmethod and both it and the homomorphic filtering methodcan enhance the image Median filtering can improve theproblem of image blurring brought by linear filtering undersome conditions [20] It is implemented by first sampling theinput signal and discriminating the result of the samplingWhen the discriminated result is representative of the signalthe values in the observation window are sorted and themiddle value of the sorted series is taken as the output thena new sample is obtained and finally the above operation iscycled Median filtering has a good suppression effect forpretzel noise and has a good removal effect for the high-frequency part in Fourier space Since the grey value of thehigh-frequency component in the image signal changes

2 Complexity

rapidly median filtering can play a role in eliminating someof the components It also has a relatively good effect in thesuppression of edge blurring occasions that preserve theedge characteristics

G(x y) Med f(x + k y + l) (k l) isinW1113864 1113865 (1)

Gaussian filtering is a linear smoothing filter that caneffectively smooth images and suppress Gaussian noise -etemplate coefficients in Gaussian filtering are variedaccording to the distance from the centre of the templateand the mean value is used as the output value

Zero-mean discrete Gaussian filter function is

G(x y) exp minusx2

+ 1σ2

1113888 1113889 (2)

Two-dimensional zero-mean defeated 6-s function filteridle number is

G(x y) exp minusx2

+ y2

+ 12σ2

1113888 1113889 (3)

-e general Gaussian template is 3lowast 3 or 5lowast 5 and thefollowing is the distribution of the weight valuesrespectively

132

times

2 4 2

4 8 4

2 4 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

1556

times

2 8 14 8 2

8 32 52 32 8

14 52 82 52 14

8 32 52 32 8

2 8 14 8 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(4)

-e specific implementation is based on the previousstate sequence the optimal estimation of the next state of thesystem through the state equation and the observationequation Because there are often disturbances such as noisein the system in the observed data the optimal estimationcan also be regarded as a filtering process -e optimal valueof the present moment is obtained by combining themeasured value of the current moment with the measuredvalue of the previous moment and the noise error to find theoptimal solution and update it and similarly the values ofthe velocity and position of the next moment can be ob-tained by iteration -e equations underlying the Kalmanfilter are as follows

xi Ailowast(iminus1)xi + wi

zi Hilowast(iminus1)xi + vi(5)

For a general neural network each pixel of the image isusually connected to each neuron in the fully connectedlayer while a convolutional neural network connects eachhidden node to only a local region of the image thus re-ducing the number of parameters to train In the con-volutional layer of a convolutional neural network theweights corresponding to the neurons are the same and the

number of trained parameters can be reduced because theweights are the same Suppose that in a target detectionsystem A B C are used and good results are achieved but atthis time you do not know which one of A B C plays a roleso you keep it A B remove C and experiment to see the roleof C in the entire system Shared weights and biases are alsoreferred to as convolutional kernels or filters Since theimages to be processed are often larger than large the mostimportant thing is to get effective image features withoutanalysing the original image during the actual processing-erefore a simple convolutional neural network structurecan be used like the idea of image compression for con-volutional operations where the image is resized by adownsampling process as shown in Figure 1

-eoretically the fully connected feedforward neuralnetwork has rich feature representation capability with thehelp of which it can be well used for the analysis of imageimages However there are several problems in practical usesuch as the difficulty in adapting to the variability of imagesand the complexity of computation Convolutional neuralnetworks effectively solve the problems encountered by fullconnectivity in terms of local connectivity weight sharingand downsampling -e local connection is such that eachneuron is not connected to every neuron in the previous layer-e use of local connectivity effectively reduces the number ofparameters in the training process Weight sharing allows agroup of neuron connections to share weights which alsoreduces the number of parameters and speeds up trainingDownsampling uses pooling to reduce the number of samplesper layer of the neural network and to improve the robustnessof the model For tasks related to image processing con-volutional neural networks increase the speed of training andensure processing results by retaining feature parameters andreducing unnecessary parameters

-e convolutional layer reduces the connectivity ofneurons by noncomplete connectivity thus reducing thecomputational complexity but the number of neurons is notsignificantly reduced the dimensionality of subsequentcomputations is still high and the overfitting problem caneasily occur To solve this problem a pooling layer forpooling operations is introduced also known as a down-sampling layer -e downsampling layer can greatly reducethe dimensionality of the features thus reducing the com-putational effort and avoiding the overfitting problem

Although the tracking targets are different they shouldall have some commonalities which need to be learned bythe network However training with tracking data is difficultbecause the same object which is the target in one videosequence may be the background in another video se-quence -e targets in each video sequence are completelydifferent and various challenges arise such as occlusiondeformation and rotation Many existing network modelsare mainly for tasks such as target detection classificationand segmentation and because they must divide manyclasses of targets these networks are large and increase thecomputational burden In the tracking problem the networkneeds to be divided into only two categories target andbackground Usually the tracked targets are small so a verylarge network is not needed to solve the tracking problem

Complexity 3

MDNet is proposed to address the abovementioned pointsof the current situation by learning the commonality of thesetargets through a multidomain learning network structure

In the detection of moving targets there are differentnoises in the video image sequences and the binary imagesobtained after thresholding usually have breaks and holes inthe target area which are processed using open and closedoperations to improve the detection accuracy

-e essence of the open operation is to erode and expandthe image which eliminates burrs and scatter in the image-e closed operation is the opposite of the open operation interms of implementation steps and can be used to fill in thehole effects that exist in the image

Take A as the target object and B as the structure element(3lowast 3 or 5lowast 5) -e open operation on A with structuralelement B is denoted as

A middot B (AotimesB)oplusB (6)

Most of the visible light in nature can be composed ofthree fundamental colours and the RGB colour space is alsoconstructed based on the principle of three fundamentalcolours with the three components of the RGB colour spacerepresenting the x- y- z-axes with the origin representingblack and the dotted endpoints along the square in the figurerepresenting white Any colour light D can be composed ofR G and B in different proportions

D R(p) + G(p) + B(p) (7)

From the schematic diagram of the two-frame differ-ential method we can see that the algorithm has thecharacteristics of simple implementation steps a smallnumber of operations and small requirements for system

implementation Because the core idea in the algorithm is touse the neighbouring two frames to do image differencingso the camera is subject to fewer light and shadow inter-ference factors more suitable for the actual scene is thewarehouse class of burglary video surveillance but at thesame time by the core idea of the algorithm the algorithm iseasy to contrast between the images to a lesser degree andthe difference is not obvious in the case of the resultingdifficulty to extract the feature value of the moving target inthe image Since the time difference between two frames istoo small and the sampling effect of the difference algorithmbetween two frames is related to the movement speed of themoving target when the movement speed is small or eventhe moving target is stationary the phenomenon of notdetecting the moving target can occur

Dk(x y) fk+1(x + k y + l) minus fk(x y)1113868111386811138681113868

1113868111386811138681113868 (8)

-e core idea of the average background method is tocalculate the mean and standard deviation of each pixel asthe background model

Average(x y) 1n

1113944

n

i1fi+1(x + k y + l) + fi(x y)

11138681113868111386811138681113868111386811138681113868 (9)

When tracking motion targets with complex shapes thetarget model created by kernel tracking cannot accuratelyrepresent the target and the silhouette tracking method candescribe the adaptive shape -e silhouette tracking andkernel tracking algorithms are the same in idea both usethe initial frame to build the target model Silhouettetracking extracts the target contour based on the estab-lished target model and then uses contour matching tosearch the target area of the subsequent video frames

Template frame

Detection frame

CNN

CNN

6 times 6 times 256127 times 127 times 3

255 times 255 times 3

22 times 22 times 256

Conv

Conv

Conv

Conv

4 times 4 times (2k times 256)

20 times 20 times 256

4 times 4 times (4k times 256)

20 times 20 times 256 17 times 17 times 4kRegression branch

Adw

Classificationbranch

17 times 17 times 2k

Adw

Positive

One group k groups

One group k groups

Complex radar return Matched

filteringSquare-law

detector

In-phaseQuadrature

Lagging window

x1

x1

x1

xnxn

xn

DBSCAN clustering algorithm

Clutter background level estimation

Monte carlo methodPfa

Trained ANN model

v

aZ

T = aZ

ComparatorH0H1

xn xnxnxn

xnxn

Leading window

dx dy dw dh

Under testReference

Guard

Negative

Figure 1 Network structure of machine learning algorithm for multitarget tracking

4 Complexity

Common silhouette tracking methods are divided into twocategories contour tracking methods and shape matchingmethods-e silhouette tracking method applies to the caseof partial overlap of target silhouettes between consecutiveframes and the energy function and state-space model areused to estimate the position information of the targetsilhouette in the current frame -e contour tracking al-gorithm can adjust the centre of gravity and size of thecontour according to the changing appearance of themoving target the shape matching method is to search forthe target shape by matching the established target modelshape to the current video image All three tracking al-gorithms have their applicable occasion conditions andnow the three methods are compared and analysed seeTable 1

32 Multiobjective Scheme Design for Motion in Sports VideoA good multiobjective tracking dataset often determines agood or bad tracking algorithm To have a fair and objectivecomparison with other algorithms it is required that thedataset be complete and rich contain all kinds of specialcases and can reflect the accuracy and robustness of thealgorithm A complete multitarget tracking dataset shouldcontain (1) different viewpoints for example SORT-basedtarget tracking has a high tracking accuracy for fixedviewpoints under low-altitude surveillance video but a lowaccuracy when the video is changed to a moving viewpointfor in-vehicle video (2) target scenes with different densitiesif there are too few targets then it cannot reflect the algo-rithmrsquos handling of target occlusion target deformation andsimilar target interference (3) different light intensity thesame target in computer vision the pixel value of thecorresponding position in a rainy day and a sunny day willproduce a large difference we can also use this difference toevaluate the robustness of an algorithm (4) different videoframe rate the video frame size is sensitive to the relevantparameters of the corresponding model of the multitargettracking algorithm and similarly we can also use this dif-ference to evaluate the robustness of an algorithm -e costof our calculation method is about 80 of other studies andthe results can be increased by 5

To make a comprehensive and fair comparison of dif-ferent algorithms this paper uses a publicly available datasetas a benchmark dataset for the performance measurement ofmultitarget tracking algorithms [21] -e dataset is thedataset used to measure the criteria of multitarget trackingmethods in themultitarget tracking series-e dataset of thisclass contains a rich set of scenarios to meet the require-ments of various situations mentioned above -e datasetcontains a total of 14 video sequences of which 7 videosequences form the training set and the remaining 7 se-quences form the test set each of which is very different fromthe other It may be a top view a flat view or a top view -ephotos are taken by high-resolution cameras and each framecontains multiple detection frames on average and eachsequence contains multiple target numbers

Generative models more widely used in image super-resolution face recognition and generation image

complementation and style conversion etc while the ap-plication of generative models is still in its infancy in sceneswith less detailed image features and more backgroundinterference like multitarget tracking For this kind of imageprocessing problems with complex interference informa-tion a low percentage of positive sample pixels in the field ofview and inconspicuous target features it is usually difficultfor the generative model to fully learn the distribution of keydata in the input samples and the complex scene infor-mation has a large impact on the accuracy of the generativemodel sample authenticity determination [22] -e multi-target tracker may encounter the problem of driftingmisfollowing or losing the identity due to the change ofidentity caused by the deformation of the target in theprocess of tracking specific targets [23ndash25] In the onlinetracking process the stability and accuracy of the tracker canbe effectively improved if more diverse samples (egsamples of the same target in multiple views and multiplemotion states) can be generated based on a limited numberof samples of a specific target in the history frame-ereforethis paper proposes a generative model based on conditionalvariational self-encoder-conditional generation adversarialnetwork and analyses and adjusts the target to be tracked in amultitarget tracking environment to generate multiviewsamples for a specific target to expand the training space ofthat target to enhance the performance of the trackingalgorithm

Variational self-encoder (VAE) shown in Figure 2 is atype of self-encoder and is an improved structure for gen-erative tasks In essence it adds Gaussian noise to thestandard self-encoder structure on top of the hidden variableof the output of the encoder (the encoder used to calculatethe mean of the distribution) so that the decoder can bemore stable to noise changes during the training process andadds KL scatter as a regular term to constrain the zero meanfor the output of the decoder In addition an additionalencoder is added to dynamically adjust the strength ofGaussian noise by calculating the variance of the inputdistribution ie during the training process of the decoderwhen the reconstruction error is larger than the KL scatterthe noise is appropriately reduced to reduce the fittingdifficulty while when the reconstruction error is smallerthan the KL scatter the noise is appropriately increased toincrease the fitting difficulty so as to improve the decoderrsquosability to generate samples

-e principle of variational self-encoder is based onvariational inference and the overall structure is a completeprobabilistic inferential model which can estimate the edgeprobability of the input image and maximize the edgeprobability to adjust the parameters of the model fortraining it can also estimate the Gaussian posterior prob-ability of the hidden variable to realize the decoding processfrom the hidden variable to the generated sample Based onthe theory of variational inference the lower bound estimatecan be obtained by calculating the parameters of the vari-ational lower bound and this estimate can be optimizedusing the standard stochastic gradient descent method iebackpropagation can train the model -is also means thatthe two-part structure of the variational self-encoder can be

Complexity 5

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 3: Machine Learning-Based Multitarget Tracking of Motion in

rapidly median filtering can play a role in eliminating someof the components It also has a relatively good effect in thesuppression of edge blurring occasions that preserve theedge characteristics

G(x y) Med f(x + k y + l) (k l) isinW1113864 1113865 (1)

Gaussian filtering is a linear smoothing filter that caneffectively smooth images and suppress Gaussian noise -etemplate coefficients in Gaussian filtering are variedaccording to the distance from the centre of the templateand the mean value is used as the output value

Zero-mean discrete Gaussian filter function is

G(x y) exp minusx2

+ 1σ2

1113888 1113889 (2)

Two-dimensional zero-mean defeated 6-s function filteridle number is

G(x y) exp minusx2

+ y2

+ 12σ2

1113888 1113889 (3)

-e general Gaussian template is 3lowast 3 or 5lowast 5 and thefollowing is the distribution of the weight valuesrespectively

132

times

2 4 2

4 8 4

2 4 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

1556

times

2 8 14 8 2

8 32 52 32 8

14 52 82 52 14

8 32 52 32 8

2 8 14 8 2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(4)

-e specific implementation is based on the previousstate sequence the optimal estimation of the next state of thesystem through the state equation and the observationequation Because there are often disturbances such as noisein the system in the observed data the optimal estimationcan also be regarded as a filtering process -e optimal valueof the present moment is obtained by combining themeasured value of the current moment with the measuredvalue of the previous moment and the noise error to find theoptimal solution and update it and similarly the values ofthe velocity and position of the next moment can be ob-tained by iteration -e equations underlying the Kalmanfilter are as follows

xi Ailowast(iminus1)xi + wi

zi Hilowast(iminus1)xi + vi(5)

For a general neural network each pixel of the image isusually connected to each neuron in the fully connectedlayer while a convolutional neural network connects eachhidden node to only a local region of the image thus re-ducing the number of parameters to train In the con-volutional layer of a convolutional neural network theweights corresponding to the neurons are the same and the

number of trained parameters can be reduced because theweights are the same Suppose that in a target detectionsystem A B C are used and good results are achieved but atthis time you do not know which one of A B C plays a roleso you keep it A B remove C and experiment to see the roleof C in the entire system Shared weights and biases are alsoreferred to as convolutional kernels or filters Since theimages to be processed are often larger than large the mostimportant thing is to get effective image features withoutanalysing the original image during the actual processing-erefore a simple convolutional neural network structurecan be used like the idea of image compression for con-volutional operations where the image is resized by adownsampling process as shown in Figure 1

-eoretically the fully connected feedforward neuralnetwork has rich feature representation capability with thehelp of which it can be well used for the analysis of imageimages However there are several problems in practical usesuch as the difficulty in adapting to the variability of imagesand the complexity of computation Convolutional neuralnetworks effectively solve the problems encountered by fullconnectivity in terms of local connectivity weight sharingand downsampling -e local connection is such that eachneuron is not connected to every neuron in the previous layer-e use of local connectivity effectively reduces the number ofparameters in the training process Weight sharing allows agroup of neuron connections to share weights which alsoreduces the number of parameters and speeds up trainingDownsampling uses pooling to reduce the number of samplesper layer of the neural network and to improve the robustnessof the model For tasks related to image processing con-volutional neural networks increase the speed of training andensure processing results by retaining feature parameters andreducing unnecessary parameters

-e convolutional layer reduces the connectivity ofneurons by noncomplete connectivity thus reducing thecomputational complexity but the number of neurons is notsignificantly reduced the dimensionality of subsequentcomputations is still high and the overfitting problem caneasily occur To solve this problem a pooling layer forpooling operations is introduced also known as a down-sampling layer -e downsampling layer can greatly reducethe dimensionality of the features thus reducing the com-putational effort and avoiding the overfitting problem

Although the tracking targets are different they shouldall have some commonalities which need to be learned bythe network However training with tracking data is difficultbecause the same object which is the target in one videosequence may be the background in another video se-quence -e targets in each video sequence are completelydifferent and various challenges arise such as occlusiondeformation and rotation Many existing network modelsare mainly for tasks such as target detection classificationand segmentation and because they must divide manyclasses of targets these networks are large and increase thecomputational burden In the tracking problem the networkneeds to be divided into only two categories target andbackground Usually the tracked targets are small so a verylarge network is not needed to solve the tracking problem

Complexity 3

MDNet is proposed to address the abovementioned pointsof the current situation by learning the commonality of thesetargets through a multidomain learning network structure

In the detection of moving targets there are differentnoises in the video image sequences and the binary imagesobtained after thresholding usually have breaks and holes inthe target area which are processed using open and closedoperations to improve the detection accuracy

-e essence of the open operation is to erode and expandthe image which eliminates burrs and scatter in the image-e closed operation is the opposite of the open operation interms of implementation steps and can be used to fill in thehole effects that exist in the image

Take A as the target object and B as the structure element(3lowast 3 or 5lowast 5) -e open operation on A with structuralelement B is denoted as

A middot B (AotimesB)oplusB (6)

Most of the visible light in nature can be composed ofthree fundamental colours and the RGB colour space is alsoconstructed based on the principle of three fundamentalcolours with the three components of the RGB colour spacerepresenting the x- y- z-axes with the origin representingblack and the dotted endpoints along the square in the figurerepresenting white Any colour light D can be composed ofR G and B in different proportions

D R(p) + G(p) + B(p) (7)

From the schematic diagram of the two-frame differ-ential method we can see that the algorithm has thecharacteristics of simple implementation steps a smallnumber of operations and small requirements for system

implementation Because the core idea in the algorithm is touse the neighbouring two frames to do image differencingso the camera is subject to fewer light and shadow inter-ference factors more suitable for the actual scene is thewarehouse class of burglary video surveillance but at thesame time by the core idea of the algorithm the algorithm iseasy to contrast between the images to a lesser degree andthe difference is not obvious in the case of the resultingdifficulty to extract the feature value of the moving target inthe image Since the time difference between two frames istoo small and the sampling effect of the difference algorithmbetween two frames is related to the movement speed of themoving target when the movement speed is small or eventhe moving target is stationary the phenomenon of notdetecting the moving target can occur

Dk(x y) fk+1(x + k y + l) minus fk(x y)1113868111386811138681113868

1113868111386811138681113868 (8)

-e core idea of the average background method is tocalculate the mean and standard deviation of each pixel asthe background model

Average(x y) 1n

1113944

n

i1fi+1(x + k y + l) + fi(x y)

11138681113868111386811138681113868111386811138681113868 (9)

When tracking motion targets with complex shapes thetarget model created by kernel tracking cannot accuratelyrepresent the target and the silhouette tracking method candescribe the adaptive shape -e silhouette tracking andkernel tracking algorithms are the same in idea both usethe initial frame to build the target model Silhouettetracking extracts the target contour based on the estab-lished target model and then uses contour matching tosearch the target area of the subsequent video frames

Template frame

Detection frame

CNN

CNN

6 times 6 times 256127 times 127 times 3

255 times 255 times 3

22 times 22 times 256

Conv

Conv

Conv

Conv

4 times 4 times (2k times 256)

20 times 20 times 256

4 times 4 times (4k times 256)

20 times 20 times 256 17 times 17 times 4kRegression branch

Adw

Classificationbranch

17 times 17 times 2k

Adw

Positive

One group k groups

One group k groups

Complex radar return Matched

filteringSquare-law

detector

In-phaseQuadrature

Lagging window

x1

x1

x1

xnxn

xn

DBSCAN clustering algorithm

Clutter background level estimation

Monte carlo methodPfa

Trained ANN model

v

aZ

T = aZ

ComparatorH0H1

xn xnxnxn

xnxn

Leading window

dx dy dw dh

Under testReference

Guard

Negative

Figure 1 Network structure of machine learning algorithm for multitarget tracking

4 Complexity

Common silhouette tracking methods are divided into twocategories contour tracking methods and shape matchingmethods-e silhouette tracking method applies to the caseof partial overlap of target silhouettes between consecutiveframes and the energy function and state-space model areused to estimate the position information of the targetsilhouette in the current frame -e contour tracking al-gorithm can adjust the centre of gravity and size of thecontour according to the changing appearance of themoving target the shape matching method is to search forthe target shape by matching the established target modelshape to the current video image All three tracking al-gorithms have their applicable occasion conditions andnow the three methods are compared and analysed seeTable 1

32 Multiobjective Scheme Design for Motion in Sports VideoA good multiobjective tracking dataset often determines agood or bad tracking algorithm To have a fair and objectivecomparison with other algorithms it is required that thedataset be complete and rich contain all kinds of specialcases and can reflect the accuracy and robustness of thealgorithm A complete multitarget tracking dataset shouldcontain (1) different viewpoints for example SORT-basedtarget tracking has a high tracking accuracy for fixedviewpoints under low-altitude surveillance video but a lowaccuracy when the video is changed to a moving viewpointfor in-vehicle video (2) target scenes with different densitiesif there are too few targets then it cannot reflect the algo-rithmrsquos handling of target occlusion target deformation andsimilar target interference (3) different light intensity thesame target in computer vision the pixel value of thecorresponding position in a rainy day and a sunny day willproduce a large difference we can also use this difference toevaluate the robustness of an algorithm (4) different videoframe rate the video frame size is sensitive to the relevantparameters of the corresponding model of the multitargettracking algorithm and similarly we can also use this dif-ference to evaluate the robustness of an algorithm -e costof our calculation method is about 80 of other studies andthe results can be increased by 5

To make a comprehensive and fair comparison of dif-ferent algorithms this paper uses a publicly available datasetas a benchmark dataset for the performance measurement ofmultitarget tracking algorithms [21] -e dataset is thedataset used to measure the criteria of multitarget trackingmethods in themultitarget tracking series-e dataset of thisclass contains a rich set of scenarios to meet the require-ments of various situations mentioned above -e datasetcontains a total of 14 video sequences of which 7 videosequences form the training set and the remaining 7 se-quences form the test set each of which is very different fromthe other It may be a top view a flat view or a top view -ephotos are taken by high-resolution cameras and each framecontains multiple detection frames on average and eachsequence contains multiple target numbers

Generative models more widely used in image super-resolution face recognition and generation image

complementation and style conversion etc while the ap-plication of generative models is still in its infancy in sceneswith less detailed image features and more backgroundinterference like multitarget tracking For this kind of imageprocessing problems with complex interference informa-tion a low percentage of positive sample pixels in the field ofview and inconspicuous target features it is usually difficultfor the generative model to fully learn the distribution of keydata in the input samples and the complex scene infor-mation has a large impact on the accuracy of the generativemodel sample authenticity determination [22] -e multi-target tracker may encounter the problem of driftingmisfollowing or losing the identity due to the change ofidentity caused by the deformation of the target in theprocess of tracking specific targets [23ndash25] In the onlinetracking process the stability and accuracy of the tracker canbe effectively improved if more diverse samples (egsamples of the same target in multiple views and multiplemotion states) can be generated based on a limited numberof samples of a specific target in the history frame-ereforethis paper proposes a generative model based on conditionalvariational self-encoder-conditional generation adversarialnetwork and analyses and adjusts the target to be tracked in amultitarget tracking environment to generate multiviewsamples for a specific target to expand the training space ofthat target to enhance the performance of the trackingalgorithm

Variational self-encoder (VAE) shown in Figure 2 is atype of self-encoder and is an improved structure for gen-erative tasks In essence it adds Gaussian noise to thestandard self-encoder structure on top of the hidden variableof the output of the encoder (the encoder used to calculatethe mean of the distribution) so that the decoder can bemore stable to noise changes during the training process andadds KL scatter as a regular term to constrain the zero meanfor the output of the decoder In addition an additionalencoder is added to dynamically adjust the strength ofGaussian noise by calculating the variance of the inputdistribution ie during the training process of the decoderwhen the reconstruction error is larger than the KL scatterthe noise is appropriately reduced to reduce the fittingdifficulty while when the reconstruction error is smallerthan the KL scatter the noise is appropriately increased toincrease the fitting difficulty so as to improve the decoderrsquosability to generate samples

-e principle of variational self-encoder is based onvariational inference and the overall structure is a completeprobabilistic inferential model which can estimate the edgeprobability of the input image and maximize the edgeprobability to adjust the parameters of the model fortraining it can also estimate the Gaussian posterior prob-ability of the hidden variable to realize the decoding processfrom the hidden variable to the generated sample Based onthe theory of variational inference the lower bound estimatecan be obtained by calculating the parameters of the vari-ational lower bound and this estimate can be optimizedusing the standard stochastic gradient descent method iebackpropagation can train the model -is also means thatthe two-part structure of the variational self-encoder can be

Complexity 5

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 4: Machine Learning-Based Multitarget Tracking of Motion in

MDNet is proposed to address the abovementioned pointsof the current situation by learning the commonality of thesetargets through a multidomain learning network structure

In the detection of moving targets there are differentnoises in the video image sequences and the binary imagesobtained after thresholding usually have breaks and holes inthe target area which are processed using open and closedoperations to improve the detection accuracy

-e essence of the open operation is to erode and expandthe image which eliminates burrs and scatter in the image-e closed operation is the opposite of the open operation interms of implementation steps and can be used to fill in thehole effects that exist in the image

Take A as the target object and B as the structure element(3lowast 3 or 5lowast 5) -e open operation on A with structuralelement B is denoted as

A middot B (AotimesB)oplusB (6)

Most of the visible light in nature can be composed ofthree fundamental colours and the RGB colour space is alsoconstructed based on the principle of three fundamentalcolours with the three components of the RGB colour spacerepresenting the x- y- z-axes with the origin representingblack and the dotted endpoints along the square in the figurerepresenting white Any colour light D can be composed ofR G and B in different proportions

D R(p) + G(p) + B(p) (7)

From the schematic diagram of the two-frame differ-ential method we can see that the algorithm has thecharacteristics of simple implementation steps a smallnumber of operations and small requirements for system

implementation Because the core idea in the algorithm is touse the neighbouring two frames to do image differencingso the camera is subject to fewer light and shadow inter-ference factors more suitable for the actual scene is thewarehouse class of burglary video surveillance but at thesame time by the core idea of the algorithm the algorithm iseasy to contrast between the images to a lesser degree andthe difference is not obvious in the case of the resultingdifficulty to extract the feature value of the moving target inthe image Since the time difference between two frames istoo small and the sampling effect of the difference algorithmbetween two frames is related to the movement speed of themoving target when the movement speed is small or eventhe moving target is stationary the phenomenon of notdetecting the moving target can occur

Dk(x y) fk+1(x + k y + l) minus fk(x y)1113868111386811138681113868

1113868111386811138681113868 (8)

-e core idea of the average background method is tocalculate the mean and standard deviation of each pixel asthe background model

Average(x y) 1n

1113944

n

i1fi+1(x + k y + l) + fi(x y)

11138681113868111386811138681113868111386811138681113868 (9)

When tracking motion targets with complex shapes thetarget model created by kernel tracking cannot accuratelyrepresent the target and the silhouette tracking method candescribe the adaptive shape -e silhouette tracking andkernel tracking algorithms are the same in idea both usethe initial frame to build the target model Silhouettetracking extracts the target contour based on the estab-lished target model and then uses contour matching tosearch the target area of the subsequent video frames

Template frame

Detection frame

CNN

CNN

6 times 6 times 256127 times 127 times 3

255 times 255 times 3

22 times 22 times 256

Conv

Conv

Conv

Conv

4 times 4 times (2k times 256)

20 times 20 times 256

4 times 4 times (4k times 256)

20 times 20 times 256 17 times 17 times 4kRegression branch

Adw

Classificationbranch

17 times 17 times 2k

Adw

Positive

One group k groups

One group k groups

Complex radar return Matched

filteringSquare-law

detector

In-phaseQuadrature

Lagging window

x1

x1

x1

xnxn

xn

DBSCAN clustering algorithm

Clutter background level estimation

Monte carlo methodPfa

Trained ANN model

v

aZ

T = aZ

ComparatorH0H1

xn xnxnxn

xnxn

Leading window

dx dy dw dh

Under testReference

Guard

Negative

Figure 1 Network structure of machine learning algorithm for multitarget tracking

4 Complexity

Common silhouette tracking methods are divided into twocategories contour tracking methods and shape matchingmethods-e silhouette tracking method applies to the caseof partial overlap of target silhouettes between consecutiveframes and the energy function and state-space model areused to estimate the position information of the targetsilhouette in the current frame -e contour tracking al-gorithm can adjust the centre of gravity and size of thecontour according to the changing appearance of themoving target the shape matching method is to search forthe target shape by matching the established target modelshape to the current video image All three tracking al-gorithms have their applicable occasion conditions andnow the three methods are compared and analysed seeTable 1

32 Multiobjective Scheme Design for Motion in Sports VideoA good multiobjective tracking dataset often determines agood or bad tracking algorithm To have a fair and objectivecomparison with other algorithms it is required that thedataset be complete and rich contain all kinds of specialcases and can reflect the accuracy and robustness of thealgorithm A complete multitarget tracking dataset shouldcontain (1) different viewpoints for example SORT-basedtarget tracking has a high tracking accuracy for fixedviewpoints under low-altitude surveillance video but a lowaccuracy when the video is changed to a moving viewpointfor in-vehicle video (2) target scenes with different densitiesif there are too few targets then it cannot reflect the algo-rithmrsquos handling of target occlusion target deformation andsimilar target interference (3) different light intensity thesame target in computer vision the pixel value of thecorresponding position in a rainy day and a sunny day willproduce a large difference we can also use this difference toevaluate the robustness of an algorithm (4) different videoframe rate the video frame size is sensitive to the relevantparameters of the corresponding model of the multitargettracking algorithm and similarly we can also use this dif-ference to evaluate the robustness of an algorithm -e costof our calculation method is about 80 of other studies andthe results can be increased by 5

To make a comprehensive and fair comparison of dif-ferent algorithms this paper uses a publicly available datasetas a benchmark dataset for the performance measurement ofmultitarget tracking algorithms [21] -e dataset is thedataset used to measure the criteria of multitarget trackingmethods in themultitarget tracking series-e dataset of thisclass contains a rich set of scenarios to meet the require-ments of various situations mentioned above -e datasetcontains a total of 14 video sequences of which 7 videosequences form the training set and the remaining 7 se-quences form the test set each of which is very different fromthe other It may be a top view a flat view or a top view -ephotos are taken by high-resolution cameras and each framecontains multiple detection frames on average and eachsequence contains multiple target numbers

Generative models more widely used in image super-resolution face recognition and generation image

complementation and style conversion etc while the ap-plication of generative models is still in its infancy in sceneswith less detailed image features and more backgroundinterference like multitarget tracking For this kind of imageprocessing problems with complex interference informa-tion a low percentage of positive sample pixels in the field ofview and inconspicuous target features it is usually difficultfor the generative model to fully learn the distribution of keydata in the input samples and the complex scene infor-mation has a large impact on the accuracy of the generativemodel sample authenticity determination [22] -e multi-target tracker may encounter the problem of driftingmisfollowing or losing the identity due to the change ofidentity caused by the deformation of the target in theprocess of tracking specific targets [23ndash25] In the onlinetracking process the stability and accuracy of the tracker canbe effectively improved if more diverse samples (egsamples of the same target in multiple views and multiplemotion states) can be generated based on a limited numberof samples of a specific target in the history frame-ereforethis paper proposes a generative model based on conditionalvariational self-encoder-conditional generation adversarialnetwork and analyses and adjusts the target to be tracked in amultitarget tracking environment to generate multiviewsamples for a specific target to expand the training space ofthat target to enhance the performance of the trackingalgorithm

Variational self-encoder (VAE) shown in Figure 2 is atype of self-encoder and is an improved structure for gen-erative tasks In essence it adds Gaussian noise to thestandard self-encoder structure on top of the hidden variableof the output of the encoder (the encoder used to calculatethe mean of the distribution) so that the decoder can bemore stable to noise changes during the training process andadds KL scatter as a regular term to constrain the zero meanfor the output of the decoder In addition an additionalencoder is added to dynamically adjust the strength ofGaussian noise by calculating the variance of the inputdistribution ie during the training process of the decoderwhen the reconstruction error is larger than the KL scatterthe noise is appropriately reduced to reduce the fittingdifficulty while when the reconstruction error is smallerthan the KL scatter the noise is appropriately increased toincrease the fitting difficulty so as to improve the decoderrsquosability to generate samples

-e principle of variational self-encoder is based onvariational inference and the overall structure is a completeprobabilistic inferential model which can estimate the edgeprobability of the input image and maximize the edgeprobability to adjust the parameters of the model fortraining it can also estimate the Gaussian posterior prob-ability of the hidden variable to realize the decoding processfrom the hidden variable to the generated sample Based onthe theory of variational inference the lower bound estimatecan be obtained by calculating the parameters of the vari-ational lower bound and this estimate can be optimizedusing the standard stochastic gradient descent method iebackpropagation can train the model -is also means thatthe two-part structure of the variational self-encoder can be

Complexity 5

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 5: Machine Learning-Based Multitarget Tracking of Motion in

Common silhouette tracking methods are divided into twocategories contour tracking methods and shape matchingmethods-e silhouette tracking method applies to the caseof partial overlap of target silhouettes between consecutiveframes and the energy function and state-space model areused to estimate the position information of the targetsilhouette in the current frame -e contour tracking al-gorithm can adjust the centre of gravity and size of thecontour according to the changing appearance of themoving target the shape matching method is to search forthe target shape by matching the established target modelshape to the current video image All three tracking al-gorithms have their applicable occasion conditions andnow the three methods are compared and analysed seeTable 1

32 Multiobjective Scheme Design for Motion in Sports VideoA good multiobjective tracking dataset often determines agood or bad tracking algorithm To have a fair and objectivecomparison with other algorithms it is required that thedataset be complete and rich contain all kinds of specialcases and can reflect the accuracy and robustness of thealgorithm A complete multitarget tracking dataset shouldcontain (1) different viewpoints for example SORT-basedtarget tracking has a high tracking accuracy for fixedviewpoints under low-altitude surveillance video but a lowaccuracy when the video is changed to a moving viewpointfor in-vehicle video (2) target scenes with different densitiesif there are too few targets then it cannot reflect the algo-rithmrsquos handling of target occlusion target deformation andsimilar target interference (3) different light intensity thesame target in computer vision the pixel value of thecorresponding position in a rainy day and a sunny day willproduce a large difference we can also use this difference toevaluate the robustness of an algorithm (4) different videoframe rate the video frame size is sensitive to the relevantparameters of the corresponding model of the multitargettracking algorithm and similarly we can also use this dif-ference to evaluate the robustness of an algorithm -e costof our calculation method is about 80 of other studies andthe results can be increased by 5

To make a comprehensive and fair comparison of dif-ferent algorithms this paper uses a publicly available datasetas a benchmark dataset for the performance measurement ofmultitarget tracking algorithms [21] -e dataset is thedataset used to measure the criteria of multitarget trackingmethods in themultitarget tracking series-e dataset of thisclass contains a rich set of scenarios to meet the require-ments of various situations mentioned above -e datasetcontains a total of 14 video sequences of which 7 videosequences form the training set and the remaining 7 se-quences form the test set each of which is very different fromthe other It may be a top view a flat view or a top view -ephotos are taken by high-resolution cameras and each framecontains multiple detection frames on average and eachsequence contains multiple target numbers

Generative models more widely used in image super-resolution face recognition and generation image

complementation and style conversion etc while the ap-plication of generative models is still in its infancy in sceneswith less detailed image features and more backgroundinterference like multitarget tracking For this kind of imageprocessing problems with complex interference informa-tion a low percentage of positive sample pixels in the field ofview and inconspicuous target features it is usually difficultfor the generative model to fully learn the distribution of keydata in the input samples and the complex scene infor-mation has a large impact on the accuracy of the generativemodel sample authenticity determination [22] -e multi-target tracker may encounter the problem of driftingmisfollowing or losing the identity due to the change ofidentity caused by the deformation of the target in theprocess of tracking specific targets [23ndash25] In the onlinetracking process the stability and accuracy of the tracker canbe effectively improved if more diverse samples (egsamples of the same target in multiple views and multiplemotion states) can be generated based on a limited numberof samples of a specific target in the history frame-ereforethis paper proposes a generative model based on conditionalvariational self-encoder-conditional generation adversarialnetwork and analyses and adjusts the target to be tracked in amultitarget tracking environment to generate multiviewsamples for a specific target to expand the training space ofthat target to enhance the performance of the trackingalgorithm

Variational self-encoder (VAE) shown in Figure 2 is atype of self-encoder and is an improved structure for gen-erative tasks In essence it adds Gaussian noise to thestandard self-encoder structure on top of the hidden variableof the output of the encoder (the encoder used to calculatethe mean of the distribution) so that the decoder can bemore stable to noise changes during the training process andadds KL scatter as a regular term to constrain the zero meanfor the output of the decoder In addition an additionalencoder is added to dynamically adjust the strength ofGaussian noise by calculating the variance of the inputdistribution ie during the training process of the decoderwhen the reconstruction error is larger than the KL scatterthe noise is appropriately reduced to reduce the fittingdifficulty while when the reconstruction error is smallerthan the KL scatter the noise is appropriately increased toincrease the fitting difficulty so as to improve the decoderrsquosability to generate samples

-e principle of variational self-encoder is based onvariational inference and the overall structure is a completeprobabilistic inferential model which can estimate the edgeprobability of the input image and maximize the edgeprobability to adjust the parameters of the model fortraining it can also estimate the Gaussian posterior prob-ability of the hidden variable to realize the decoding processfrom the hidden variable to the generated sample Based onthe theory of variational inference the lower bound estimatecan be obtained by calculating the parameters of the vari-ational lower bound and this estimate can be optimizedusing the standard stochastic gradient descent method iebackpropagation can train the model -is also means thatthe two-part structure of the variational self-encoder can be

Complexity 5

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 6: Machine Learning-Based Multitarget Tracking of Motion in

reconstructed using a deep learning model to perform mostof the tasks in computer vision

4 Analysis of Results

41 Validation of Multitarget Tracking Data EnhancementEffect -e ultimate goal of the proposed method in thispaper is to be able to compensate for the lack of data di-versity in multitarget tracking tasks therefore the proposedmodel is used to validate two approaches to multitargettracking respectively a single-target tracking enhancement-based tracking method and a twin neural network-basedtracking method -e fundamental difference between thetwo approaches is the update mechanism

-e twin neural network-based tracking method doesnot have an online update process during tracking avoidingthe cumulative error that occurs during constant updatingHowever this method relies on offline training and thetraining process is directed at the edges containing the targetto be tracked not the global information of a single frameFor multitarget tracking tasks the diversity of datasetsavailable for training is not strong When the trained twinneural networks are used for tracking the frequent occur-rence of interactions and occlusions due to the uncertainnumber of targets in the field of view can greatly affect thesimilarity metric accuracy of the single-sample trained twin

neural networks -e model proposed in this paper expandsthe dataset during the training process and uses the gen-erative model to generate multiangle and multipose statesfor samples of the same identity which makes the twinneural network more robust to complex situations as shownin Figure 3

-e above two methods are optimized using their cor-responding data generation methods and the final perfor-mance is verified on the MOT Challenge 17 dataset and theresults are shown in Figure 3 From the experimental resultsit can be seen that after expanding the data diversity with thegeneration model proposed in this paper it can effectivelyreduce the impact of both methods by possible deformationand occlusion as shown in the reduction of False Negative(FN) by 31 and 48 respectively and the reduction ofIdentity Switch (ID Sw) by 225 and 183 respectivelyDue to the enrichment of training space the recognition rateand long-time tracking effect of the two methods are alsoimproved as shown by the improvement of the index by 04and 07 respectively In summary the proposed method canenrich the diversity of multitarget tracking data as a datageneration method which can effectively improve theoverall performance of multitarget tracking

-e multitarget tracking algorithm based on spatial in-formation association uses motion estimation and predictioncombines spatial distance metric and tracking gate restriction

X1

X2

X3

X4

X5

Mean 1

Variance 1

Mean 2

Variance 2

Mean 3

Variance 3

Mean 4

Variance 4

Mean 5

Variance 5

Normal distribution 1

Normal distribution 2

Normal distribution 3

Normal distribution 4

Normal distribution 5

Z1

Z2

Z3

Z4

Z5

Y1

Y2

Y3

Y4

Y5

Builder

Figure 2 Working principle of the variable division self-encoder

Table 1 Comparison of the analysis of three tracking methods

Trackingmethod Advantage Disadvantage

Point tracking Suitable for translational movements withsmall targets When the target shape is obscured point tracking

Nucleartracking

When the target contour is a geometric rigidbody the effect is better

Inability to effectively track a small number of feature pointsrsquo illuminationand target morphology changes have high sensitivity

Silhouettetracking

It has a significant effect on tracking nonrigidbodies with complex shapes High requirements for target model establishment

6 Complexity

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 7: Machine Learning-Based Multitarget Tracking of Motion in

to construct the cost matrix of data association and uses theHungarian algorithm to solve the tracking matching resultunder the minimum association cost the experimental en-vironment is shown in Figure 4 -ree surveillance videosunder the fixed camera in MOT17 are used as the test set andthe motion target is with visibility greater than 30 -especific parameters of the three videos are shown in Figure 4

To quantitatively compare the effects of different de-tection algorithms on the multitarget tracking effect thispaper compares the accuracy precision and several trackingquality evaluation indexes for the above three groups oftracking results where the arrow next to the name of eachindex indicates the expected trend of the tracking algorithmon the index and the downward arrow indicates that thesmaller the index is the better the index is and the upwardarrow indicates that the larger the index is the better theindex is In the experiments of this chapter the confidencethreshold of both the Faster RCNN and the improved YOLOdetection results are uniformly set to 08 and the thresholdof DPM is set to minus1 -e evaluation metrics of the threegroups of tracking results are obtained as shown in Figure 5

From Figure 5 it can be seen that the multitargettracking results based on the improved YOLO algorithm inthis paper perform better in four indexes MOTA (accuracy)MT (most tracking) ML (most lost) and missed followers(FN) indicating that the detection and tracking algorithm inthis paper not only reduces the number of missed followersbut also improves the stability of target tracking -etracking results based on the Faster RCNN detection al-gorithm are more accurate than those of DPM and YOLOmainly because the Faster RCNN algorithm based on thetwo-step method has higher accuracy in the regression of thetarget bounding box which leads to a higher intersectionratio between the successfully tracked target and the realtrajectory bounding box

-is comparison experiment is designed to test theimpact of association cost metrics on data association al-gorithms by using Euclidean distance between target frame

centroid locations intertarget frame intersection and mergeratio and the proposed tracking-gate-based intersection andmerge ratio of target frames respectively In this paper thedistance metric for multitarget data association uses thecross-merge ratio between the detection result located in thetracking gate and the predicted state bounding box whichmore than doubles the tracking accuracy compared to usingthe Euclidean distance between the target centre positions-e change to the intersection ratio as the distance metric isthe most useful and the metric not only has better matchingperformance compared to the Euclidean distance metric butalso applies to targets of different sizes with convenientparameter settings and high portability -e addition of thetracking window qualification based on the motion stateestimation makes the number of target identity transfor-mations in the tracking process decreases and the number of

0

10

20

30

40

50

ID Sw

FN

FP ML

MT

IDF1

MOTA

MAMA + DA

RegSiamRegsiam + DA

Figure 3 Effect of multitarget tracking data enhancement ontracking performance

600

1050

525

3175

8691065 958

2892

145346

69 165

MOT17-02 MOT17-04 MOT17-09 All0

500

1000

1500

2000

2500

3000

3500

Val

ues

Sequence

Number of framesMark the number of track pointsAverage number of targets per frame

Figure 4 Multitarget tracking experimental test set parameters

MOTA MOTP MT ML GT FP FN IDsw FM

0

20

40

60

80

100

Percent

epyT

DPMFRCNNImprove YOLO

Figure 5 Comparison of motion multiobjective tracking resultindicators with different detection algorithms

Complexity 7

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 8: Machine Learning-Based Multitarget Tracking of Motion in

false alarms decreases However due to the addition of thetracking gate qualification the number of misses andtracking fragments in the tracking results also increasesslightly

42 Performance Test Results Analysis of Motion MultitargetTracking System in Sports Video In this section the pro-posed method in this paper is compared with the methods inthe literature Firstly experiments are conducted separatelyusing the combination scheme of different viewpoints toobserve the tracking performance improvement effect ofmultiple viewpoints -en a fair comparison with existingmethods is performed to analyse the advantages andshortcomings of the proposed method in this study and thecomparison results are shown in Figure 6

-e inputndashoutput evaluation of the experimental resultsis consistent with the evaluation system in the literatureusing the same inputs as well as the same GTs which canensure the fairness of the comparison As shown in Figure 6the performance of the tracking system in this studyachieved good results although it did not exceed the lit-erature however the 100 MT as well as the lower IDs hasexceeded the other comparison methods In the more dif-ficult intensive motion scenarios the best results wereachieved by the method proposed in this paper -is indi-cates that Markov optimization models play a key role indealing with dense scenes It can also be found from Figure 6that the method in this study makes better use of multiviewdata for tracking ie the tracking performance using datafrom all three views is better than that using data from onlytwo views in all three experimental sequences which followsthe practical physical meaning and is consistent with theoriginal purpose of multiview research

In Figure 7 the three subplots of each plot represent thesimultaneous tracking result plots for the master view andthe two slave views respectively It can be seen from the plotsthat tracking using the information from multiple view-points enables a more accurate estimation of the targettrajectory It can also be found that the same targets arecorrectly assigned the same tracking trajectory markers indifferent viewpoints

A Markov random field data association optimizationmodel based on cross-view coupled trajectory fragments isintroduced which has a new potential function enhance-ment method to effectively associate the fragments oftrajectory fragments caused by intensive motion In thispaper the cross-view coupled trajectory fragments areobtained by a data fusion method based on image mutualinformation -is method can calculate the spatial positionrelationship between cross-view 2D trajectory fragmentpairs considering both position and motion informationand corrects the position coordinates of stumped anddeviated target detection data in dense motion scenes usingthe human key point detection method Using PETS 2009experimental dataset for modular and system-level ex-periments respectively the experimental results demon-strate the enhancement of image mutual informationmethod and MRF optimization method in the process of

data fusion and data association Meanwhile the trackingmethod in this study has good multiview multitargettracking performance when compared with the currentresearch methods

To test the running speed of the multitarget trackingalgorithm in this chapter and the change of the overallrunning speed of the detection and tracking algorithm whenthe number of targets in the scene changes three staticsurveillance videos in MOT17 were tested and the resultswere counted and the average time consumed by the al-gorithm is shown in Figure 8

From Figure 8 it can be shown that the average timeconsumed by the tracking algorithm in this chapter is onlyabout 9ms which is fast On the other hand the averagenumber of targets in the three test sequences is differentresulting in a larger dimension of the cost matrix to be solvedin the scenario with a larger number of targets in the dataassociation process which in turn increases the computa-tional effort of the multitarget tracking segment but sincethe tracking segment itself accounts for a smaller proportionof the overall algorithm operation time the difference in theaverage frame rate is limited which in turn indicates that thenumber of targets in the scenario has a limited impact on thetracking algorithm and has limited fluctuations in thecomputation time -is paper optimizes the traditionalclassifier for cosine similarity to achieve the extraction andeffective matching of target depth epistatic features -enthe fusion and tracking strategy of apparent feature simi-larity and spatial similarity in multitarget tracking isproposed

-e construction of target similarity measure and as-sociation cost is the core of data association based onHungarian arithmetic -e target bounding box intersectionand ratio based on tracking gate used in this chapter is morethan double in tracking accuracy compared to using Eu-clidean distance between target centre locations and thechange to intersection and ratio as a distance metric playsthe biggest role which is not only applicable to multiscaletargets and easy to set threshold parameters but also has

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100

Val

ues

Epochs

GTsIDsMT

Figure 6 Tracking system performance comparison results

8 Complexity

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 9: Machine Learning-Based Multitarget Tracking of Motion in

better tracking performance compared to Euclidean distancemetric

Compared with the latest related research our researchresults show that our efficiency and accuracy have improvedby nearly 5 On this basis the tracking gate limit based onthe motion state estimation is introduced to further reducethe number of target identity transformations and thenumber of false alarms in the tracking process However themethod also has shortcomings and the number of missingtargets and tracking fragments in the tracking results isslightly increased due to the increase of the limits

5 Conclusion

-is paper focuses on the motion multitarget trackingtechnology in video surveillance and deeply investigates themultitarget online tracking problem in video surveillancefrom four aspects motion target detection motion stateestimation data association and epistemic feature modelling-e construction of a feature pyramid network for multiscaletarget prediction and the combination of multiscale trainingmechanism effectively improve the quality of multiscalemotion target detection in complex surveillance scenes and

combine with the clustering optimization of the a priori frameparameters to effectively reduce the rate of missed detectionand false detection of motion targets Based on the targetdetection results the overall design of the detection-basedmultitarget tracking scheme in this paper is designed incombination with video surveillance application scenariosand a multitarget tracking algorithm that fuses apparentfeature metrics with spatial similarity is proposed -is paperproposes a multitarget tracking algorithm that fuses theapparent features of the target with spatial informationWhenthe cross-entropy loss function based on the classifier is usedfor training the obtained feature distribution shows radialcharacteristics and does not match the distance of thecommonly used feature similarity metric and for thisproblem this paper optimizes the traditional classifier forcosine similarity to achieve the extraction and effectivematching of target depth epistatic features -en the fusionand tracking strategy of apparent feature similarity and spatialsimilarity in multitarget tracking is proposed To address theshortage of effective update mechanism of target features inthe field of multitarget tracking a sparse update mechanismwith intertarget occlusion judgment is proposed to effectivelyprevent tracking drift in the case of intertarget occlusion -eeffectiveness of this paperrsquos method is verified by the ex-perimental results on three sets of surveillance videosequences

Data Availability

-e data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

-e authors declare that they have no conflicts of interest

References

[1] J Hulsmann J Traub and V Markl ldquoDemand-based sensordata gathering with multi-query optimizationrdquo Proceedings ofthe VLDB Endowment vol 13 no 12 pp 2801ndash2804 2020

[2] A Belhadi Y Djenouri J C-W Lin and A Cano ldquoTrajectoryoutlier detectionrdquo ACM Transactions on Management In-formation Systems vol 11 no 3 pp 1ndash29 2020

[3] A Zappone M Di Renzo and M Debbah ldquoWireless net-works design in the era of deep learning model-based AI-

Figure 7 Trace result graph

MOT17-02 MOT17-04 MOT17-09 All0

20

40

60

80

100

120

140

Val

ues

Sequence

Average frame rate

Average time per frame fordetection and tracking

Tracking average timeper frameAverage number of targetsper frame

Figure 8 Test elapsed time and average frame rate of multitargettracking algorithm on MOT17 dataset

Complexity 9

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity

Page 10: Machine Learning-Based Multitarget Tracking of Motion in

based or bothrdquo IEEE Transactions on Communicationsvol 67 no 10 pp 7331ndash7376 2019

[4] A B Hamida A Benoit P Lambert and C B Amar ldquo3-Ddeep learning approach for remote sensing image classifica-tionrdquo IEEE Transactions on Geoscience and Remote Sensingvol 56 no 8 pp 4420ndash4434 2018

[5] A Farasat G Gross R Nagi and A G Nikolaev ldquoSocialnetwork analysis with data fusionrdquo IEEE Transactions onComputational Social Systems vol 3 no 2 pp 88ndash99 2016

[6] H A Pierson and M S Gashler ldquoDeep learning in robotics areview of recent researchrdquo Advanced Robotics vol 31 no 16pp 821ndash835 2017

[7] L Li K Ota andM Dong ldquoDeep learning for smart industryefficient manufacture inspection system with fog computingrdquoIEEE Transactions on Industrial Informatics vol 14 no 10pp 4665ndash4673 2018

[8] A Jalali and H Farsi ldquoA new steganography algorithm basedon video sparse representationrdquo Multimedia Tools and Ap-plications vol 79 no 3-4 pp 1821ndash1846 2020

[9] H Song J J -iagarajan P Sattigeri and A SpaniasldquoOptimizing kernel machines using deep learningrdquo IEEETransactions on Neural Networks and Learning Systemsvol 29 no 11 pp 5528ndash5540 2018

[10] S De L Bruzzone A Bhattacharya F Bovolo andS Chaudhuri ldquoA novel technique based on deep learning anda synthetic target database for classification of urban areas inPolSAR datardquo IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing vol 11 no 1 pp 154ndash1702018

[11] L Jiang L Yan Y Xia Q Guo M Fu and K Lu ldquoAsyn-chronous multirate multisensor data fusion over unreliablemeasurements with correlated noiserdquo IEEE Transactions onAerospace and Electronic Systems vol 53 no 5 pp 2427ndash2437 2017

[12] H Wu Z Zhang C Jiao C Li and T Q S Quek ldquoLearn tosense a meta-learning-based sensing and fusion frameworkfor wireless sensor networksrdquo IEEE Internet of ltings Journalvol 6 no 5 pp 8215ndash8227 2019

[13] Z Zhao X Wang and T Wang ldquoA novel measurement dataclassification algorithm based on SVM for tracking closelyspaced targetsrdquo IEEE Transactions on Instrumentation andMeasurement vol 68 no 4 pp 1089ndash1100 2019

[14] M A Al-Jarrah A Al-Dweik M Kalil and S S Ikki ldquoDe-cision fusion in distributed cooperative wireless sensor net-worksrdquo IEEE Transactions on Vehicular Technology vol 68no 1 pp 797ndash811 2019

[15] M Pourshamsi M Garcia M Lavalle and H Balzter ldquoAmachine-learning approach to PolInSAR and LiDAR datafusion for improved tropical forest canopy height estimationusing NASA AfriSAR Campaign datardquo IEEE Journal of Se-lected Topics in Applied Earth Observations and RemoteSensing vol 11 no 10 pp 3453ndash3463 2018

[16] N Mittal U Singh R Salgotra and B S Sohi ldquoAn energyefficient stable clustering approach using fuzzy extended greywolf optimization algorithm for WSNsrdquo Wireless Networksvol 25 no 8 pp 5151ndash5172 2019

[17] P Ghamisi R Gloaguen P M Atkinson et al ldquoMultisourceand multitemporal data fusion in remote sensing a com-prehensive review of the state of the artrdquo IEEE Geoscience andRemote Sensing Magazine vol 7 no 1 pp 6ndash39 2019

[18] H Zhang X Zhou Z Wang H Yan and J Sun ldquoAdaptiveconsensus-based distributed target tracking with dynamiccluster in sensor networksrdquo IEEE Transactions on Cyberneticsvol 49 no 5 pp 1580ndash1591 2019

[19] X Yuan and Y Pu ldquoParallel lensless compressive imaging viadeep convolutional neural networksrdquo Optics Express vol 26no 2 pp 1962ndash1977 2018

[20] D Nada M Bousbia-Salah and M Bettayeb ldquoMulti-sensordata fusion for wheelchair position estimation with unscentedKalman Filterrdquo International Journal of Automation andComputing vol 15 no 2 pp 207ndash217 2018

[21] V Radu C Tong S Bhattacharya et al ldquoMultimodal deeplearning for activity and context recognitionrdquo Proceedings ofthe ACM on Interactive Mobile Wearable and UbiquitousTechnologies vol 1 no 4 pp 1ndash27 2018

[22] Q Zhou and Y Zheng ldquoLong link wireless sensor routingoptimization based on improved adaptive ant colony algo-rithmrdquo International Journal of Wireless Information Net-works vol 27 no 103 pp 241ndash252 2019

[23] S Xia D Peng D Meng et al ldquoA fast adaptive k-means withno boundsrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence p 1 2020

[24] B Yong W Wei K C Li et al ldquoEnsemble machine learningapproaches for webshell detection in Internet of things en-vironmentsrdquo Transactions on Emerging TelecommunicationsTechnologies 2020

[25] J S Almeida P P Rebouccedilas Filho T Carneiro et alldquoDetecting Parkinsonrsquos disease with sustained phonation andspeech signals using machine learning techniquesrdquo PatternRecognition Letters vol 125 pp 55ndash62 2019

10 Complexity