8
Overview of Crowd Behaviour Analysis Methods Franjo Matkovic Faculty of Electrical Engineering and Computing University of Zagreb Zagreb, Croatia [email protected] Abstract—This paper provides overview of the crowd analysis methods and their components. With increase of surveillance sys- tems and crowd simulations applications, the need for available crowd surveillance models increases. Topic of interest in crowd analysis will be abnormal crowd behavior, as timely detection of their occurrences is very important factor for quick and effective reaction. When it comes to conventional computer vision, most of the methods for abnormal crowd detection use physics inspired methods like fluid dynamics and social force models to analyze crowd behaviors in the scene. Overview of the perspectives, such as microscopic and macroscopic, which tackle different problems is presented. In the end, possible directions for future research will be discussed. Index Termscrowd behavior analysis, abnormal crowd analysis, motion segmentation,physics inspired methods, fluid dy- namics, social force model, optical flow, macroscopic, microscopic I. I NTRODUCTION Crowd analysis is a research area of computer vision that is gaining momentum. Steady increase in population number and density of population brings problems to public safety and management. Solutions are offered with widespread number of video cameras used for public surveillance. Current solution with humans monitoring makes these systems not feasible, as humans cannot monitor multiple systems simultaneously, get fatigued and have high operating costs. Beside using crowd analysis in visual surveillance for anomaly detection, it is used in public space design, crowd management and virtual environments. That is why there are attempts to introduce automatic surveillance systems. Figure 1. shows example of structure for crowd scene analysis algorithms. Survey [1] shows increase in survey papers on the problems in matter. As a number of papers increase so does a need for unique definitions. Some attempt to make basic definitions are in glossary [2]. Crowd is an aggregation of people that share the same goal [2]. Some papers distinguish different size crowds like small crowd (up to 15 people), medium crowd (up to 50 people) and dense crowd (more then 50 people) [3]. When it comes to crowd analysis, there are many different definitions and approaches to problem solving due to interdisciplinarity. Most common approach is from physically inspired models and sociological standpoint. A lot of deep learning models have been proposed that mimic such approaches. This work has been supported by the Croatian Science Foundation under project IP-2018-01-7619 Knowledge-based Approach to Crowd Analysis in Video Surveillance (KACAVIS). Survey [3] talks not just about computer vision itself in crowd analysis but how it is related with computer graphics and how they intertwine. Interesting part about crowd analysis in this paper is about holistic approaches in behaviour understanding, Since holistic approaches describe crowd as a single big entity, it has its advantages and drawbacks. Biggest advantage and drawback is human detection. If we are not interested in a single person in the crowd, or density of the crowd is such that we cannot pinpoint single person, we will not use human detection, and we will automatically exclude errors and noises from this step. But, if we are not detecting humans, during holistic analysis, there will be no difference between humans, and, for example, cars. Survey [4] focuses on crowd motion pattern learning, crowd behaviour and activity analysis and anomaly detection in crowds. Survey mentions that due to nature of crowds, con- ventional computer vision and traditional approaches are not enough to tackle problem efficiently and there have to be special considerations for modeling such systems. ”Ali [5] stated that the mechanics of crowd analysis are complex, because crowds exhibit both dynamics and psychological characteristics such that are goal directed” [5]. Another type of crowd partition is that crowds can be structured or unstruc- tured. Structured crowds share a goal and are coherent, while unstructured crowds are chaotic in their movements and goals. They split crowd analysis on two approaches: i) continuum- based approach; ii) agent-based approach. Survey [1] provides overview of physics-inspired methods in crowd analysis like fluid dynamics, interaction forces and complex motion systems. The focus in on crowd motion analysis. Survey points out that approaches to crowd analysis are overlapped and states that there are three methods for crowed analysis: i) controlled experiment; ii) crowd modeling and simulation and iii) crowd video surveillance. With development of deep learning and increase in its use in all areas of computer vision and computer science in general, it is also used in crowd analysis. Survey by Tripathi et al [6] shows steady increase of deep learning usage in crowd analysis. Despite the data-sets mentioned in the survey they state that crowd analysis is not more exposed to deep learning because there lacks a good dataset that can learn deep networks. Another division od crowd analysis is based on problems that are being solved and some of those are: Density estimation

Overview of Crowd Behaviour Analysis Methods

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Overview of Crowd Behaviour Analysis MethodsFranjo Matkovic

Faculty of Electrical Engineering and ComputingUniversity of Zagreb

Zagreb, [email protected]

Abstract—This paper provides overview of the crowd analysismethods and their components. With increase of surveillance sys-tems and crowd simulations applications, the need for availablecrowd surveillance models increases. Topic of interest in crowdanalysis will be abnormal crowd behavior, as timely detection oftheir occurrences is very important factor for quick and effectivereaction. When it comes to conventional computer vision, most ofthe methods for abnormal crowd detection use physics inspiredmethods like fluid dynamics and social force models to analyzecrowd behaviors in the scene. Overview of the perspectives, suchas microscopic and macroscopic, which tackle different problemsis presented. In the end, possible directions for future researchwill be discussed.

Index Terms— crowd behavior analysis, abnormal crowdanalysis, motion segmentation,physics inspired methods, fluid dy-namics, social force model, optical flow, macroscopic, microscopic

I. INTRODUCTION

Crowd analysis is a research area of computer vision thatis gaining momentum. Steady increase in population numberand density of population brings problems to public safety andmanagement. Solutions are offered with widespread number ofvideo cameras used for public surveillance. Current solutionwith humans monitoring makes these systems not feasible, ashumans cannot monitor multiple systems simultaneously, getfatigued and have high operating costs. Beside using crowdanalysis in visual surveillance for anomaly detection, it isused in public space design, crowd management and virtualenvironments. That is why there are attempts to introduceautomatic surveillance systems. Figure 1. shows example ofstructure for crowd scene analysis algorithms. Survey [1]shows increase in survey papers on the problems in matter.As a number of papers increase so does a need for uniquedefinitions. Some attempt to make basic definitions are inglossary [2]. Crowd is an aggregation of people that share thesame goal [2]. Some papers distinguish different size crowdslike small crowd (up to 15 people), medium crowd (up to 50people) and dense crowd (more then 50 people) [3]. When itcomes to crowd analysis, there are many different definitionsand approaches to problem solving due to interdisciplinarity.Most common approach is from physically inspired modelsand sociological standpoint. A lot of deep learning modelshave been proposed that mimic such approaches.

This work has been supported by the Croatian Science Foundation underproject IP-2018-01-7619 Knowledge-based Approach to Crowd Analysis inVideo Surveillance (KACAVIS).

Survey [3] talks not just about computer vision itself in crowdanalysis but how it is related with computer graphics and howthey intertwine. Interesting part about crowd analysis in thispaper is about holistic approaches in behaviour understanding,Since holistic approaches describe crowd as a single big entity,it has its advantages and drawbacks. Biggest advantage anddrawback is human detection. If we are not interested in asingle person in the crowd, or density of the crowd is suchthat we cannot pinpoint single person, we will not use humandetection, and we will automatically exclude errors and noisesfrom this step. But, if we are not detecting humans, duringholistic analysis, there will be no difference between humans,and, for example, cars.Survey [4] focuses on crowd motion pattern learning, crowdbehaviour and activity analysis and anomaly detection incrowds. Survey mentions that due to nature of crowds, con-ventional computer vision and traditional approaches are notenough to tackle problem efficiently and there have to bespecial considerations for modeling such systems. ”Ali [5]stated that the mechanics of crowd analysis are complex,because crowds exhibit both dynamics and psychologicalcharacteristics such that are goal directed” [5]. Another typeof crowd partition is that crowds can be structured or unstruc-tured. Structured crowds share a goal and are coherent, whileunstructured crowds are chaotic in their movements and goals.They split crowd analysis on two approaches: i) continuum-based approach; ii) agent-based approach.Survey [1] provides overview of physics-inspired methods incrowd analysis like fluid dynamics, interaction forces andcomplex motion systems. The focus in on crowd motionanalysis. Survey points out that approaches to crowd analysisare overlapped and states that there are three methods forcrowed analysis: i) controlled experiment; ii) crowd modelingand simulation and iii) crowd video surveillance.With development of deep learning and increase in its use in allareas of computer vision and computer science in general, it isalso used in crowd analysis. Survey by Tripathi et al [6] showssteady increase of deep learning usage in crowd analysis.Despite the data-sets mentioned in the survey they state thatcrowd analysis is not more exposed to deep learning becausethere lacks a good dataset that can learn deep networks.Another division od crowd analysis is based on problems thatare being solved and some of those are:

• Density estimation

Fig. 1. Structure of the crowd scene analysis (Taken from: [4])

• Crowd Counting• Tracking in crowds (Motion Segmentation)• Crowd Behavior Recognition (Abnormal Behavior Detec-

tion)A lot of subjects are intertwined. Results of crowd densityestimation can be used as additional parameter when countingpedestrians in scene. Analyzing crowd behaviour and detectingabnormal crowd behaviour can be based on tracking people inthe scene and analyzing segmented motion. Anomaly detectionand localization are challenging tasks due to the fact thatdefinition of ”anomaly” is subjective, and depends on thecontext it is used in. Generally, events that are ”anomaly” arethose that occur rarely or unexpected [41]. This paper providesoverview of the current state-of-the-art physics-inspired crowdanalysis models.

II. CROWD ANALYSIS APPROACHES

When it comes to crowd analysis from computer visionstandpoint approaches are defined as microscopic(object-level)and macroscopic(holistic), while such approaches are definedas agent-based and continuum-based when looking literatureabout crowd modeling in computer graphics [3], [4]. Addi-tional approach to crowd analysis is mesoscopic level, whichincorporates both microscopic and macroscopic approaches ineffort to tackle their downsides and use their advantages incrowd analysis. In the following subtopics general review ofthe methods will be provided with with more depth analysislater in the paper.

A. Microscopic (object-level; agent-based)Microscopic approach to crowd analysis focuses on object-

level and texture-level analysis. This approach is better suitedfor small density crowds involving 2 to 3 dozen of individualsin the scene where occlusion and other external factors andnatural influence can be mitigated and neglected. One exampleis to used such systems in controlled environments like sub-ways, where illumination is constant. Example of such scenecan be found Figure 2, but as it can be seen, it is not perfectdue to occlusion of interactive sign post. Advantage of thisapproach is ease in tracking individuals in the scene sincethey are detected through every frame.

B. Macroscopic (holistic; continuum-based)As crowd behavior analysis area got interest, people turned

to macroscopic approach to due to its advantages over mi-croscopic approach, and that is that it does not need to detect

Fig. 2. Example of crowd in the subway (Taken from: https://pixabay.com/photos/train-commuter-commerce-people-1285358/)

pedestrians in the scene to analyze it. As mentions before, thatapproach has its upsides and downsides. One of the downsidesis tracking of individuals in the scene. As there are no detectedpedestrians, it is hard to discern among individuals, especiallywhen density increases. Pixels of individuals decrease asdensity increases. Tracking of individuals in dense crowdsbreaks because of occlusions happening due to inter-objectinteractions. Tracking every individual is hard because humansexhibit goal-directed dynamics and psychological characteris-tics [7] which influence how he behaves in the crowd.Methods for crowd analysis that are based on optical flowwere presented to tackle problem that comes from optical flow.Optical flow is method widely used in computer vision, andcomputer sciences in general. It is method that is based oncomputing instantaneous motion between frames, and thereare quite a number of methods that compute it. Drawback ofthe optical flow is that it does not capture long-range temporaldependencies because it is computed between two frames, andtemporal and spatial features that are based on pure opticalflow are not applicable in general applications.Typical representative approach to physics-inspired macro-scopic methods are those concerning fluid analysis. Manymethods for fluid analysis like time-dependent and time-independent dynamical systems are used. Most famous ap-proaches to fluid dynamic analysis are methods concerningpathlines, streamlines and streakflow.Advantage of this methods are that they can be combined withoptical flow in order to model and segment movements inthe scene. Streamlines are instantaneously tangent to everypoints velocity vector in the flow. Pathlines are trajectoriesindividual particles create while moving through the flow.Streakline representation of the flow is superior to other flowmethods because it recognizes spatial-temporal changes in thescene more accurately [8]. Most of these methods are wellknown in fluid dynamics as flow visualization and analysistools. Figure 3 shows difference between streamlines, pathlines

Fig. 3. Example of flow representations; 1. shows streamline that canbe calculated using just optical flow; 2. shows pathlines which followevery particle that moved from that start point; and 3. that shows exampleof streakline; (Taken from: http://www.cs.ucf.edu/∼ramin/projects/streaklineeccv/ramin eccv streakline.html)

and streaklines.If streaklines were to be visualized as a flow analysis tool,

it is a dyed material that is injected in a flow. Traces thatdyed material left in the flow is the streakline. According toMehran: ”streaklines represent the locations of all particles ata given time that passed through a particular point” [8]. Fora steady, unchanging flow, all three representations should bethe same, while they are different for unsteady and changingflow. When analyzing crowd behaviour, we mostly look forchanges in flows and the method that best describes spatio-temporal changes is streakline flow.

Interesting research areas in macroscopic analysis are track-ing in crowded scenes (higher-density crowds) [7], [9] andcrowd behaviour analysis [10]–[13]. [4] flow-based motionrepresentation

C. Mesoscopic

Mesoscopic approach is trying to take advantage of bothmacroscopic and microscopic approaches while negating theirshortcomings. [4] spatio-temporal motion representation

III. MICROSCOPIC

Helbing and Molnar [14] presented a new method formicroscopic crowd modeling (agent-based). Method is knownas Social Force Model(SFM) and it describes movements anddecisions of an agent based on his environment. This workhas been an inspiration for many extensions.Social Force Model is based on interaction forces betweenpedestrian and environment, and interaction of pedestrian withother pedestrians. SFM describes pedestrians intention to moveto target, while adapting to new present situation. Social ForceModel can be described with the following formula:

mid−→vidt

=−→f desiredi +

−→f obsticleiw +

∑j

−→f interactionij (1)

where−→f desiredi is the desired force of the pedestrian and it

marks a velocity with which he wants to move.

−→f desiredi =

vdesiredi ∗ −→e desiredi −−→v iτi

(2)

where vdesiredi is the speed with which pedestrian wants tomove, −→e desiredi is normalized vector pointing in the directionof the target location, −→v i is the current speed of the pedestrianand τi is the relaxation parameter [15].

−→f obsticleiw (−→xi ,−→xw, ri) = [Ai ∗ exp(

ri − diwBiw

)]−→eiw (3)

where −→xi is the position of the pedestrian, −→xw is the position ofthe nearest obstacle and ri is radius of the pedestrian [15]. Aiand Biw are constants, dij = ‖−→xi −−→xw‖ is distance betweenpedestrian and nearest obstacle [15]. −→eiw is a normalizedvector pointing from obstacle to pedestrian, and thus showingdirection of the negative (repulsion) force.

−→fij

interaction is theforce between pedestrians [15].

−→fij

interaction(−→xi ,−→xj ,−→vij , rij) =rij(−→xi ,−→xj ,−→vij) + tij(

−→xi ,−→xj ,−→vij , rij)(4)

where rij(−→xi ,−→xj ,−→vij) marks a repulsive function where a

pedestrian slows down to avoid colliding with another pedes-trian [15].

rij(−→xi ,−→xj ,−→vij) = (5)

−Ai ∗ exp((rij − dij)

Bij− (n ∗Bij ∗ θ)2) ∗

−→tij

where −Ai ∗ exp(...) term calculates the strength of the forceand−→tij marks movement direction [15]. Ai is an individual

interaction strength constant. Bij is the interaction rangeBij = γ‖

−→Tij‖,

−→Tij = λ−→vij + −→eij , λ, γ are constants, n is

an angular interaction range constant, θ is the angle betweeninteraction vector and relative velocity vector (can be between−π - full left or π - full right) [15].

−→tij =

−→Tij

‖−→Tij‖

is normalizedinteraction vector.

tij(−→xi ,−→xj ,−→vij) = (6)

−Ai ∗Kθ ∗ exp((rij − dij)

Bij− (n′ ∗Bij ∗ θ)2) ∗ −→nij

This mathematical term indicates pedestrian making change inthe direction of the movement when near other pedestrians. It

is different from rij(...) as it has turn signal Kθ =θ

|θ|which

indicates direction where pedestrian should move ( -1 - left,1 - right) and direction vector −→nij which is perpendicular tovector

−→tij [15]. Figure 4 illustrates social forces in play.

Rittscher et al. [9] proposed framework for segmentation ofindividuals in the scene. To obtain segmentation, a variant ofEM algorithm is used. Advantage of this approach is that no aprior knowledge about the scene is necessary. Authors showedthat framework was robust to partial occlusion, shadows andclutter.Ge et al. [16] proposed algorithm for small group analysis incrowd. They state that people often walk either in pairs orin groups, and in rare occasions, alone. Proposed algorithmconsisted of detector (head-shoulder or reverse-jump-MCMC)and particle filter tracker for tracking of detected persons. Aftertrajectories, using hierarchical clustering within sliding time

Fig. 4. Illustration of Social Forces. (Taken from: http://futurict.blogspot.com/2014/12/social-forces-revealing-causes-of.html)

window, they identified possible groups.Zhou et al. [17] presented model for crowd behavior analysis.Model is comprised of mixture of dynamic pedestrian-agentswhich learns behaviour patterns. It is agent-based modelingbecause each pedestrian is modeled as independent agent.Whole crowd is then modeled as a mixture of pedestrian-agents. Thanks to modeling of beliefs of pedestrians, it canlearn from fragmented data. This approach models collectivedynamics of a crowd.Sent et al [15] proposed an agent-based model for crowdsimulation. Method is based on fuzzy logic applied to social-force model. Using fuzzy rules, they described mathematicalequations used to model interaction forces in conventionalSFM. Fuzzy sets were calibrated empirically and 3 mathe-matical functions were translated into 6 fuzzy systems. Thedesired force is composed of 2 fuzzy sets (angle, intensity),obstacle force is described with 1 fuzzy set (intensity) andsocial force between pedestrians is composed of 3 fuzzysets (angle, intensity, deceleration). Theirs proposed modelinherited all merits of SFM while reducing complexity withoutadding additional burden to computation cost.Huang et al [18] presented an extension to social force models(SFM) known as social group force model (SGFM). This ex-tension tackles the drawback of SFM, which does not includepsychological component of intra-group behaviours, such as,we are not behaving in the same way when walking withstrangers or with someone we know. Additional componentswere introduced into basic SFM which modeled social groups.Social groups will anticipate collisions with other groups andsmaller groups initiate avoidance.

IV. MACROSCOPIC

Ali et al [7] presented algorithm for individual trackingin high density crowds (structured crowds). They overcameproblems of high density tracking by introducing ”scenestructure based force model”. - Basically using DFF, SFF andBFF, they learn static properties of the scene (”background”- SFF); BFF and DFF are dynamic with values 20 and 5,respectivly, and with this properties they can track particles

moving through the scene. This algorithm was inspired by theevacuation procedures in case of a fire.Cheriyadat et al. [19] presented method (algorithm) for au-tomatically identifying dominant motion patterns in denslycrowded scenes. Authors tracked feature points foundwith Shi-Tomasi-Kanade(STK) detector using Kanade-Lucas-Tomasi(KLT) tracker in preset area of the video sequence. Us-ing longest-common sequences (LCS) they created trajectorieswhich will be grouped based on similar velocity and directionand spatial closeness. The performance of the algorithm wasnot in real-time and has quite a number of parameters, whichthey said would improve in the future.Ali and Shah [11] proposed framework for high density crowdflow segmentation using Lagrangian Particle Dynamics. Draw-back of the framework is that it is not real-time framework.Using optical flow and particle dynamics, Ali segmented videoby their movements through time using finite time lyapunovexponent, They got meaningful results using this method. Itworks by calculating flow fields, then mean field ( averagefrom n frames). Particle advection happens under influence ofmean fields. Flow maps and FTLE Fields are then calculatedand lastly, FTLE is segmented.Andrade et al. [10] presented automatic technique for abnor-mal crowd detection. Using dense optical flow and backgroundsubtraction they only consider foreground objects, thus reduc-ing noise. Features they used in HMM is based on principalcomponent analysis (PCA). In this paper, presented methodwas showed to effectively detect simulated scenarios.Benabbas et al [20] proposed a global motion model based onoptical flow. Models in algorithm use mixture distributionsestimated with online approach. Using 2 classifiers allowfor overlap in events (walking-running/merge-split-...). Theyshowed that their approach was applicable, but lacking inparameter refinement (for finer/coarser analysis), and that itwas not in real time (4 fps).Direkoglu et al. [21] presented framework for abnormalitydetection based on novel features. Novel features includeangular difference between consecutive dense optical flows.Abnormal situation is where each individual in the group runsaround. Brox et al. [22] presented method for optical flowcalculation with high accuracy. Method is based on warpingtheory where they show its success and urge necessity of itsuse in optical flow calculation. They stated that although focuswas on accuracy, because of implicit minimization it can befast as well.Hospedales et al. [23] presented method for identifying andlocalizing rare and subtle behaviours. Their model, weakly-supervised joint topic modeling (WS-JTM) shows capabil-ity of one-shot learning of events that are rare and subtleamong events that occur regularly. Another advantage of theframework is that it is capable of online classification andlocalization in an video.Hu et al. [24] divised method for discovering motion patternsin crowded scenes. Method uses optical flow map created fromwhole video to create motion patterns. Hierarchical clusteringand neighborhood graphs are used to segment motion patterns

from global optical flow field. Obvious drawback is that thismethod is not online.Jodoin et al [25] created a novel method for extractingdominant motion patterns and main entry/exit areas. Theyused motion histogram of every pixel based on optical flowcalculated on every frame of the video. ”Meta-tracking” is theninitiated to find meta-tracks of particles initiated randomly onframe. With hierarchical clustering tracks are obtained, andfinally motion patterns. According to them their method worksbetter then some object tracker based methods, but again, it isnot online.Mehran et al. [26] introduced a method for detection ofabnormal behaviours in crowd scenes. using optical flow withthe holistic approach, particle advection was used to modelsocial force between inserted particles and their interactionwas then classified as normal behaviour or as abnormalbehaviour. Results they showed, indicated they method candetect and localize abnormal behaviour in the crowded scenes.Mehran et al. [8] proposed a streakline representation of flowin crowded scenes. Proposition was made in order to dealwith problems of approaches used more commonly whenworking with optical flow such as streamlines and pathlines.Streamlines have gaps in the flow because they are createdfrom instantaneous velocity vectors. Pathlines do not have thisproblem, but they do not allow for detection of local spatialchanges, and suffer from time lag. They showed advantagesof using streaklines in segmentation and abnormal behaviordetection.In previous chapter difference between pathlines, streamlinesand streaklines was explained. Streaklines are calculated byadvecting particles through the flow

−→xpi (t+ 1) =

−→xpi (t) +

−→v (−→xpi (t), t)

where−→xpi = (xi, yi) marks particle vector with x and y

positions and−→vpi = (ui, vi) which mark velocity field for each

of the axis, and since velocity is the same size as original frame(dense optical flow), locations of the velocities are markingwhere the location in the next moment t+1 will be. Thiswill yield a family of curves, all starting at a point p [8].As unsteady flow can be represented with multiple pathlines(one pathline for each particle), or with a single streaklinecontaining the same number of particles. For example, if wehave streakline with T particles, then we have T pathlines withT ,T − 1,T − 2...,1 paths T ∗ (T − 1)/2, thus, we are gainingmemory efficiency [8]. Streaklines also have downsides, if wetrack them for too long, shape they adopt can be inconsistentwith true flow.

Moore et al [27] describes hydrodynamic approaches tovideo crowd surveillance on three different scales (microscopic- one particle, mesoscopic - multiple particles in group,macroscopic - all particles). Advantage of using social forcemodel on particles was displayed.Raghavendra et al. [28] introduced a new scheme for detectingand localizing the abnormal crowd behavior. Method that wasproposed used Social Force Model (SFM) as a feature to

analyze. This was optimized by Particle Swarm Optimization(PSO) to advect randomly initialized particles. Particles thatdid not fit estimated distribution were considered outliers andabnormality. Advantage of this method is that it does notrequire learning phase.Solmaz et al. [29] proposed a method for identifying fivedifferent crowd behaviors (bottlenecks, fountainheads, lanes,arches, and blocking). Method consists of optical flow defineddynamical system, which is used to create trajectories withparticles inserted in scene. These trajectories are then analyzedusing linear algebra to identify ROIs and their properties.Advantage of this method is that it does not require learning.Drawback is that it cannot identify overlapping behaviors andthe system is deterministic and cannot capture randomness inthe behavior.Treuille et al. [13] presented a method for crowd simulationbased on continuum dynamics in real-time. They showedadvantage of simulation as continuum instead of agent-basedsimulation, although they said it is not a perfect methodfor all crowd behavior (tightly-packed people). Method wasformulated based on goal, speed and discomfort level (peoplewant to walk on walkway, not road).Wang et al. [30] presented an unsupervised method for crowdanalysis and abnormal crowd behavior detection. Methodis based on low-level features such as optical flow whichare then further processed. Further processing is based onLatent Dirichlet Allocation (LDA) and hierarchical Dirichletprocess (HDP). Low-level features are clustered into simple”atomic” activities, which are in turn, clustered into multi-agent interactions. Advantage is given to HDP because it canautomatically discover a correct number of activities.Wang et al. [31] presented an improved version of theMehran(M.Shah) flow segmentation method. Improvementwas based on variational model of computing optical flow,instead of Lucas-Kanade method, and formulation of thestreakflow similarity was improved. Results showed that thismethod was less subjective to interference.Wu et al. [32] presented method for anomaly detection andlocalization and crowd flow modelling for both coherent andincoherent scenes. Framework consists of flow modelling us-ing particle advection based on optical flow. Chaotic dynamicsanalysis on trajectories generated by particle advection is usedto characterize crowd motions. Probabilistic framework usingGaussian Mixture Models is used for anomaly detection andlocalization.Wu et al. [33] presented a crowd descriptor which was inspiredby curl and divergence and their interaction. Both curl anddivergence are computed from the normalized optical flow ofthe video sequences. Descriptor vector is obtained by outerproduct of the local Curl and Divergence descriptors. Withthese procedure, number of descriptors can get high veryquick, so a fisher pooling with PCA is used to get an optimalsize. Authors showed the effectiveness of the proposed methodover some older ones.Yang et al. [34] proposed an unsupervised approach for motionsegmentation (modeling). Using low-level optical flow, which

is quantized into words, which are then screened for usefulones (that contain information) for pattern detection. Motionpatterns are detected at different scales using diffusion maps.Clustering is used for dominant motion detection.Yuan et al. [35] presented a novel statistical method for abnor-mal crowd behavior detection. Method is based on statisticalmodelling of optical flow and histogram of gradients (HOG).From training set, normal motion patterns are learned, whileabnormal motion patterns are inferred online from test sets.It is noted that test frames composed of both normal andabnormal motion patterns. Noise in the scenes are modeledusing Mixture of Gaussians (MoG). Normal events in thetesting scenes are used to refine learned model to better infernormal from abnormal.Zhang et al. [36] presented a method for crowd motionsegmentation. Method is based on computation of streaklines.Streaklines are computed using optical flow and trajectoriesadvected with streaklines are grouped using novel similaritycalculation method. Drawback is that when speed in the sceneis slow, groups cannot be segmented properly, and efficiencyof the method depends on the learning set (if the density of thecrowd in the learning scene is the same as the testing scene,it will perform well).Zhang et al. [37] proposed a abnormal crowd behavior detectorbased on energy levels. They first calculated dense opticalflow and quality of every particle based on their distancefrom the camera (so that algorithm is perspective insensitive).Entropy of the image is calculated and thresholded with Otsu’smethod to gain foreground. Kinetic energy is calculated basedon particle quality and velocity. Energy-Level is described byco-occurrence matrix. From this matrix 3 different descriptorsare calculated, and for behavior to be abnormal, all descriptorsmust be abnormal for more then 10 frames. Advantage of themethod is that it is insensitive to changes in point of view.In paper by [38] spatio-temporal convolution neural network(STCNN) model was presented. Idea of the model was totake raw pixels as input and then classify abnormal behaviourfirstly on frame-level as temporal dimension, and then onpatch(pixel)-level as spatial dimension. In order to increaseprecision and decrease computational cost, they preformedpreprocessing using low-level feature optical flow. If the patchcontained less then certain amount of movement, it wasmarked as noise and rejected. Figure 5 displays such approach.To tackle the problem of temporal dimension, some convolu-tion layers contain additional dimension in filters. Figure 8shows proposed structure of the STCNN with 4 convolutionallayers (first, third, fourth and sixth), 2 subsampling layers(second and fifth), and 2 last fully connected layers. Theyreported results better then state-of-the-art algorithms, but saythat the drawback of their model is that their model cannotdetect ”abnormalities” not seen in the training dataset, or ifthe ”abnormality” is obscured/occluded by the surroundingpedestrians or objects.Another deep learning approach was by Zhuang [39]. Theyproposed model that can learn long-term events. Model con-sists of two parts, in first part, they adopted VGG-16 neural

Fig. 5. Proposed approach for anomaly detection using STCNN. (Taken from:https://www.sciencedirect.com/science/article/abs/pii/S0923596516300935)

network for computing 4096 dimensional vector that was theninput for differential long short-term memory. The second partconsists of deviation of long short-term memory (LSTM) mod-ule which can better express spatio-temporal structures thenLSTM. Three modules were stacked because first DLSTMhas only spatial information and deeper layers learn temporalinformation and model complex spatio-temporal structures.Comparison was made between using only second part of theDRCNN and using whole DRCNN, and it was showed thatwhole model worked better as it was end-to-end architectureand required no trajectory detection or any kind of additionalinput beside raw image. Figure 7 shows structure of DRCNN.Authors concluded that, when no human supervision in in-cluded in the process of state-of-the-art methods, DRCNNoutperforms those methods.Framework for learning deep event models for crowd anomalydetection was presented in paper [40]. It is based on unsu-pervised deep neural networks. Features for neural networksare extracted automatically using PCANet from 3D gradientswhich contain both motion and appearance. With these fea-tures, deep GMM model is learned. With experiments theypresented, improvement was shown when compared to state-of-the-art traditional methods that used hand-crafted features.Advantage of this approach was that GMM was capable oflearning distribution of the normal events, so when abnormalevents happened, it could successfully detect them withoutseeing them before.Another example of the macroscopic deep learning approachwas presented in [41] by Sabokrou et al. In their work,they used fully convolutional neural network for fast anomalydetection in crowded scenes. They argue that use of hand-crafted features is a poor idea due to the fact that they cannotrepresent normal scenarios efficiently and CNNs proved effec-tive in various data analysis. Also, there are numerous waysto describe region properties, while trajectory-based methodshave been used to define behaviours of objects. These methods,while modeling spatio-temporal properties such as histogramof gradients (HoG) and histogram of optical flow (HoF) have2 disadvantages. They cannot handle occlusion and have highcomputational complexity [41]. In their results on Subwayand UCSD datasets, they showed that their approach, whileworking in near real-time, was superior to state-of-the-artmethods.

V. MESOSCOPIC

Kratz and Nishino [12] presented a novel framework basedon modeling of the local spatio-temporal motion patterns.Their approach was to model spatio-temporal changes via

Fig. 6. Structure of the STCNN. (Taken from: https://www.sciencedirect.com/science/article/abs/pii/S0923596516300935)

Fig. 7. Structure of the DRCNN. (Taken from: https://ieeexplore.ieee.org/document/7961786)

cuboids which would then train HMMs to detect unusual mo-tion patterns. Temporal relationships for local spatio-temporalmotion patterns is captured by distribution-based HMM, whilespatial relationships are captured by a coupled HMM.Kratz and Nishino [42], inspired by their previous works, haveproposed method for tracking pedestrians. Pedestrians weretracked using particle filter, where they used a prior knowledgefrom modeled motion patterns. Motion patterns were modeledfrom learning frames that were of the same scene, but previoustime stamps. Drawback of the system is that crowd in the videohas to be structured.Another approach to mesoscopic level representation was donein [43], [44]. Analysis started at microscopic level with agent-based approach in detecting pedestrians, then using thosebounding boxes for further analytics. Macroscopic approachis used when calculating dense optical flow in the boundingbox of the detection and thus analyzing where every pixelwas moving. To predict movement in the next frame, pointdensity function and dominant motion vectors are calculated.This features are used when tracking pedestrians. Generatedtrajectories are then used as inputs into the fuzzy systems forabnormal behaviour detection. The drawback of this approachis that it is only frame level analysis (only detecting time whensuch situation is happening), and it is not in real-time.

VI. CONCLUSION

In the previous chapters of this work, overview of somemethods were displayed. Core physics-based methods weredisplayed in detail, like Social Force Model (SFM) and Streak-line respresentation, while other methods which improvedupon them were pointed out. In mesoscopic representation,advantage of using microscopic methods on macroscopic scalewas presented, and its possibilities of commercial usage.In light of all of this methods, there is still a room forimprovement, especially in including deep learning methodsfor feature extraction, and modeling fuzzy logic to be usedin inference engines. Figure 8 shows example of how thestructure for such system could look like.

Fig. 8. Example of the proposed structure.

Most of the presented methods are tackling problems of spatio-temporal analysis using traditional methods. While there arepresented deep learning approaches that try to handle com-plexity of spatio-temporal relationships, there aren’t too manyapproaches that include fuzzy logic [45], [46]. It could be anapproach that effectively tackles complexity ofcrowds by usingintegrated fuzzy spatio-temporal knowledge and reasoning.Although this structure look generic, and it leaves room tointervene in any of the presented modules, focus of theresearch would be in creating an effective fuzzy system thatcould describe and model crowd behaviors. Additional valuewould be if it could detect and recognize(localize) multipleevents happening simultaneously.

ACKNOWLEDGMENT

This work has been supported by the Croatian ScienceFoundation under project IP-2018-01-7619 Knowledge-basedApproach to Crowd Analysis in Video Surveillance (KA-CAVIS).

REFERENCES

[1] Xuguang Zhang, Qinan Yu, and Hui Yu. Physics inspired methods forcrowd video surveillance and analysis: a survey. IEEE Access, 6:66816–66830, 2018.

[2] Juliane Adrian, Nikolai Bode, Mira Beermann Maik Boltes Mar-tyn Amos, Mitra Baratchi, John Drury Zhijian Fu Roland GeraertsAlessandro Corbetta, Guillaume Dezecache, Aoife Hunt Tinus KantersAngelika Kneidl Krisztina Konya Steve Gwynne, Gesine Hofinger,Georgios Michalareas Fergus Neville Evangelos Ntontis Gerta Kster,Mira Kpper, Andreas Schadschneider Armin Seyfried Alastair ShipmanStephen Reicher, Enrico Ronchi, Gavin Brent Sullivan Anne Temple-ton Federico Toschi Anna Sieben, Michael Spearpoint, Iker ZuriguelZeynep Ycel, Francesco Zanlungo, and Cornelia von Krchten Nanda Wi-jermans Natalie van der Wal, Frank van Schadewijk. A glossary forresearch on human crowd dynamics. Collective Dynamics, 4:1–13, 2019.

[3] Julio Cezar Silveira Jacques Junior, Soraia Raupp Musse, and Clau-dio Rosito Jung. Crowd analysis using computer vision techniques.IEEE Signal Processing Magazine, 27(5):66–77, 2010.

[4] Teng Li, Huan Chang, Meng Wang, Bingbing Ni, Richang Hong, andShuicheng Yan. Crowded scene analysis: A survey. IEEE transactionson circuits and systems for video technology, 25(3):367–386, 2014.

[5] Saad Ali. Taming crowded visual scenes.[6] Gaurav Tripathi, Kuldeep Singh, and Dinesh Kumar Vishwakarma.

Convolutional neural networks for crowd behaviour analysis: a survey.The Visual Computer, 35(5):753–776, May 2019.

[7] Saad Ali and Mubarak Shah. Floor fields for tracking in high densitycrowd scenes. In David Forsyth, Philip Torr, and Andrew Zisserman,editors, Computer Vision – ECCV 2008, pages 1–14, Berlin, Heidelberg,2008. Springer Berlin Heidelberg.

[8] Ramin Mehran, Brian E. Moore, and Mubarak Shah. A streaklinerepresentation of flow in crowded scenes. In Kostas Daniilidis, PetrosMaragos, and Nikos Paragios, editors, Computer Vision – ECCV 2010,pages 439–452, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[9] Jens Rittscher, P.H. Tu, and N. Krahnstoever. Simultaneous estimationof segmentation and shape. volume 2, pages 486– 493 vol. 2, 07 2005.

[10] E. L. Andrade, S. Blunsden, and R. B. Fisher. Modelling crowdscenes for event detection. In 18th International Conference on PatternRecognition (ICPR’06), volume 1, pages 175–178, Aug 2006.

[11] S. Ali and M. Shah. A lagrangian particle dynamics approach for crowdflow segmentation and stability analysis. In 2007 IEEE Conference onComputer Vision and Pattern Recognition, pages 1–6, June 2007.

[12] L. Kratz and K. Nishino. Anomaly detection in extremely crowdedscenes using spatio-temporal motion pattern models. In 2009 IEEEConference on Computer Vision and Pattern Recognition, pages 1446–1453, June 2009.

[13] Adrien Treuille, Seth Cooper, and Zoran Popovic. Continuum crowds.ACM Trans. Graph., 25(3):1160–1168, July 2006.

[14] Dirk Helbing and Peter Molnar. Social force model for pedestriandynamics. Physical Review E, 51, 05 1998.

[15] A. D. Sent, M. Roisenberg, and P. J. de Freitas Filho. Simulationof crowd behavior using fuzzy social force model. In 2015 WinterSimulation Conference (WSC), pages 3901–3912, Dec 2015.

[16] W. Ge, R. T. Collins, and R. B. Ruback. Vision-based analysis of smallgroups in pedestrian crowds. IEEE Transactions on Pattern Analysisand Machine Intelligence, 34(5):1003–1016, May 2012.

[17] B. Zhou, X. Wang, and X. Tang. Understanding collective crowdbehaviors: Learning a mixture model of dynamic pedestrian-agents. In2012 IEEE Conference on Computer Vision and Pattern Recognition,pages 2871–2878, June 2012.

[18] Lin Huang, Jianhua Gong, Wenhang Li, Tao Xu, Shen Shen, JianmingLiang, Quanlong Feng, Dong Zhang, and Jun Sun. Social force model-based group behavior simulation in virtual geographic environments.ISPRS International Journal of Geo-Information, 7(2), 2018.

[19] A. M. Cheriyadat and R. J. Radke. Detecting dominant motions indense crowds. IEEE Journal of Selected Topics in Signal Processing,2(4):568–581, Aug 2008.

[20] Yassine Benabbas, Nacim Ihaddadene, and Chaabane Djeraba. Motionpattern extraction and event detection for automatic visual surveillance.EURASIP Journal on Image and Video Processing, 2011(1):163682, Dec2010.

[21] C. Direkoglu, M. Sah, and N. E. O’Connor. Abnormal crowd behaviordetection using novel optical flow-based features. In 2017 14th IEEEInternational Conference on Advanced Video and Signal Based Surveil-lance (AVSS), pages 1–6, Aug 2017.

[22] Thomas Brox, Andres Bruhn, Nils Papenberg, and Joachim Weickert.High accuracy optical flow estimation based on a theory for warping.In Tomas Pajdla and Jirı Matas, editors, Computer Vision - ECCV 2004,pages 25–36, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.

[23] Timothy Hospedales, Jian Li, Shaogang Gong, and Tao Xiang. Identi-fying rare and subtle behaviors: A weakly supervised joint topic model.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33,04 2011.

[24] M. Hu, S. Ali, and M. Shah. Learning motion patterns in crowdedscenes using motion flow field. In 2008 19th International Conferenceon Pattern Recognition, pages 1–5, Dec 2008.

[25] Pierre-Marc Jodoin, Yannick Benezeth, and Yi Wang. Meta-tracking forvideo scene understanding. pages 1–6, 08 2013.

[26] R. Mehran, A. Oyama, and M. Shah. Abnormal crowd behaviordetection using social force model. In 2009 IEEE Conference onComputer Vision and Pattern Recognition, pages 935–942, June 2009.

[27] Brian Moore, Saad Ali, Ramin Mehran, and Mubarak Shah. Visualcrowd surveillance through a hydrodynamics lens. Commun. ACM,54:64–73, 12 2011.

[28] R. Raghavendra, Alessio Del Bue, Marco Cristani, and Vittorio Murino.Abnormal crowd behavior detection by social force optimization. pages134–145, 11 2011.

[29] B. Solmaz, B. E. Moore, and M. Shah. Identifying behaviors in crowdscenes using stability analysis for dynamical systems. IEEE Transactions

on Pattern Analysis and Machine Intelligence, 34(10):2064–2070, Oct2012.

[30] X. Wang, X. Ma, and E. Grimson. Unsupervised activity perception byhierarchical bayesian models. In 2007 IEEE Conference on ComputerVision and Pattern Recognition, pages 1–8, June 2007.

[31] Xiaofei Wang, Xiaomin Yang, Xiaohai He, Qizhi Teng, and MingliangGao. A high accuracy flow segmentation method in crowded scenesbased on streakline. Optik, 125(3):924 – 929, 2014.

[32] S. Wu, B. E. Moore, and M. Shah. Chaotic invariants of lagrangianparticle trajectories for anomaly detection in crowded scenes. In 2010IEEE Computer Society Conference on Computer Vision and PatternRecognition, pages 2054–2060, June 2010.

[33] Shuang Wu, Hang Su, Hua Yang, Shibao Zheng, Yawen Fan, and QinZhou. Bilinear dynamics for crowd video analysis. Journal of VisualCommunication and Image Representation, 48:461 – 470, 2017.

[34] Yang Yang, Jingen Liu, and Mubarak Shah. Video scene understandingusing multi-scale analysis. pages 1669–1676, 09 2009.

[35] Y. Yuan, Y. Feng, and X. Lu. Statistical hypothesis detector for abnormalevent detection in crowded scenes. IEEE Transactions on Cybernetics,47(11):3597–3608, Nov 2017.

[36] Dongping Zhang, Jiao Xu, Min Sun, and Zhiyu Xiang. High-densitycrowd behaviors segmentation based on dynamical systems. MultimediaSyst., 23(5):599–606, October 2017.

[37] Xuguang Zhang, Qian Zhang, Shuo Hu, Chunsheng Guo, and HuiYu. Energy level-based abnormal crowd behavior detection. Sensors,18(2):423, Feb 2018.

[38] Shifu Zhou, Wei Shen, Dan Zeng, Mei Fang, Yuanwang Wei, andZhijiang Zhang. Spatial-temporal convolutional neural networks foranomaly detection and localization in crowded scenes. Signal Process-ing: Image Communication, 47, 07 2016.

[39] N. Zhuang, T. Yusufu, J. Ye, and K. A. Hua. Group activity recognitionwith differential recurrent convolutional neural networks. In 2017 12thIEEE International Conference on Automatic Face Gesture Recognition(FG 2017), pages 526–531, May 2017.

[40] Yachuang Feng, Yuan Yuan, and Xiaoqiang Lu. Learning deep eventmodels for crowd anomaly detection. Neurocomputing, 219, 09 2016.

[41] Mohammad Sabokrou, Mohsen Fayyaz, Mahmood Fathy, ZahraMoayedd, and Reinhard klette. Deep-anomaly: Fully convolutionalneural network for fast anomaly detection in crowded scenes, 2016.

[42] L. Kratz and K. Nishino. Tracking pedestrians using local spatio-temporal motion patterns in extremely crowded scenes. IEEE Trans-actions on Pattern Analysis and Machine Intelligence, 34(5):987–1002,May 2012.

[43] Darijan Marcetic and Slobodan Ribaric. A fuzzy logic-based approachto detection of abnormal crowd behaviour. Proceedings ELMAR-2019,pages 143–146.

[44] Franjo Matkovic, Darijan Marcetic, and Slobodan Ribaric. Abnormalcrowd behaviour recognition in surveillance videos. 2019 15th Inter-national Conference on Signal-Image Technology and Internet-BasedSystems, pages 428–435.

[45] Slobodan Ribaric and Tomislav Hrkac. A model of fuzzy spatio-temporalknowledge representation and reasoning based on high-level petri nets.Information Systems, 37(3):238–256, 2012.

[46] Haitao Cheng, Li Yan, Zongmin Ma, and Slobodan Ribaric. Fuzzyspatio-temporal ontologies and formal construction based on fuzzy petrinets. Computational Intelligence, 35(1):204–239, 2019.