16
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=tadr20 Advanced Robotics ISSN: 0169-1864 (Print) 1568-5535 (Online) Journal homepage: https://www.tandfonline.com/loi/tadr20 Visual detection and tracking with UAVs, following a mobile object Diego A. Mercado-Ravell, Pedro Castillo & Rogelio Lozano To cite this article: Diego A. Mercado-Ravell, Pedro Castillo & Rogelio Lozano (2019) Visual detection and tracking with UAVs, following a mobile object, Advanced Robotics, 33:7-8, 388-402, DOI: 10.1080/01691864.2019.1596834 To link to this article: https://doi.org/10.1080/01691864.2019.1596834 Published online: 27 Mar 2019. Submit your article to this journal Article views: 40 View Crossmark data

Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

  • Upload
    others

  • View
    14

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=tadr20

Advanced Robotics

ISSN: 0169-1864 (Print) 1568-5535 (Online) Journal homepage: https://www.tandfonline.com/loi/tadr20

Visual detection and tracking with UAVs, followinga mobile object

Diego A. Mercado-Ravell, Pedro Castillo & Rogelio Lozano

To cite this article: Diego A. Mercado-Ravell, Pedro Castillo & Rogelio Lozano (2019) Visualdetection and tracking with UAVs, following a mobile object, Advanced Robotics, 33:7-8, 388-402,DOI: 10.1080/01691864.2019.1596834

To link to this article: https://doi.org/10.1080/01691864.2019.1596834

Published online: 27 Mar 2019.

Submit your article to this journal

Article views: 40

View Crossmark data

Page 2: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS2019, VOL. 33, NOS. 7–8, 388–402https://doi.org/10.1080/01691864.2019.1596834

FULL PAPER

Visual detection and tracking with UAVs, following a mobile object

Diego A. Mercado-Ravell a, Pedro Castillob and Rogelio Lozanoc

aCátedra CONACyT, Centro de Investigación en Matemáticas CIMAT, Zacatecas, Mexico; bSorbonne universitès, Université de technologie deCompiègne, CNRS, Heudiasyc UMR 7253, CS 60 319, Compiègne cedex, France; cLaboratoire Franco Mexicain d’Informatique et Automatique,Unite Mixte Internationale LAFMIA UMI 3175 CINVESTAV-CNRS, Mexico City, Mexico

ABSTRACTConception and development of an Unmanned Aerial Vehicle (UAV) capable of detecting, track-ing and following a moving object with unknown dynamics is presented in this work, consideringa human face as a case of study. Object detection is accomplished by a Haar cascade classifier.Once an object is detected, it is tracked with the help of a Kalman Filter (KF), and an estimation ofthe relative position with respect to the target is obtained. A linear controller is used to validatethe proposed vision scheme and for regulating the aerial robot’s position in order to keep a con-stant distance with respect to the mobile target, employing as well the extra available informationfrom the embedded sensors. The proposed systemwas extensively tested in real-time experiments,through different conditions, using a commercial quadcopter connected via wireless to a groundstation running under the Robot Operative System (ROS). The proposed overall strategy shows agood performance even under disadvantageous conditions as outdoor flight, being robust againstillumination changes, image noise and the presence of other people in the scene.

ARTICLE HISTORYReceived 9 October 2018Revised 14 January 2019Accepted 6 March 2019

KEYWORDSAerial vehicles; objectdetection and tracking;vision-based control;human–robot interaction

1. Introduction

The astonishing fast developments on the fields of com-puter science, electronics, mechanics, and more partic-ularly on robotics, over the last recent years, allow usto believe that the science fiction vision of a world sur-rounded by all kind of robots interacting with humansin all sort of tasks is no longer a fairy dream, but a realpossibility in the near future.

Until now, this high risk of injure humans hasprevented the use of robots for several applications,constraining them to industrial tasks where the environ-ment is controlled and security can be maximized, orin dangerous scenarios where humans can hardly oper-ate. Otherwise, small size robots that are harmless forhumans have to be employed, mainly for entertainmentand research. With respect to aerial robots, better knownas Unmanned Aerial Vehicles (UAVs), some interestingapplications have popped out from new emerged tech-nologies [1], for example, by using the Myo braceletproduced by Thalmic Labs we are able to control UAVsjust with the movements of the arm [2]. This is adevise equipped with Electromyography (EMG) sensorsalong with a 9-axis Inertial Measurement Unit (IMU)to observe activity from your muscles to detect handgestures and orientation. Another interesting tool is the

CONTACT Diego A. Mercado-Ravell [email protected]; [email protected]

Microsoft Kinect sensor. It has been conceived to rec-ognize gestures and body postures to friendly controlthe navigation of UAVs through a Natural User Interface(NUI) [3, 4]. Additionally, haptic devices offer a differentway of interaction with human operators by providingforce feedback, allowing to improve the piloting expe-rience and assist the user in complicated tasks such assimultaneously controlling multiple UAVs [5]. A studyon the integration of speech, touch and gesture for thecontrol of UAVs from a ground control station is alsopresented in [6].

The use of computer vision for human–drones inter-action appears as a powerful alternative, and some exam-ples can be found in the literature, as in [7] where a studyon how to naturally interact with drones using hand ges-tures is presented. More examples can be found in [8],where a survey on computer vision forUAVs is presented.Target tracking using monocular vision for aerial vehi-cles has also been studied as in [9] where a nonlinearadaptive observer and a guidance law for object track-ing are proposed and validated in simulations. Anotherkey related problem is the detection of people in theimagery provided by drones. This is of particular inter-est in search and rescue missions using UAVs. Forinstance, Blondel et al. [10] propose a detection algorithm

© 2019 Informa UK Limited, trading as Taylor & Francis Group and The Robotics Society of Japan

Page 3: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 389

using information from visual and infrared spectrumcameras.

In the present work, it is intended to offer a solutionto the problem of an UAV capable of following a mov-ing target, in this case a human face, using a camera asmain sensor. As a first approximation to the problem,the simplest solution was preferred over the sophisticatedone, as long as it was effective in accomplishing its objec-tive and always keeping in mind the users security as thedesign priority. To do so, a wide variety of problems, allof great interest in the robotics field, were successfullysolved and the resulting system can be safely used. Theygo from perception using monocular computer vision, toautomatic control and state estimation, passing throughsignal filtering.

Detection and tracking of moving objects is envi-sioned for several interesting applications, including fol-lowing a suspect or a fugitive in a persecution, or videorecording people in hazardous situations, like while prac-ticing extreme sports. This idea was already conceivedby the commercial quadcopter LILY [11], but in thatcase the system depends on an external tracking deviceto localize the human user, while our proposed algorithmdepends only on an already integrated frontal camera.Moreover, following a person can be used to interact withthe human user, for example, trying to hide and run awayfrom the drone, or even to play with it some populargames like tag or the hot-potato. Furthermore, the classi-fier can be trained to detect any other mostly rigid objectto be followed by the drone.

Recent efforts towards people following using com-puter vision can be found in the literature, as is the earlycase of the joggobot [12], an UAV designed to assista person while jogging, improving the jogging experi-ence. This was accomplished by using special markerson the jogger’s T-shirts. Later on, in [13] the personfollowing problem is tackled using a depth camera, sta-bilizing the depth image by warping it to a virtual-staticcamera and feeding the stabilized video to the OpenNItracker, but a second camera looking upwards is neededto detect special markers and determine the absoluteposition of the UAV. Simple commands are providedthrough hand gestures as well. Danelljan et al. [14] pro-pose a detection algorithm combining color and shapeinformation. Detected objects are then used to initial-ize an Adaptive Color Tracker (ACT) to keep track ofmultiple objects. The tracking results are verified usingthe detector and filtered using a Probability HypothesisDensity (PHD) filter. Distance to the target is estimatedassuming a horizontal ground plane and a fixed per-son height. Position control is performed by means ofa Proportional controller. In [15], a general object fol-lowing strategy using a commercial AR Drone 2.0 and

OpenTLD tracker is presented. An Image Based VisualServoing (IBVS) is then applied to follow the target. Thisapproach was tested to follow people using the logos onthe target cloths. The advantage relies on the capabilityto follow different objects, but a human operator mustinitialize the object to be tracked, and this architectureis not able to estimate the depth to the target. Meanwhile,in [16] a study on multiple object trackers is presentedto detect and follow a person using the AR Drone 2.0. Inparticular, they extended the Discriminative Scale SpaceTracker (DSST) and the Kernelized Correlation Filter(KFC) based tracker in order to detect target lost andredetect targets. They employ the same IBVS by [15].More recently, in [17] an end-to-end human–robot inter-actionwith an uninstrumented human is presented. First,the drone, a commercial Parrot Bebop, detects from faraway potential humans waving hands to interact with.Then, it approaches the selected target and obeys sim-ple hand-gesture commands such as taking pictures andlanding. Hand gestures are detected using optical flow.During the approaching phase, the same long-term visualtracker from [16] is used. Depth estimation is performedsimilar to [14]. IBVS is used along with a predictive con-troller to position the drone, but the lateral movementsare not controlled. Finally, Yao et al. [18] present a visual-based human following UAV, but using a blimp instead ofa multirotor.

In contrast to other works, as depicted in Table 1,our approach does not require special markers or logoson the human cloths [12, 13, 15]. Also, a simpledepth estimation is provided removing the horizon-tal ground plane assumption [14, 17]. Finally, unlike[15–17] where IBVS is used, we propose a relative posi-tion controller after estimation of the relative positionbetween the drone and the human target in the worldframe, where the drone is commanded as an omnidi-rectional robot, controlling lateral movements instead ofyaw. The proposed architecture has proved to be effec-tive in real-time experiments, while using computer effi-cient algorithms well suited for embedded applicationson UAVs.

The contribution of this work can be summarized inthe following points:

• Several powerful tools and techniques are mergedtogether to offer a full-working solution for a mobile-target tracking drone.

• A complete computer vision algorithm is developedto detect and track a face in 3D, despite the presenceof false positives or other faces on the image. Thisincludes estimating the distance from the image planeto the face, using the size of the detected face on theimage.

Page 4: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

390 D. A. MERCADO-RAVELL ET AL.

Table 1. Qualitative comparison with related works.

Property/work Target type Special markers Tracking method Reach Depth estimation Prototype

[12] T-shirt � Unknown Following in straight lines X AR.Drone[13] Full-body � OpenNI Following+ hand gestures Depth camera AscTec Pelican[14] People X ACT+ PHD Following Horizontal plane assumption LinkQuad[15] Manually selected T-shirt logo OpenTLD Following X AR.Drone[16] Person X DSST & KFC Following (IBVS [15]) X AR.Drone[17] People X [16] Hand gestures commands [14] BebopThis Face X Haar+ KF Following � AR.Drone

• A real-time relative position control algorithm is pro-posed for an UAV following a moving object withunknown dynamics, including the observation of themissing state.

The outline of this paper is the following: the generaloperation of the overall system is described in Section 2.In Section 3, the computer vision algorithm for facedetection and tracking, as well as the necessary transfor-mation from the image to the real world are presented.The relative position control is introduced in Section 4,along with some techniques for estimating the altitudespeed, needed for the state feedback. Section 5 containsthe experimental results of the proposed algorithms, andlast but not least, Section 6 concludes with the perspec-tives and conclusions of this work.

2. Overall system description

A commercial quadcopter, the AR.Drone Parrot, shownin Figure 1 is selected for this work. This quadcopteroffers a good solution to work close to people with-out major danger, thanks to its small size and weight of53 × 52 cm and 0.42 kg, respectively, and overall becauseit is protected with a soft hull which also increases itsrobustness against crashes. It is equipped with three-axisgyroscopes and accelerometers, an ultrasound altimeter,an air pressure sensor and a magnetic compass. Fur-thermore, it contains two video cameras, one lookingdownwards and one forward, the former, with a reso-lution of 320 × 240 pixels at a rate of 60 fps, is used toestimate the horizontal velocities using optic flow, whilethe latter, with a resolution of 1280 × 720 at 30 fps, is nor-mally intended for real-time video streaming and image

Figure 1. Detection and tracking of a mobile object using an UAV, application to follow a human user. The UAV is able to autonomouslyfollow the target of interest, keeping a constant distance with it.

Page 5: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 391

recording, but in this case it is further used for the visionalgorithm to detect a moving object of interest.

The drawback of this kind of commercial UAVsresides in its lack of flexibility to be modified, since bothsoftware and hardware come in closed architectures andit is not straightforward to customize them. This is solvedby designing position control laws that employ referenceroll and pitch angles alongwith altitude and yaw speeds asvirtual control inputs (φd, θd, zd, ψd). Such virtual con-trol signals are then fed to the inner autopilot as desiredreferences. In this case, a linear PD controller is proposed.All sensor measurements are sent to a ground station at afrequency of 200Hz. The image processing and the con-trol algorithms are computed in real time onROS at a rateof 30Hz.

Three main nodes running in ROS on a ground sta-tion computer are employed to achieve object detectionand tracking with an UAV. Communication with thedrone is performed by means of the AR.Drone driver

Figure 2. Overall system description. The drone communicateswirelessly with a ground station composed by a computer run-ning ROS. Three main ROS nodes are executed in the groundstation: the drone’s driver, the face detection using OpenCV andthe control node. The drones driver provides information fromthe embedded sensors, such as the ultrasound altimeter and theoptic flow sensor, along with video stream from two cameras.Object detection is accomplished using a Haar cascade classifieron OpenCV, once a target is detected, scale is estimated based onthe target size, and a KF is used for tracking the objective. Relativeposition control is performed through a PD controller, using theextra information from the sensors, where a Luenberger observeris implemented to recover the altitude speed. For safety reasons,manual recovery is available with a joystick.

node, already available as open source, which allows torecover information from the embedded sensors on thedrone, along with the video streams from both cameras,and to send control inputs. A second node is in chargeof the image processing from the frontal camera todetect and track the target, providing the target’s posi-tion and size. Object detection is accomplished usinga Haar feature-based classifier in cascade. Once a tar-get is detected, a Kalman Filter (KF) is used to track italong time, adding robustness against the presence ofother target-like objects and false-positive detection. Athird node was implemented for relative position con-trol. First, the estimated relative position of the trackedobject with respect to the drone is transformed from theimage space to the real world, by a suitable coordinateschange, where the image depth is estimated using thea-priori knowledge of the object size. A PD controlleris used to keep a constant distance between the UAVand the target, where the horizontal velocities (x, y) areobtained from the embedded optical flow sensor, andthe altitude speed (z) is estimated by means of a Luen-berger observer and the ultrasound altitude sensor. Boththe visual object detection and tracking node and therelative position control node were developed for thiswork.

Moreover, a Graphical User Interface (GUI) node isavailable for online parameter tuning, switching betweenoperation modes and real-time monitoring. Also, in caseany problem arises, and taking the user security as thedesign priority, manual recovery is possible at any timeusing a sixaxiswireless joystick. The overall system archi-tecture is presented in Figure 2.

3. Visual object detection and tracking

3.1. Computer vision algorithm for object detection

Object detection and tracking is performed using com-puter vision, with the help of the OpenCV libraries onROS, using the cv_bridge node. In brief, the UAV’s frontalcamera streams a video at 30Hz in BGR format, which istransformed to grayscale. Then, the image is smoothedby means of a blur filter with a kernel of 5 × 5, to elim-inate white noise on the image, helping to diminish thenumber of false positive detections due to high frequencynoise. Afterwards, a histogram equalization is appliedto improve the contrast on the image. Finally, the pre-trained object detector is used, and the selected target istracked with the help of a KF. Outliers are disregardedbefore the KF. A flow chart describing the complete visualobject detection and tracking algorithm is depicted inFigure 3.

Page 6: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

392 D. A. MERCADO-RAVELL ET AL.

Figure 3. Flowchart of the image processing algorithm. Videostream is provided by the frontal camera on the UAV and con-verted to grayscale. After a blur filter and histogram equaliza-tion, the Haar cascade classifier is used to detect the objectivein the scene. A KF filter is used to track the moving object alongtime, rejecting other similar objects in the image as well as falsepositives.

Object detection is accomplished through the Haarclassifier on OpenCV [19]. It consists in a MachineLearning technique first developed in [20] which usesHaar-like features in cascade through different levels ofthe image to determine whether or not a pre-specifiedrigid object, for which it was trained a priori, is presenton the image. The Haar classifier is a supervised classi-fier that uses a form of AdaBoost organized as a rejectioncascade and designed to have high detection rate at thecost of low rejection rate, producingmany false positives.One of themain advantages of this method is the compu-tational speed achieved in real-time detection, once theclassifier was trained off-line for the desired object, inthis case a face. It is important to notice that this method

can be trained for almost any mostly rigid object withdistinguishing views.

In this case, the classifier is trained to detect humanfaces. The idea is to create an application to interact withthe user by keeping a constant distance.Only frontal facesare considered by now. In order to detect different face’sposes, a classifier for each pose can be trained and runsequentially. For future developments, it is of particularinterest to detect faces rotated in yaw with respect to thedrone, to achieve relative yaw control such that the UAVwould be able to autonomously locate itself in front of thetarget.

3.2. Kalman filter for object tracking

One of the main drawbacks of the object detectionalgorithm is its low rejection rate, resulting in a highnumber of false positive detections. To overcome thisissue and in order to add robustness to the algorithmagainst missed detections and the presence of otherobjects similar to the target one on the scene, onesolution is to track along time the detected object. Todo so, let us consider the use of the Kalman Filter(KF), a powerful technique for optimal state estimationand filtering of linear systems perturbed with Gaussiannoise [21].

In this case, the discrete-time version of theKF [22, 23]is applied to the kinematic model of the detected target(see Figure 4), i.e.

[χkVk

]=

[1 Ts0 1

] [χk−1Vk−1

]+

[0

ak−1

], (1)

where χ = [x y d]T represents the position of the centerof the circle enclosing the object of interest, directly onthe image and its diameter d, while V = [ ˙x ˙y ˙d]T is itstime derivative. k is the discrete time index andTs definesthe sampling period. Finally, the process noise a ∈ �3 isused as a tuning parameter, analogous to the acceleration,which determines how fast the variables can move.

The measurement in the KF is updated according tofour different cases, depending on the state of the visionalgorithm:

• At least one object is detected.• No objects are detected.

◦ Iterations since last detection ≤ n.◦ Iterations since last detection> n.

for certain constant n ∈ �+ denoting the maximumnumber of iterations without detection before the objectis considered lost. For the first case, if more than oneobject is detected on the same scene, either due to a

Page 7: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 393

Figure 4. Visual object detectionand trackingwith aflyingquadcopter.Detectedobjects are representedbydashed red circles, includingfalse detections. The yellow solid circle represents the objective trackedby the KF. The algorithm is robust against false positive detectionsand the presence of other similar objects in the scene.

false positive detection or to the presence of other similarobjects on the image, the detection closest to the previ-ous estimation is chosen. If no object is detected for a fewiterations, a missed detection is assumed and the mea-surement is updated with the last estimation, allowing tocontinue with the same motion for a while, and avoidingsharp displacements in the closed loop system. How-ever, if the target is not detected for several iterations, theobject of interest is assumed lost, consequently, the mea-surement vector is updated with its initialization valueχinit, and the quadcopter would hold its position. There-fore, the update equation for the measurement vector ζtakes the form

ζk =⎧⎨⎩χminkχk−1χinit

∣∣∣∣∣∣Nfaces > 0

(Nfaces = 0)&(ιlost ≤ n)(Nfaces = 0)&(ιlost > n)

(2)

with

χmin = minχδi∈

√(x − xδi)2 + (y − yδi)2, (3)

where Nfaces is the number of objects detected at thepresent frame, ιlost defines the number of iterationssince the last valid detection. represents the set of allthe objects detected on the image, described by χδ =[xδ yδ dδ]T. χmin denotes the closest detection to theprevious estimation.

Figure 4 illustrates a possible scenario, where the UAVcaptures a scene with multiple object detections. Posi-tive detections obtained by the classifier are identifiedby the dotted red circles, including false positive detec-tions where there are no objects of interest. Then, theKF is employed to track the chosen target along time,

and its result is displayed by a unique solid yellow cir-cle with center at (x, y) and diameter d. Only the closestdetection to the previous estimation is used to update themeasurement vector of the KF, and the rest are discarded.

3.3. Relative position estimation: from image toreal world

Let us consider for simplicity the idealized pinholecamera model to describe the relationship betweencoordinates in the real world pw = (xw, yw, zw) to theirprojection on the image pim = (xim, yim), see Figure 5,according to the following expressions [24]:

xim = fx[xwyw

]+ cx; yim = fy

[zwyw

]+ cy; (4)

where the constant parameters for the focal lengths fx =Fsx, fy = Fsy are actually the product of the physicalfocal length F and the number of pixels per meter sx, sy,along each image axis. These constants, together with theprincipal point position cx, cy, determine the intrinsicparameter of the camera and are known from calibra-tion. Note that, in contrast to the standard notation usedin computer vision, here z stands for the altitude and yfor the depth between the camera and the point, this isdone just for consistency with the rest of the coordinateframes.

From the previous equation, it is straightforward torecover from the image projection, the real-world posi-tion of a point in the xw and zw axes, normalized by thedepth yw. However, this depth yw remains unknown andcannot normally be obtained from the information given

Page 8: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

394 D. A. MERCADO-RAVELL ET AL.

Figure 5. From real world to image coordinates transformation. A pinhole camera model is used to map the image projection to itsreal-world source, without scale. The scale is estimated using the a-priori knowledge of the average size of the object.

by a single camera. The use of several cameras for stereo-vision is precluded due to the inflexibility of the selectedhardware. Another option is to obtain extra informationfrom the already available sensors, such as altitude andinertial measurements, to fusion it with the vision datato estimate this depth, similar to the work developed in[25]. Nevertheless, in our considered scenario, it can beestimated from the a-priori knowledge of the average sizeof the object of interest. Even though every person’s faceis different in size and shape, let us assume the face of anadult having an average size whose enclosing circle has adiameter of df ≈ 0.241m. Also, consider a diameter d ofthe enclosing circle for the face projected in the image,see Figure 4. Then, it is easy to show that

yw ≈ Fdfd

. (5)

Finally, substituting in (4) the position of the object inthe physical world can be obtained with

xw = (xim − cx)sxdfd

; (6)

zw = (yim − cy)sydfd

. (7)

4. Relative position control

Provided that a full state feedback is available, themissionis to control the drone to autonomously keep a constantdistance with respect to a moving target with unknowndynamics. This is equivalent to a three-dimensional posi-tion control with a time varying reference. A hierarchicalcontrol is proposed to deal with this problem, wherea PD position controller with gravity compensation is

added in cascade with the inner orientation control loop,available with the autopilot on the selected experimentalplatform. This strategy is compatible with most commer-cial autopilots and allows for easy implementation withother drones.

Let us consider a simplified version of the well-knowndynamic model of a quadcopter [26]:

⎡⎣xyz

⎤⎦ ≈ T

m

⎡⎣ sψsφ + cψsθcφ

−cψsφ + sψsθcφcθcφ

⎤⎦ −

⎡⎣00g

⎤⎦ , (8)

⎡⎣φθψ

⎤⎦ ≈

⎡⎣τφτθτψ

⎤⎦ , (9)

where x, y, z are the position of the quadcopter withrespect to an inertial frame, T ∈ �+ defines the totalthrust produced by the motors, m and g represent themass and gravity constant, respectively. φ, θ andψ standfor the Euler angles roll, pitch and yaw, and τφ , τθ and τψdescribe the control torques produced by the differentialvelocities of the rotors. The short notation sα = sin(α)and cα = cos(α) is used. Remember that the AR.Droneincludes an internal autopilot to deal with the attitude(φ, θ and ψ) and the altitude (z) controllers, to followsome desired reference. Therefore, our control input vec-tor is u = [φd θd zd ψd]T, i.e. the desired angles roll andpitch, and desired altitude and yaw rates. Given that therotational dynamics are much faster than the transla-tional one [27], a time scale separation of the translationaland rotational dynamics permits to use the roll and pitchdesired references (φd and θd) as virtual control inputs forthe unactuated states (x and y).

Page 9: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 395

4.1. Control law

Since the available autopilot already provides a suitableattitude controller, the study herein will focus on thehorizontal position dynamics x and y. Assuming smallvariations in the roll and pitch angles, i.e. only non-aggressive maneuvers are considered, in compliance withthe security demands while working close to humans, alinearization around the equilibrium point φd ≈ θd ≈ 0yields:

m[xy

]≈ T

[sψ cψ

−cψ sψ

] [φdθd

]. (10)

Note the introduction of the desired angles instead of thereal ones, this is possible since φd ≈ φ and θd ≈ θ due tothe action of the autopilot attitude control loop. Denotenow

θ

]=

[sψ cψ

−cψ sψ

] [φdθd

](11)

then[xy

]= T

m

θ

], (12)

where φ and θ are used as the new virtual control inputsfor the position, and are chosen such that they follow cer-tain references xd, yd. Thus a PD control can be proposedas follows:

θ

]= m

T

[−kpx(x − xd)− kdx(x − xd)−kpy(y − yd)− kdy(y − yd)

](13)

with the control gains kpx, kpy, kdx, kdy ∈ �+. Trans-forming to the original coordinates

θ

]= m

T

[sψ −cψcψ sψ

] [−kpx(x − xd)− kdx(x − xd)−kpy(y − yd)− kdy(y − yd)

].

(14)

In this work, the interest relies on the vehicle positionrelative to the mobile target, rather than global localiza-tion. Also, it is supposed that the objective pose is alignedwith the drone’s yaw (the objective is in view from thefrontal camera on the vehicle), hence the relative yawrotation always remains small. To remove this constrain,other classifiers can be trained for different yaw rota-tions, then a relative yaw control can be implemented toadd robustness to rotational movements and increase theapplications of the system.

As for the yawψ and altitude z controllers, the follow-ing control laws are proposed:

ψd = −kpψ(ψ − ψd)− kdψψ (15)

zd = −kpz(z − zd)− kdzz (16)

for some desired ψd, zd. Also, kpψ , kpz, kdψ and kdz ∈�+ are suitable control gains. For the following prob-lem of a mobile object, the desired position [xd yd zd]Tis obtained from the relative position between the droneand the object (Equation 7), which is estimated by thevision algorithm presented in Section 3, i.e.[

xdyd

]=

[xw

yw + yoff

], (17)

where yoff is a predefined safe distance to the objective.

4.2. Altitude velocity estimation

It is important to notice that the previous feedback con-trol strategy relies on a good measurement of the sys-tem states and their derivatives. Relative position is useddirectly from the vision algorithm. However, velocitiesfrom the vision algorithm result to be imprecise andnoisy, and not to trust for control feedback. This is nota problem for the x and y coordinates since the usedplatform is equipped with a down looking camera fromwhich the optic flow is calculated to estimate the hor-izontal velocities x and y. Nevertheless, problems arisebecause no measurement is available for the altitudevelocity z.

To overcome this issue, two estimation techniqueswere implemented and tested for the altitude velocity,based on the altitude measurements zm from the ultra-sound sensor integrated on the bottom of the helicopter.The first method consists on the classical Euler derivativeof the measurement zm, i.e.

zm(k) = zm(k)− zm(k −�t)�t

, (18)

where k and �t stand for the discrete time variable andits increment. Although it offers a simple solution, thisderivative is known to amplify the noise from the mea-surements and cannot be directly used for control feed-back. For this reason, it was decided to implement afourth-order low-pass filter type Chebyshev I, at a cutofffrequency of 8Hz. This is a kind of recursive filter, and assuch, it offers a fast response [28]. The following equationpresents the filter algorithm with input zm and output ˙zm

˙zm(k) =4∑

i=0aizm(k − i�t)+ bi ˙zm(k − i�t), (19)

where the constant coefficients ai and bi are obtainedwiththe help ofMATLAB/fdatool.

Page 10: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

396 D. A. MERCADO-RAVELL ET AL.

The result is a smooth and acceptable estimation of thespeed, except for a little delay of about 4 steps producedby the filter. This response is good enough for severalapplications where dynamics are slow or sampling fre-quency is large enough to neglect this delay. However, forUAV control feedback this can cause instability or poorperformance in the closed-loop system.

The second explored method consists of a Luenbergerstate observer [29]. A state observer provides estimatesof the internal states of the system from the input andoutput measurements. Let us consider the discrete-timekinematic model of the altitude at instant k[

zkzk

]=

[1 �t0 1

] [zk−1zk−1

]+

[0uz

], (20)

γ = [1 0][zk−1zk−1

], (21)

where γ is the output and the input is uz = zk�t. Then,a Luenberger observer is proposed, according to the fol-lowing equations:

[zk˙zk

]=

[1 �t0 1

] [zk−1˙zk−1

]+

[L1L2

][γ − γ ] +

[0uz

],

(22)

γ = [1 0][zk−1˙zk−1

]. (23)

Note the use of a hat on the variables to indicate thatthey are state estimates rather than the real ones. In orderto guarantee that the estimated states [z ˙z]T convergeasymptotically to the real ones, the observer gains L1, L2are chosen such that the matrix[

1 − L1 �t−L2 1

]

has all its eigenvalues inside the unit circle.Figure 6 allows us to compare both estimation tech-

niques, where it is clear that the two of them obey ingeneral the same behavior, however, the estimation givenby the Euler derivative combined with the Chebyshevfilter (dashed blue line) attenuates a little bit the signalamplitude and introduces a small undesired delay. Thisis corrected by using only a Luenberger observer instead(solid red line), which proves to be an excellent option forthis case.

5. Real-time experimental results

The performance of the full strategy was vastly studiedunder different conditions, in indoor and outdoor tri-als, as can be observed in the video at https://youtu.be/xbpMx4o6gY0. There, we can appreciate four stages.

First, indoor following is demonstrated moving in thethree axis, where the user even covers his face during hismotion and uncovers it in a different position, provingthe fast response of the system.We consider that coveringthe face during themotion is a harsher condition than fastuser motion. The second stage, also indoors, proves therobustness of the system in the presence of other target-like objects, in this case, other faces. The dronewas able tokeep constant distance with respect to the objective evenin the presence of other person trying to perturb the sys-tem.Third scenario accounts for outdoors in the presenceof other faces. Note that there is no control over certainoutdoors conditions, such as illumination and wind. Inthis part of the video, we can observe several false posi-tive detections, signaled as red circles, in addition to thetarget detection and other face in the scene. The use ofa KF for tracking allows us to keep the right objectivealong time. Finally, outdoors operation is tested onemoretime, in the presence of strong wind gusts that consid-erably perturb the UAV. In all scenarios, the system wasable to successfully accomplish its objective in spite of theimposed harsh conditions.

To quantitatively evaluate the system performance,other experiments were conducted indoors, similar tothe first stage in the video. Some results are presented inFigures 7–12. The mission consisted in detecting ahuman face and keeping a constant distance of 2m infront of it, while the target moves in a rectangular posi-tion, at human walking speed, as can be appreciatedin Figure 7. Once the second trajectory is completed,the interacting user descends and rises a few times inorder to test the altitude response. An OptiTrackmotioncapture system, composed of 12 infrared cameras, wasused only as a ground truth, providing millimeterprecision.

Two results are of particular interest in this study, onthe one hand, the computer vision detection plus theKF tracking performance to estimate the relative posi-tion of the mobile object with respect to the camera canbe studied from Figures 8–10. Figure 8 illustrates theeffect of the KF for tracking the moving target, mainlyto add robustness to false positives and the presence ofother target-like objects in the scene (other humans inthis case). The result is a smoother estimation, where theKF is able to handle wrong detections (see for examplethe peak around second 13 in the x coordinate). Pleasenote that this figure is the only one that does not cor-respond to the same experiment, but to a separate onewhere harder conditions were imposed in order to high-light the action of the KF, by increasing the number offalse positive detections.

The relative position estimation is illustrated inFigure 9, where the visual detection and tracking

Page 11: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 397

Figure 6. Altitude speed estimation comparison. The Luenberger estimator (solid red line) shows a faster response and smallerattenuation compared to the filtered Euler derivative (dashed blue line).

Figure 7. 3D trajectory. The mobile target (solid red line) moves describing a rectangular trajectory and the aerial vehicle (dashed blueline) is able to follow the objective with good performance.

Page 12: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

398 D. A. MERCADO-RAVELL ET AL.

Figure 8. Kalman filter tracking estimation (solid lines) against direct detection by the vision algorithm (dashed lines). The use of a KFto track the detected object helps to filter out false detections and adds robustness against other similar objects in the scene, see forinstance second 13 in the x variable (top plot).

Figure 9. Relative position estimated by the computer vision algorithm (dashed line) versus ground truth from amotion capture system(solid line). The proposed algorithm proves to be consistent with respect to the ground truth, but presents a small error.

Page 13: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 399

Figure 10. Computer vision estimation error. The RMSE are 0.1294m in x, 0.2392m in y and 0.1802m in z.

Figure 11. Relative position control performance measured by a motion capture system. The target’s position (solid lines) versus theaerial vehicle position (dashed lines). The offset in the y coordinate corresponds to a predefined desired distance yoff = 2mbetween thetracked object and the drone.

algorithm is compared against the ground truth mea-surement for the three axis. There, it can be observedthat the estimated relative position is consistent with the

reality, but small errors are presented, mainly for the xcoordinate. It can also be noted the good performance ofthe depth estimation using the a-priori knowledge of the

Page 14: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

400 D. A. MERCADO-RAVELL ET AL.

Figure 12. Drone’s position relative to the target. The UAV is able to follow the objective and keep a constant distance of 2m infront of it.

average target size. Figure 10 completes the study show-ing the relative position estimation errors with a RootMean Square Error (RMSE) of 0.1294m in x, 0.2392min y and 0.1802m in z, which are acceptable for this kindof application, taking into account the use of a low-costcamera subject to relatively fast motions.

On the other hand, the performance of the relativeposition control strategy can be observed fromFigures 11and 12. Figure 11 shows the position of the tracked objectalong with the position of the UAV as measured by themotion capture system. Please note that the objective isto keep a constant distance of 2m in front of the tar-get, which explains the offset in the y coordinate. Finally,the relative position between the drone and the movingobject is presented in Figure 12, where the validity of theproposed algorithms is confirmed to detect, track andcontrol an UAV to follow a mobile object, with unknowndynamics.

The proposed algorithms demonstrated satisfactoryperformance during the experiments, even under uncon-trolled conditions outdoors where illumination changesand wind gusts are a major concern. The system alsoproved to function effectively in spite of the presence ofother target-like objects in the scene, and false positivedetection, thanks to the use of a KF for tracking. Fur-thermore, the system showed fast response before theobjective movement, being able to follow the face as longas it stays in sight. Please note that the range of operation

of the detection algorithm is limited to the field of viewof the camera lens, and the camera resolution to detectfar away targets. Nevertheless, at current stage the systemis only capable of detection and tracking of frontal faces,lacking robustness against target rotations, in particularin yaw.

6. Conclusion and future work

An implementation for an UAV which is able to detect,track and follow a moving object with unknown dynam-ics was conceived and successfully developed in thiswork, using a human face as a case of study. In order todo so, several tools and techniques were merged togetherto offer a full-working solution.

Relative position from the mobile target to the quad-copter was estimated by a computer vision algorithm.Object detection was accomplished by means of a Haarcascade classifier, while a KF was implemented to keeptrack of the object of interest, adding robustness againstfalse positive detections and other similar objects on theimage. Then a suitable transformation was proposed, byusing the previously known average size of the objective,to compute the depth to the target, normally unknownfor monocular vision algorithms.

A PD control strategy was implemented to deal inreal-time with the relative position regulation to a timevarying reference. To complete the state feedback for the

Page 15: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

ADVANCED ROBOTICS 401

controller, a Luenberger observer was employed to esti-mate the missing altitude velocity. The overall systemperformance was tested in numerous real-time experi-ments indoors and outdoors, under different conditions,and proved to be a good solution for the studied prob-lem, despite the use of a low-cost quadcopter and thesimplicity of the algorithms.

It is envisioned to implement the proposed algo-rithms in a fully embedded UAV. Also, a wide angle lenswould help to increase the field of view, allowing fastermotions.

Another important improvement is to extend thecomputer vision algorithm to detect the object rotation aswell, and control the drone’s yaw angle accordingly. To doso, it is necessary to train various classifiers for differentorientations and run them sequentially.

This work can be used as a base for future applicationsand developments, looking to improve the human–robotinteraction experience. To cite some examples:

• The detection algorithm can be trained to trackalmost any other kind of mostly rigid object that havedistinguishing views. Then, the proposed system canbe used as it is to follow the desired object.

• Detecting face gestures and body movements couldbe used to give further commands to the quadcopter,improving the user experience.

• A powerful and interesting applications would be todetect other quadcopters and use the developed sys-tem for formation flight applications.

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

This work has been sponsored by the ScholarOne ManuscriptFrench National Networks of Robotics Platforms ROBOTEX(ANR-10-EQPX-44).

Notes on contributors

Diego Alberto Mercado-Ravell was born in Mexico City. Hereceived his B.S. degree in Mechatronics Engineering fromthe Universidad Panamericana in Mexico in 2010, the M.Sc.degree in Electrical Engineering option Mechatronics fromCINVESTAV-IPN, Mexico City in 2012, and the Ph.D. inAutomation, Embedded Systems and Robotics from the Uni-versity of Technology of Compiègne, France in 2015. Dr. Mer-cado has held post-doctoral positions at the Mechanical andAerospace department at Rutgers, the state University of NewJersey and at UMI 3175 LAFMIA laboratory at CINVESTAVMexico. He is currently full-time professor at the research cen-ter in mathematics CIMAT-Zacatecas in Mexico, and memberof the Mexican National Research System SNI-C since 2018.His research topics includemodeling and control of unmanned

aerial and/or underwater vehicles, autonomous navigation,real-time embedded applications, data fusion and computervision.

Pedro Castillo was born in Morelos, Mexico, on January 8,1975. He received the B.S. degree in electromechanical engi-neering from the Instituto Tecnologico de Zacatepec, Morelos,Mexico, in 1997, the M.Sc. degree in electrical engineeringfrom the Centro de Investigación y de Estudios Avanzados(CINVESTAV), Mexico, in 2000, and the Ph.D. degree in auto-matic control from theUniversity of Technology ofCompiègne,France, in 2004. His research topics include real-time con-trol applications, nonlinear dynamics and control, aerospacevehicles, vision, and underactuated mechanical systems.

Rogelio Lozano was born in Monterrey, Mexico, on July 12,1954. He received the B.S. degree in electronic engineeringfrom the National Polytechnic Institute of Mexico in 1975, theM.S. degree in electrical engineering from Centro de Inves-tigación y de Estudios Avanzados (CINVESTAV), Mexico in1977, and the Ph.D. degree in automatic control from Labora-toire d’Automatique deGrenoble, France, in 1981.He joined theDepartment of Electrical Engineering at CINVESTAV, Mexico,in 1981where heworked until 1989.HewasHead of the Sectionof Automatic Control from June 1985 to August 1987. He hasheld visiting positions at the University of Newcastle, Aus-tralia, fromNovember 1983 to November 1984, NASA LangleyResearch Center VA, from August 1987 to August 1988, andLaboratoire d’Automatique deGrenoble, France, fromFebruary1989 to July 1990. Since 1990, he is a CNRS (Centre Nationalde la Recherche Scientifique) Research Director at Universityof Technology of Compiègne, France. He was Associate Editorof Automatica in the period 1987–2000. He is associate Editorof the Journal of Intelligent and Robotics Systems since 2012 andAssociate Editor in the International Journal of AdaptiveControland Signal Processing since 1988. He has coordinated or par-ticipated in numerous French projects dealing with UAVs. Hehas recently organized two international workshops on UAVs(IFAC RED UAS 2013 and IEEE RAS RED UAS 2015). Heparticipates in the organization of the annual internationalconference ICUAS (International Conference on UnmannedAerial Systems) since 2010. He is IPCChairman of the ICSTCCin Rumania since 2012. He was Head of Heudiasyc Labora-tory in the period 1995–2007. Since 2008 he is Head of theJoint Mexican-French UMI 3175 CNRS. His areas of expertiseinclude UAVs, mini-submarines, exo-squelettons and Auto-matic Control. He has been the advisor or co-advisor of morethan 35 Ph.D. theses and publishedmore than 130 internationaljournal papers and 10 books.

ORCID

Diego A. Mercado-Ravell http://orcid.org/0000-0002-7416-3190

References

[1] Peshkova E, Hitz M, Kaufmann B. Natural interactiontechniques for an unmanned aerial vehicle system. IEEEPervas Comput. 2017;16(1):34–42.

[2] https://www.myo.com/.[3] ETH Zurich. Controlling a quadrotor using kinect, 2011.

Page 16: Visual detection and tracking with UAVs, following a ...static.tongtianta.site/paper_pdf/d3fe19ea-78b1-11e9-9e2f-00163e08bb86.pdf · Full Terms & Conditions of access and use can

402 D. A. MERCADO-RAVELL ET AL.

[4] Sanna A, Lamberti F, Paravati G, et al. A kinect-based nat-ural interface for quadrotor control. Entertain Comput.2013;4(3):179–186.

[5] LeeD, Franchi A,Hyoung I, et al. Semiautonomous hapticteleoperation control architecture of multiple unmannedaerial vehicles. IEEEASMETransMechatron. 2013;18(4):1334–1345.

[6] Yao C, Xiaoling L, Zhiyuan L, et al. Research on the UAVmulti-channel human–machine interaction system. In:IEEE editor. 2nd Asia-Pacific Conference on IntelligentRobot Systems, Wuhan, China, June 2017.

[7] Cauchard J, Jane LE, Zhai K, et al. Drone & me: anexploration into natural human–drone interaction. In:Proceedings of the ACM International Joint Conferenceon Pervasive and Ubiquitous Computing, Osaka, Japan,September 2015.

[8] Kanellakis C, Nikolakopoulos G. Survey on computervision for UAVs: current developments and trends. J IntellRobot Syst. 2017;87:141–168.

[9] Choi H, Kim Y. UAV guidance using a monocular-vision sensor for aerial target tracking. Control Eng Pract.2014;22:10–19.

[10] Blondel P, Potelle A, Pégard C, et al. Dynamic collabora-tion of far-infrared and visible spectrum for human detec-tion. In: International Conference on Pattern Recognition(ICPR), Cancun, Mexico, 2016.

[11] https://www.lily.camera/.[12] Graether E, Mueller F. Joggobot: a flying companion as

flying companion. In: CHI, Austin, TX, 2012.[13] Naseer T, Sturm J, Cremers D. Follow me: person fol-

lowing and gesture recognition with a quadrocopter. In:Intelligent Robots and Systems (IROS), Tokyo, Japan,November 2013. p. 624–630.

[14] Danelljan M, Shahbaz F, Felsberg M, et al. A low-levelactive vision framework for collaborative unmanned air-craft systems. In: Agapito L, Bronstein M, Rother C,editors. Computer Vision – ECCV 2014 Workshops,Zurich, Switzerland, September 2014.

[15] Sanchez-Lopez JL, Pestana J, Saripalli S, et al. Computervision based general object following for gps-denied mul-tirotor unmanned vehicles. In: AmericanControl Confer-ence (ACC), Portland, USA, June 2014.

[16] Haag K, Dotenco S, Gallwitz F. Correlation filter basedvisual trackers for person pursuit using a low-costquadrotor. In: 15th International Conference on

Innovations for Community Services (I4CS), Nuremberg,Germany, July 2015.

[17] Monajjemi M, Mohaimenianpour S, Vaughan R. UAV,come tome: End-to-end,multi-scale situatedHRIwith anuninstrumented human and a distant UAV. In: IEEE/RSJInternational Conference on Intelligent Robots and Sys-tems (IROS), Daejeon, Korea, October 2016.

[18] Yao N, Anaya E, Tao Q, et al. Monocular vision-basedhuman following on miniature robotic blimp. In: IEEEInternational Conference on Robotics and Automation(ICRA), Singapore, June 2017.

[19] Bradski G, Kaehler A. Learning OpenCV. O’Reilly; 2008.Chapter 13, Machine Learning; p. 459–520.

[20] Viola P, Jones MJ. Rapid object detection using a boostedcascade of simple features. In: IEEE Computer Vision andPattern Recognition (CVPR), Kauai, Hawaii, USA, 2001.

[21] Kalman R. A new approach to linear filtering and predic-tion problems. J Basic Eng. 1960;82 (Series D):35–45.

[22] GrewalM, Andrews A. Kalman filtering: theory and prac-tice usingMATLAB. New York, USA: JohnWiley & Sons;2001.

[23] Brown R, Hwang Y. Introduction to random signals andapplied Kalman filtering. Fourth edition, 2012. Chich-ester, UK: John Wiley & Sons; 1992.

[24] Bradski G, Kaehler A. Learning OpenCV. 1st ed.O’Reilly; 2008. Chapter, Camera Models and Calibratio;p. 370–404.

[25] Achtelik M, Weiss S, Siegwart R. Onboard IMU andmonocular vision based control for MAVS in unknownin- and outdoor environments. In: Proceedings of theIEEE International Conference on Robotics and Automa-tion (ICRA), Shanghai, China, 2011.

[26] Castillo P, Lozano R, Dzul A. Modelling and con-trol of mini-flying machines. 1st ed. Londres: Springer-Verlag; 2005. Chapter 3, The Quad-rotor Rotorcraft;p. 39–59.

[27] Bertrand S, Guénard N, Hamel T, et al. A hierachical con-troller for miniature VTOL UAVS: Desing and stabilityanalysis using singular perturbation theory. Control EngPract. 2011;19(10):1099–1108.

[28] Smith SW. The scientist and engineer’s guide to digitalsignal processing. San Diego, CA: California TechnicalPublishing; 1997.

[29] Luenberger D. An introduction to observers. IEEE TransAutomat Contr. 1971;AC-16(6):596–602.