Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Evaluation of 3D markerless motion capture accuracyusing OpenPose with multiple video cameras
Nobuyasu Nakanoa,b,∗, Tetsuro Sakuraa, Kazuhiro Uedaa, Leon Omuraa,b,Arata Kimuraa, Yoichi Iinoa, Senshi Fukashiroa, Shinsuke Yoshiokaa
aThe University of Tokyo, Tokyo, Japan.bResearch Fellow of the Japan Society for the Promotion of Science, Tokyo, Japan.
Abstract
There is a need within human movement sciences for a markerless motion cap-
ture system, which is easy to use and sufficiently accurate to evaluate motor
performance. This study aims to develop a 3D markerless motion capture tech-
nique, using OpenPose with multiple synchronized video cameras, and examine
its accuracy in comparison with optical marker-based motion capture. Partic-
ipants performed three motor tasks (walking, countermovement jumping, and
ball throwing), with these movements measured using both marker-based opti-
cal motion capture and OpenPose-based markerless motion capture. The differ-
ences in corresponding joint positions, estimated from the two different methods
throughout the analysis, were presented as a mean absolute error (MAE). The
results demonstrated that, qualitatively, 3D pose estimation using markerless
motion capture could correctly reproduce the movements of participants. Quan-
titatively, of all the mean absolute errors calculated, approximately 47% were
less than 20 mm and 80% were less than 30 mm. However, 10% were greater
than 40 mm. The primary reason for mean absolute errors exceeding 40mm
was that OpenPose failed to track the participant’s pose in 2D images owing to
failures, such as recognition of an object as a human body segment, or replacing
∗Corresponding authorEmail address: [email protected] (Nobuyasu Nakano)
Preprint submitted to Journal of Biomechanics November 6, 2019
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
one segment with another depending on the image of each frame. In conclu-
sion, this study demonstrates that, if an algorithm that corrects all apparently
wrong tracking can be incorporated into the system, OpenPose-based marker-
less motion capture can be used for human movement science with an accuracy
of 30mm or less.
Keywords: OpenPose, markerless, motion capture
1. Introduction1
Motion capture systems have been used extensively as a fundamental tech-2
nology within biomechanics research. However, traditional marker-based ap-3
proaches experience significant environmental constraints. For example, mea-4
surements are difficult to perform in environments wherein wearing markers5
during the activity is not ideal (such as sporting games). Markerless measure-6
ments without such environmental constraints can facilitate new learnings about7
human movements (Mundermann et al., 2006); however, complex information8
processing technology is required to make an algorithm that recognizes human9
poses or skeletons from images. Therefore, it is desirable to many biomechanics10
researchers to develop a markerless motion capture that is easy to use.11
Recently, automatic human pose estimation using deep learning techniques12
have attracted attention amongst computer vision researchers. OpenPose is one13
of the most popular technologies (Cao et al., 2018) and is deemed easy to use14
for biomechanics researchers. It is open source software that automatically esti-15
mates human joint centers and skeletons from 2D RGB images, outputting the16
2D coordinates in the images. Kinect is another easy-to-use markerless motion17
capture that has been used in many studies (Clark et al., 2012; Pfister et al.,18
2014; Gao et al., 2015; Schmitz et al., 2014). OpenPose, when compared to19
Kinect, has less constraints on the distance between the camera and the target20
2
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
to be measured as well as the sampling rate of video recording, because it can21
estimate the human pose from RGB image without using a depth sensor.22
Seethapathi et al. (2019), which reviewed pose tracking studies from the per-23
spective of movement science, pointed out that human pose tracking algorithms,24
such as OpenPose, did not prioritize the quantities that matter for movement25
science. It remains unclear whether the accuracy of the OpenPose-based 3D26
markerless motion capture is appropriate for human movement studies such27
as sports biomechanics or clinical biomechanics. The aim of this study is to28
develop a 3D markerless motion capture using OpenPose with multiple syn-29
chronized video cameras, then assess the accuracy of the 3D markerless motion30
capture by comparing with an optical marker-based motion capture.31
2. Materials and Methods32
Participants. Two healthy male volunteers participated in this experiment. The33
mean age, height, and body mass of the participants were 22.0 years, 173.5 cm,34
and 69.5 kg, respectively. The participants provided written informed consent35
prior to the commencement of the study, and the experimental procedure used36
in this study was approved by the Ethics Committee of the university with37
which the authors were affiliated.38
Overview of data collection. Participants performed three motor tasks in the39
following order: walking, countermovement jumping, and ball throwing. These40
movements were measured using both a marker-based optical motion capture41
and a video camera-based (markerless) motion capture. A light was used to42
synchronize the data obtained from all the video cameras and the two differ-43
ent measurement systems. All methods and instrumentation details are in the44
following subsections.45
3
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Marker-based motion capture. Forty-eight reflective markers were attached onto46
body landmarks (Figure 1). The coordinates of these reflective markers upon47
the participants’ bodies were recorded using a 16-camera motion capture system48
(Motion Analysis Corp, Santa Rosa, CA, USA) at a sampling rate of 200 Hz.49
The elbow, wrist, knee, and ankle joint centers were assigned to the mid-points50
of the lateral and medial markers, while the shoulder joint centers were assigned51
to the mid-points of the anterior and posterior shoulder markers. The hip joint52
centers were estimated using the method described by Harrington et al. (2007).53
The raw kinematic data was smoothed using a zero-lag fourth order Butterworth54
low-pass filter. The cut-off frequency of the filter was determined using a residual55
analysis (Winter, 2009). Data analysis was performed using MATLAB (v2019a,56
MathWorks, Inc., Natick, MA, USA).57
Markerless motion capture. The experimental setup and overview of the mark-58
erless motion capture are shown in Figure 2. The markerless motion capture59
consisted of five video cameras (GZ-RY980, JVCKENWOOD Corp, Yokohama,60
Kanagawa, Japan). Two measurement conditions, i.e. combinations of video61
camera resolutions and sampling frequencies, were implemented: 1920 × 108062
pixels at 120 Hz (1K condition) and 3840×2160 pixels at 30 Hz (4K condition).63
OpenPose (version 1.4.0) was installed from GitHub (CMU-Perceptual-Computing-Lab,64
2017) and run with GPU (GEFORCE RTX 2080 Ti, Nvidia Corp, Santa Clara,65
CA, USA) under default settings. Twenty-five keypoints (Figure 3) of the partic-66
ipant’s body were outputted independently for each frame. The control points,67
at which 3D global coordinates could be identified, were measured using the68
video cameras with use of a calibration pole. The 2D video camera coordinates69
obtained from OpenPose were transformed to 3D global coordinates using a70
direct linear transformation (DLT) method (Miller et al., 1980). The raw kine-71
matic data was smoothed using a zero-lag fourth order Butterworth low-pass72
4
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
filter. The cut-off frequency of the filter was determined using residual analysis73
(Winter, 2009) with its ranges of 5-8 Hz and 2-3 Hz in the 1K and 4K conditions,74
respectively.75
Data analysis. The position data obtained using the marker-based motion cap-76
ture was downsampled using the spline function to alter the number of frames77
such that they are the same as that obtained using markerless motion capture.78
The analysis period durations were defined for each individual motor task as79
follows: from the second step heel contact to the next heel contact of the same80
leg in a walking task, from the start of the squatting motion to the recovery of81
the initial upright stance in a jumping task, and from the toe-off on the oppo-82
site side of the throwing arm to the end of the arm-swing in a throwing task.83
The differences in the corresponding joint positions that were estimated from84
the two different motion captures throughout the analysis durations were cal-85
culated. Mean absolute error (MAE) was used as the indicator of the difference86
as described by Equation 1, where, xm and xo are the positions estimated by87
the marker-based and OpenPose-based approaches, respectively.88
3. Results89
Examples of the 3D pose estimations obtained by the two different motion90
captures are depicted in Figure 4. In addition, video examples that show the91
participant’s pose during movements are provided as supplementary materials92
to this paper. The landmarks’ positions tracked by OpenPose (Figure 3) do93
not necessarily correspond to the points estimated by the marker-based ap-94
proach. The joint positions of the shoulder, elbow, wrist, hip, knee, and ankle95
tracked by OpenPose are considered to be approximately the same as those96
estimated by the marker-based approach. However, the positions of other land-97
marks tracked by OpenPose are considered to be different from those estimated98
5
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
by the marker-based approach. The representative time-series profiles of joint99
positions estimated by both the marker-based motion capture (Mocap) and the100
OpenPose-based markerless motion capture (OpenPose) can be seen in Figure 5.101
The mean absolute error (MAE) of the two plots throughout the duration of102
analysis is shown in each panel. The shapes of the time-series profiles were103
found to be approximately the same.104
Quantitatively, the MAE of joint positions in Figure 5a and Figure 5b were105
less than 20 mm; however, the MAEs of joint positions in Figure 5c and Fig-106
ure 5d were greater than 40 mm. The MAEs were particularly large at specific107
moments within the analysis duration (i.e. at 45% and 80% time in Figure 5c,108
described in Figure 4c). The MAEs of the corresponding joint positions, es-109
timated from the two different motion captures for all trials, are presented in110
Table 1. Of these MAEs in Table 1, approximately 47% are less than 20 mm,111
80% are less than 30 mm, and 10% is greater than 40 mm.112
The accuracy of the 3D pose estimation using the markerless motion capture113
depends on 2D pose tracking by OpenPose. Because the algorithm that tracks114
the human pose was applied to each frame of the video independently, within115
a single trial, there are frames where the participant’s pose was well tracked,116
whereas in others, the participant’s pose was not well tracked. Examples of pose117
estimation successes and failures using OpenPose are depicted in Figure 6.118
4. Discussion119
This study aimed to examine the accuracy of 3D markerless motion capture120
using OpenPose with multiple video cameras, through comparison with an op-121
tical marker-based motion capture. Qualitatively, 3D pose estimation using the122
markerless motion capture approach can correctly reproduce the movements of123
participants (Figure 4 and Supplementary movies). The MAE in 80% of all124
6
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
trials conducted was found to be less than 30 mm. This small error may be due125
to the shortcomings in the OpenPose tracking precision. Because the major-126
ity of computer vision-based pose tracking algorithms, including OpenPose, are127
based on the supervised learning using manually labeled data, it is inevitable128
that small errors in 3D pose are caused by inherent noise in the training data.129
Proportionately large MAEs exceeding 40 mm were observed in certain cases130
(Figure 5). Observing the estimated pose during movements reveals that when131
the correct joint positions (including noises) were estimated in the trial, an error132
of less that 30 mm existed (e.g. Figure 4 a1, a2, a4). However, when apparently133
incorrect joint positions were estimated, a comparatively large error of greater134
thatn 40 mm was observed (e.g. Figure 4 a3).135
Observing the estimated pose during movements reveals that the correct136
joint positions (including noises) are estimated in the trial including a small137
error that is less than 30 mm (e.g. Figure 4 a1,a2,a4); however, apparently in-138
correct joint positions are estimated in the trial including a relatively large error139
that is more than 40 mm (e.g. Figure 4 a3). The primary reason for estimating140
apparently incorrect 3D positions is that OpenPose failed to track the partici-141
pant’s pose depending on the image of each individual frame (Figure 6). Du to142
the 2D tracking failures, correction of such failures is required to achieve a more143
accurate 3D pose estimation. Within this study, for example, the interchange of144
the left and right segments was retrospectively corrected because without this145
correction, the 3D pose was completely different from the human shape. How-146
ever, recognizing an object as a human body segment (e.g. failures in Figure 6)147
was not corrected because it may have required manual tracking; moreover,148
without the correction, the 3D pose accuracy could be evaluated. Therefore, to149
use the OpenPose-based markerless motion capture in human movement science150
studies, it is considered necessary to incorporate algorithms that can correct all151
7
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
such a tracking failures.152
Other sources of error may be data processing, such as time synchronization.153
The error in the movement direction (i.e. Y direction for walking task and Z154
direction for jumping task), especially in the 30 Hz measurement (4K condi-155
tion), tends to be large (Table 1). Within the time series profiles, the timing156
of synchronization appears to affect to the error of the two motion captures157
(Figure 5d). However, because this is a problem caused by the process of com-158
paring the two motion captures, the effect of this error on the accuracy of the159
markerless system measurement should be relatively small.160
This study is preliminary work, and thus requires further examination. The161
accuracy of other biomechanical parameters, such as joint angle, joint angular162
velocity, and joint torque, needs to be investigated. Additionally, a program that163
is capable of correcting the OpenPose tracking failures described in this study,164
which can improve the accuracy of the 3D estimation, needs to be developed.165
While limitations still exist, the OpenPose-based markerless motion capture is166
expected to be applied in the future to sporting games that have been considered167
difficult to measure with marker-based motion capture.168
Acknowledgements169
This work was supported by JST-Mirai Program Grant Number JPMJMI18C7,170
Japan.171
Conflict of interest172
There are no conflicts of interest to declare.173
8
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Supplementary material174
Supplementary videos are attached with this paper. The videos show the175
human pose during walking, jumping, and throwing obtained by two different176
motion capture.177
References178
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2018). Openpose:179
realtime multi-person 2d pose estimation using part affinity fields. arXiv180
preprint arXiv:1812.08008 , .181
Clark, R. A., Pua, Y. H., Fortin, K., Ritchie, C., Webster, K. E.,182
Denehy, L., & Bryant, A. L. (2012). Validity of the Microsoft Kinect183
for assessment of postural control. Gait and Posture, 36 , 372–377.184
doi:10.1016/j.gaitpost.2012.03.033.185
CMU-Perceptual-Computing-Lab (2017). OpenPose: Real-time multi-person186
keypoint detection library for body, face, hands, and foot estimation.187
https://github.com/CMU-Perceptual-Computing-Lab/openpose.188
Gao, Z., Yu, Y., Zhou, Y., & Du, S. (2015). Leveraging two kinect sen-189
sors for accurate full-body motion capture. Sensors, 15 , 24297–24317.190
doi:10.3390/s150924297.191
Harrington, M. E., Zavatsky, A. B., Lawson, S. E. M., Yuan, Z., & Theologis,192
T. N. (2007). Prediction of the hip joint centre in adults, children, and193
patients with cerebral palsy based on magnetic resonance imaging. Journal194
of Biomechanics, 40 , 595–602. doi:10.1016/j.jbiomech.2006.02.003.195
Miller, N. R., Shapiro, R., & McLaughlin, T. M. (1980). A technique for obtain-196
ing spatial kinematic parameters of segments of biomechanical systems from197
cinematographic data. Journal of Biomechanics, 13 , 535–547.198
9
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Mundermann, L., Corazza, S., & Andriacchi, T. P. (2006). The evolution of199
methods for the capture of human movement leading to markerless motion200
capture for biomechanical applications. Journal of NeuroEngineering and201
Rehabilitation, 3 , 1–11. doi:10.1186/1743-0003-3-6.202
Pfister, A., West, A. M., Bronner, S., & Noah, J. A. (2014). Compara-203
tive abilities of Microsoft Kinect and Vicon 3D motion capture for gait204
analysis. Journal of Medical Engineering and Technology , 38 , 274–280.205
doi:10.3109/03091902.2014.909540.206
Schmitz, A., Ye, M., Shapiro, R., Yang, R., & Noehren, B. (2014). Ac-207
curacy and repeatability of joint angles measured using a single camera208
markerless motion capture system. Journal of Biomechanics, 47 , 587–591.209
doi:10.1016/j.jbiomech.2013.11.031.210
Seethapathi, N., Wang, S., Saluja, R., Blohm, G., & Kording, K. P. (2019).211
Movement science needs different pose tracking algorithms. arXiv preprint212
arXiv:1907.10226 , .213
Winter, D. A. (2009). Biomechanics and motor control of human movement .214
John Wiley & Sons.215
10
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
MAE =1
n
n∑i=1
|xm(i)− xo(i)| (1)
Table 1: The differences of corresponding joint positions estimated from the two differentmotion captures.
Walk1K Walk4K Jump1K Jump4K Throw1K Throw4K
X 28.4 24.6 27.1 20.9 21.7 19.3ShoulderR Y 17.0 49.7 12.5 11.9 27.7 32.5
Z 17.9 16.5 29.5 30.8 15.8 13.9X 4.32 6.96 9.36 8.25 47.3 45.2
ElbowR Y 37.0 66.8 27.0 20.5 28.7 35.1Z 21.7 22.2 26.9 35.3 38.0 38.9X 5.78 7.52 8.96 8.31 40.6 47.5
WristR Y 19.0 44.2 13.2 20.6 28.7 40.3Z 15.7 16.8 23.8 38.9 24.7 26.7X 9.65 7.67 8.77 6.01 29.5 25.0
HipR Y 21.3 49.4 15.0 14.2 13.5 20.8Z 24.4 20.6 31.0 32.1 27.2 23.5X 6.41 4.09 7.74 6.47 15.2 13.1
KneeR Y 25.9 48.2 8.48 18.3 13.8 19.1Z 10.1 11.4 14.8 20.9 24.4 20.4X 9.68 8.73 9.82 6.67 12.3 19.1
AnkleR Y 28.6 58.1 9.31 11.0 17.7 22.2Z 11.7 20.7 20.6 27.9 12.4 20.3
11
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Figure 1: Positions of reflective markers attached on the landmarks of the body. Forty-eight markers were placed on the fingertip (right), third metacarpal, styloid process of ulna,styloid process of radius, humerus-medial epicondyle, humerus-lateal epicondyle, humerus-lesser tubercle, under the scapula-acromial angle, scapula-acromion, toe, first metatarsal, fifthmetatarsal, calcaneus, malleolus medialis, malleolus lateralis, knee medial side, knee lateralside, trochanter major, top of head, ear, upper margin of sternum, C7 vertebra, lowest edgeof rib, sternum-xiphoid process, T8/12 vertebra, anterior superior iliac spine and posteriorsuperior iliac spine.
12
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Video Camera
Calibration Pole
Control Point
Measurement
Movement
Measurement
2D CoordinatesCamera Parameter
OpenPose
3D Global Coordinates
Camera �
Camera �
DLT transform・・・
�� �� �� ,��
Figure 2: Experimental setup and overview of the markerless motion capture.
13
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Figure 3: Twenty-five keypoints (nose, neck, shoulder, elbow, wrist, hip, knee, an-kle, eye, ear, big-toe, small-toe, and heel) of human body tracked by OpenPose(CMU-Perceptual-Computing-Lab, 2017).
14
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
(a)
Ball T
hro
win
g(a2)(a1)
(a3) (a4)
(b)
Walk
ing
(b1) (b2)
(b3) (b4)
Figure 4: Examples of participant’s pose estimated by the marker-based motion capture(Mocap) and by the markerless motion capture using OpenPose (OpenPose) during a (a) ballthrowing task and (b) walking task. 15
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
X Y ZAnkle
(w
alk
4K)
Elb
ow
(th
row
1K)
Knee (
jum
p1K)
Ankle
(th
row
1K)
(a)
(b)
(c)
(d)
Figure 5: Time series profiles of joint positions estimated by the marker-based motion capture(Mocap) and by the markerless motion capture using OpenPose (OpenPose). The meanabsolute error (MAE) through analysis duration is shown in each panel.
16
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint
Success Failure
(b)
Walk
ing
(a)
Ball T
hro
win
g
Figure 6: Examples of pose estimation using OpenPose during a (a) ball throwing task and(b) walking task. Left panels show the frame where the tracking of all joints was succeeded(to some extent), and right panels show the frame where the tracking of right arm joints wasmissed.
17
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint