Evaluation of 3D markerless motion capture …29 develop a 3D markerless motion capture using OpenPose with multiple syn-30 chronized video cameras, then assess the accuracy of the

Evaluation of 3D markerless motion capture accuracyusing OpenPose with multiple video cameras

Nobuyasu Nakanoa,b,∗, Tetsuro Sakuraa, Kazuhiro Uedaa, Leon Omuraa,b,Arata Kimuraa, Yoichi Iinoa, Senshi Fukashiroa, Shinsuke Yoshiokaa

aThe University of Tokyo, Tokyo, Japan.bResearch Fellow of the Japan Society for the Promotion of Science, Tokyo, Japan.

Abstract

There is a need within human movement sciences for a markerless motion cap-

ture system, which is easy to use and sufficiently accurate to evaluate motor

performance. This study aims to develop a 3D markerless motion capture tech-

nique, using OpenPose with multiple synchronized video cameras, and examine

its accuracy in comparison with optical marker-based motion capture. Partic-

ipants performed three motor tasks (walking, countermovement jumping, and

ball throwing), with these movements measured using both marker-based opti-

cal motion capture and OpenPose-based markerless motion capture. The differ-

ences in corresponding joint positions, estimated from the two different methods

throughout the analysis, were presented as a mean absolute error (MAE). The

results demonstrated that, qualitatively, 3D pose estimation using markerless

motion capture could correctly reproduce the movements of participants. Quan-

titatively, of all the mean absolute errors calculated, approximately 47% were

less than 20 mm and 80% were less than 30 mm. However, 10% were greater

than 40 mm. The primary reason for mean absolute errors exceeding 40mm

was that OpenPose failed to track the participant’s pose in 2D images owing to

failures, such as recognition of an object as a human body segment, or replacing

∗Corresponding authorEmail address: [email protected] (Nobuyasu Nakano)

Preprint submitted to Journal of Biomechanics November 6, 2019

.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/842492doi: bioRxiv preprint

https://doi.org/10.1101/842492

http://creativecommons.org/licenses/by-nc-nd/4.0/

one segment with another depending on the image of each frame. In conclu-

sion, this study demonstrates that, if an algorithm that corrects all apparently

wrong tracking can be incorporated into the system, OpenPose-based marker-

less motion capture can be used for human movement science with an accuracy

of 30mm or less.

Keywords: OpenPose, markerless, motion capture

1. Introduction1

Motion capture systems have been used extensively as a fundamental tech-2

nology within biomechanics research. However, traditional marker-based ap-3

proaches experience significant environmental constraints. For example, mea-4

surements are difficult to perform in environments wherein wearing markers5

during the activity is not ideal (such as sporting games). Markerless measure-6

ments without such environmental constraints can facilitate new learnings about7

human movements (Mundermann et al., 2006); however, complex information8

processing technology is required to make an algorithm that recognizes human9

poses or skeletons from images. Therefore, it is desirable to many biomechanics10

researchers to develop a markerless motion capture that is easy to use.11

Recently, automatic human pose estimation using deep learning techniques12

have attracted attention amongst computer vision researchers. OpenPose is one13

of the most popular technologies (Cao et al., 2018) and is deemed easy to use14

for biomechanics researchers. It is open source software that automatically esti-15

mates human joint centers and skeletons from 2D RGB images, outputting the16

2D coordinates in the images. Kinect is another easy-to-use markerless motion17

capture that has been used in many studies (Clark et al., 2012; Pfister et al.,18

2014; Gao et al., 2015; Schmitz et al., 2014). OpenPose, when compared to19

Kinect, has less constraints on the distance between the camera and the target20

2


https://doi.org/10.1101/842492


to be measured as well as the sampling rate of video recording, because it can21

estimate the human pose from RGB image without using a depth sensor.22

Seethapathi et al. (2019), which reviewed pose tracking studies from the per-23

spective of movement science, pointed out that human pose tracking algorithms,24

such as OpenPose, did not prioritize the quantities that matter for movement25

science. It remains unclear whether the accuracy of the OpenPose-based 3D26

markerless motion capture is appropriate for human movement studies such27

as sports biomechanics or clinical biomechanics. The aim of this study is to28

develop a 3D markerless motion capture using OpenPose with multiple syn-29

chronized video cameras, then assess the accuracy of the 3D markerless motion30

capture by comparing with an optical marker-based motion capture.31

2. Materials and Methods32

Participants. Two healthy male volunteers participated in this experiment. The33

mean age, height, and body mass of the participants were 22.0 years, 173.5 cm,34

and 69.5 kg, respectively. The participants provided written informed consent35

prior to the commencement of the study, and the experimental procedure used36

in this study was approved by the Ethics Committee of the university with37

which the authors were affiliated.38

Overview of data collection. Participants performed three motor tasks in the39

following order: walking, countermovement jumping, and ball throwing. These40

movements were measured using both a marker-based optical motion capture41

and a video camera-based (markerless) motion capture. A light was used to42

synchronize the data obtained from all the video cameras and the two differ-43

ent measurement systems. All methods and instrumentation details are in the44

following subsections.45

3


https://doi.org/10.1101/842492


Marker-based motion capture. Forty-eight reflective markers were attached onto46

body landmarks (Figure 1). The coordinates of these reflective markers upon47

the participants’ bodies were recorded using a 16-camera motion capture system48

(Motion Analysis Corp, Santa Rosa, CA, USA) at a sampling rate of 200 Hz.49

The elbow, wrist, knee, and ankle joint centers were assigned to the mid-points50

of the lateral and medial markers, while the shoulder joint centers were assigned51

to the mid-points of the anterior and posterior shoulder markers. The hip joint52

centers were estimated using the method described by Harrington et al. (2007).53

The raw kinematic data was smoothed using a zero-lag fourth order Butterworth54

low-pass filter. The cut-off frequency of the filter was determined using a residual55

analysis (Winter, 2009). Data analysis was performed using MATLAB (v2019a,56

MathWorks, Inc., Natick, MA, USA).57

Markerless motion capture. The experimental setup and overview of the mark-58

erless motion capture are shown in Figure 2. The markerless motion capture59

consisted of five video cameras (GZ-RY980, JVCKENWOOD Corp, Yokohama,60

Kanagawa, Japan). Two measurement conditions, i.e. combinations of video61

camera resolutions and sampling frequencies, were implemented: 1920 × 108062

pixels at 120 Hz (1K condition) and 3840×2160 pixels at 30 Hz (4K condition).63

OpenPose (version 1.4.0) was installed from GitHub (CMU-Perceptual-Computing-Lab,64

2017) and run with GPU (GEFORCE RTX 2080 Ti, Nvidia Corp, Santa Clara,65

CA, USA) under default settings. Twenty-five keypoints (Figure 3) of the partic-66

ipant’s body were outputted independently for each frame. The control points,67

at which 3D global coordinates could be identified, were measured using the68

video cameras with use of a calibration pole. The 2D video camera coordinates69

obtained from OpenPose were transformed to 3D global coordinates using a70

direct linear transformation (DLT) method (Miller et al., 1980). The raw kine-71

matic data was smoothed using a zero-lag fourth order Butterworth low-pass72

4


https://doi.org/10.1101/842492


filter. The cut-off frequency of the filter was determined using residual analysis73

(Winter, 2009) with its ranges of 5-8 Hz and 2-3 Hz in the 1K and 4K conditions,74

respectively.75

Data analysis. The position data obtained using the marker-based motion cap-76

ture was downsampled using the spline function to alter the number of frames77

such that they are the same as that obtained using markerless motion capture.78

The analysis period durations were defined for each individual motor task as79

follows: from the second step heel contact to the next heel contact of the same80

leg in a walking task, from the start of the squatting motion to the recovery of81

the initial upright stance in a jumping task, and from the toe-off on the oppo-82

site side of the throwing arm to the end of the arm-swing in a throwing task.83

The differences in the corresponding joint positions that were estimated from84

the two different motion captures throughout the analysis durations were cal-85

culated. Mean absolute error (MAE) was used as the indicator of the difference86

as described by Equation 1, where, xm and xo are the positions estimated by87

the marker-based and OpenPose-based approaches, respectively.88

3. Results89

Examples of the 3D pose estimations obtained by the two different motion90

captures are depicted in Figure 4. In addition, video examples that show the91

participant’s pose during movements are provided as supplementary materials92

to this paper. The landmarks’ positions tracked by OpenPose (Figure 3) do93

not necessarily correspond to the points estimated by the marker-based ap-94

proach. The joint positions of the shoulder, elbow, wrist, hip, knee, and ankle95

tracked by OpenPose are considered to be approximately the same as those96

estimated by the marker-based approach. However, the positions of other land-97

marks tracked by OpenPose are considered to be different from those estimated98

5


https://doi.org/10.1101/842492


by the marker-based approach. The representative time-series profiles of joint99

positions estimated by both the marker-based motion capture (Mocap) and the100

OpenPose-based markerless motion capture (OpenPose) can be seen in Figure 5.101

The mean absolute error (MAE) of the two plots throughout the duration of102

analysis is shown in each panel. The shapes of the time-series profiles were103

found to be approximately the same.104

Quantitatively, the MAE of joint positions in Figure 5a and Figure 5b were105

less than 20 mm; however, the MAEs of joint positions in Figure 5c and Fig-106

ure 5d were greater than 40 mm. The MAEs were particularly large at specific107

moments within the analysis duration (i.e. at 45% and 80% time in Figure 5c,108

described in Figure 4c). The MAEs of the corresponding joint positions, es-109

timated from the two different motion captures for all trials, are presented in110

Table 1. Of these MAEs in Table 1, approximately 47% are less than 20 mm,111

80% are less than 30 mm, and 10% is greater than 40 mm.112

The accuracy of the 3D pose estimation using the markerless motion capture113

depends on 2D pose tracking by OpenPose. Because the algorithm that tracks114

the human pose was applied to each frame of the video independently, within115

a single trial, there are frames where the participant’s pose was well tracked,116

whereas in others, the participant’s pose was not well tracked. Examples of pose117

estimation successes and failures using OpenPose are depicted in Figure 6.118

4. Discussion119

This study aimed to examine the accuracy of 3D markerless motion capture120

using OpenPose with multiple video cameras, through comparison with an op-121

tical marker-based motion capture. Qualitatively, 3D pose estimation using the122

markerless motion capture approach can correctly reproduce the movements of123

participants (Figure 4 and Supplementary movies). The MAE in 80% of all124

6


https://doi.org/10.1101/842492


trials conducted was found to be less than 30 mm. This small error may be due125

to the shortcomings in the OpenPose tracking precision. Because the major-126

ity of computer vision-based pose tracking algorithms, including OpenPose, are127

based on the supervised learning using manually labeled data, it is inevitable128

that small errors in 3D pose are caused by inherent noise in the training data.129

Proportionately large MAEs exceeding 40 mm were observed in certain cases130

(Figure 5). Observing the estimated pose during movements reveals that when131

the correct joint positions (including noises) were estimated in the trial, an error132

of less that 30 mm existed (e.g. Figure 4 a1, a2, a4). However, when apparently133

incorrect joint positions were estimated, a comparatively large error of greater134

thatn 40 mm was observed (e.g. Figure 4 a3).135

Observing the estimated pose during movements reveals that the correct136

joint positions (including noises) are estimated in the trial including a small137

error that is less than 30 mm (e.g. Figure 4 a1,a2,a4); however, apparently in-138

correct joint positions are estimated in the trial including a relatively large error139

that is more than 40 mm (e.g. Figure 4 a3). The primary reason for estimating140

apparently incorrect 3D positions is that OpenPose failed to track the partici-141

pant’s pose depending on the image of each individual frame (Figure 6). Du to142

the 2D tracking failures, correction of such failures is required to achieve a more143

accurate 3D pose estimation. Within this study, for example, the interchange of144

the left and right segments was retrospectively corrected because without this145

correction, the 3D pose was completely different from the human shape. How-146

ever, recognizing an object as a human body segment (e.g. failures in Figure 6)147

was not corrected because it may have required manual tracking; moreover,148

without the correction, the 3D pose accuracy could be evaluated. Therefore, to149

use the OpenPose-based markerless motion capture in human movement science150

studies, it is considered necessary to incorporate algorithms that can correct all151

7


https://doi.org/10.1101/842492


such a tracking failures.152

Other sources of error may be data processing, such as time synchronization.153

The error in the movement direction (i.e. Y direction for walking task and Z154

direction for jumping task), especially in the 30 Hz measurement (4K condi-155

tion), tends to be large (Table 1). Within the time series profiles, the timing156

of synchronization appears to affect to the error of the two motion captures157

(Figure 5d). However, because this is a problem caused by the process of com-158

paring the two motion captures, the effect of this error on the accuracy of the159

markerless system measurement should be relatively small.160

This study is preliminary work, and thus requires further examination. The161

accuracy of other biomechanical parameters, such as joint angle, joint angular162

velocity, and joint torque, needs to be investigated. Additionally, a program that163

is capable of correcting the OpenPose tracking failures described in this study,164

which can improve the accuracy of the 3D estimation, needs to be developed.165

While limitations still exist, the OpenPose-based markerless motion capture is166

expected to be applied in the future to sporting games that have been considered167

difficult to measure with marker-based motion capture.168

Acknowledgements169

This work was supported by JST-Mirai Program Grant Number JPMJMI18C7,170

Japan.171

Conflict of interest172

There are no conflicts of interest to declare.173

8


https://doi.org/10.1101/842492


Supplementary material174

Supplementary videos are attached with this paper. The videos show the175

human pose during walking, jumping, and throwing obtained by two different176

motion capture.177

References178

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2018). Openpose:179

realtime multi-person 2d pose estimation using part affinity fields. arXiv180

preprint arXiv:1812.08008 , .181

Clark, R. A., Pua, Y. H., Fortin, K., Ritchie, C., Webster, K. E.,182

Denehy, L., & Bryant, A. L. (2012). Validity of the Microsoft Kinect183

for assessment of postural control. Gait and Posture, 36 , 372–377.184

doi:10.1016/j.gaitpost.2012.03.033.185

CMU-Perceptual-Computing-Lab (2017). OpenPose: Real-time multi-person186

keypoint detection library for body, face, hands, and foot estimation.187

https://github.com/CMU-Perceptual-Computing-Lab/openpose.188

Gao, Z., Yu, Y., Zhou, Y., & Du, S. (2015). Leveraging two kinect sen-189

sors for accurate full-body motion capture. Sensors, 15 , 24297–24317.190

doi:10.3390/s150924297.191

Harrington, M. E., Zavatsky, A. B., Lawson, S. E. M., Yuan, Z., & Theologis,192

T. N. (2007). Prediction of the hip joint centre in adults, children, and193

patients with cerebral palsy based on magnetic resonance imaging. Journal194

of Biomechanics, 40 , 595–602. doi:10.1016/j.jbiomech.2006.02.003.195

Miller, N. R., Shapiro, R., & McLaughlin, T. M. (1980). A technique for obtain-196

ing spatial kinematic parameters of segments of biomechanical systems from197

cinematographic data. Journal of Biomechanics, 13 , 535–547.198

9


https://doi.org/10.1101/842492


Mundermann, L., Corazza, S., & Andriacchi, T. P. (2006). The evolution of199

methods for the capture of human movement leading to markerless motion200

capture for biomechanical applications. Journal of NeuroEngineering and201

Rehabilitation, 3 , 1–11. doi:10.1186/1743-0003-3-6.202

Pfister, A., West, A. M., Bronner, S., & Noah, J. A. (2014). Compara-203

tive abilities of Microsoft Kinect and Vicon 3D motion capture for gait204

analysis. Journal of Medical Engineering and Technology , 38 , 274–280.205

doi:10.3109/03091902.2014.909540.206

Schmitz, A., Ye, M., Shapiro, R., Yang, R., & Noehren, B. (2014). Ac-207

curacy and repeatability of joint angles measured using a single camera208

markerless motion capture system. Journal of Biomechanics, 47 , 587–591.209

doi:10.1016/j.jbiomech.2013.11.031.210

Seethapathi, N., Wang, S., Saluja, R., Blohm, G., & Kording, K. P. (2019).211

Movement science needs different pose tracking algorithms. arXiv preprint212

arXiv:1907.10226 , .213

Winter, D. A. (2009). Biomechanics and motor control of human movement .214

John Wiley & Sons.215

10


https://doi.org/10.1101/842492


MAE =1

n

n∑i=1

|xm(i)− xo(i)| (1)

Table 1: The differences of corresponding joint positions estimated from the two differentmotion captures.

Walk1K Walk4K Jump1K Jump4K Throw1K Throw4K

X 28.4 24.6 27.1 20.9 21.7 19.3ShoulderR Y 17.0 49.7 12.5 11.9 27.7 32.5

Z 17.9 16.5 29.5 30.8 15.8 13.9X 4.32 6.96 9.36 8.25 47.3 45.2

ElbowR Y 37.0 66.8 27.0 20.5 28.7 35.1Z 21.7 22.2 26.9 35.3 38.0 38.9X 5.78 7.52 8.96 8.31 40.6 47.5

WristR Y 19.0 44.2 13.2 20.6 28.7 40.3Z 15.7 16.8 23.8 38.9 24.7 26.7X 9.65 7.67 8.77 6.01 29.5 25.0

HipR Y 21.3 49.4 15.0 14.2 13.5 20.8Z 24.4 20.6 31.0 32.1 27.2 23.5X 6.41 4.09 7.74 6.47 15.2 13.1

KneeR Y 25.9 48.2 8.48 18.3 13.8 19.1Z 10.1 11.4 14.8 20.9 24.4 20.4X 9.68 8.73 9.82 6.67 12.3 19.1

AnkleR Y 28.6 58.1 9.31 11.0 17.7 22.2Z 11.7 20.7 20.6 27.9 12.4 20.3

11


https://doi.org/10.1101/842492


Figure 1: Positions of reflective markers attached on the landmarks of the body. Forty-eight markers were placed on the fingertip (right), third metacarpal, styloid process of ulna,styloid process of radius, humerus-medial epicondyle, humerus-lateal epicondyle, humerus-lesser tubercle, under the scapula-acromial angle, scapula-acromion, toe, first metatarsal, fifthmetatarsal, calcaneus, malleolus medialis, malleolus lateralis, knee medial side, knee lateralside, trochanter major, top of head, ear, upper margin of sternum, C7 vertebra, lowest edgeof rib, sternum-xiphoid process, T8/12 vertebra, anterior superior iliac spine and posteriorsuperior iliac spine.

12


https://doi.org/10.1101/842492


Video Camera

Calibration Pole

Control Point

Measurement

Movement

Measurement

2D CoordinatesCamera Parameter

OpenPose

3D Global Coordinates

Camera �

Camera �

DLT transform・・・

�� ,��

Figure 2: Experimental setup and overview of the markerless motion capture.

13


https://doi.org/10.1101/842492


Figure 3: Twenty-five keypoints (nose, neck, shoulder, elbow, wrist, hip, knee, an-kle, eye, ear, big-toe, small-toe, and heel) of human body tracked by OpenPose(CMU-Perceptual-Computing-Lab, 2017).

14


https://doi.org/10.1101/842492


(a)

Ball T

hro

win

g(a2)(a1)

(a3) (a4)

(b)

Walk

ing

(b1) (b2)

(b3) (b4)

Figure 4: Examples of participant’s pose estimated by the marker-based motion capture(Mocap) and by the markerless motion capture using OpenPose (OpenPose) during a (a) ballthrowing task and (b) walking task. 15


https://doi.org/10.1101/842492


X Y ZAnkle

(w

alk

4K)

Elb

ow

(th

row

1K)

Knee (

jum

p1K)

Ankle

(th

row

1K)

(a)

(b)

(c)

(d)

Figure 5: Time series profiles of joint positions estimated by the marker-based motion capture(Mocap) and by the markerless motion capture using OpenPose (OpenPose). The meanabsolute error (MAE) through analysis duration is shown in each panel.

16


https://doi.org/10.1101/842492


Success Failure

(b)

Walk

ing

(a)

Ball T

hro

win

g

Figure 6: Examples of pose estimation using OpenPose during a (a) ball throwing task and(b) walking task. Left panels show the frame where the tracking of all joints was succeeded(to some extent), and right panels show the frame where the tracking of right arm joints wasmissed.

17


https://doi.org/10.1101/842492


Documents

Evaluation of 3D markerless motion capture …29 develop a 3D markerless motion capture using OpenPose with multiple syn-30 chronized video cameras, then assess the accuracy of the