Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
MSc Thesis
Ekundayo Olufemi A.
{Contactless Measurement in Smart Environment for the Elderly People
Using Kinect v2 Sensor.}
School of Computer Science
{International Master's Degree Programme in Information Technology}
February 2018
Foreword
This thesis was done at the School of Computing, University of Eastern Finland dur-
ing the autumn 2017.
I want to extend my gratitude to my parents, friends, teachers, and especially my
supervisor Prof. Pekka Toivanen.
List of abbreviations
AAL Ambient Assisted Living
API Application Programming Interface
CMOS Complementary Metal–oxide–semiconductor
GDL Gesture Description Language
IR Infrared
IMU Inertia Measurement Unit
LIDAR Light Detection and Ranging
RFID Radio-frequency Identification
RGB Red Green Blue
SDK Software Development Kit
SEAL Smart Environment for assisted Living
SIFT Scale Invariant Feature Transform
SSIM Structural Similarity Index-based Measure
TOF Time of Flight
iv
Contents
1 Introduction to Kinetic v2 Sensor ............................................................... 1
1.1 Evolution of Kinect Sensor ................................................................. 1 1.2 Technology of Kinect ......................................................................... 2 1.3 Kinect (2.0 2013) – Designed for Xbox One ...................................... 3 1.4 1.5 Non- Commercial Kinect designed for Microsoft Windows ........ 3 1.5 Kinect Versions from 1.5 to 1.8 .......................................................... 3
1.6 Kinect v2 ............................................................................................. 3 1.7 Significance of Kinect v2 and Assisted Living Facilities ................... 4 1.8 Potential Use of Kinect v2 in Assisted Living .................................... 4
1.8.1 Different Spheres for Application of Kinect v2 .................. 5
2 Review of related literature ......................................................................... 6
2.1 Introduction ......................................................................................... 6 2.2 Use of Wireless Sensor Networks ...................................................... 6
2.3 Kinect v2 Depth Sensor ...................................................................... 6 2.4 Use in Karate Techniques ................................................................... 6
2.5 Advantages of v2 over v1 ................................................................... 7 2.6 Pose Estimation of Human Body Part Using Multiple Cameras ........ 8 7 An Innovative Hearing System Utilizing the Human Body .............. 8
2.8 Accuracy and Reliability of Optimum Distance in Kinect v2 ........... 9 2.9 Integration of Microsoft Kinect with Simulink ................................ 10
2.10 Utility and usability of Kinect v2 and Leap Motion ........................ 12 2.11 A Depth-Based Fall Detection System Using a Kinect Sensor ........ 13 2.12 Experimental Studies on Human Body ............................................. 15
2.13 Body Movement Analysis and Recognition ..................................... 15
2.14 An Integrated Platform for Live 3-D Human Reconstruction ......... 18 2.15 Automated Training and Maintenance through Kinect .................... 19 2.15 Kinect in the Kitchen and other Practical Home Environments ....... 20
2.16 Kinect Gaming and Physiotherapy ................................................... 21
3 Research Methodology ............................................................................. 23
3.1 Introduction ....................................................................................... 23
3.2 Model of the research ....................................................................... 24 3.3 Research Design ............................................................................... 24 3.4 Primary Data ..................................................................................... 25 3.5 Summary ........................................................................................... 25
4 Data analysis and presentation .................................................................. 26
4.1 Introduction ....................................................................................... 26
4.2 Smart Home environments ............................................................... 27
4.3 Movement detection models ............................................................. 39 4.4 Skeletal Tracking systems ................................................................ 53
5 Findings and conclusion ........................................................................... 57
v
5.1 Findings ............................................................................................ 57
5.2 Conclusions ....................................................................................... 60
References ......................................................................................................... 62
Appendices
Appendix 1: Checklist (2 pages)
vi
Table of Figures and Illustrations
Figure 1-Xbox 360, Kinect v1. Klesistern (2014) ....................................................... 1
Figure 2- CMOS sensor, Primesense. Journal of Sensors (2014) ................................ 2
Figure 3-Kinect sensor components. Journal of Sensors (2013) ................................. 4
Figure 4- GDL illustration. Teng et.al. (2013) ............................................................ 7
Figure 5- Medical application. Lim et.al. (2014) ......................................................... 9
Figure 6- Simulink Kinect. Joshua et.al. (2014) ........................................................ 11
Figure 7- Leap Motion Sensor. Hughes et.al. (2015) ................................................ 12
Figure 8- Motion Sensor illustration. Hughes et.al. (2015) ....................................... 13
Figure 9- Fall detection illustration. Samuele et.al. (2014) ....................................... 14
Figure 10- Movement analysis Glove. Yang et.al. (2012) ......................................... 16
Figure 11- Humanoid robotics illustration. Clingal et.al. (2014) .............................. 16
Figure 12- RGDB illustration. Immitrios et.al. (2014) .............................................. 19
Figure 13- Smart Home System illustration. Berkley University Journal et.al. (2013)
................................................................................................................................... 20
Figure 14- Pose Experiments, Kinect tests. (2013) .................................................. 23
Figure 15- Research design ........................................................................................ 25
Figure 16- Gradinaru (2016) graphical representation of system .............................. 26
Figure 17- Conceptual Framework of a smart home environment ............................ 28
Figure 18- Smart home environment layered description ......................................... 28
Figure 19- Smart home environment layout .............................................................. 30
Figure 20- Hondori et al (2013) system set up including inertia sensors and Kinect
sensors ........................................................................................................................ 31
Figure 21- Hondori et al (2013) 3-D trajectories ....................................................... 32
vii
Figure 22- Hondori et al (2013) experimental data on body movements .................. 33
Figure 23- Hondori et al (2013) limb changes in task like drinking and eating ........ 34
Figure 24- Hondori et al (2013) inertia sensor data from individual’s items ............ 34
Figure 25- Mohamed et al (2013) smart house used in the experiment .................... 35
Figure 26- Mohamed et al (2013) Natural User Interface ......................................... 36
Figure 27- Mohamed et al (2013) Waist detection posture ....................................... 37
Figure 28- Mohamed et al (2013) Waist detection posture ....................................... 37
Figure 29- Mohamed et al (2013) Kinect procedure for gesture recognition ............ 38
Figure 30- Mohamed et al (2013) .............................................................................. 38
Figure 31- Mohamed et al (2013) Kinect toolbox recognition of circle gestures ...... 39
Figure 32- Chin et al (2013) Three Kinect sensors, IR light, RGB camera, IR detector
................................................................................................................................... 40
Figure 33- Chin et al (2013) Depth sensor distance .................................................. 40
Figure 34- Chin et al (2013) Depth frame bit pixel ................................................... 41
Figure 35- Chin et al (2013) Algorithm depth distance ............................................. 41
Figure 36- Chin et al (2013) Average depth distance vs Actual distance .................. 44
Figure 37- Chin et al (2013) Accuracy analysis AMPE vs Distance ......................... 44
Figure 38- Chin et al (2013) Precision analysis std vs Distance ............................... 45
Figure 39-Alexiadis et al (2017)3-D Camera and sensor setup ................................. 47
Figure 40- Alexiadis et al (2017) Stages for the proposed model ............................. 48
Figure 41- Alexiadis et al (2017) Image quality reconstruction; Kinect data,
waterlight geometry and Poisson ............................................................................... 48
Figure 42- Tahavori et al (2013) Kinect for Xbox vs Windows ................................ 49
Figure 43- Sengupta and Ohya (1996) Two staged pose estimation illustration ....... 51
viii
Figure 44- Sengupta and Ohya (1996) back projection method estimation .............. 51
Figure 45- Sengupta and Ohya (1996) images used for the experiment ................... 52
Figure 46- Sengupta and Ohya (1996) extracted silhouette images .......................... 52
Figure 47- Sengupta and Ohya (1996) rendered images from the parameter set ...... 53
Figure 48- Sengupta and Ohya (1996) rendered images of the transferred model .... 53
Figure 49- Tao et al (2013) constant camera error .................................................... 54
Figure 50- Tao et al (2013) variable camera error ..................................................... 55
Figure 51- Choe et al (2014) invariability of IR images and RGB under different
lighting conditions ..................................................................................................... 56
Figure 52- Choe et al (2014) Data capturing system, used to obtain the base mesh . 56
Figure 53- Choe et al (2014) input shading image, projected mesh and depth map . 56
1 Introduction to Kinetic V2 Sensor
Kinect technology was initially named as Project Natal during the initial phases of its
development. It is a series of input devices developed by Microsoft for different vid-
eo consoles including Xbox one and Xbox 360. The device makes use of different
gestures and spoken commands to provide a natural interface to users to interact with
console and computer (Lange, 2011). In 2010, Kinect was developed to enhance the
audience base of Xbox 360 and was rumored to be launched with release of Xbox
360 console [1]. These reports were however dismissed by Microsoft. At that time, it
was believed by the company that Xbox will last until 2015. Following the release,
different experiments were conducted to evaluate the stability of the device. In 2009
to prove the stability of Kinect, different games were shown in Tokyo Game show of
year 2009. The games included Beautiful Katamari and Space Invaders Extreme
were the important ones (Stowers, 2011). Initially, it was planned that for Kinect
operations like skeletal mapping, a microprocessor would also be accompanied by
the sensor unit, however, later it was decided that there would not be any dedicated
processor in it. For this dedicated purpose, processor cores were developed instead.
Research by Stowers (2011) further showed that only 10-15 % computing resources
were used by Kinect. In the same timeframe development of Kinect like gadgets be-
came the trend of the time.
Figure 1-Xbox 360, Kinect v1. Klesistern (2014)
1.1 Evolution of Kinect Sensor
After the "World Premiere 'Project Natal' for Xbox 360 Experience" of 2010, Kinect
was the official name granted to this gadget. This word was created from a combina-
tion of two words kinetic and connect. Initially, this was thought to be an imperative
initiative and the date of launch was initially set as November 2010 by Microsoft [3].
This was however changed as the project faced delays. When Xbox 360 was later
announced to be launched it was ready for Kinect, port with connector and ready for
launch in July 2010.
At the time of release of the Kinect, there were many companies, working in collabo-
ration with Microsoft to ascertain its possibilities, application and compatibility with
other gadgets. Villaroman (2011) argued that because of its immense appeal and at-
tention Microsoft announced that it would launch a commercial version along with
launch of Software Development Kit (SDK) for the companies [1]. Finally, Mi-
crosoft launched Windows SDK, the commercial version of Kinect. At that time,
different companies were working on different applications for Kinect.
1.2 Technology of Kinect
Israeli developer Primesense, developed Kinect v1. It was a combination of hardware
and software, both by Microsoft. Kinect v1 generated 3-D view of an object through
a combination of gadgetry including camera, infra-red projector and microchip spe-
cially designed for this purpose. Versions of 3-D reconstruction of image based was
done by scanner system called Light Coding. To capture video data in 3-D, in spite
of different light conditions, depth sensor has been designed with a combination of
monochrome Complementary metal–oxide–semiconductor (CMOS) sensor. The
depth sensor was an innovative feature addition that fitted well with most applica-
tions. Keeping in view, the presence of furniture or other obstacles, player’s physical
environment and game play can be calibrated automatically by Kinect software. It
also has the ability to adjust the depth of the sensor.
Figure 2- CMOS sensor, Primesense. Journal of Sensors (2014)
The developer PrimeSense, clarified that the number of people that can be tracked by
the software are only restricted by the number of people who can fit in the camera.
According to Microsoft only 6 players can be tracked by the software simultaneous-
ly. However joint players could go up to 20. The key features, which were regarded
as success of Kinect were its voice recognition, facial recognition and most im-
portant, gesture recognition.
1.3 Kinect (2.0 2013) – Designed for Xbox One
It was released in November 2013. The old technology of Primesense was replaced
by Microsoft’s developed ‘time of flight sensor’. According to most analysts like
Azzari (2013) this innovation uses a time of flight camera and has great ability of
processing of 2 GB per second. It has three times greater accuracy over its predeces-
sor and can track with the help of an Infrared (IR) sensors. It also has the ability of
tracking 6 skeletons at a time. Kinect v2 came with improved video communication
and applications, specifically developed for video analytics. The accompanying Mi-
crophone is used to provide voice commands.
1.4 Non- Commercial Kinect designed for Microsoft Windows
In Feb 2012, Microsoft released a new version that has Windows 7 compatible PC
drivers. This version provided capabilities to developers built by using C++, C# and
Visual basic. It also had access to low level streams from depth and other sensors.
Almost 50 companies worked with Microsoft for the development of Kinect (Chang,
2012). The enhanced capabilities were for skeletal tracking and advanced audio ca-
pabilities. Skeletal tracking was to allow tracking of people by gesture driven appli-
cations. The audio capabilities were integrated with speech recognition Windows
application programming interface (API).
1.5 Kinect Versions from 1.5 to 1.8
These were started and launched in 19 different countries. It was released in 2012. A
new application known as Kinect for Windows v1.5 SDK including Kinect Studio
was developed. Users interacting with the application were to record, debug and play
back clips. In this version, tracking of arms, neck and head of Kinect using person
was developed for new system or joint skeletal system. The versions from 1.6 to 1.8
further improved with minor variations.
1.6 Kinect v2
It was released for the first time in 2014. It was designed on the same technology as
was Kinect for Xbox one.
Figure 3-Kinect sensor components. Journal of Sensors (2013)
1.7 Significance of Kinect v2 in Assisted Living Facilities
According to Biswass (2011), Kinect v2 is an advanced motion sensor capable of
measuring 3-D motion in a person. Kinect SDK, Microsoft made Kinect for Win-
dows was an interface to kinetic hardware, which was provided by an Application
Programme.
Assisted living residence is for the people with some disability or those that have
attained old age who cannot live independently or have opted to not live inde-
pendently (El-laithy, 2012). In recent past, with scientific developments in this field,
there has been a transformation from ‘care as a service’ to “care as business”. It has
evolved to a huge industry, in 2012, a survey conducted in US facilities showed ex-
istence of 22,500 such facilities. These can be standalone services or part of multi-
level senior living community. Kinect v2 sensor has emerged as a potential contribu-
tor in improving the standards of assisted living. The features of v2 are; enhanced
field of view, improved picture resolution, enhanced skeletal tracking and recogni-
tion of joints.
1.8 Potential Use of Kinect V2 in Assisted Living
Most researchers like Stowers (2104) agree that Kinect v2 can be a potential contrib-
utor to many more domains to enhance standards for assisted living. It can be used in
building of smart home environment, detection of driver fatigue by multi-sensor sig-
nals based methods and to model movement in human body by using twin cylinder
method. Kinect v2 can also provide a platform for live 3-D human reconstruction as
well as capturing motion. It can help in monitoring of patients during external beam
radiotherapy and assist in recognition of Karate techniques and similar domains.
In the rehabilitation system, it can help in undertaking skeletal tracking in virtual
reality rehabilitation system. Kinect v2 has also been widely used in geometry re-
finements required in the motion fields and human body tracking based on discrete
wavelet transform (DWT). Moreover, it can be used in shadow detection and classi-
fication, estimation of movements of human body parts and propagation along hu-
man body parts (Rowe, 2011).
1.8.1 Different Spheres with Scope for Application of Kinect v2
Sensor
In this paper we will capitalize upon the potential of Kinect v2, with regards to
assited living. Kinect v2 can be a contributor to facilitate living in assisted living
environment for elderly people and treatment of illnesses. People can perform their
routine exercises under the view of Kinect sensor, because it can analyze the move-
ments and correct any mistakes and accordingly pass on the instructions. This can
provide much needed motivation for elderly people to exercise regularly. Another
innovation of v2 sensor with regards to assisted living can be by providing a hearing
system using human body as a medium of transmission. This mechanism of replacing
sound transmitter and transmission line can be done by Kinect v2 Sensor.
Kinect v2 can also help in the treatment of Parkinson’s disease. It can measure clini-
cally relevant movements with accuracy like hand clasping and even tapping. Rela-
tive improvements or worsening of these movements over time could also be accu-
rately measured using Kinect v2.
2 Review of related literature
2.1 Introduction
A variety of studies have been undertaken to review the efficacy of Kinect v2 Sensor.
Researchers have even gone ahead to recommend use and applications. However,
utilization of Kinect v2 in assisted living is rarely found. Some of the research in-
clude;
2.2 Use of Wireless Sensor Networks
For Wireless Sensor Networks (WSNs), analysis, proposal and implementation for
smart home for assisted living has been done by Hemant and Ghayvat (2013). Ac-
cording to them WSNs are today the backbone of many systems. Smart home sys-
tems that provide assisted living to patients already use WSNs. These researchers
provided a protocol designed for providing smart homes for assisted living. They
described this protocol implementation in an old home built to specifically test the
implementation of a wireless sensor network. The protocol targets event and com-
munication based protocols and provides smart home solutions. However, sensors
alone were not found to be enough. Intelligent sampling and control algorithm is
designed according to sensor type and structure.
2.3 Kinect v2 Depth Sensor
Research by Lin and Longyu (2013) extensively described the use of Kinect depth
sensor, since its launch. Even though Microsoft has released a new version with im-
proved hardware as well, however, in their view the accuracy needed a test. They
performed experiments to check the Kinetic v2 depth sensor and its accuracy. They
observed some variations in the depth evaluations of the Kinect and proposed a toler-
ation method to enhance the accuracy while evaluating depth. [2]
2.4 Use in Karate Techniques
An Effectiveness comparison of Kinect v1 and Kinect v2 for recognition of Oyama
karate techniques has been done by (Marek and Tomasz, 2010). The purpose of study
was to evaluate Kinect v1 and Kinect v2 to recognize the actions of Karate Tech-
niques named as Oyama. Initially, multimedia cameras were famous for personal
computers and game consoles and were also cheaper while being used for these pur-
poses. However, Kinect v2 has given the concept a wide array of use. Its use for hu-
man computer interaction also gave it a new dimension.
According to their research Kinect can be used in medicine, education and for con-
trolling robotics arm. Kinect v2 has come up as one of the best intelligent home solu-
tions and has many potentials, yet to be explored and fully utilized. Postural segmen-
tation and assessment of postural control capabilities are the most common ap-
proaches to be used. Classification method is used to make gesture recognition pos-
sible. To perform tracking and generate motion capture data, kinetic sensor data is
preprocessed by kinetic libraries. Kinect v2 has appeared by enhancing the capabili-
ties of its predecessor i.e. Kinect.
2.5 Advantages of v2 over v1
In Kinect v2, Gesture Description Language (GDL) has been used as a classification
algorithm. The data was recorded by two professional and belt instructors. The re-
search collected 200 x movement samples per person. The data was divided into two
sections as training and evaluation. The data was then thoroughly assessed. Kinect v2
proved to be more reliable than Kinect v1, taking stock of recognition rates of GDL
classifier and error classification cases. The major advantage of Kinect v2 over
Kinect v1 was accurate calculation of leg joint positions. [3]
Figure 4- GDL illustration. Teng et.al. (2013)
A different research conducted by the University of North Carolina at Chapel Hill
has correctly illustrated the functions and classification of Kinetic shadow detection
feature. The research shows that Kinetic maps are often found with holes, missing
data or similar missing links in many of the cases. In their research they advocate a
different idea, which is, turning holes into a useful information (Teng and Hui,
2014). They proposed different types of shadows based on local patterns as shown by
geometry. Shadow information is then fully used. [4]
2.6 Pose Estimation of Human Body Part Using Multiple Cameras
There is a lot of existing research on methods of estimating the pose of multiple 2-D
and 3-D images and objects as a starting point (Kuntal and Jun, 2014). In the re-
search the approximate volume in 3-D is obtained by projecting the silhouettes in
images. The authors analyzed that existing means of communication like the video
conferencing systems have few limitations. The users are often at far distances, one
of the solutions viewed for this problem are feeling of co-location of humans [2].
Views of points in real space interact to object in space. It has tackled the issue by
assuming space in 3-D modeled. In this paper an example of human body part with
pose estimation has been given. The works include pose parameters by random selec-
tion. The author conducted few experiments using CAD model of a human head,
which was undertaken utilizing 4 cameras. These were placed in a semi-circle in
equal distance. Any algorithm for the estimation of pose is difficult to extend easily
for the application. Silhouette edges for experiments were separated manually. Three
randomly chosen points in volume are taken, every fifth point on edge of silhouette is
taken. The results were initially not good, but the results later improved. The algo-
rithms developed by them can easily be used in future [3].
2.7 An Innovative Hearing System Utilizing the Human Body as a
Transmission Medium
Some researchers recommended an innovation in hearing system using the human
body as the medium of transmission (Son and Kwang, 2013). This concept has made
the replacement of sound transmitter with human body. Self-demodulation is the
base of generating audible sound. Frequency of two waves difference in audio signal
was produced by self-modulation effect, through a non-linear medium. In this con-
text a user is able to hear sound without a transmitter and making noise by using self-
modulation. The concept of wireless sound transmission has been given by the au-
thor. Distortions in propagation process can be reduced by ultrasound [19]. The pa-
per has successfully given the concept of using human body as a transmission medi-
um for the proposed system as model to be used.
Figure 5- Medical application. Lim et.al. (2014)
2.8 Accuracy and Reliability of Optimum Distance for High Per-
formance Kinect Sensor
A high-performance research conducted by Lim and Shafriza (2013), analyzed the
sensor from a different perspective. In their camera i.e. depth/rg camera, each pixel
represented a distance, which corresponded directly to some point in this physical
world [20]. Biomedical application is one of the successful features of Microsoft
Kinect Sensor as it gives the necessary tools required to provide measurements of
volume, length and other measurements. These technologies have become popular
with time, applications like Time of flight (TOF) and Microsoft Kinect Sensor are
applicable in biomedical field, and come in domain of range camera. The principal
of working of TOF camera is of emission of modulated light on the scene [17]. This
light is reflected and measured with reference signal. To obtain depth information, it
is correlated with modulated light. The technique used by Kinect sensor is different
as utilized in infrared structured light projector and CMOS camera, which computes
depth of the scene. Now, 3-D technologies have come in market with depth cameras
and Kinect Sensor. The primary aim of development of Kinect sensor is its utiliza-
tion in biomedical applications. This Kinect sensor is like a camera, due to its speci-
fications [14]. These authors focused on the fact that Kinect sensor can provide ac-
curate and reliable depth distance values same with actual distance. The analysis of
the measurement of depth to actual distance has a lot of importance for the accuracy
of Kinect sensor. The depth array calculated by the researchers had a precision of up
to 11 bits value. Therefore, it is likely that the depth sensor measurements of Kinect
Sensor will provide non-linear function of distance [18]. The focus of the research
was also on default range and near range of distance from Kinect sensor.
The authors undertook the task of investigating depth data of Kinect sensor. In this
study, they carried out a reliability analysis of the sensor’s specification as claimed
by Microsoft. This research provided an insight into the authenticity of data. Experi-
ments conducted with these sensors have proved that error in depth measurements
are enhanced by enhancing distance to sensor. These variations range from a few mm
to 40 mm at the max [15]. The formula used for these calculations was Kuder-
Richardson formula. This study proved to be very useful as it provided the method-
ology to determine 3-D pose estimation in human motion application by carrying out
accurate, precise and reliable depth distance.
2.9 Integration of Microsoft Kinect with Simulink: Real-Time Ob-
ject Tracking Example
Microsoft Kinect has a lot of potential in system applications due to its introduction
as a low cost and high resolution 3-D sensing systems (Joshua and Tyler, 2015). The
purpose of their study was to develop Kinect block, and provide access to depth im-
age streams, and access to sensor cameras. Available drivers of Kinect, the interface
of C language, is an impediment to Kinect application development. This access pro-
vides the ability to incorporate without any difficulty to an image processing based
on Simulink. The study focused on issues affiliated with implementation aspect of
Kinect. One of such important aspect is calibration of sensor, another is utility of
Kinect block and Kinect and it is through 3-D object tracking example [9].
Figure 6- Simulink Kinect. Joshua et.al. (2014)
For detection of both moving and stationary obstacles, it is greatly dependent upon
the capability of systems to navigate in unsure circumstances. In the available sen-
sors, Sonar sensors is a low-cost option, however, these are prone to false echoes and
reflections due to bad quality angular resolution. Another option is Infra-red and la-
ser range finders, these are also low cost, but the grey area is provision of measure-
ments from only one point in the scene. Other available option is of spectrum, Radar
and Light Detection and Ranging (LIDAR) systems, these can provide precise meas-
urements along with good angular resolutions [14]. However, they also have grey
areas. Most important of them are their high-power consumption and high expenses.
This complete picture and the revolution of low cost digital cameras has produced an
interest in vision-based setups for the vehicles, which are autonomous. Even in this
case, the disadvantage of distance must be of stereoscopic cameras. The recent re-
lease of the Microsoft Kinect addresses this issue by providing both a camera image
and a depth image. Kinect was primarily for entertainment market, however due to
its powerful capabilities to operate, it has gained a lot of popularity in sensing and
robotic community. Few of the examples of this popularity are applications related to
robot / human interaction, 3-D virtual environment construction, medicine, robot
tracking and sensing. Most of Kinect applications are coded in C [15]. In industry
and academia, use of image processing is now common place. Even inexperienced
users can use these tools. These are used to target hardware to implement by making
use of code generation automatically. Simulink provides a widely accepted environ-
ment for designing implementation of image processing algorithms as well. For ex-
ample, in automotive industry, Simulink is used to produce seamless code-generation
tools. These translate the final design into a target hardware, which is real time-
executable. These tools have a use in educational environments, it enables the stu-
dents to concentrate on major details rather than low level. The major contributions
of this study can be divided into three major spheres. Interface development allowing
Kinect to be involved in refined Simulink designs, it allows more users to access.
The targets discussed in the paper are Linux based, which are used in mobile auton-
omous robotic machines. Real time Kinect streams parallel camera and depth images.
2.10 Comparing the utility and usability of the Microsoft Kinect and
Leap Motion sensor devices in the context of their application
for gesture control of biomedical images
A study conducted by Hughes and Nextorov (2014) investigated the interaction of
medical images in operating room with a requirement of maintaining asepsis. This
arrangement has resulted in a complex type of arrangement in use of mouse and key-
board, between scrubbed clinicians and on other end non-scrubbed personnel [16]. It
is Microsoft Kinect or Leap motion which could give direct control of medical image
navigation and manipulation.
Figure 7- Leap Motion Sensor. Hughes et.al. (2015)
The authors admitted that many studies have already been undertaken to study use of
Leap and Microsoft Kinect in Operating Room, however, no study had compared the
sensors in terms of their usage. They aimed their study to compare the use, utility,
accuracy and acceptance of the two motion sensors. In this research, 42 persons par-
ticipates. Out of these 30 % were diagnostic Radiologists and 70 % were surgeons or
Interventional Radiologists. All the participants were having good computer skills
but limited gaming experience. In analysis of utility of two motion sensors, 50 % of
participants rated Microsoft Kinect v2 as very useful in their routine practice, how-
ever performance of Leap Motion Sensor was 38 %. Out of Surgeons and Interven-
tional Radiologists 54 % rated Microsoft Kinect as useful [13]. Younger participants
found Leap Motion interface as more useful than older. In 37.5 % participants, after
use of Kinect sensor, perception of leap motion sensor deteriorated. System accepta-
bility was better for Kinect Sensor, as compared to Leap Motion Sensor. With re-
spect to utility and use, Microsoft Kinect was found better. However, Leap motion
was found to have a better accuracy. Kinect was more acceptable to the users, alt-
hough Microsoft Kinect was tiresome physically. More than half of the surgeons and
interventional Radiologists found Microsoft Kinect v2 as very useful. As regards to
this study, Vascular and Orthopedic surgeons found the sensors to be more useful.
The measurement accuracy was found not to be of good standard, which can be at-
tributed to many factors including the system’s field of view.
For Leap Motion Sensor, user needed to place the cursor at end or start point of ana-
tomical structure and keep the hand sable before indicator of selection is seen [5].
More time was taken before selecting the measurement point. On the other hand,
Kinect proved to be better, as took short time. In certain cases, participant had
showed hand before selecting the end point, so the measurement command was com-
pleted prematurely. Few gestures were initially available for both sensors and were
available. However, later gestures were disabled and replaced by discrete input or
click. Due to requirement of time to implement measured command in four seconds
for the startup and end measurements, both sensors were found to be slower than
average time. In terms of time to task completion, as per the prior studies conducted,
with adequate practice, motion sensors performed better than the mouse. Fastest
time for a participant was 6.38 secs for Leap Motion Sensor and 7.54 secs for Mi-
crosoft Kinect V2. These times are lower than overall average time to indicate and
measure [11].
Figure 8- Motion Sensor illustration. Hughes et.al. (2015)
System use influences utility of system by surgeons. It has been shown by relation-
ship between use and utility, due to poor use, there comes poor utility. Study proved
that Leap Motion Sensor could not be equivalent to Kinect v2, because younger doc-
tors were more comfortable with use of Motion Sensors, as compared to Kinect [9].
2.11 A Depth-Based Fall Detection System Using a Kinect Sensor
Researchers have also tested Kinect sensors application in fall detection systems.
Samuel and Enea (2014), for instance, carried out a study proposing a fall detection
system basing on Microsoft Kinect. This system is privacy preserving and automatic.
The raw depth data, which is provided by the sensors is analyzed by means of ad-hoc
algorithm. This system implements a definite solution to categorize, all the blobs in
the specific scene. Whenever a person is identified, a tracking algorithm is followed
between different frames. When use of depth frame is made of, it allows to extract
human body, even when it is interacting with other things such as a wall, or a tree.
Inter-frame processing algorithm helps to efficiently solve the problem of blob fusion
[14]. If a depth blob attached to a person is near the floor, the fall is detected.
Figure 9- Fall detection illustration. Samuele et.al. (2014)
In the study, in top- view configuration, using Kinect Sensor method of automatic
fall detection has been proposed. Without relying on sensors, which are wearable and
by the exploitation of privacy- preserving depth data only, this approach allows de-
tecting a fall event. With the help of ad hoc discrimination algorithm, this system
could identify and bifurcate the stationery objects from human subjects, within scene.
Simultaneous tracking can be done and numerous human subjects can be monitored.
Authors confirmed through experiments the capability of identifying the human body
during a fall event. Moreover, the capability of algorithm recommended for tackling
the blob fusions in domain of depth.
The system proposed in this research has been tested and realized on PC with fea-
tures of Windows 7, i5 processor with a RAM of 4 GB. The proposed algorithm can
be adapted by diff depth sensors, and it needs only depth information as input data.
Moreover, embedded real time implementation has been done featuring Linaro
12.11, Cortex A-9 and 2 GB RAM. Authors foresee that future research activities
will focus to simultaneously tackle and manage various depth sensors by improving
and enhancing the performance of the algorithm. The system will be made to support
the tracking of subjects whenever it endeavor to cross areas covered by adjacent sen-
sors.
2.12 Experimental Studies on Human Body Communication Char-
acteristics based upon Capacitive Coupling
Researcher at the Academy of Sciences, Shenzhen, China studied Human Body
Communication and regarded it as technology of transmission for sensor network
applications for short range (Wen-cheng and Ze-dong, 2014). There are few full-
scale measurements, which described body channel propagation on capacitive cou-
pling [11]. The study has its focus on experimenting various body parts, investigating
the features of body channel. By making use of coupling technique, both in terms of
frequency and time, the characteristics of body channel may be measured. Based on
the results measurements, it was observed that the body maintained stable character-
istics. Elbow, wrist and knee affected channel affected the attenuation characteristics
[19].
2.13 Body Movement Analysis and Recognition
Different studies have also proposed human-robot interaction basing on innovative
combination of sensors. Yand and Hui (2014) conducted a study on communication
by non-verbal ways for communication of robots and humans by developing an un-
derstanding of human body gestures. The robot can express itself by making use of
body movements, such as facial expressions, movements of body parts and verbal
expression. For this communication, twelve gestures of upper body will be utilized.
Interactions of objects and humans are also included in these. Gestures are character-
ized by the information of arm, hand posture and arm. To capture the hand posture,
use is made of Cyber Glove II. Microsoft Kinect gives information for head and arm
posture [12]. This is an up to date solution of human gesture combination by the sen-
sors. Basing on the data obtained by posture data of body, proposal has been made of
human gestures recognition, which is real time, as well as effective. In this study,
experiments were also conducted to prove the efficacy and effectiveness of the ap-
proach proposed in this.
Figure 10- Movement analysis Glove. Yang et.al. (2012)
Human-computer interaction has recently gained the interest and attention of indus-
trial and academic communities, and is still not very old field as it started in 1990s.
This field has contributions from mechanical engineering, computer sciences and
mathematics. Unlike interactions of earlier times, more social dynamics aspect must
be expected in domain of human-robot interactions. As people want to interact with
robots, as they do with other humans, so robot human interaction is needed to be
made more believable. Robots should be able to make use of verbal and body lan-
guage, as well as facial expressions [10]. Some robots are already being used for this
goal. Nao Humanoid Robot1 can use gestures and body expressions. The main con-
cern of the study was to establish means of communication between robot and human
using body language. One of the main purpose of the study was to apply other than
verbal language to human-robot interaction in social domain. Upper body, gestures
are applied, which are 12 in number. These are involved in recommended system and
are all intuitive and natural gestures. They characterize themselves by arm, head and
posture information. Human-object interactions are involved in these gestures.
Figure 11- Humanoid robotics illustration. Clingal et.al. (2014)
A human body dataset is constructed to analyze the recommended recognition meth-
od. The dataset was made by making results from 25 samples of different body sizes,
culture backgrounds and genders. Efficiency and effectivity of the recommended
system has been proven by the experiments. Few of the major aspects of the study
are:
Kinect and Cyber Glove II are integrated to captured arm, head and hand pos-
ture. For this human gesture-capture sensor is recommended.
For recognition of upper body gestures, a real time and effective sensor is
recommended.
A gesture understanding and human robot interaction system is built to assist
humans to interact with robots.
A scenario was established in which, a user and a robot classroom interaction was
created for a case study of GUHRI system. The user is student and robot acts as lec-
turer. Robot can understand the upper body gestures, 12 in number. Robot is like
humans and can react by combining facial expression, verbal language or body
movement. The behavior of robot in class is triggered by the body language of the
user [7]. Here all the actions are completely consistent with the established scenario.
GUHRI system has also the ability to tackle unexpected situations like, if a user an-
swers a phone call suddenly, it can react appropriately. Regarding proper under-
standing of upper body gestures, dynamic are the important body language compo-
nents in daily life. They provide clue for communication to enhance performance for
this communication. To make robot- human interaction, robot should be able to un-
derstand static as well as dynamic gestures with the help of movement analysis and
recognition of human gestures. Human body 3-D combined information can be ob-
tained in real time by the Microsoft’s Kinect SDK. By the change in position of body
joint in temporal axis, motion information can be obtained. Activity recognition has
already also been done to by this information of body joint motion. Possibility of
ignoring hand gestures is still there, due to which chances of ignoring gestures by
hand are there. Future is likely to be marked by studies on recognition of gestures of
upper body and body motion, as well as requisite information by hand gestures. An-
other dimension is recognition of sensor form egocentric point of view. In the rec-
ommended GUHRI system of the paper, Kinect has been proposed as vision sensor.
It is not a perfect system and has many limitations like inability to change viewpoint
due to fixed position of Kinect Sensor. Due to this limitation, it is always not possi-
ble for remote to get maximum viewpoint of gestures by the human body. One of the
options available to solve this problem is to get gesture information by egocentric
perspective of the robot. This provides opportunity for changing the view point of the
robot, but it gives birth to some new problems. As the distinction between motion of
a camera and a real body motion will be difficult for the robot [11]. In future, further
work can also be done in this regard by understanding the integration of verbal clues
to GUHRI system to further increase the robot-human interaction. If robot is more
autonomous in seeing and hearing, it will become more like humans.
This paper has recommended in overall context, a GUHRI system, with understand-
ing of robots and human interactions and innovative understanding of gestures. Ro-
bot can understand 12 upper body part gestures which can be comprehended by the
robot. By a few features like facial expressions, body movements and verbal expres-
sion, robot also has the ability to express itself. A combination of sensors has been
recommended to combine Microsoft’s Kinect and Cyber Glove to capture posture of
head arm and hand simultaneously [3]. By doing this, an effective and real-time ges-
ture recognition mechanism has been recommended. In the experiments, human body
gesture dataset has been built. The efficiency of our gesture recognition has been
built by results of the experiments conducted. Till now, the gestures involved are
static gestures like of having question, to appreciate, to call, to drink etc. In this
study, the future recommendations are to understand dynamic gestures as to say no,
to clap, to wave hand. Another important recommended addition is of speech recog-
nition; it would make the interaction more real.
2.14 An Integrated Platform for Live 3-D Human Reconstruction
and Motion Capturing
There are also experiments and studies that show how Kinect technology can be used
for live 3-D human reconstruction and motion capturing. In their research, Imitrios
and Alexadis (2011) investigate the developments in 3-D capturing, processing and
provided ways to unlock pathways of 3-D applications. Their study addresses tasks
of real time capturing and motion tracking by explaining main features of an inte-
grated platform targeting future 3-D applications. Moreover, along this, an innova-
tive sensors calibration method has also been discussed. Basing on an increased de-
viation of volumetric Fourier transform based method, an innovative method of re-
construction has been from RGB-D has been recommended in this paper. The paper
also proposed, a qualitative evaluation of 3-D reconstruction mechanisms, as existing
evaluation methods have been found quite irrelevant. Overall, an accurate mecha-
nism of real time human body tracking has been recommended, that also was basd on
a generic and multiple depth based mechanism. Experiments conducted in the study
proved the lessons of the study.
In this study, including multi-Kinetic v2, capturing reconstruction of moving hu-
man’s other applications like fast reconstruction of humans, and based on skeleton-
motion tracking by depth cameras has been described and main elements of integrat-
ed system have been described elaborately. Based on these elements, innovative ap-
proaches have been recommended in this paper and discussion on existing approach-
es have also been explained. Along with this, an innovative mechanism for evalua-
tion of 3-D reconstruction system, has also been recommended. Some limitations of
ongoing researches have also been discussed. Imperfect synchronization issue with
RGBD sensors, may lower the construction quality, it is one of the main limitations
of this research. In domain of skeleton tracking mechanism, short comings of topol-
ogy change are to take over by fitting of skeleton scheme [2]. Moreover, by splitting
the body into upper and lower parts and fusing our mechanism of data from inertial
measurements, limitations can be overcome.
Figure 12- RGDB illustration. Immitrios et.al. (2014)
2.15 Automated Training and Maintenance through Kinect
Availability of Kinect at low cost rates and its provision of high quality sensors has
enabled researchers like Saket and Jagannath (2011) conduct a study to reduce bur-
den on mechanics involved in automobile maintenance, undertaken in centralized
workshops [1]. A system prototype has been recommended that works with Kinect.
Speech and gesture are the two modes of operation of this system. If on speech
mode, it can be controlled by various audio commands. It can also be controlled by
gesture mode. Gesture recognition is done by Kinect System. The system along with
RGB depth camera processed data of skeletal, by keep record of body joints. Recog-
nition of gestures is undertaken by checking user movements against the predefined
situation. Real time image data streams are captured by high density camera, 3-D
model is generated and superimposed on data being received in real timeframe.
In the recommended system, Kinect plays an important role. It works as a tracking
instrument for the developed Reality System. [6] The system recommended in this
paper utilizes few of the very important features of Kinect, which are speech recogni-
tion, joint estimation and tracking system of skeletal. One of the most important fea-
tures of Kinect is tracking of skeletal. Reason for it is the ability of finding user’s
position by using it, which is used for guiding the user in assembly procedure. It also
makes use in recognition of gesture. This assembly helps to bring the individual parts
together and join them as a single product. These assemblies, can further be bifurcat-
ed into full and partial assemblies. The basic mode, which is also called as full as-
sembly mode will teach the technicians about the procedure of assembling of particu-
lar product. In partial assembly mode, the role of Kinect becomes more important, as
the technician is guided in detail about the assembly of parts. When assembly of one
part is completed, next part assembly can be started with [12]. There are two differ-
ent modes, in which the system can work and these ate Gesture and Speech modes.
Basing on the user’s acquaintance and know-how/ experience on the system, the user
can select the mode, according to his/her convenience. If speech mode has been se-
lected, the user will command by speaking. On selecting Gesture Mode, user inter-
acts by using gestures, whereas the system guides by voice commands. For example,
the START word of command to start with the system.
The research has elaborately discussed about the use of Kinect sensor for tracking
and detection issues. Not only as tracking device, but Kinect is also being used as an
input device. The study is a step towards making automatic repair and maintenance
of vehicles. The recommended system will assist in reducing the work load on
skilled experts for considering regular activities. Instead this system can be used for
small jobs. By doing this, process of documentation will also become simple. The
supervisor has no need to roam around in this system [2]. The system keeps check on
each step, so the process of step wise verification also becomes simpler. The system
recommended in study is likely to bring many opportunities for engineering based
companies making use of Augmented Reality to make their complex tasks easy.
Overall, this system can contribute a lot towards an improvement in the system of
repair and maintenance.
2.15 Kinect in the Kitchen: Testing Depth Camera Interactions in
Practical Home Environments
Galen (2013), from University of California, Berkeley has carried out a study that
depth cameras are being used in millions of houses due to developments in Microsoft
Kinect.
Figure 13- Smart Home System illustration. Berkley University Journal et.al.
(2013)
This study has taken Kinect to real kitchens. Although, touchless gestural controls
can prove to be difficult for few but it enables the commands to be transformed into
movements of cooking. This smart kitchen enables the users to alter the scheme and
control with other limbs, when hands are not empty. The recommended system was
tested with 5 different persons, who cooked in respective kitchens and identified that
placing the Kinect was simple and reason of their success. An important challenge
was accidental commands in the kitchen [12].
The experiment proved that the users found the system, easy and pleasing with low
levels frustration. It was also felt that implementation of the system enabled to load
music and recipes. It was helpful, as the interaction style was general. All subjects
expressed that although, it was difficult and messy to cook but they were quite happy
about the experience. The observations were not favorable in view of those conduct-
ing the research. Accidental use of navigational aid caused a lot of mess. Other than
accidental pressing of buttons, during change of directions, sweeping hand also
caused problems. Some of the errors occurred, when the subjects pushed buttons,
while focusing elsewhere. Another problem that was experienced was that the sub-
jects mostly pushed the wrong buttons. Most of these wrong pressings were due to
pushing of buttons more quickly. Kinect SDK smoothing was the reason for this, by
the authors. The subjects liked the lock buttons on the screens, but were rarely used
by them [17]. During conduct of the experiment, few subjects could not identify that
the lock was not automatic but was result of automatic pushing of the button. It is
recommended that for the future use, locking system should be made automatic, es-
pecially when the subject turns sideways (resultantly position of the axis joint col-
lapses towards inner side) towards the side counters or towards the side of face coun-
ters behind. For this system, it is recommended to make this unlocking a process
involving 2 steps, instead of keeping it a single step process. The Kinect proved to be
extremely useful during the conduct of the experiment. Especially the ease of posi-
tioning Kinect was surprising for the users. During the conduct of experiment, cam-
era was so placed that the subject generally remained in the frame. One important
aspect was requirement of distance in the experiment. To do this, the cart was gener-
ally placed, out of the kitchen, and out of the way.
2.16 Kinect Gaming and Physiotherapy
Research conducted by Sachin and Singh (2014) from University of Pune recom-
mended a system that joins 2 applications of Kinect. These are Kinect gaming as well
as application of Kinect, used for physiotherapy. The recommended system, under-
take the tasks basing on critical features, such as depth recognition, tracking of skele-
tal and recognition of gestures. Kinect camera is the key instrument, as per the stud-
ies, which gets all the operations to be implemented [2]. The movement of subject
body was tracked by implementing skeletal tracking and by identifying key points on
the skeleton of human body. Depth recognition is another important feature of the
system. It is carried out to segment the front and rear ground of the image. Depend-
ing upon pixel color, it has also the ability to separate a person from the background.
Kinect is required to conduct these operations. One of the main reasons to do so is, as
it has the capability to produce RGB streams and depth at lesser cost than the usual
sensors in common use. Kinect can measure the distance of any given point from the
Kinect Sensor, as it has the time of flight camera. To undertake this open Kinect
driver framework for Kinect is being implemented. It has the capability to generate
depth images. For performance of applications, normally Kinect is used along with
console device [12]. Console device is quite costly, therefore, in this study an effort
is being made to give away with the console device, rather to tackle the problem of
tracking of human skeletal is being undertaken using Microsoft Kinect. In this study,
an effort has been made to maximize the hardware and by finishing the console de-
vice, the procedures are to be conducted by incorporating Kinect with developed and
refined system programs to undertake the particular set of operations [15]. The study
panel has recommended the final project implementation which can be utilized for
further development of applications.
3 Research Methodology
3.1 Introduction
This section lays out the procedures and methods employed in this research. In this
research documentary analysis will be primarily be used. This section will outline the
results and facts from previous research methods like sampling, research design and
data analysis. Additionally, concerns have been raised about the applicability of the
different Kinect innovations and discoveries (Bevilacqua, 2014). This research will
address those concerns. An experimental analysis on the effectiveness of Kinect in
assisted living environments is crucial as it helps Ambient Assisted Living (AAL)
organizations benchmark against best standards and practices. In his research,
Konstantinidis (2015) expressed the need for AAL organizations to adapt to external
environments and patient needs as a strategy that helps in improving both the tech-
nical and practical application of Kinect. This is particularly important as most Smart
home environments are shifting towards a service culture and staff reduction strategy
which has a more demanding clientele. This research will analyze results from clini-
cal experiments in Kinetic devices like Camera tracking. In his research Anastatiou
(2011), analyzed the efficacy of kinetic camera in tracking hand, elbow and trunk
movements.
In addition, a glimpse of available research works show that Kinect devices have
been extensively researched and documented. Experimental research has been done
in 3-D mapping technological improvements and in body tracking. In this context,
this research will analyze consequential advances in related technologies like GPU
systems and sensors that facilitate technological improvements and new Kinect ap-
plications. Technologies like Mo-cap, Kinect v1 and Kinect v2, have been used to
properly perform experiments in assisted living environments. Test for this system
involve sitting, walking and standing.
Figure 14- Pose Experiments, Kinect tests. (2013)
3.2 Model of the research
This research will employ a documentary analysis strategy and will primarily use
experimental and clinical studies. Experimental results will be used to determine the
impact of Kinect and the different applications in Assisted Living Environments.
Main upside of a documentary analysis is that it’s cost effective and relies on scien-
tifically approved approaches to conduct the study Clembers (2001). Documentary
analysis also tends to work with an unlimited scope making the research simple and
logistically easier compared to other research methods. Results from clinical tests
and applications were also used to answer the research objectives.
Statistical Package for science (SPSS) was used to analyze all the collected data after
which descriptive metrics like means, averages, percentages and frequencies were
used for further analysis. Data interpretation was conducted in respect to the frame of
reference of the research problem and objectives.
According to researchers like Robinson (2003), validity and reliability of data collec-
tion methods directly determines the accuracy of collected data. Reliability ensures
that instruments used yield consistent results. To ensure objectivity and accuracy of
the research a different department was tasked at auditing and inspecting the docu-
ments used in the research. Cronbach’s Alpha was used to check for consistency in
obtained results. The Alpha, which ranges from 0-1, measures the level of reliability
in an increasing rate. According to Dristern (1990), the minimum value of reliability
of a research should be at a value of 0.6.
In this research, the research team also corrected for inconsistencies, errors and mod-
ified the formulas used to increase accuracy.
3.3 Research Design
The research design employed in this study will outline the blueprint and plan for
answering the research questions and fulfilling research objectives. According to
Blumberg (2005), a research design shows the plan that will guide researchers in
answering the research questions.
Although researchers concur that it’s sometimes technical to perform a research us-
ing documentary analysis, they agree that it’s an important approach which can help
researchers get deeper insights especially if they use a combination of methodologies
Flinter (2009).
Figure 15- Research design
3.4 Primary Data
In the collection of data. More emphasis was placed on data that could be analyzed.
Quantitative: Will entail numerical data collected from questionnaire, interviews and
surveys. Quantitative data are easily analyzable and can be used to show patterns and
trend. Graphs, pie charts and tables can be used to further illustrate quantitative data
which can then be used to draw inferences. Email survey will be used because of its
easy admissibility and the potential to survey large number of respondents.
Qualitative: These are non-numerical data collected from methods like one-on-one
interviews and observations. Qualitative data can help in clearing any bias that may
result from quantitative data collection methods. Questions are asked directly to the
interviewer or respondent.
3.5 Summary
Results from various journals, books and literature sets will be used to form an opin-
ion on the use of Kinect and its application on smart living environments. Important-
ly, this research seeks to outline the future trends in Kinect applications and use in
AAL environments. Although researchers in this fields Webster (2014) believe the
application of Kinect to AAL is still at its infancy this research will delve into the
future of applications and its relationship with other technologies like the IoT (inter-
net of things) and Olympus camera.
4 Data analysis and presentation
4.1 Introduction
For comprehensive analysis, the following sections of the paper are organized in a
documentary analysis manner. In this case, documentary analysis is used as a tool to
gather evidence that center around use of Microsoft Kinect application, weaknesses
and use in assisted living environments. This section will analyze lab results from
conducted experiments, survey and studies in Kinect and Kinect components.
The most important reason documentary analysis was used for this research is that it
is efficient. In our case, documented research papers, journals are easily accessible
and their documented results verifiable. In this section, different research papers are
analyzed to form an opinion on the future and applications of Kinect in assisted liv-
ing environments. General research data was used to design the final data analysis
technique. This section analyzed existing protocols used in assisted living environ-
ments and proposed new protocols and areas of research. Key approach for this re-
search is to build on the works of previous researchers such as Yang et al (2015) and
Gradinaru (2016) both of who proposed new technologies for 3-D representation
using sensors. In his research, Gradinaru (2016) designed new systems and software
for capturing and display of animated information.
Other sister technologies involved in the development of Kinect applications like 3-D
sensing tools were also analyzed, for video and still cameras.
Figure 16- Gradinaru (2016) graphical representation of system
Some of the key areas targeted for analysis include;
Smart Home environments
Movement Detection Models
Internet of things and its impact on Kinect
Skeletal Tracking systems
4.2 Smart Home environments
Smart home systems play a critical in creation and continuity of Kinect operations in
assisted living environments. According to Kawatsu (2014) a smart home environ-
ment is one that creates interconnections between a physical environment. In a smart
home environment, people have the expectations that the technologies can be used to
improve their everyday life. Applications of smart home systems can be in commu-
nication, safety, welfare and appliances. The devices used in home systems environ-
ment consist of communication modules, cameras, sensors and actuators. Overall, a
server is used to manage all the operations of the smart home environments.
In their research, Baeg et al (2007) constructed from scratch a smart home environ-
ment in the research building of KITECH (Korea Institute of Industrial Technology).
This research aimed to demonstrate the efficacy and practicability of robot assisted
home environment. The research featured custom made sensors, actuators a robot
and a database.
The researchers made use of RFID (Radio-frequency identification) technology to
identify, track and follow objects within the home system. RFID uses radiofrequency
to track objects. RFID tag was used to identify objects in the environment. Basically,
objects with the tag were considered smart appliances. Apart from the smart envi-
ronment, the conceptual framework consisted of servers and a robot. Smart objects
were assigned sensor capabilities which meant they could they could communicate
with both the server and the robots.
Below figure shows the conceptual environment:
Figure 17- Conceptual Framework of a smart home environment
The smart environment was divided into layers; the first layer consisted of the real
home environment which has scattered setting of objects and appliances. The second
layer consisted of actuators and wireless sensors. This level includes additional sen-
sors like temperature sensors, RFID readers, smart lights, and humidity and security
sensors. In level three there were devices like tables, chairs and shelves which all had
RFID sensors to enable identification ease. In the fourth level there existed a com-
munication protocol which ensured that reliable and accurate communication be-
tween the home server and other devices in the vicinity. The server which managed
the relationship between the devices and the sensors was in level five.
Figure 18- Smart home environment layered description
In this experiment, the main use of the robot was to allow several key functions like;
mapping, localization, object recognition, and interaction. To that end, the robot was
equipped with ultra-violent sensors, cameras, ultra sound, a good processing speed
and adequate memory.
For this experiment, specific home services were selected to be performed that repli-
cated real home services. The objective of the smart home environment was to give
users close to real life services. Some of the functions to be performed in the smart
environment included; Object cleaning, running home errands and executing home
security functions.
Object cleaning; in this scenario, the service robot is tasked at tiding up the room or
environment. The robot does this by arranging objects in a required or preset way.
RFID installed in the roof of the home are used to direct the robot on navigation and
what objects to clean. Purpose of this part of the experiment was to investigate the
potential use of robots in tasks such as laundry cleaning, home arrangement and tasks
like doing dishes.
Performing errands; In this case, robots are tasked at identifying and fetching specif-
ic objects or smart items around the smart home. Fetched objects have RFID tags
which means they are easily identifiable within the network. The fetch function
works after receiving a command from a person. The robot then sends a request with
the position of the object to be fetched, after receiving the information it moves to
where the device is, grabs it and sends it back.
In this research, the researchers used two key modules; RFID interfaces and Com-
munication modules. The protocol used to operate the communication module was
the ZigBee protocol. The ZigBee protocol is an open standard protocol based on
802.15.4b. The protocol provides inter connections for different applications that is
low-power and wireless. ZigBee protocol was used for all the devices. On the other
hand, EPCglobal Gen2 was used for RFID modules. EPCglobal Gen2 employs a
standard for the use and applications of any RFID module.
The team used below physical layout for the research;
Figure 19- Smart home environment layout
This paper outlines innovative ways which can help improve assisted living envi-
ronments. The architecture employed and use of RFID systems proves that smart
home systems can be created from available materials and technology. Scenarios
performed by robots like cleaning, arrangements can be employed in assisted living
environments. According to the researchers, the goal was to create environment
where people are served by robots. The robots work by ensuring the environment is
as required. The robots employed in this research can be used to help individuals in
assisted living environments perform basic functions like cleaning, washing or house
arranging.
With such developments in robotics and creation of smart homes, Kinect v2 can be
used employed both for navigation and dense map creation. The Kinect v2, as op-
posed to v1 is based build on time-of-light principle which means that it can even be
used out of homes. RFID sensors employed in this research can be in particular very
useful when it comes to mobile robot movement.
For robotic applications, Kinect v2 sensor has been used by researchers to provide a
much better application primarily because of the ToF technology employed. By us-
ing ToF, accurate measurements for objects can be obtained and used. Also, due to
the high resolution cameras, a lot of information is captured. The result is that home
environments are accurately mapped with fine details and minimal errors. With
Kinect v2’s active illumination, surrounding images are captured even in dark envi-
ronments.
Research conducted by Hondori et al (2013), gave important insights in the applica-
tion of Microsoft Kinect v2 in a smart home setting. The research focused on ges-
tures and made use of sensor fusion on Kinect and inertia sensors. The goal of the
research was to access the significance of smart home systems in assisting post-
stroke patients’ complete day to day activities. To achieve this, Microsoft Kinect was
used to monitor activities such as spoon acceleration, wrist position, elbow position,
shoulder joints and angular positions. Purpose was to distinguish between healthy
and paralyzed individuals. This distinction is a complex problem in assisted living
environments. Microsoft Kinect and Inertia were successfully tested in these envi-
ronments. The use of a smart home systems in assisting stroke patients was driven by
the high cost associated with visiting rehab facilities. Convenience of having smart
home systems would allow doctors and therapist to remotely assist clients. The smart
home systems would help therapists monitor patients and analyze improve-
ments/progress.
As opposed to the smart home systems developed by Zheng et al (2013), the systems
developed and tested by Hondori et al (2013) did not rely on numeric integration of
inertia measurement unit (IMU). This research made use of inertia and Kinect sen-
sors at the same time. The main activity used to record movements was intake ges-
tures. Critical body functions like eating and drinking were selected. The setup in-
cluded Microsoft Kinect, Sensor fusion and Inertia sensor. Inertia sensors were
placed on different items like utensils which recorded movements of both the subject
and the items they were using. A Kinect sensor was also placed on the table to moni-
tor individual movements while eating and drinking.
Figure 20- Hondori et al (2013) system set up including inertia sensors and
Kinect sensors
Individuals were asked to perform different tasks in order to record the experimental
data.
Eating and drinking task- activities such as eating, cutting steak and drinking water
were performed and repeated for a couple of times. These movements were then ana-
lysed in 3-D trajectories as seen below.
Figure 21- Hondori et al (2013) 3-D trajectories
The body movements are measured in terms of degrees.
Right elbow- changes in the range of 50-110
Left elbow- changes in the range of 65-115
Kinect sensor data analysis- above figure shows body movements and changes.
The movements on the wrist and joints illustrate body movements. These shows
movements of the individual’s limbs while his head is still.
Figure 22- Hondori et al (2013) experimental data on body movements
Figure 23- Hondori et al (2013) limb changes in task like drinking and eating
Figure 24- Hondori et al (2013) inertia sensor data from individual’s items
Data measured from inertia sensors is illustrated by figure 23. The bias on the signal
is approximately 9.81 m/s due to gravity. This is adjusted and factored in each of the
3-D measuring unit. It was found that during cutting of the stake, the frequency rec-
orded was of the highest value while magnitude was stud. Frequency during drinking
is constant.
This researched proved that smart home environments could lessen the burden in-
curred by the post stroke patients. The systems could also give vital data to physician
for proper monitoring and study of patients. Microsoft Kinect and inertia sensors are
vital for the system. The researches demonstrated that it’s possible to capture move-
ments and positions such as angular displacement and limb gestures. While other
researchers have performed the same research using on-body sensing techniques this
research relied solely on Kinect and inertia sensors.
The system used in this research can be used in clinical assisted living environments.
A different research conducted by Mohamed et al (2013) assessed how smart home
systems can be used to assist individuals with disabilities while making use of Mi-
crosoft Kinect systems. The recommended systems aimed at monitoring elderly indi-
viduals. The systems recognized gestures and body actions and gave feedback
through a network. The key goal of the experiment was to monitor elderly individu-
als in their natural environment. To this end, two projects were initiated, DOMUS
and GUARDIAN ANGEL.
Objective of the GUARDIAN ANGEL project was to produce sensors that could be
integrated to any media type. Monitoring of all the various object parameters was a
key objective of the experiment. According to the researchers, Microsoft Kinect was
used because superior advantages compared to other sensors. Some of the key ad-
vantages included; RGB camera, depth camera and infrared transmitter.
Figure 25- Mohamed et al (2013) smart house used in the experiment
Some of the favorable characteristic of Microsoft Kinect is as illustrated in below
diagram;
Property Specification
Field of View (Horizontal, Vertical, Diagonal) 58 H, 45 V, 70 D
Depth image size VGA (640X480)
Spatial x/y resolution 3 mm
Depth z resolution (at 2m distance from sensor) 1cm
Maximum image throughput (frame rate) 60FPS
Color image size UXGA (1600x1200)
Data interface/power supply USB 2.0
Power consumption 2.25W
Operating environment Indoors
Table 1-Characteristics of Microsoft Kinect Components
Processing of the recorded data was done via three data streams generated by IR light
reflected from the scene. Below is an image of the natural user interface.
The data is transmitted in three streams image, depth and audio. The Kinect system
was relied upon to give accurate 3-D information.
Application
Processed data (Natural Human Interaction Library)
Data Streams (Image stream, Depth stream, Audio stream)
Kinect Sensor
Figure 26- Mohamed et al (2013) Natural User Interface
Tested activities included gestures that were done using hand positions. The Kinect
sensor assessed the position using 20 joints. X and Y coordinates of each joint was
calculated. Below are images of the gestures and postures. The application could
detect gestures and postures. Two methods were used to recognize gestures algo-
rithm or template based. Because of needed flexibility, the 1 dollar and N dollar al-
gorithm were used. These algorithms can be implemented in different environments
even in a prototyping context. In this case, an act done by an individual is recognized
and compared to previously recorded sets of points. 1 dollar algorithm takes note of
movements “unistroke”. A gesture is denoted by a continuous gesture “multistroke”.
In the 1 dollar unistroke four identifiers are used to recognize templates that are
compared to stored data.
Resampling
Rotation based on indicative angle
Scaling and translation
Score calculation
Figure 27- Mohamed et al (2013) Waist detection posture
Figure 28- Mohamed et al (2013) Waist detection posture
The recognition scenario is performed as shown in the figure 29 below.
Figure 29- Mohamed et al (2013) Kinect procedure for gesture recognition
A toolbox with Kinect SDK was used for the experiment. The toolbox utilizes both
Golden section search and 1 dollar method. Theoretically, the two methods both fa-
cilitate technical understanding of gestures like a circle as illustrated below.
Figure 30- Mohamed et al (2013)
Kinect toolbox recognition of circle gestures
The researchers concluded that ultimately a lot of Kinect sensors may be needed to
properly monitor a complete home environment like a large building hospital. The
researchers made use of a WIFI network in mesh topology. When a gesture was de-
Waiting for push/pull gesture
Waiting for skeleton moving
Recognizer Algorithm
Detecting Unistroke gesture
Network Communication
Device Action
tected an alert was sent via text or a simple alert. Communication from all frames
was hence not required. The system worked such that in case there was an emergen-
cy a text alert was transmitted.
The figure below shows the communication process.
Figure 31- Mohamed et al (2013) Kinect toolbox recognition of circle gestures
In general, the program can detect gestures and communicating it via text transmis-
sion. Unlike other smart home systems, in this research the sensors were non-
intrusive to the users. The researchers successfully found the appropriate algorithm
for gesture commands. For future experiments, the researchers aimed to using an
Ethernet gateway EIB/KNX to accommodate many actuators.
4.3 Movement detection models
There exist several researches that have devolved into movement detection models
and its application to Kinect environments. Some of the research like Chin et al
(2015), focuses on optimum desistance for Kinect model detection. The researchers
focused on accuracy and reliability of Kinect cameras and sensors. Apart from giving
insights on the quality of the pictures. Calculations were conducted to analyze abso-
lute error percentages at varying distance.
The researchers studied the Kinect camera as a research based camera. There exists
little research that illustrate the reliability and accuracy of the Kinect camera as a
research camera.
Thread client creation
Execution of the program connection on
the server and sending alert
Closing connection and
terminating thread
Main program
Gesture recognition
The Kinect camera hardware components are as shown below.
Figure 32- Chin et al (2013) Three Kinect sensors, IR light, RGB camera, IR
detector
According to the product specifications. The Kinect sensor has a dual depth range,
default and near range. At the two ranges the depth sensor returns 3-D images with
x,y and z co-ordinates. There exists a blind spot on at approximately 0-0.8m. At the
spot, the camera is not able to return accurate depth range data. This data can’t also
be generated on any spot after 4m. Near range blind spot exists at 0-0.4m. At this
spot the camera cannot generate raw depth data after 3m.
The distance analysis is as seen below.
Figure 33- Chin et al (2013) Depth sensor distance
C programming language is used to program the SDK of the Kinect sensor. The de-
veloper kit gives access to the source code and other technical resources like Kinect
studio. All these tools enable easier development of applications.
The sensor can calculate distance on a straight line between the sensor and the object.
The distance is obtained by a perpendicular line drawn by the sensor. When an image
frame is captured, the Kinect sensor returns max and min depth ranges in mm.
The diagram below shows the 16-bit raw depth frame returned by the Kinect sensor;
Figure 34- Chin et al (2013) Depth frame bit pixel
Technically all the bits have specific functions. For instance, the first three bits are
used as players identifiers whole the following 13 gives the distance in mm. The fol-
lowing programming operations are used to calculate the bits.
Figure 35- Chin et al (2013) Algorithm depth distance
To calculate depth distance, 5 tests were done for the two ranges, default range and
near range. Distance range goes from 200mm to 4000mm with a differentiation of
100mm. Objective of the 5 tests was to approximate the average distance. The 1st
equation calculated the average distance for the experiment. Additionally, the AMPE
(Absolute mean percentage error) was calculated to establish the best range of esti-
mate. Further, standard deviation was calculated to ascertain the precision of the pro-
vided depth data. To analyze consistency the Kuder-Richardson was used.
Summary of the equations used for the experiment is as shown below.
Average, x (mm) = (1)
AMPE(%) = | X | (2)
Standard Deviation = (3)
i is the number of each test, i= 1,2,3,4,5
x is the average of each distance
indicates to sum
n is the total number of test taken which n=5
rkr20 = (4)
rkr20 is the kuder-richardson formula 20
k is the total number of test items
p is the distance of testing is pass ±5 mm
q is the distance of testing is fail
is the variation of entire test
Equation 1- Chin et al (2013) Experiment equations
To conduct the experiment, the researchers placed a box (cardboard) in the field of
view of the sensor. The range of the application is between 200mm and 400mm. The
experiment was designed to primarily focus on the centre of view. Cardborad bodies
are used instead of human bodies. This was to channel the focus to depth distance as
opposed to detection of a human frame. Human frame (bodies) may present errors
due to curved outline surface.
Different range estimations.
The experiment used Microsoft SDK for the programming part (software
framework). The SDK, available for windows allows for depth skeleton tracking
which creates anomated avatar images. According to the software package of the
SDK, it’s indicated that the SDK supports depth values of up to 4000mm. The SDK,
has an upper limit depth of 800mm and a lower limit of 500mm. Below is an
experimental result of the comparison between debt distance and lower distance.
Equation 2- Chin et al (2013) Default range vs near range
Figure 36- Chin et al (2013) Average depth distance vs Actual distance
With the default range the Kinect sensor was able to show images for object as far as
4000mm in front of the camera and those cloase as 500mm. At the distances, the
sensor was capable of assuring accuracy, relaibility and precision. Further, less than
1% error (AMPE) was recorded between the range of 600mm to 2900mm. Above
graph graph shows a similar quadratic shape that plots near the actual range. The
error depth is at 1.5%.
The experiment concluded that for the two ranges, there exist different depth quality
images. The default range returns all human joints (20). On the other hand, the near
range returns 10 joints. For near range, the sensor tends to focus on the users head,
hands and torso. This is because at a near range, the sensor had a limited view
because of the close distance. Default range can be used for a lot of applications like
facial recognition, human pose estimation and robotics.
Figure 37- Chin et al (2013) Accuracy analysis AMPE vs Distance
Figure 38- Chin et al (2013) Precision analysis std vs Distance
In general, the researchers concluded that the Kinect sensor provides object infor-
mation with a high level of precision and accuracy. The Kinect could also be relied
to provide accurate distance. Additionally, the following conclusions were made by
the researchers;
Kinect sensors has low errors in measurement of depth distance. The error is
only more pronounced at 600mm low range and 2900mm high range.
There is a quadratic increase in random error of the sensor up to a maximum
range of 40mm.
The Kinect sensor shows consistency in different distance ranges.
The researchers recommend about 600mm to 3000mm for biomedical applications.
Different researchers like Alexiadis (2017), have proposed alternative methods for
motion and 3-D body detection using RGB-D streams. The method is based on vol-
umetric Fourier transformation method. The researchers also proposed a qualitative
evaluation framework for real-time 3-D reconstruction systems. In their paper, they
propose elements and ways of capturing and reconstruction of human 3-D appear-
ance.
For the system setup, the devices were placed in a radius pointing towards the loca-
tion of the object being captured. The radius used was [2m, 4m]. Because of the limi-
tation presented by Kinect v2 (one sensor per computer) a network architecture was
setup. For image storage, RGB JPEG and LZ4 (for compression) was used. The
models allowed for on-line constructions of 3-D images and higher quality results.
The mapping calibration was approximated using a fixed KRT matrix. The approxi-
mations were based on a dense 3-D rigid condensation. The SDK Kinect package
was used for the programming part. The below figure shows the calibration setup.
The researchers achieved external calibration through the use of “novel registration
model”. The model uses and easy to build structure that works as an anchor in which
all registration is featured. The model is built on Scale Invariant Feature Transform
(SIFT). The advantage of the calibration is that after it was set up, no human input
was further required. For the calibration object, the researchers went for an easy to
obtain materials and one which had unique patterns that could support a SIFT fea-
ture. The IKEA standard package box was used as a calibration structure. The image
of the box is as seen in figure-40, the size of the box used was 56×33×41 cm3.
The calibration procedure involved placing the calibration structure at the center of
the room where it could be properly captured by all the sensors. For each view point
image, a color image and depth image is captured. Since there are more than one
Kinect sensors in all sections of the space the researchers concluded it was better to
not operate the sensors simultaneously to avoid interference. Additionally, the re-
searchers performed a quick post synchronization procedure to quickly synchronize
the data obtained. The data was synchronized in 16msec.
As per the 3-D image texture, the vertices were clearly visible in multiple RGB cam-
eras. The different colors from the RGB cameras were synchronized to produce eve-
ry reconstructed vertex. The researchers found that the color quality of the image
significantly depended on the angle of view. To this end, the researchers assigned a
smaller weight to color information at the object boundaries.
Evaluation- the researchers use a capturing system fitted with calibrated RGB-D
sensors for performance evaluation. The kinect sensors are also used in the recon-
struction procedure and serve as checks and balances to ensure accuracy. The captur-
ing system is as shown below.
Figure 39-Alexiadis et al (2017)3-D Camera and sensor setup
In terms of volume, the researchers observed the image was distorted. The 3-D image
suffered from cut limbs, holes and other different distortions. On the other hand, the
appearance quality, defined by the image quality was measured using the Structural
Similarity Index-based Measure (SSIM).
After evaluation and determination of the algorithm to be used in volume-based
tracking. The researchers proposed the below stages for the proposed model.
2-4m
C3
C5
C2
C4
C0
C1
Figure 40- Alexiadis et al (2017) Stages for the proposed model
Results of the experiments show a reliable quality reconstruction technique. The re-
sults were mostly obtained through the Kinect v2 sensors in different configuration
and spatial set up. Even though the Kinect v2 provided above board quality pictures,
the reconstruction done by the researcher’s present image quality with less distortion.
The reconstruction method employed resembles the poison reconstruction as it ren-
ders images that are properly blended in color and texture. The quality of the recon-
struction is better than TSDF-based reconstruction.
Figure 41- Alexiadis et al (2017) Image quality reconstruction; Kinect data,
waterlight geometry and Poisson
Generally, the researchers set out to describe the key elements for a system that
tracks and captures real time 3-D images including skeletal motion movements. The
researchers propose a novel system for 3-D reconstruction that is replicable since the
elements used are widely available. Limitations such as the non-perfect
synchronization of RGB cameras is discussed and expounded. Further, the
researchers recommended areas of improvement like visual quality and frame rates.
For instance, pre-scanned users face can be used to reconstruct the face since it’s one
of the most important part of the bidy reconstruction.
For the experiment, the Kinect sensor became the most important component since
there it was able to correctly recreate and provide high quality images.
In the field of motion tracking, researchers like Tahavori et al (2013), have also
contributed a lot in testing the technical capabilities of the Kinect sensor. For their
research, the researchers used both Kinect for windows and Kinect for Xbox to check
the ideal device in detecting respiratory motion in patients. The results were that the
Kinect for windows gave a more accurate detection with error in the range of less
than 2 mm. The goal of the experiment was to use Kinect for depth distribution on
the body of the patient, this will then allow monitoring of the patient’s motion. The
researchers wanted to know the potential of using Kinect for measuring and detecting
respiratory motion.
To investigate the technical capabilities of the Kinect, the researchers used a planar
object that was mounted on an optical rail. The rail was used to ensure precision of
measurements. The researchers also made use of the Gail motion controller to inves-
tigate the respiratory displacement. There were volunteers in the experiment.
To check for the technical capabilities of the Kinect for Xbox vs windows they were
both mounted on the rail.
Below graph shows the performance of the two devices.
Figure 42- Tahavori et al (2013) Kinect for Xbox vs Windows
Above data was analysed using Matlab and the kinect programming package SDK.
To reduce noise, the data was averaged at 1000 frames of depth. For the experiment,
distance was varied with a range of 40-140cm nthen data was recorded for both
devices in normal and near mode. As seen from figure-43, both devices have a lower
linit of 50 cm.
The result showed that Kinect for Windows had a higher accuracy and precision
level compared to the Xbox Kinect. To further check the performance of the Kinect
for windows, the researchers condicted further tests. The rsearchers used a
rectangular box whith a measurement of 20cm × 20cm × 5cm. The box was placed
on the rail and the distance was varied around the range of 80-100cm. The box was
then moved in steps from the sensor and measured at 2mm, 5mm and 10mm. This
was done both in normal and near mode. It was concluded that the Kinect sensor for
windows in near mode showed an error of <1mm.
The researchers also analyzed the rotational accuracy of the Kinect sensor. This was
done by use of a rectangular object which was placed in front of the Kinect sensor.
To estimate accuracy of the Kinect sensor, the known rotational difference between
the Kinect sensor and the test object was compared. The results obtained were as
below;
Ground Truth Normal Mode Near-mode
3⸰ 1.4⸰ 2.5⸰
4⸰ 2.4⸰ 4.8⸰
7⸰ 3.6⸰ 6.8⸰
Table 2- Tahavori et al (2013) Normal mode vs near mode rotational results
There also exist research in other movement detection aspects like pose estimation. A
research conducted by Sengupta and Ohya (1996), showed how multiple cameras can
be effectively used to analyze a person’s pose. The aim of the paper was to introduce
a method of easily obtaining an approximation of the pose of a 3-D or a 2-D image.
The researchers make use of a 3-CAD model on which they hypothesize a set of
models using a spatial extent function. In the case, the hypothesized points are then
used to derive the pose parameter.
In the experiment setting, the researchers assume everything in the space is in 3-D
and they are modelled in advance. To pose the image of a human body, a new two
staged edge is proposed. The two staged edge is as show in the below image.
Figure 43- Sengupta and Ohya (1996) Two staged pose estimation illustration
Figure 44- Sengupta and Ohya (1996) back projection method estimation
The process of approximating the images, first involves processing of images ob-
tained from the multiple cameras as seen in figure -46. The images are obtained by
background subtraction and thresholding. To obtain a pose estimate, the image is
solved as a 3-D image and not as a CAD 2-D image. The 4*3 camera calibration ma-
trix is obtained and is then used to calculate the back projected ray.
When finding the approximate pose estimation, the researcher obtains a 3*3 matrix, a
3*1 rotation matrix which when calibrated maps a 3-D point with co-ordinate values
of X in the CAD model. The exact mapping between the three non-collinear points is
projected with volume V.
Experimental results showed that the cameras could successfully extract the edges of
the model and transform it to the appropriate pose parameter set by the researchers.
Results from this experiment provided the basis upon which several later experi-
ments were conducted in regard to Kinect development.
To illustrate the pose estimation technique, the researchers conducted the experiment
in a controlled environment which had CAD model of a human head positioned at
equal intervals from the cameras in a semi-circle. The edges of the models were
found using a zero crossing edge detector. For the experiments, the silhouette was
separated manually. Transformation parameters were calculated for each rigidity
constraint. The figure obtained from the experiments are as seen below.
Figure 45- Sengupta and Ohya (1996) images used for the experiment
Figure 46- Sengupta and Ohya (1996) extracted silhouette images
Figure 47- Sengupta and Ohya (1996) rendered images from the parameter set
Figure 48- Sengupta and Ohya (1996) rendered images of the transferred model
The researchers through the experiment presented a theoretical technique of pose
estimation. The designed algorithm could extract and estimate the edge of the silhou-
ettes through use of a spatial extent function. To verify the pose parameters, each
image was projected to the model images. This leads to better refining of pose pa-
rameters.
Finally, a stable value is obtained by constant repetition of the process. This is done
until a reasonable pose is obtained.
4.4 Skeletal Tracking systems
Several studies have delved into the research of how Kinect can be applied to skele-
tal tracking. Tao et al (2013), researched on the kinematic validity of Microsoft
Kinect in skeletal tracking for application in virtual limb rehabilitation. The research
investigated the extent to which Kinect can be used to track hand position, limbs,
ankles and body trunk. For the experiment, cameras were positioned in the range of
1.45m and 1.75m. The goal was the experiment was to determine the extent to which
the Kinect sensor can be used for limb rehabilitation through use of preset and repeti-
tive tasks. Additionally, the precision of the Kinect sensor is determined and ana-
lyzed.
For the experiment, the researchers used Optotrack 3-D motion capture system which
was placed on different locations. The participating individuals then performed dif-
ferent movements such as; leaning backwards, elbow flexing and trunk leaning. All
this was captured by a Kinect sensor placed at a height of 135 cm.
The results obtained from the experiment showed a simultaneous comparison be-
tween the sensor result and the motion capture system. The error of the mean squared
difference was then obtained. This showed that for the hand movements, the error
was 6.3cm and the variable error was at 2.4 cm. The constant error in all positions
was found to be less than 9.8cm. For trunk movements, obtained mean error was at
3.9 cm with a variable of 2.5cm.
Below figure shows the constant camera error with respect to the distance;
Figure 49- Tao et al (2013) constant camera error
Overall, the data obtained from the sensor, closely matched that from the Optotrack
motion tracker except that of the elbow. The elbow tracking showed varying results
because of the limitations of the Kinect sensor in modeling of the elbow.
The research concluded that the appropriate location of the camera with respect to
Kinect Skeletal tracking ought to be at 30*30 square and 1.45m and 1.75m from the
user, the camera can also vary with a distance of 0.15m to the left or to the right.
Figure 50- Tao et al (2013) variable camera error
In regards to geometric refinement and pose estimation, there was a research
conducted by Choe et al (2014), the researcher aim was to improve the accuracy of
Kinect camera for depth recognition and image reconstruction. The researchers used
a 3-D mesh to optimize the geometric refinement process. The approach the
researchers toom does not require additional Kinect camera or complex setup.
Effectively, the researchers were able to utilize shading information to perfect the
geometry refinement. The researcher used different lighting conditions to verify the
invariability of Kinect IR images.
Figure 51- Choe et al (2014) invariability of IR images and RGB under different
lighting conditions
The data capturing process the researchers used consisted of discrete IR shading im-
age acquisition. The Kinect fusion was used to obtain the first mesh as shown in fig-
ure-53 below.
The Kinect SDK is used to register depth map with a reconstructed surface.
Figure 52- Choe et al (2014) Data capturing system, used to obtain the base
mesh
Figure 53- Choe et al (2014) input shading image, projected mesh and depth
map
The research demonstrated that the captured IR images do not result in any overlap-
ping visible spectrum. The researcher also described a method of radiometrically
calibrating the Kinect IR. The research assumes that Lambertian BRDF which made
the result erratic.
5 Findings and conclusion
5.1 Findings
Microsoft Kinect presents and important technology that can be used in a wide array
of applications including assisting patients in assisted living environments. A lot of
research and experiments have been conducted to test the clinical and technical ca-
pability of the Kinect in physical therapy and body parts rehabilitation. Documentary
analysis conducted in chapter 4 covered some of the biggest and most important ex-
periments conducted to assess the technical capabilities of Kinect components such
as the IR camera. Increase in research interest in Microsoft signifies its relevance in
applications such as use in assisted living environments.
For ease of research, studies of the application of Kinect in Assisted Living environ-
ments can be classified into;
1. Experiments that evaluated accuracy, reliability and precision of Microsoft
Kinect
2. Experiments that evaluated the application of Kinect in Clinical settings and
Smart Home environments
3. Experiments that investigated use of Kinect for Movement Detection Models
4. Experiments that investigated use of Kinect in Skeletal tracking systems
Normally, in assisted living environments assessment is done by people who can be
doctors, nurses or hospital volunteers. This means that the assessment relies heavily
on a human touch which means higher labor costs and low scalability. For instance,
an activity like therapy would require a specialised/trained Physical therapist (PT) or
Occupational therapist (OCT). Given that these kinds of clinical assessment can be
done by people, it is subject to errors and inaccuracies.
To solve this problem, researchers are testing motion sensors. Notably, motion sen-
sors have in the past few years received significant interest because of their afforda-
bility and practicality. The commonly technologies used for motion sensing are opto-
electronics and nonoptoelectronics sensors. While optoelectronics use markers,
nonoptoelctronics do not. In instances where markers are used, they are placed on the
bodies of the individuals which are then tracked by a camera sensor. Where markers
are not used the sensors apply inertia, mechanical and magnetic techniques to track
motion.
Our findings show that Kinect can be used both for optoelectronics and nonoptoelc-
tronics experiments. For inertia systems as seen in chapter-4, researchers use sensor
fusion algorithms and human skeletal algorithms. Magnetic systems on the other
hand use motion capture technologies to transmit and receive signals that can be used
for position, orientation and pose of receiver. In studied experiments, we have seen
that the sensors are 6 DoF per receiver which is able to provide 3-D positioning.
Findings also show that Kinect can also be used in collaboration with wearable tech-
nologies such as wearable sensors, smart suits and music gloves. These devices to-
gether with Kinect sensors are able to follow the user’s motion passively or actively.
Review of visual based motion trackers show that they either use contrast based or
depth based imaging. Contrast based systems work by tracking different colour
markers attached to the bodies or hands that are being tracked. Depth-sensing sys-
tems use depth imagery segmentation and vision algorithms to track and detect hu-
man motions.
Importance of the Microsoft Kinect compared to other previous camera
Compared to other cameras, we found that Kinect has a lot of advantages and fea-
tures that make it ideal for motion tracking. For instance, Microsoft Kinect provides
a Software Development Kit that gives developers important access to body joint
positions.
Specification of the Kinect that are ideal for motion tracking include; RGB camera,
multi-array microphone, infrared projector and CMOS sensor. According to the ex-
periments analysed in chapter 4, it was found that truly, Kinect sensor can handle
both depth and infrared streams at 640X480 pixels which can be increased when
needed to 1280X1024. The stream supports 8-bit resolution and can accommodate
VGA or UYVY colour format.
The senor can be adjusted to near range or default range. At near range people within
0.4-3m are visible while in default range visibility is at 0.8m-2.5m. The microphone
is capable of processing 4 channels of 16-but audio at a rate of 16 kHz. The sensor
can visualize 6 people but is only capable of tracking 2 people at a time.
Reliability and Accuracy of Kinect
From analyse papers, it’s obvious that a lot of researchers have tried to ascertain the
reliability and accuracy of Kinect sensor. Generally, most researchers agree that Ki-
nect is good as a motion capture mainly because it’s easily available and affordable.
However, researchers point out that the technology suffers from occlusion. It has
been observed that at time, the Kinect sensor would recognise chair legs like they are
human legs. This means that for successful tracking, problems brought about by oc-
clusion need to be effectively addressed.
Important to note, accuracy tests of the Kinect show that its sensors are accurate
enough for use in smart living or assisted living environments. In a trial to test Kinect
application in assisted living environments, Dutta (2014) compared Kinect to Vicon
in the tracking of motion. The result of the research showed that in monitoring of
elderly falling Kinect was accurate enough to be used. In a different research on the
accuracy, Kurillo et al (2013) found that Microsoft Kinect provided more reliability
compared to MoCap system. In terms of range of motion measurements, Kinect
proved to be a more accurate measure compared to MoCap. This is at the backdrop
of research in different areas such as hip abduction, elbow flexing, knee flexing and
shoulder abductions. Other researchers like Hawi et al (2014) showed that Kinect had
an exemplary test-retest reliability but had low accuracy compared to goniometers.
The most important finding from all the literature sets studied in Chapter-4 was that
Kinect can be reliably used as a depth sensor. However, developers should factor in
occlusion issues and the noise usually experienced in skeletal tracking. Researchers
also agree that to solve most of the challenges presented by Kinect Kalman filters,
sensor fusion and calibration should be used.
Findings on Application of Kinect to patients with Neurological Disorders
Key application highlighted in this research is the use of Kinect in assisted living
environments. Assisted living environments usually has patients with different needs
like those with chronic diseases that require specialised care. Researchers such as
Llorens et al (2013) have pioneered research in this area with encouraging findings.
The researcher created a game that promote rehabilitation activities in patients with
Neurological Disorders. Clinical tests of the game showed significant improvements
in body balance and mobility of the patients.
Research conducted by Exell et al (2014) showed that electrical simulation could be
used to rehabilitate a patient’s arm. In conclusion, the researcher insisted that Kinect
is accurate enough although it needs further research.
5.2 Conclusions
This paper reviewed different literature and experiment results on Microsoft Kinect
in the field for assisted living. First, similar experiments using other technologies
were reviewed that aimed to provide solutions in assisted living environments. Limi-
tations and errors presented by these systems were analyzed and discussed. Previous
systems used in motion tracking were not as effective as Kinect. These systems were
only able to track specific body parts like palm, hand and face. The systems were
also not interactive and did not provide ease in programming the way the Kinect pro-
vides the Software Development Kit (SDK) which provides programmable access to
skeletal tracking.
The arrival of the Kinect has ushered a new age for motion sensors. Today, there
exist numerous research on the application of Kinect in motion sensing and in smart
environments. Kinect has proven to be more accurate, precise and reliable compared
to RGB systems. However, Kinect is not without limitation and faults. Issues such as
occlusion and noise still exist and require improvement. In assisted living environ-
ments, these problems can be significantly reduced by Kalman filtering, calibration
and sensor fusion.
Further, this researched discussed evaluation and performance of Kinect in assisted
living environments with patients requiring different levels of attention. Studies in
this area targeted different monitoring architectures and infrastructure in both home
environments and in specialized hospital monitoring environments. Experiments uti-
lized different body movements, games, cognitive therapy and exercises. Some ex-
periments resulted into successful assessment of falls, movements and even postures.
However, other studies lacked clinical evaluation of the results which raises ques-
tions on the effectiveness of the experiments.
In addition, this research compared Kinect with other sensor technologies both as a
whole and in the form of component by component. Examples of other analyzed de-
vices included Leap motion, Asus Xtion and Intel Creative Cameras. Although the
different cameras were suited for different small functions, Kinect proved to be the
better option for full body tracking.
The rapid growth in the field of smart environment and assisted living, and the con-
tinuous advancement in the field of artificial intelligence, have opened the possibility
for many options for further work in this study. A viable option is testing the Mi-
crosoft Kinect sensor with the Smart Environment for Assisted Living (SEAL) appli-
cations. In this case, Kinect could be the eyes and brains of the SEAL app, helping to
monitor the patient in real time. It will also be sending real time updates about the
patient into the SEAL app, and activating the SEAL app alarm in cases where by the
patient seems to be in danger. The alarm could be calling for help whenever the pa-
tient falls and he/she is not able to help him/herself up, and it could also send mes-
sages to the doctors or nurses when no activity (movement, breathing, etc.) is record-
ed from patient for a period. Kinect sensor could also be used for other machine
learning related studies.
References
Hemant Ghayvat, Jie Liu, Subhas Chandra Mukhopadhyay, and Xiang Gui “Well-
ness Sensor Networks: A Proposal and Implementation for Smart Homefor Assisted
Living” IEEE SENSORS JOURNAL, VOL. 15, NO. 12, PP 17341-17344 December
2015.
Lin Yang, Longyu Zhang, Haiwei Dong, Abdulhameed Alelaiwi, and Abdulmotaleb
El Saddik “Evaluating and Improving the Depth Accuracy of Kinect for Windows
v2” IEEE SENSORS JOURNAL, VOL. 15, NO. 8, PP 4275-4277August 2015.
Marek R. Ogiela, Tomasz Hachaj and Katarzyna Koptyra “Effectiveness comparison
of Kinect and Kinect 2 for recognition of Oyama karate techniques” 18th Interna-
tional Conference on Network-Based Information Systems, 2015.
Teng Deng1, Hui Li1, Jianfei Cai1, Tat-Jen Cham1, Henry Fuchs “Kinect Shadow
Detection and Classification” 2013 IEEE International Conference on Computer Vi-
sion Workshops
Joshua Fabian, Tyler Young and James C. Peyton Jones “Integration of Microsoft
Kinect With Simulink: Real-Time Object Tracking Example” EEE/ASME Transac-
tions On Mechatronics, vol. 19, no. 1, February 2014
Zhang, Z., 2012. Microsoft kinect sensor and its effect. IEEE multimedia, 19(2),
pp.4-10.
Han, J., Shao, L., Xu, D. and Shotton, J., 2013. Enhanced computer vision with mi-
crosoft kinect sensor: A review. IEEE transactions on cybernetics, 43(5), pp.1318-
1334.
Lange, B., Chang, C.Y., Suma, E., Newman, B., Rizzo, A.S. and Bolas, M., 2011,
August. Development and evaluation of low cost game-based balance rehabilitation
tool using the Microsoft Kinect sensor. In Engineering in medicine and biology soci-
ety, EMBC, 2011 annual international conference of the IEEE (pp. 1831-1834).
IEEE.
Stowers, J., Hayes, M. and Bainbridge-Smith, A., 2011, April. Altitude control of a
quadrotor helicopter using depth map from Microsoft Kinect sensor. In Mechatronics
(ICM), 2011 IEEE International Conference on (pp. 358-362). IEEE.
Galna, B., Barry, G., Jackson, D., Mhiripiri, D., Olivier, P. and Rochester, L., 2014.
Accuracy of the Microsoft Kinect sensor for measuring movement in people with
Parkinson's disease. Gait & posture, 39(4), pp.1062-1068.
Villaroman, N., Rowe, D. and Swan, B., 2011, October. Teaching natural user inter-
action using OpenNI and the Microsoft Kinect sensor. In Proceedings of the 2011
conference on Information technology education (pp. 227-232). ACM.
Azzari, G., Goulden, M.L. and Rusu, R.B., 2013. Rapid characterization of vegeta-
tion structure with a microsoft kinect sensor. Sensors, 13(2), pp.2384-2398.
Chang, C.Y., Lange, B., Zhang, M., Koenig, S., Requejo, P., Somboon, N., Sawchuk,
A.A. and Rizzo, A.A., 2012, May. Towards pervasive physical rehabilitation using
Microsoft Kinect. In Pervasive Computing Technologies for Healthcare
(PervasiveHealth), 2012 6th International Conference on (pp. 159-162). IEEE.
Biswas, K.K. and Basu, S.K., 2011, December. Gesture recognition using microsoft
kinect®. In Automation, Robotics and Applications (ICARA), 2011 5th International
Conference on (pp. 100-103). IEEE.
El-laithy, R.A., Huang, J. and Yeh, M., 2012, April. Study on the use of Microsoft
Kinect for robotics applications. In Position Location and Navigation Symposium
(PLANS), 2012 IEEE/ION (pp. 1280-1288). IEEE.
Gonzalez-Jorge, H., Riveiro, B., Vazquez-Fernandez, E., Martínez-Sánchez, J. and
Arias, P., 2013. Metrological evaluation of microsoft kinect and asus xtion sen-
sors. Measurement, 46(6), pp.1800-1806.
Moazzam, I., Kamal, K., Mathavan, S., Usman, S. and Rahman, M., 2013, October.
Metrology and visualization of potholes using the microsoft kinect sensor.
In Intelligent Transportation Systems-(ITSC), 2013 16th International IEEE Confer-
ence on (pp. 1284-1291). IEEE.
Kawatsu, C., Li, J. and Chung, C.J., 2013. Development of a fall detection system
with Microsoft Kinect. In Robot Intelligence Technology and Applications 2012 (pp.
623-630). Springer Berlin Heidelberg.
Kawatsu, C., Li, J. and Chung, C.J., 2013. Development of a fall detection system
with Microsoft Kinect. In Robot Intelligence Technology and Applications 2012 (pp.
623-630). Springer Berlin Heidelberg.
Weerasinghe, I.T., Ruwanpura, J.Y., Boyd, J.E. and Habib, A.F., 2012. Application
of Microsoft Kinect sensor for tracking construction workers. In Construction Re-
search Congress 2012: Construction Challenges in a Flat World (pp. 858-867).
Araujo, R.M., Graña, G. and Andersson, V., 2013, March. Towards skeleton bio-
metric identification using the microsoft kinect sensor. In Proceedings of the 28th
Annual ACM Symposium on Applied Computing (pp. 21-26). ACM.
Bevilacqua, V., Nuzzolese, N., Barone, D., Pantaleo, M., Suma, M., D'Ambruoso,
D., ... & Stroppa, F. (2014, June). Fall detection in indoor environment with kinect
sensor. In Innovations in Intelligent Systems and Applications (INISTA) Proceedings,
2014 IEEE International Symposium on (pp. 319-324). IEEE.
Konstantinidis, E. I., Antoniou, P. E., Bamparopoulos, G., & Bamidis, P. D. (2015).
A lightweight framework for transparent cross platform communication of controller
data in ambient assisted living environments. Information Sciences, 300, 124-139.
Anastasiou, D. (2011, May). Gestures in assisted living environments.
In International Gesture Workshop (pp. 1-12). Springer, Berlin, Heidelberg.
Baeg, S. H., Park, J. H., Koh, J., Park, K. W., & Baeg, M. H. (2007, October). Build-
ing a smart home environment for service robots based on RFID and sensor net-
works. In Control, Automation and Systems, 2007. ICCAS'07. International Confer-
ence on (pp. 1078-1082). IEEE.
Hondori, H.M., Khademi, M., Dodakian, L., Cramer, S.C. and Lopes, C.V., 2013. A
spatial augmented reality rehab system for post-stroke hand rehabilitation.
In MMVR (pp. 279-285).
Mohamed, A.B.H., Val, T., Andrieux, L. and Kachouri, A., 2013, January. Assisting
people with disabilities through Kinect sensors into a smart house. In Computer Med-
ical Applications (ICCMA), 2013 International Conference on (pp. 1-5). IEEE.
Chin, L.C., Basah, S.N., Yaacob, S., Din, M.Y. and Juan, Y.E., 2015, March. Accu-
racy and reliability of optimum distance for high performance Kinect Sensor.
In Biomedical Engineering (ICoBE), 2015 2nd International Conference on (pp. 1-
7). IEEE.
Alexiadis, D.S., Chatzitofis, A., Zioulis, N., Zoidi, O., Louizis, G., Zarpalas, D. and
Daras, P., 2017. An Integrated Platform for Live 3D Human Reconstruction and Mo-
tion Capturing. IEEE Transactions on Circuits and Systems for Video Technolo-
gy, 27(4), pp.798-813.
Tahavori, F., Alnowami, M., Jones, J., Elangovan, P., Donovan, E. and Wells, K.,
2013, October. Assessment of Microsoft Kinect technology (Kinect for Xbox and
Kinect for Windows) for patient monitoring during external beam radiotherapy.
In Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2013
IEEE (pp. 1-5). IEEE.
Sengupta, K. and Ohya, J., 1996, November. Pose estimation of human body part
using multiple cameras. In Robot and Human Communication, 1996., 5th IEEE In-
ternational Workshop on (pp. 146-151). IEEE.
Tao, G., Archambault, P.S. and Levin, M.F., 2013, August. Evaluation of Kinect
skeletal tracking in a virtual reality rehabilitation system for upper limb hemiparesis.
In Virtual Rehabilitation (ICVR), 2013 International Conference on (pp. 164-165).
IEEE.
Choe, G., Park, J., Tai, Y.W. and So Kweon, I., 2014. Exploiting shading cues in
kinect ir images for geometry refinement. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 3922-3929).