Arindam batabyal literature reviewpresentation

3-D Mapping using Microsoft

Kinect

Kinect is a motion sensing input device by Microsoft (2010) for Xbox 360 and Xbox One video game consoles and Windows PCs.

It is based around a webcam-style add-on peripheral, it enables users to

control and interact with their console/computer without the need for a game controller, through a natural user interface using gestures and spoken commands.

It has an infrared receiver, infrared transmitter, RGB camera and a

microphone array for taking real time input from the environment . The Kinect’s depth sensor produces depth image at 640x480 pixels. The

sensor has field of view of 58° horizontally and 45° vertically. The optimal operating range of the sensor is said to be between 0.8 to 3.5m that can be extended up to 0.6 to 6m

This paper describes the use of Microsoft Kinect for building maps of rooms. These maps can be used for navigation and robot path finder. The Kinect system is arguably the most popular 3-D camera technology currently on the market.

Mapping rooms has become a trend since the release of Microsoft Kinect.

The wide availability of affordable RGB-D cameras is causing a revolution in perception and changing the landscape of robotics and related fields.

The Kinect depth’s sensor has become popular replacing the expensive systems like tilting laser range finders and stereoscopic systems. In this paper, we explore some of the recent trends for mapping using kinect

Flow Chart of the System

• Teleoperation means controlling robotic vehicles from a remote or distant location

• It implies a client-server architecture where one machine controls the vehicle and collects the telemetry, and another machine has the operator which visualizes the telemetry and generates commands for the machine

Region of Interest (ROI): The Region Of Interest (ROI) is a region of the image that is specially chosen for successive image analysis operations for the robotic entity. All of the pixels outside of the ROI are discarded, in this case for our particular robot and navigating around spaces, the region of interest for us is the "available space" for movements.

The Gaussian Image in Computer Vision lets machines differentiate between

primitive geometric objects. It maps geometric objects based on area, position and normal direction as a complex value.

Kalman Filter Framework: This algorithm is used for estimating the state of a linear or non-linear dynamic system from noisy measurements. It is extremely efficient with Robotics and computer vision as well. The Kalman filter also provides a statistically robust framework for fusing different measurement modalities. The filter maintains an estimate of the uncertainty in the tracked parameters, which can be useful for evaluating tracking performance.

Obstacle Detection & Avoidance System

In Simultaneous Localization and Mapping (SLAM), the robot is left in an unknown location. The robot moves and builds a consistent map of its surrounding Apart from the above mentioned algorithms Extended Kalman Filter, Fast SLAM and Occupancy Grid Map is used. Most of the algorithms used odometry and proximity sensors to implement localization and mapping.

RGB-D SLAM

The SLAM is divided into three steps:

Combining the RGB and

depth images: The RGB and depth images of the current frame must be combined in order to create a 3D point cloud relative to the Kinect camera. The depth data is obtained directly from the IR camera, the IR and RGB images can be mapped to one another in the same manner as a traditional stereo camera setup. But the cameras must be calibrated.

RGB-D SLAM Algorithm

A top down, orthographic view of a three dimensional map generated from Kinect data with a map created by the SLAM library

Feature Extraction and Matching: This step is a major part of the SLAM front-end, which is responsible for establishing spatial relations from the sensor data. There are two phases in this step: the extraction of interest points or features from the RGB image of the current frame (converted to gray scale), and then matching or tracking those points back to the RGB image in the previous frame. An additional constraint must is added to the feature matching algorithm, each matching pair of features must have a corresponding 3D point in their respective frames. This step is the most computationally expensive in the algorithm when implemented on mobile hardware. Therefore, the speed of the system, as a whole, is dependent on the methods used for feature detection and matching.

Graph Optimization and Map Building: The pairwise transformations between sensor poses, as computed by the front-end, form the edges of a pose graph. Due to estimation errors, the edges form no globally consistent trajectory. To create a globally consistent trajectory we optimize the pose graph using the g2o framework. The g2o framework is an easily extensible graph optimizer that can be applied to a wide range of problems including several variants of SLAM and bundle adjustment.

It computes a globally consistent trajectory.

Vehicle Pose Error Sensor Random Error

The errors are minimized using octomap. The

OctoMap library implements a 3D occupancy grid mapping approach, providing data structures and mapping algorithms in C++ particularly suited for robotics.

3-D Mapping: The three dimensional mapping used by the researchers can be improved up on in several ways.

Latency Reduction: In teleoperation there is some

latency between commanding the vehicle and receiving updated telemetry.

Color Image Integration: Adding the data from the color image on the Kinect would add to the photo realism of the teleoperation telemetry.

FUTURE WORK

Military Healthcare Engineering Design

In this presentation, the SLAM for three dimensional mapping have been explored in detail.

SLAM extracts visual keypoints from the color images and uses the depth images to localize the keypoints in 3D.

In future researchers will be trying to attempt shadow tracking where the Kinect will be placed on CSUF's Unmanned Utility Robotic Ground Vehicle (UURGV) to follow a person around until that person stops.

A big advantage is that the Kinect sensor is very cheap costing about $ 150 at this time, in comparison with other depth cameras and laser sensors which can cost up to several thousand dollars.

I would like to thank Dr. Sudeep Pasricha for teaching us Embedded Design of Hardware/Software systems and making it a wonderful experience.