From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices

From Sense to Print: Towards Automatic 3DPrinting from 3D Sensing Devices

Nadia Figueroa∗, Haiwei Dong∗, Abdulmotaleb El Saddik∗†∗Division of Engineering, New York University Abu Dhabi

P.O. Box 129188, Abu Dhabi, UAE

Email: {nadia.figueroa, haiwei.dong}@nyu.edu†School of Electrical Engineering and Computer Science, University of Ottawa

800 King Edward, Ottawa, Ontario, Canada

Email: [email protected]

Abstract—In this paper, we introduce the From Sense to Printsystem. It is a system where a 3D sensing device connectedto the cloud is used to reconstruct an object or a human andgenerate 3D CAD models which are sent automatically to a 3Dprinter. In other words, we generate ready-to-print 3D models ofobjects without manual intervention in the processing pipeline.Our proposed system is validated with an experimental prototypeusing the Kinect sensor as the 3D sensing device, the KinectFusionalgorithm as our reconstruction algorithm and a fused depositionmodeling (FDM) 3D printer. In order for the pipeline to beautomatic, we propose a semantic segmentation algorithm appliedto the 3D reconstructed object, based on the tracked cameraposes obtained from the reconstruction phase. The segmentationalgorithm works with both inanimate objects lying on a table/flooror with humans. Furthermore, we automatically scale the modelto fit in the maximum volume of the 3D printer at hand. Finally,we present initial results from our experimental prototype anddiscuss the current limitations.

I. INTRODUCTION

In the past few years, researchers have actively taken the

task to explore 3D vision technologies, even more since the

release of the Kinect sensor [1]. These technologies have

enabled us to create 3D models of environments, objects or

even humans. Having 3D models enables us to create 3D

virtual worlds which not only resemble the real world but

actually emulate it. We can also create 3D Computer-Aided-

Design (CAD) models of real world objects to analyze or

create replicas. In this paper, we are interested in 3D printing

of objects and humans from 3D reconstructed models.

3D printers have been around for about 30 years, however

only recently they are being available for the general public.

Moreover, companies like iMaterialise [2] or Shapeways [3]

are 3D printing services where you can simply upload your

CAD model online, choose a material and in a few weeks your

3D printed object is delivered to your address. This procedure

is quite straight-forward when you have a water-tight (no holes)

polygon mesh of your CAD model. When designing an object

from any modeling software this is possible. However, when

creating a 3D model from a reconstructed mesh, this requires

some post-processing steps such as segmentation, triangulation

and scaling of the object to the dimensions of the 3D printer.

Our goal is to introduce a system where a 3D sensing device

connected to the cloud will be used to reconstruct an object

or a human and generate 3D CAD models which will then

be automatically sent to a 3D printer. Due to the current an-

nouncements from Samsung developing a depth CMOS sensor

[4], Apple Inc. filing a patent for a 3D imaging camera for iOS

devices [5] and Primesense [6] releasing a miniature version of

their 3D sensing device ideal to be embedded in mobile devices

and consumer robotics, we predict that smartphones and smart

devices will be carrying these capabilities in the near-future.

Therefore, our effort is aimed at studying the workflow and

issues that could arise from such a process, with prototypes

and experiments resembling the future technologies.

In the following section we provide an overview of the

state-of-the-art of the three main components of our system:

(i) 3D sensing devices, (ii) 3D model reconstruction and

(iii) 3D printers. In Section III we describe the proposed

system architecture and present our implementation choices to

prove the applicability of the From Sense to Print system. In

Section III-E, we present interesting results from experiments

on objects and humans. Finally, we present our conclusions

and future work in Section IV.

II. RELATED WORK

A. 3D Sensing Devices

The goal of 3D sensing devices is to generate 3D rep-

resentations of the world from the viewpoint of a sensor.

These are generally in the form of 3D point clouds. Each

point p of a 3D point cloud has (x, y, z) coordinates relative

to the fixed coordinate system of the origin of the sensor.

Depending on the sensing device the point p can additionally

have color information such as (r, g, b) values. 3D point clouds

are generated from depth images or depth maps. In a depth

image, each pixel has a depth value assigned to it. These

depth values are the distances of surfaces in the world to the

origin of the camera. The result is a 2.5D representation of

the world. This representation can be easily converted to a

3D representation using the known geometry of the sensor.

The most common approaches for acquisition of depth images

2013 IEEE International Conference on Systems, Man, and Cybernetics

978-1-4799-0652-9/13 $31.00 © 2013 IEEE

DOI

4897

2013 IEEE International Conference on Systems, Man, and Cybernetics

978-1-4799-0652-9/13 $31.00 © 2013 IEEE

DOI 10.1109/SMC.2013.833

4897

have been: Time-of-Flight (TOF) systems and triangulation-

based systems [7].

TOF systems emit signals (i.e. sound, light, laser) and

measure the time it takes for the signal to bounce back

from a surface. Sensors such as LIDAR (Light Detection and

Ranging), radars, sonars, TOF cameras and Photonic-Mixing-

Device (PMD) cameras are part of this category. Triangulation-

based systems measure distances to surfaces by matching

correspondences viewed by two sensors. These two sensors

need to be calibrated to each other. Stereo and multi-camera

stereo vision systems lie in this category. A new category of

depth acquisition systems has grown popular since the launch

of the Kinect sensor. It consists of an infrared laser projector

combined with a monochrome CMOS sensor, which captures

video data in 3D. The technology is called light coding [6] and

it is a variant of image-based 3D reconstruction. In this work,

since we want to make a system as reproducibly as possible.

We have opted for using the Kinect sensor as our 3D sensing

device.

B. 3D Model Reconstruction

In this subsection, we address the problem of reconstruc-

tion of 3D models by using multiple views or point clouds

obtained by the 3D sensing device. These multiple views

can be obtained by moving the 3D sensing device around

the object until obtaining a full set of 360◦ views of the

object or the object can be spun 360◦ around its axis with

the 3D sensing device fixed on a certain viewing point. Full

3D models of the objects can be reconstructed by registering

these multiple views. Registration is the estimation of the rigid

motion (translations and rotations) of a set of points with

respect to another set of points, these can be image pixels or

3D points. This rigid motion can be estimated by using either

coarse or fine registrations methods, or a combination of both.

Coarse registration methods are RANSAC-based algorithms

that use sparse feature matching, firstly introduced by Chen et

al. [8] and Feldmar [9]. These generally provide an initial guess

to fine registration algorithms, which rely on minimizing point-

to-point, point-to-plane or plane-to-plane correspondences. The

methods for solving this are: (i) Genetic Algorithms and (ii)

Iterative Closest Point (ICP) variants [10]–[12]. In some cases

a coarse registration is not necessary, since the point clouds

are already very close to each other or semi-aligned a fine

registration suffices.

For the problem of registering multiple point clouds, many

approaches have been presented. An offline multiple view

registration method has been introduced by Pulli [13]. This

method computes pair-wise registrations as in initial step and

uses their alignments as constraints for a global optimization

step. This global optimization step registers the complete

set of point clouds simultaneously and diffuses the pair-wise

registration errors. A similar approach was also presented by

Nishino [14]. Chen and Medioni [11] developed a metaview

approach to register and merge views incrementally. Masuda

[15] introduced a method to bring pre-registered point clouds

into fine alignment using the signed distance functions. A

simple pair-wise incremental registration would suffice to

obtain a full model if the views contain no alignment errors.

This becomes a challenging task while dealing with noisy

datasets. Some approaches use an additional offline optimiza-

tion step to compensate the alignment errors for the set of rigid

transformations [16].

All of these previously mentioned algorithms are targeted

at using raw or filtered data from the 3D sensing device (i.e.

3D points), which lack a resulting 3D model that has a tight

surface representation of the object. Thus, in order to convert

these 3D reconstructions to 3D CAD models, several post-

processing steps need to be applied. Initially, the set of points

need to be transformed into a surface, this can be done by

meshing algorithms. Some popular meshing algorithms are

greedy triangulation [17], marching cubes [18] and poisson

reconstruction [19]. Once the 3D model is in a mesh form with

no holes it can be directly imported to any CAD software.

Recently, Newcombe et al [20] [21] introduced their novel

reconstruction system - Kinect Fusion, which fuses dense depth

data streamed from a Kinect into a single global implicit

surface model real-time. They use a volumetric representation

called the truncated signed distance function (TSDF) and

combine it with a fast ICP (Iterative Closest Point). The

TSDF representation is suitable to generate 3D CAD models.

In other words, in this approach the surface is extracted

beforehand (with the TSDF representation), then the classical

ICP-based registration approach is performed to generate the

full reconstruction. A commercial application released soon

after Kinectfusion is ReconstructMe [22]. This software is

based on the same principle of incrementally aligning a TSDF

from kinect data on a dedicated GPU. As this technology has

demonstrated to give promising results we use it in our system

as our main 3D model reconstruction algorithm. Specifically,

our implementation is based on an open source implementation

of KinectFusion found in the Point Cloud Library (PCL) [23].

C. 3D Printers

3D printing is an additive technology in which 3D objects

are created using layering techniques of different materials,

such as plastic, metal, etc. The first 3D printing technology

developed in the 1980’s was stereolithography (SLA) [24].

This technique uses an ultraviolet (UV) curable polymer resin

and an UV laser to build each layer one by one. Since then

other 3D printing technologies have been introduced. For

example, the polyjet technology which works like an inkjet

document printer, but instead of jetting drops of ink it jets

layers of liquid photopolymer and cures them with a UV

light [25]. Another 3D printing technology is fused deposition

modeling (FDM), based on material extrusion, a thermoplastic

material is heated up into semi-liquid state and extruded from

a computer-controlled print head [25]. This technology has

become specifically popular for commercial 3D printers.

III. FROM SENSE TO PRINT

In this section, we present our proposed architecture for

an automatic 3D printing from 3D sensing system (Fig. 1).

48984898

Fig. 1: System architecture: From sense to print

The first component of the system consists of a 3D sensor,

either embedded on a mobile device connected to the cloud

or a standalone sensing device with networking capabilities.

This sensing device streams 3D data to the cloud. The 3D

data is then fed to a 3D reconstruction algorithm found in a

dedicated 3D processing machine. Depending on the nature of

the object and the scans, segmentation and scaling algorithms

are applied accordingly. Once the 3D reconstruction is a

watertight mesh, it is converted to the compatible 3D printing

format (generally .stl file). This system emulates the process of

everyday document printing in an office environment, where

documents are generated in local machines then the print job

is sent to the closest printer available. Nevertheless, with 3D

objects this is not as straight-forward. In the following section

the details of a prototype implementation of this system are

described.

Following we provide a detailed description of our proto-

type implementation of the From Sense to Print system. To test

our proposed system architecture, we use the low-cost Kinect

sensor with a tablet for visualization purposes. For surface

reconstructions, we use an open-source implementation of the

KinectFusion algorithm and for 3D printing we use an FDM-

based desktop 3D printer Dimension 1200ES from Stratasys

[26].

A. 3D Sensing with Microsoft Kinect

We chose to use the Microsoft Kinect sensor as our 3D

sensing device for two reasons. Firstly, because it is a cheap

and accessible 3D sensor in the market and thus anyone

can reproduce our experiments. Secondly, because the chosen

reconstruction algorithm is tailored to be used for Kinect data.

The Kinect uses an infrared projector and CMOS image sensor

to reconstruct the 3D environment of the scene using light

coding. The principle of light coding is to project a known

pattern onto the scene. This pattern is then seen by the image

sensor, depth information is then acquired by estimating the

deformation of the pattern caused by the scene.

We have stripped down a Kinect and mounted it on a touch-

screen table (Samsung Galaxy Tab) (Fig. 2). The purpose of

this Kinect-tablet is to emulate mobile devices of the future

with embedded 3D sensing capabilities. The tablet is used to

visualize the data stream of the Kinect, in order to have a

hands-on experience of a freely moving hand-held mobile 3D

sensing device.

Fig. 2: 3D sensor with real-time visualization

B. 3D Model Reconstruction with KinectFusion

To reconstruct a 3D model from the 3D data obtained by a

freely moving hand-held sensing device, we use the reconstruc-

tion algorithm, KinectFusion introduced by Newcombe et al.

[20] [21]. As explained previously, it is based on incrementally

fusing consecutive frames of depth data into a 3D volumetric

representation of an implicit surface. This representation is the

truncated signed distance function (TSDF) [27]. The TSDF is

basically a 3D point cloud stored in GPU memory using a 3D

voxelized grid. The global TSDF is updated every time a new

depth image frame is acquired and the current camera pose

with respect to the global model is estimated. Initially, the

depth image from the Kinect sensor is smoothed out using

a bilateral filter [28], which up-samples the raw data and

fills the depth discontinuities. Then the camera pose of the

current depth image frame is estimated with respect to the

48994899

global model by applying a fast Iterative-Closest-Point (ICP)

algorithm between the currently filtered depth image and a

predicted surface model of the global TSDF extracted by ray

casting. Once the camera pose is estimated, the current depth

image is transformed into the coordinate system of the global

TSDF and updated. Following we will describe in detail the

camera pose estimation method and the global TSDF update

procedure.

1) Camera Pose Estimation: The principle of the ICP

algorithm is to find a data association between the subset of

the source points (Ps) and the subset of the target points (Pt)

[10], [11]. Let’s define a homogenous transformation T () of

a point in Ps (denoted as ps) with respect to a point in Pt

(denoted as pt) as

pt = T (ps) =

[R t0 1

]ps (1)

where R is a rotational matrix and t is a translational vector.

Thus, ICP can be formulated as

T ∗ = argminT

∑ps∈Ps

(T (ps)− pt)2

(2)

= argminT

∑‖T (Ps)− Pt‖2 (3)

In our implementation, we use a special variant of the ICP-

algorithm, the point-to-plane ICP [12]. It minimizes the error

along the surface normal of the target points nt, as in the

following equation:

T ∗ = argminT

∑ps∈Ps

‖nt · (T (ps)− pt)‖2 (4)

where nt · (T (ps)− pt) is the projection of (T (ps)− pt) onto

the sub-space spanned by the surface normal (nt).

2) Global TSDF Updating: After computing transforma-

tion T , the new depth image is transformed into the coordinate

system of the global TSDF by T (Ps). The global model is

represented in a voxelized 3D grid and integrated using a

simple weighted running average. For each voxel, we have a

value of signed distance for a specific voxel point x as d1 (x),d2 (x), · · · , dn (x) from n depth images (di) in a short time

interval. To fuse them, we define n weights w1 (x), w2 (x),· · · , wn (x). Thus, the weight corresponding point matching

can be written in the form

w∗n = argk

n−1∑k=1

‖WkDk −Dn‖2 (5)

where

Dk+1 =WkDk + wk+1dk+1

Wk + wk+1(6)

Wk+1 = Wk + wk+1 (7)

Dk+1 is the cumulative TSDF and Wk+1 is the weight function

after the integration of the current depth image frame. Further-

more, by truncating the update weights to a certain value Wα

a moving average reconstruction is obtained.

3) Meshing with Marching Cubes: The final global TSDF

can be converted to point cloud or polygon mesh representa-

tion. The polygon mesh is extracted by applying the marching

cubes algorithm to the voxelized grid representation of the 3D

reconstruction [18]. The marching cubes algorithm extracts a

polygon mesh by subdividing the points cloud or set of 3D

points into small cubes (voxels) and marching through each

of these cubes to set polygons that represent the isosurface

of the points lying within the cube. This results in a smooth

surface that approximates the isosurface of the voxelized grid

representation. In Fig. 3, a successful reconstruction of a

human head can be seen. We walked around the object closing

a 360◦ loop. The resulting polygon mesh can be used as CAD

model to virtualize the scanned object or replicate it with

a 3D printer. However, as can be seen in Fig. 3, this final

(a) 3D point cloud. (b) Mesh.

Fig. 3: 3D reconstructed human head model. (a) In point cloud

format and (b) as a polygon mesh.

3D reconstructed model includes portions of the table or the

environment that we do not want to print or virtualize. This

holds for scans of people as well. When we create scans of

humans and want to print a 3D statue or bust, we need to

manually trim the 3D models. This is an obstacle for trying

to create an automatic 3D sensing to printing system. Thus,

in the next subsection we present our proposed approach for

automatic 3D model post-processing which generates a ready-to-print polygon mesh by applying semantic segmentation

algorithms to the 3D point cloud of the reconstructed models.

C. Automatic 3D Model Processing for 3D Printing

This is one of the most important contributions in our

paper, since until now there has not been a system that can

generate ready-to-print 3D models of objects without the

manual intervention of a human. Our method is based on two

assumptions. The first one being that the scanned object, be

it an inanimate object or a human, is standing/sitting/lying on

a plane. The second assumption is that the 3D sensing device

must or approximately close a loop around this object. (In

other words the human driving the 3D sensor must walk/rotate

around the scanned object in a 360◦ fashion. The method can

be used inanimate objects lying on a table and humans, with

slight modifications between the two.

49004900

(a) Full reconstructed model with cam-era poses

(b) Top view of segmented table planeand camera poses

(c) Side view of segmented table planeand camera poses

Fig. 4: Result of applying RANSAC based planar model fitting to the scene. (a) Full 3D reconstruction (b) Top view and (c)

Side view of the table-top plane in green. Camera poses ci are represented by the multiple coordinate frames.

1) Segmentation for Inanimate Object Lying on a GroundPlane: The first step for segmenting the reconstructed object

(with the assumption that it lies on a table) is to find the table

top area where the object is located. We use a Random Sample

Consensus (RANSAC)-based method to iteratively estimate

parameters of the mathematical model of a plane from a set

of 3D points of the scene [29]. The mathematical model of a

plane is specified in the Hessian normal form as follows:

ax+ by + cz + d = 0 (8)

where a, b, c are the normalized coefficients of the x, y, zcoordinates of the plane’s normal and d is the Hessian com-

ponent of the plane’s equation. The largest fitted plane is

segmented from the point cloud, this plane represents the

object-supporting surface (ie. table or counter) of the scene.

Now that the plane of the table-top has been identified, we are

interested in extracting the set of points that lie on top of this

plane and below the maximum z-value of the set of camera

poses C w.r.t. the table plane. (Fig. 4)

In order to extract the points “on top” of the table without

removing any useful information, we transform the plane so

that its surface normal ni directions are parallel to the z-

axis and the plane is orthogonal to the z-axis and parallel

to the x-y plane of the world coordinate system. The surface

normals ni of every point pi ∈ Pplane of the plane have the

same orientation throughout the whole surface. To achieve the

transformed plane shown in Fig. 5, we need to find the unique

3D rotation R that will rotate the zdirection of the plane (ie.

the normal direction) into (0,0,1) (z-axis) and the ydirection of

the plane orthogonal to the z-axis. R is formulated as follows:

R =[Rx, Ry, Rz

]T(9)

where Rx is the rotation around the x-axis, Ry around the

y-axis and Rz around the z-axis. The zdirection of the plane

is obtained by using the normalized coefficients of the plane

(a) Inclined plane. (b) Transformed plane.

Fig. 5: Global coordinate axis transformation to table plane. (a)

Plane inclination with respect to the global coordinate system

(b) Transformed plane with normals direction parallel to the

z-axis and y-axis orthogonal to z-axis

model from Eq.8.

zdirection =[a b c

]T(10)

The ydirection of the plane is formulated as follows:

ydirection =[0 pmax(y)− pmin(y) pmin(z)− pmax(z)

]T(11)

where pmin and pmax are the minimum and maximum values

w.r.t. each axis. Once the zdirection and ydirection are defined

for the inclined plane, each individual rotation is estimated as

follows:

Rx =ydirection × zdirection||ydirection × zdirection||

Ry =zdirection ×Rx

||ydirection ×Rx||Rz =

zdirection||zdirection||

(12)

Finally, we construct a homogeneous transformation matrix

Tp = [R, t], where t = [0, 0, 0]T . The procedure to extract the

object from the full 3D reconstruction is listed in Algorithm

1.

49014901

Algorithm 1 Object Extraction Procedure

Input: T (rigid transformation), Pplane(point cloud of the segmentedplane), Pfull(full point cloud of reconstruction), C(tracked cameraposes from reconstruction)

Output: Pobject (point cloud representing the object on top of thetable)P ∗plane = T · Pplane

P ∗full = T · Pfull

C∗ = T · C3DPrism← construct3DPrism(P ∗plane)P ∗object ← extractPointswithinPrism(C∗, P ∗full, 3DPrism)Pobject = T−1 · P ∗plane

Initially, Pplane, Pfull and C are transformed by T so that

the plane is orthogonal to the z-axis. Then we create a 3D

prism between the convex hull of the Pplane and the convex

hull of the loop generated from the camera poses C. As seen

in Fig. 6, the face of the 3D prism is contracted from the

convex hull of the shape of the loop of camera poses projected

to the plane. This shape is then extruded in the negative Z

direction until it crosses the table plane. The points within this

3D bounding prism are extracted and transformed back to the

original world coordinate system, resulting in a point cloud

containing only the object on the table top.

(a) 3D prism for segmentation.

(b) Segmented 3D model (blue)

Fig. 6: Object extraction procedure. (a) Constructed 3D prism

which represents the scanned object (b) 3D model after seg-

mentation

2) Segmentation for humans: There are two ways of print-

ing a 3D human, either full body statue or bust. When printing

a full body statue, the previous algorithm holds. Since the

human is standing on the floor, this would be the ground

plane analogous to a table plane for an inanimate objects.

However, when trying to create a bust of a human, most of the

segmentation is done manually. Thus, we propose an automatic

segmentation for this 3D printing case as well. When scanning

a human to print a bust, we generally tend to concentrate on

the face and head, even though the human is standing or sitting

down. There is no ground plane in either case. The only spatial

knowledge we have about the scan is the camera poses and that

the human is sitting or standing upright. However, in order to

create our 3D prism for segmentation, as mentioned in the last

section, we create a virtual plane on top of the subject’s head.

The procedure is listed in Algorithm 2.

Algorithm 2 Segmentation for Human Bust

Input: Pfull(full point cloud of reconstruction), C(Tracked cameraposes from reconstruction)

Output: Phuman (point cloud representing the human bust)Ccentroid = compute3DCentroid(C)Headtop = nearestNeighbors(C, k, Pfull)Pplane = fitplane(Headtop)T = findP laneTransform(Pplane)P ∗plane = T · Pplane

P ∗full = T · Pfull

C∗ = T · C3DPrism← construct3DPrism(C∗, P ∗plane, offsethead)P ∗human ← extractPointswithinPrism(P ∗full, 3DPrism)Phuman = T−1 · P ∗plane

(a) Headtop, Pplane and Ccentroid

(b) 3D prism for segmentation.

Fig. 7: Segmentation for human bust. (a) Side view of top head

points (Headtop) virtual plane (Pplane), camera pose centroid

and (Ccentroid) (b) constructed 3D prism which represents the

human bust

As can be seen, the first four lines of the algorithm are the

only difference between this method and the segmentation for

objects on a table (Algorithm 1). This part of the algorithm is

where we create a virtual plane in order to construct the 3D

prism for segmentation. Initially, we compute the 3D centroid

of the camera poses. Then, we find the k nearest neighbors

49024902

(a) Real spatial dimensions ofreconstructed model.

(b) Scaled spatial dimen-sions to printer size.

Fig. 8: Automatic scaling of the reconstructed model to the

volume of a 3D printer. (a) Real dimensions (b) scaled dimen-

sions.

of the centroid with respect to the human reconstruction. A

reasonable value for k, can vary depending on the resolution of

the reconstruction and the total number of points. These points

(headtop) represent the closest set of points from the top of the

head of the human subject to the centroid of the camera poses.

Once headtop is obtained, we try to find a planar model as in

Section III-C1 and further estimate the planar transformation

matrix T for Pplane. The resulting 3D prism and human bust

segmentation can be seen in Fig. 7.

3) Scaling: In order to have a direct scaling of the world

to the 3D volume of a 3D printer we create a circle which

represents the closed loop of the measurements on an the x-y

plane, i.e. the approximate camera pose loop. The diameter

of this approximate loop (dloop) is obtained by computing the

maximum distance between axis from the camera poses:

dloop = argmax(||xmin − xmax||, ||ymin − ymax||) (13)

Since the maximum possible diameter of the scaled world in

the printer’s volume is the length (lvol) of the face of the model

base, the scaling factor (sf ) is computed as follows:

sf = lvol/dloop; (14)

The final segmented reconstructed model is then scaled by sf .

D. 3D Printing

After applying segmentation and scaling to the 3D point

cloud of the reconstructed model, we generate a polygon

mesh using the marching cubes algorithm, as described in

Section III-B3. We automatically send the resulting .stl file

to the machine on the network that is connected to the 3D

printer. This is the only step that until now needs human

intervention. Once a model of the printer has been imported

to the modeling software (Fig. 9), the layers as well as the

necessary support material are computed. However, even if we

automatize this procedure when sending the ready part to the

printer a confirmation button has to be pushed and a new model

base has to be loaded. If the printer manufacturers provide the

option of having a remote interface, we can fully accomplish

the automatic pipeline.

(a) 3D part for printing of a inan-imate human head.

(b) 3D part for printing of ahuman bust.

Fig. 9: Generated layers and support material for the final 3D

model. (a) Object (b) Human bust.

E. Experimental Validation

In this section, we present some initial results towards

automatic 3D printing from a 3D sensing device based on the

implementation of our proposed system. The 3D reconstruction

algorithm was computed on an laptop running Ubuntu 12.10

with an Intel Core i1-3720QM processor, 24GB of RAM and

an NVIDIA GeForce GTX 670M with 3GB GDDR5 VRAM.

Currently, the reconstruction algorithm runs in near real-time

(i.e. 15fps) and the processing for 3D printing takes approx-

imately 60 seconds (due to the marching cubes algorithm

on CPU). Thus, after recording the 3D data closing a loop

around the object or human a ready-to-print 3D CAD model

is generated in less than a minute. The 3D printing depends on

the complexity and size of the models, as well as the printer

itself (examples of successful prints can be seen in Fig. 10).

For example, the head model took approximately 7 hours to

print, whereas the human bust model takes approximately 12

hours to print.

IV. CONCLUSION AND FUTURE WORK

In this paper, we proposed the From Sense to Print system,

a system that can automatically generate ready-to-print 3D

CAD models of objects or humans from 3D reconstructions

using the low-cost Kinect sensor. Currently, the bottleneck

of such an automatic pipeline is the manual segmentation

or post-processing of the reconstructed objects. We proposed

an automatic segmentation algorithm that can be applied on

objects lying on table/ground and humans. We used semantic

information derived from the camera poses to automatically

scale the reconstruction to the size of the 3D printer. One of

the limitations of our present system is the lack of connectivity

that the 3D printer manufacturers provide, if we could send the

model directly to an interface embedded on the printer, it would

be more straight-forward process. Furthermore, the 3D prints

currently generated are full models, in order to save material

we could instead generate 3D shells. Thus we are currently

working on an automatic approach to generate the 3D shells

and create replicas in more expensive materials such as copper

or other metals.

49034903

(a) Real Object/human. (b) Reconstructed 3D model. (c) Segmented 3D model. (d) 3D printed model.

Fig. 10: Flow of a 3D printing from 3D sensing trial with an object lying on a table (top row) and a human head model (bottom

row).

REFERENCES

[1] Microsoft, “Kinect,” http://www.xbox.com/en-us/kinect, [Online; ac-cessed April 13, 2013].

[2] Materialise, “iMaterialise,” http://i.materialise.com, 2013, [Online; ac-cessed April 13, 2013].

[3] “Shapeways,” http://www.shapeways.com/, [Online; accessed April 13,2013].

[4] “Samsung depth camera sensor,” http://ow.ly/ghJ1A, [Online; accessedApril 13, 2013].

[5] B. Bilbrey, M. F. Culbert, D. I. Simon, R. DeVau, M. Sarwar, and D. S.Gere, “Image capture using three-dimensional reconstruction,” US Patent20 120 075 432, Mar. 29, 2012.

[6] Primesense, “Primesense 3D sensor,” http://www.primesense.com/, [On-line; accessed April 13, 2013].

[7] A. Nuchter, 3D Robotic Mapping: The Simultaneous Localization andMapping Problem with Six Degrees of Freedom, 1st ed., ser. SpringerTracks in Advanced Robotics. Springer Publishing Company, Incorpo-rated, 2009, vol. 52.

[8] C. S. Chen, Y. P. Hung, and J. B. Cheng, “A fast automatic method forregistration of partially-overlapping range images,” in Proceedings of theSixth International Conference on Computer Vision, 1998, pp. 242 –248.

[9] J. Feldmar and N. Ayache, “Rigid, affine and locally affine registration offree-form surfaces,” International Journal of Computer Vision, vol. 18,no. 2, pp. 99–119, May 1994.

[10] P. Besl and H. McKay, “A method for registration of 3-d shapes,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 14,no. 2, pp. 239–256, Feb. 1992.

[11] Y. Chen and G. Medioni, “Object modeling by registration of multiplerange images,” in Proceedings of IEEE International Conference onRobotics and Automation (ICRA), 1991, pp. 2724 –2729.

[12] Z. Zhang, “Iterative Point Matching for Registration of Free-FormCurves and Surfaces,” International Journal of Computer Vision, vol. 13,no. 2, pp. 119–152, October 1994.

[13] K. Pulli, “Multiview registration for large data sets,” in Proceedingsof the Second International Conference on 3-D Digital Imaging andModeling, 1999, pp. 160–168.

[14] K. Nishino and K. Ikeuchi, “Robust simultaneous registration of mul-tiple range images,” in Proceedings of the Fifth Asian Conference onComputer Vision (IACCV2002), 2002, pp. 454–461.

[15] T. Masuda, “Object shape modelling from multiple range images bymatching signed distance fields,” in Proceedings of the First Interna-tional Symposium on 3D Data Processing Visualization and Transmis-sion, 2002, pp. 439–448.

[16] T. Weise, T. Wismer, B. Leibe, and L. Van Gool, “In-hand scanningwith online loop closure,” in Proceedings of IEEE 12th InternationalConference on Computer Vision Workshops, 2009, pp. 1630 –1637.

[17] M. T. Dickerson, R. L. S. Drysdale, S. A. McElfresh, and E. Welzl,“Fast greedy triangulation algorithms,” in Proceedings of the 10th AnnualSymposium on Computational Geometry, 1994, pp. 211–220.

[18] W. E. Lorensen and H. E. Cline, “Marching cubes: A high resolution 3dsurface construction algorithm,” Computer Graphics, vol. 21, no. 4, pp.163–169, 1987.

[19] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface recon-struction,” in Proceedings of the Fourth Eurographics Symposium onGeometry processing (SGP ’06), 2006, pp. 61–70.

[20] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J.Davison, P. Kohli, J. Shotton, S. Hodges, and A. W. Fitzgibbon, “Kinect-fusion: Real-time dense surface mapping and tracking,” in Proceedingsof the 10th IEEE International Symposium on Mixed and AugmentedReality (ISMAR), 2011, pp. 127–136.

[21] S. Izadi, D. Kim, O. Hilliges, D. Molyneaux, R. A. Newcombe, P. Kohli,J. Shotton, S. Hodges, D. Freeman, A. J. Davison, and A. W. Fitzgib-bon, “Kinectfusion: real-time 3d reconstruction and interaction using amoving depth camera,” in Proceedings of the 24th ACM Symposium onUser Interface Software and Technology (UIST’11), 2011, pp. 559–568.

[22] PROFACTOR, “Reconstructme,” http://reconstructme.net/, [Online; ac-cessed April 13, 2013].

[23] R. Rusu and S. Cousins, “3d is here: Point cloud library (PCL),”in Proceedings of IEEE International Conference on Robotics andAutomation (ICRA), 2011, pp. 1–4.

[24] C. W. Hull, “Apparatus for production of three-dimensional objects bystereolithography,” US Patent US4 575 330 A, Mar. 11, 1986.

[25] Stratasys, “3D printing technologies,” http://www.stratasys.com/3d-printers/technology/, [Online; accessed April 13, 2013].

[26] ——, “Dimension 1200es,” http://www.stratasys.com/3d-printers/design-series/performance/dimension-1200es, [Online; accessed April13, 2013].

[27] B. Curless and M. Levoy, “A volumetric method for building complexmodels from range images,” in Proceedings of the 23rd Annual Con-ference on Computer Graphics and Interactive Techniques (SIGGRAPH’96), 1996, pp. 303–312.

[28] Q. X. Yang, “Recursive bilateral filtering,” in Proceedings of the 12thEuropean Conference on Computer Vision (ECCV), 2012, pp. 399–413.

[29] M. A. Fischler and R. C. Bolles, “Random sample consensus: Aparadigm for model fitting with applications to image analysis andautomated cartography,” Communications of the ACM, vol. 24, pp. 381–395, June 1981.

49044904

Education

From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices