Download pdf - Towards Robust Product Packing with a Minimalistic End

Towards Robust Product Packing with a Minimalistic End-Effector

Rahul Shome*, Wei N. Tang*, Changkyu Song, Chaitanya Mitash, Hristiyan Kourtev,Jingjin Yu, Abdeslam Boularias, and Kostas E. Bekris

Abstract— Advances in sensor technologies, object detectionalgorithms, planning frameworks and hardware designs havemotivated the deployment of robots in warehouse automation.A variety of such applications, like order fulfillment or packingtasks, require picking objects from unstructured piles andcarefully arranging them in bins or containers. Desirablesolutions need to be low-cost, easily deployable and controllable,making minimalistic hardware choices desirable. The challengein designing an effective solution to this problem relates toappropriately integrating multiple components, so as to achievea robust pipeline that minimizes failure conditions. The currentwork proposes a complete pipeline for solving such packingtasks, given access only to RGB-D data and a single robotarm with a vacuum-based end-effector, which is also used asa pushing finger. To achieve the desired level of robustness,three key manipulation primitives are identified, which takeadvantage of the environment and simple operations to suc-cessfully pack multiple cubic objects. The overall approachis demonstrated to be robust to execution and perceptionerrors. The impact of each manipulation primitive is evaluatedby considering different versions of the proposed pipeline,which incrementally introduce reasoning about object poses andcorrective manipulation actions.

I. INTRODUCTION

The past decade has witnessed a vast growth of intelligentrobotic solutions for logistics and warehouse automationtasks, with the Amazon Kiva mobile robotic fulfillmentsystem serving as a prime example [1]. Nevertheless, thecompletion of many tasks still rely on the use of repetitivehuman labor, such as for picking and packing of productsand building customer orders. In particular, tightly packingproducts that are picked from an unstructured pile, the focaltask of this work, still remains primarily manual, even thoughit is integral to warehouse automation and manufacturing.

Packing objects to fit in confined spaces, such as a smallshipping box as the one in Fig. 1, is highly challenging.It can be argued that is more difficult than clearing cluttersince packing requires placing objects in close vicinity toeach other, in an ordered manner and also to be well alignedwith the boundary of the container. This demands high levelsof accuracy from the perception component as well as themanipulation strategy. Indeed, surprisingly little prior workseems to exist that explicitly addresses this problem, let aloneusing a simple, suction-based end-effector.

The authors are affiliated with the Computer ScienceDept. of, Rutgers University, New Brunswick, NJ, USA.email:{jingjin.yu,abdeslam.boularias,kostas.bekris}@cs.rutgers.edu

*These authors have equally contributed to the work.The authors would like to acknowledge the support of NSF IIS:1617744

and JD-X Research and Development Center (RDC). Any opinions ex-pressed here or findings do not reflect those of the sponsor.

Fig. 1. The product packing problem for cuboid products: initial configu-ration (left), and achieved goal configuration (right).

To help narrow the application gap and enable the reliable,fully autonomous execution of such tasks, this system paper:

A. Proposes a complete hardware stack and an accom-panying software pipeline for developing and testing al-gorithmic solutions for robot-enabled packing tasks. Thehardware setup involves a single robotic arm as shown inFig. 2, which depends only on depth-imaging technologyto sense its environment. The result is a fully autonomousintegrated system for picking objects from unstructured pilesand placing them to satisfy a desirable packing arrangement,such as the one shown in Fig. 1.

B. Explores the use of a suction-based end-effector, whichis also treated as a pushing finger, for product packing. Theplacement of objects, given the packing objective, requiresthe vacuum-based end-effector to pick objects from specificorientations, which may not be immediately accessible. Nev-ertheless, the paper demonstrates that the end-effector canstill address such challenges in a reliable manner.

C. Develops and evaluates corrective prehensile andnon-prehensile manipulation primitives that increase thepipeline’s robustness despite uncertainty regarding the poseof the objects. The uncertainty arises from the end-effector’sminimalistic nature and the use of only visual input. A criti-cal aspect of the primitives is the intentional use of collisions,which exploit the inherent compliance of objects, the bins,and the end-effector for enhancing accuracy. Furthermore,the proposed primitives are tightly integrated with sensingto achieve in real-time:i) toppling of objects in the initial unstructured bin to

expose a desirable surface of the object for picking;ii) pulling an object towards its target placement while

pushing neighboring objects to further pack by operatingdirectly over point cloud data; and

iii) real-time monitoring of potential failures and correctivepushing to achieve tight packing.The evaluation uses the platform of Fig. 2. The

experiments execute the pipeline in real-world scenes andshow that the proposed manipulation primitives providerobustness despite the minimalism of the setup.

arX

iv:1

903.

0098

4v1

[cs

.RO

] 3

Mar

201

9

Fig. 2. Experiments are performed using a Kuka LBR iiwa arm equippedwith a suction-based end-effector and depth-sensing cameras SR300. Twobins are within the reachability region of the arm given overhand picks.

II. RELATED WORK

Picking Objects in Clutter Most efforts in robot pickingrelate to grasping with form- or force-closure using fin-gered hands [2]. While early works focused on standaloneobjects [3], recent efforts are oriented toward objects inclutter [4]–[6]. Either analytical or empirical [4], graspingtechniques typically identify points on object surfaces andhand configurations that allow grasping. Analytical methodsrely on mechanical and geometric object models to identifystable grasps. Empirical techniques rely on examples ofsuccessful or failed grasps to train a classifier to predict thesuccess probabilities of grasps on novel objects [7]. Analyt-ical methods can be applied in many setups but are proneto modeling errors. Empirical methods are model-free andefficient [8] but require a large number of data. The heuristicpicking strategy presented here inherits the properties ofanalytical methods while reducing computational burden.Bin Packing The 3D bin packing problem, which is NP-hard [9], [10], requires objects of different or similar volumesto be packed into a cubical bin. Most strategies search forε-optimal solutions via greedy algorithms [11]. While binpacking is studied extensively, to the best of the authors’knowledge there are few attempts to deploy bin packingsolutions on real robots, where inaccuracies in vision andcontrol are taken into account. Such inaccuracies have beenconsidered in the context of efforts relating to the AmazonRobotics Challenge [12] [13] [14] [15] [16] [17] but mostof these systems do not deal with bin packing. Most de-ployments of automatic packing use mechanical components,such as conveyor trays, that are specifically designed forcertain products [18], rendering them difficult to customizeand deploy. Industrial packing systems also assume thatthe objects are already mechanically sorted before packing.While this work makes the assumption that the objects havethe same dimensions, the setup is more challenging as theobjects are randomly thrown in the initial bin.Non-prehensile Manipulation Non-prehensile manipula-tion, such as pushing, has been shown to help graspingobjects in clutter [19], [20]. In these works, pushing is usedto reduce the uncertainty of a target object’s pose. Throughpushing, target objects move into graspable regions. Thiswork follows the same principle and relies on pushing actions

to counter the effects of inaccurate localization and pointcloud registration. The problem is different because pushingis used for placing instead of grasping objects. The proposedmethod also uses pushing actions for toppling picked objectsto change their orientations as well as to re-arrange mis-aligned objects. Other efforts have also considered pushing asa primitive in rearrangement tasks [21], [22]. The proposedsystem takes advantage of the compliance properties of theend-effector and leverages collisions with the environment toaccurately place objects or topple them. A closely related ap-proach [23] performs within-hand manipulation of a graspedobject by pushing it against its environment.6D Pose Estimation Recent efforts in 6D pose estimationuse deep learning. In particular, regression over quaternionsand centers can predict the rotations and centers of objects inimages [24]. An alternative first predicts 3D object coordi-nates, followed by a RANSAC scheme to predict the object’spose [25]. Similarly, geometric consistency has been usedto refine the predictions from learned models [26]. Otherapproaches use object segmentation to guide a global searchprocess for estimating 6D poses [27]–[29]. This work isbased on the authors’ prior effort [27], where uncertaintyover pixel labels returned by a convolutional neural networkis taken into account when registering 3D models into thepoint cloud. The approach used here is different from theprevious work as it can handle multiple instances of the samecategory of objects.

III. PROBLEM SETUP AND NOTATION

Consider a robotic arm in a workspace W with knownstatic obstacles, two bins Binit and Bgoal, as well as nmovable, uniform objects O = {o1, . . . , on} of known cubicdimensions. The bins Binit and Bgoal are static but alsocompliant bodies in known poses that define a cuboid volumein W where the objects can be placed.

A labeled arrangement A = {p1, . . . , pn} is an assignmentof poses pi ∈ SE(3) to each object oi. Initially, the objectsare in a start arrangement Ainit, where the objects are insideBinit in random but “stable” poses, i.e., the objects are stablyresting and not moving. The Ainit arrangement is not knowna-priori to the robot.

The objective is to move O to an unlabeled arrangementAgoal = {p1, . . . , pn}, which achieves a tight packing inBgoal. Agoal depends on the pose of Bgoal, its dimensionsand the object dimensions. The target arrangement and thepicking order is input for the proposed process. The targetarrangement maximizes the number of objects inside Bgoal,while the objects rest on a stable face, and minimizes theconvex hull of the volume the objects O occupy in W . Thisis typically a grid-like regular packing for cuboid objects.The unlabeled nature of Agoal means it is satisfied as longas one of the objects is placed at each target pose, i.e.,

∀pj ∈ Agoal : ∃ oi ∈ O so that D(pj , pi) < ε, (1)

where ε is a threshold for achieving the target poses; D(·, ·)is distance between object poses, which should consider the3-axis symmetry of the cubic objects. In particular, if two

Fig. 3. Left: Pipeline in terms of control, data flow (green lines) and failure handling (red lines). The blocks identify the modules of the system. Sensingreceives an RGBD image of Binit and object CAD models to return a grasp point. Based on the picking surface, the object is either transferred to Bgoal

or is handled by the Toppling module, which flips the object and places it back in Binit. When the object is transferred, a robust Placement moduleplaces the object at the target pose pi. The Packing module validates and corrects the placement to achieve tight packing. Right: a) Instance segmentation.b) Pose estimation and picking point selection are provided by sensing, c) Picking d) Toppling e) Placement and f) Packing.

poses result into an object occupying the same volume, thentheir distance is 0. For instance, rotation by π about thevertical axis for a stably resting cube on a flat surface resultsin distance 0. A popular distance metric for 6D poses is theADI metric [30]. The evaluation section will describe thedistance used in the experimental process for evaluating theerror of the final arrangement given point cloud data.

The arm has d joints that define the arm’s configurationspace Cfull ⊂ Rd, which has a collision-free subset Cfree.Valid arm motions correspond to a continuous C-space curveπ : [0, 1] → Cfree. The arm has an end-effector, such as asuction cup, for which discrete operations {pick, release}give rise to discrete modes: M = {transfer, transit}.No within hand manipulation operations are available. Thestate space of the arm is: X = Cfree × M. Sensing isused to reason about the current object poses. Overall, therobot operations involve (i) moving the joints, (ii) picking orreleasing objects and (iii) sensing.

The arm’s forward kinematics define a mapping FK :Cfull → SE(3), which provides the pose g ∈ SE(3)of the end-effector given q ∈ Cfull. The reachable taskspace defines the end-effector poses the arm can reachwithout collisions: T = {∀ q ∈ Cfree : FK(q) ∈SE(3)}. For the arm to pick oi at pi, it has to be that thearm’s end-effector pose g satisfies a binary output function:is pick feasible(oi, pi, g), where g ∈ T . For instance,the pose g of a suction cup must align it with at least oneof the surfaces of an object oi at pi. Then, it is possible todefine the set of end-effector poses, which allow to pick anobject at a specific pose:

G(oi, pi) = {g ∈ T : is pick feasible(oi, pi, g) = true}.

Assume the sets G(oi, pi) are non-empty for all objects Oand poses in Ainit or Agoal (and their vicinity inside binsBinit and Bgoal). Otherwise, the task is not feasible. Notethat it may be necessary to reconfigure the objects insidethe initial bin so as to pick them from the appropriate facebefore placing them. This is due to the lack of within-handmanipulation.

Given the above definitions, the task is to identify asequence of motions for the arm as well as end-effectorand sensing operations to transfer the objects O from theunknown initial stable arrangement Ainit inside Binit to atight, grid-based packing inside Bgoal defined by an unla-beled arrangement Agoal, so as to satisfy Eq. 1.

IV. SYSTEM COMPONENTS

This section describes critical components that influencethe design of the proposed pipeline:

a) Hardware Setup: The robot used in the setup isthe Kuka IIWA14 7-DoF arm (Fig. 2). A customized end-effector solution extrudes a cylindrical end-effector that endswith a compliant suction cup, to engage vacuum grasps.Two RealSense SR300 cameras are mounted on a frame andpointed to containers Binit and Bgoal from the other siderelative to the robot as Fig. 2 shows. While far from therobot, the cameras’ frame is statically attached to the robot’sbase such that calibration errors are minimized in estimatingthe camera poses in the robot’s coordinate system.

b) Workspace Design: Fig. 2 shows the setup designedfor the target task. The annular blue region represents thesubset of the reachable workspace that allowed for top-down picks with the robot’s end-effector. This region iscomputed by extensively calling an inverse kinematics (IK)solver for top-down picks with the end-effector. The IKsolutions indicate that the radial region between 40cm and70cm from the robot center maximizes reachability and IKsolution success given the setup. The bins (red rectangles) areplaced so that they lie inside the optimal reachable region.

c) Software: MoveIt! [31] is used for motion plan-ning. Most of the motions are performed using CartesianControl, which guides the arm using end-effector waypoints.Ensuring the motions occur in reachable parts of the spaceincreases the success of Cartesian Control, simplifies motionplanning and speeds-up execution. To decrease planningtime, motion between the bins is precomputed using the RRT∗

algorithm [32] and simply replayed at appropriate times.

V. PROPOSED SOLUTION

The proposed pipeline, described in Fig. 3, provides thesteps undertaken to perform the desired packing. The base-line steps correspond to:a) (Sensing): sense and select a target object oi to pick up;b) (Picking): execute action {pick};c) (Transfer): move oi to the next available target pose pj

and execute action {release}.The experimental section considers this baseline pipeline,where it is observed that it performs poorly due to mul-tiple sources of uncertainty, ranging from pose estimationand calibration errors, to object non-uniformity and objectinteraction with the bin and other objects, among others.These issue necessitate the introduction of remedies, whichactively reduce the impact of the uncertainty. To this end,3 manipulation primitives for a simplistic end-effector aredesigned to increase robustness and are integrated with theoverall architecture:i) Toppling;ii) Robust Placement; andiii) Corrective Packing.

A. Baseline: Pose Estimation and Picking

Given an RGB-D image of the source bin and a CADmodel of the object, the objective is to retrieve (oi, pi) suchthat it maximizes the chance of achieving target configurationpj , where D(pi, pj) ≤ ε. To achieve this, the image is passedthrough a MaskRCNN convolutional neural network [33],which is trained to perform segmentation and retrieve theset of object instances O. An image segment is ignoredif it has a number of pixels below a threshold. It is alsoignored, if MaskRCNN has small confidence that the seg-ment corresponds to the target object. Among the remainingsegments, instances oi ∈ O are arranged in a descendingorder of the mean global Z-coordinate of all the RGBD pixelsin the corresponding segment. Then, 6D pose estimation isperformed for the selected instance [34] [35].

If, given the detected 6D pose of the instance, the top-facing surface does not allow the placement of the object viaa top-down pick, the next segment instance is evaluated inorder of the mean global Z-coordinate. If no object revealsa top-facing surface, then the first object in terms of themaximum mean global Z-coordinate is chosen for picking.

For the selected object, a picking point, i.e., a point wherethe suction cup will be attached to the object, is computedover the set of points registered against the object model. Itutilizes a picking-score associated in a pre-processing stepwith each model point, which indicates the stability of thepick on the object’s surface. The score calculates the distanceto the center of the object mesh. A continuous neighborhoodof planar pickable points is required to make proper contactbetween the suction cup and the object surface. Thus, alocal search is performed around the best pick-score pointto maximize the pickable surface.

Fig. 4. Adaptive pushing: (left) The black border is the bin and the grayrectangle a previously placed object. The green rectangle represents thetarget pose for the current object. The light green boundary represents anε-expanded model that intersects the point cloud at the purple points. Thesepoints result in the black vector that pushes the object away from them.(right) A screen shot of a scene’s point cloud, where the white points arecollision points with previously placed objects and the red volume showsthe computed pre-push pose for the new object.

B. Toppling

The toppling primitive is invoked if there exists no objectthat exposes the desirable top-facing surface, or if the objectwas erroneously picked from the wrong face. The latter isdetected after the pick by performing pose estimation oncethe objet is attached to the suction cup. For instance this canhappen for the soaps shown in Figure 3 (right)(d), if the thinside is available for pick but the soap needs to be placed ontheir wider side. In this case, a toppling primitive is used toreorient the object.

Given a starting pose of an object pstart and a topplingaction of the arm, the object ends up at a new pose ptopple.The objective is to allow the existence of a pick g ∈G(oi, ptopple) so that there is a transfer action from pickg that achieves the final placement pend close enough to thedesired target placement pj , i.e., D(pend, p

j) ≤ ε. For theconsidered setup, this means that the top-facing surface ofthe object at ptopple and pj is the same.

The toppling module inspects the visible point cloud in thesource bin to identify the best toppling plane, which is suf-ficiently empty and flat. The accompanying implementationrestricts actions to ones that change the pose of the cubicobjects by shifting the most upward facing surface to onlyone of the adjacent surfaces. While this does not guaranteetoppling the object to all possible poses, the symmetry ofcubic products resolves this issue.

Prior work [36] has shown the efficacy of minimal end-effectors used in tandem with the environment to achievetoppling. In the previous work, the friction against a conveyorbelt is used to topple an object about a resting surface. Theconveyor belt’s motion is parallel to the initial resting surfaceplane. In the current setup, the compliance of the suction cupis used to emulate the same effect using a lateral motion onthe same plane as the top-surface along the direction of thedesired transformation between pstart and ptopple. Due tosymmetry, at least one neighboring surface allows top-downpicks, so a successful toppling action exists. Using the poseof the object, the lateral motion direction is executed oncethe object is placed on the detected plane, and the object isreleased during this action. Results show that this is highlyeffective in the target setup.

C. Point Cloud Driven Adaptive Pushing

Directly placing objects at the goal pose pi into bin Bgoalis prone to placement failures due to errors in perceptionas well as prior placements. This may result in damagingthe objects. A safer alternative is to drop the object froma certain height, right above the goal pose, so as to avoidpressing against previously erroneously placed objects. Still,however, this alternative results in low quality packing. Akey realization is that during placement, the object beingmanipulated will inevitably approach or even collide withother objects or the target container. To sidestep undesirablecollisions, an adaptive pushing primitive is developed, whichdirectly operates over point cloud data for the target bin.

The process is shown in Fig. 4. The adaptive pushingbegins by growing the object model at the target pose pi.Given the uncertainty value ε, the model is enlarged by 2εalong each dimension. The enlarged model is intersected withthe point cloud to retrieve collision points. A collision vectoris computed as a vector pointing from a collision point tothe center of the object model. Summing all collision vectorsyields the displacement vector. By iteratively moving theobject model along the unit displacement vector, where thedisplacement vector is recomputed after each movement, acollision-free pre-push pose for the object is obtained. Duringthe execution, the robot first moves the object above the pre-push pose, then lowers the object to the pre-push pose andfinally pushes the object to the target pose.

D. Fine Correction using Push and Pull Primitives

The final primitive deals with the remaining failure cases.Fine corrections are required because objects can be placedin incorrect poses due to unexpected collisions as wellas calibration and pose estimation errors. The proposedcorrective manipulation procedure continuously monitors thescene and triggers corrective actions whenever necessary.

PUSHPULL

�𝒑𝒑𝟏𝟏

�𝒑𝒑𝟐𝟐

�𝒑𝒑𝟒𝟒

�𝒑𝒑𝟓𝟓

�𝒑𝒑𝟔𝟔�𝒑𝒑𝟗𝟗

𝜷𝜷

Fig. 5. Fine layout adjustmentand correction.

The process first removesthe background, the box, therobot’s arm and end-effectorfrom the observed point cloudand computes its surfacenormals. The observed pointcloud is then comparedagainst the desired alignmentof the objects in their target poses. As shown in Fig. 5,two types of misalignment errors can occur. The firsttype occurs when the top surface normal of an objectis not perpendicular to the support surface. This erroris corrected by pushing the object along a direction andfor a distance computed based on its surface normal in amanner that makes it aligned with the support surface. Thesecond type of error happens when a peripheral object isnot entirely within the desired footprint of the pile. Theproposed procedure systematically detects pivot points thatare outside the desired footprint of the pile and pulls theirobjects inside accordingly. The correction is repeated untilthe point cloud is aligned with the desired goal poses givena threshold ε, or a timeout occurs.

VI. EVALUATION

To evaluate the performance of the proposed pipeline, ex-tensive experiments were executed on the Kuka platform ofFig. 2, to accurately represent the motivating complicationsthat arise in real-world setups. The experiments are designedto showcase the hardness of tight packing as well as thebenefits of adding robust environment-aware manipulationprimitives that aid in increasing success rate and accuracy.

For consistency, an identical version of the problem istested, with “dove soap bars” that are randomly thrown intothe source bin Binit, which is placed on one side of therobot’s reachable workspace (Fig 2). Only top-down graspsare allowed within a given alignment threshold. The startarrangement Ainit of objects is intended to reflect a randompile, with 10 repetitions of each experimental condition.The target bin Bgoal contains a 3 × 3 grid arrangementof 9 objects, on the same plane, with the stable face ofthe object targeted for placement. The complete pipelineuses a) corrective actions for fine adjustments, b) push-to-place actions for robust placement, c) toppling actions forincreasing successes, and d) pose estimation for adjusting theobject. The improvements introduced by these strategies areevaluated through the following comparison points, withinthe context of the proposed pipeline:V1 - Full pipeline: The complete pipeline with all theprimitives achieves the highest accuracy and success rate.V2 - No corrective actions: The experiment corresponds tothe use of V1 without the fine correction module of Fig. 5.V3 - No push-to-place actions: This version is V2 withoutthe use of the robust placement module (Fig. 4) that performspush actions to achieve robust placement.V4 - No toppling actions: These experiments used V2without considering toppling actions to deal with objects notexposing a valid top surface that allows the target placement.V5 - (Baseline) No push-to-place, toppling, pose-estimation: The naive baseline that solely uses a pose-unaware grasping module that reports locally graspablepoints and drops the grasped object at an end-effector poseraised from the center of the desired object position, with noadjustment in orientation.

The metrics evaluated include the fraction of successfulobject transfers that succeed in moving objects to the targetbin. The accuracy is captured in the threshold mentionedin Eq. (1) that is expressed in terms of a percentage ofunoccupied volume within the ideal target placement volume.This was measured with a voxel discretization sufficient toelucidate the difference between the methods. The averagedata recorded is reported in Fig. 6. This error measure isproportional to the accuracy. The points to note for everyversion are detailed as follows.

The low error for V1 corroborates the final bin placementevidence. On average 7 corrective actions per experimentwere invoked to achieve the high degree of accuracy.

The accuracy improvement obtained from corrective ac-tions is evaluated in V2. While this version succeeded indropping all the objects close to the correct target poses,

BestError rate(percentage)

WorstError rate(percentage)

0 38.060.1414.680

0.18 60.13100.038.062.85

V1 V3V2

V1

V4 V5

Fig. 6. (left)The final set of object poses in the target poses at the end of every experiment. Different column represents different versions. The top rowis the best case, and the bottom row is the worst case. (middle) the blue bar represents the fraction of successful object transfers, the orange bar representsthe percentage of unoccupied volume within the ideal target placement volume.(right)the blue bar represents the average number of correction actionshappened per experiment, the green bar represents the average number of toppling actions happened per experiment.

application use-cases where a higher degree of accuracy isdesired motivate the use of corrective actions. The integrationof the corrective actions was done with higher error thresholdduring intermediate steps, and a much finer one for the finaladjustment. Errors can typically arise from execution failureand pose misalignments. The less accurate these underlyingprocesses are, the more important corrective actions become.

V3 only performs adjustments using pose estimation, andtoppling. While, this is sufficient to successfully transferall the objects, any difference of accuracy to V2 would beintroduced by the lack of push-to-place actions. Here thereis complete reliance on the exactness of the execution andpose adjustments. Due to the proximity of adjacent objectsurfaces in the target grid arrangement, even minor errorsget aggravated. However, due to the ability to reason abouttoppling, all the objects can be transferred to the target bin,even with this low accuracy. This is demonstrated in theoccurrence of the failure to transfer all the objects.

In V4 any object that does not expose an permitted pickingsurface that makes the prehensile placement possible, is notpicked. Any instance of the source bin, which ends withno such objects results in no valid picking actions that canmake the approach proceed, and a failure is declared. Thecurrent behavior of V4 drops the object if it is mistakenlygrasped from the wrong surface. This can itself be used asa naive toppling primitive. It is important to note that theremight be other alternative strategies that can deal with thisfailure, but the intent of this comparison is to demonstratethe importance of having a deliberate toppling strategy inthe pipeline, that can change the object’s orientation in thecontext of random starting arrangements of the object. Onaverage, over V1, V2, and V3 the toppling primitive wasrequired 4 times per experiment. This highlights the necessityof this reasoning. Deliberate toppling however requires atleast one additional pick action. The number of pick attemptsper successful object transfer was 2.56 for V4, whereas, inV2 the same was 1.98. This indicates that toppling is indeednecessary both in terms of success and efficiency of actions.

Expectedly, V5 has the lowest accuracy. However, sincethere is no reasoning about the pick surface, every objectwas transfered to the space of the bin. This has no guaranteeto work if the object is larger. This drives the motivation forusing a set of robust primitives for the packing problem.

Overall, the time for the experiments show a trend of

increasing with the increasing complexity of the pipeline.The trade-off of accuracy versus time persists. On average,V1 ran for 945s while V5 ran for 323s.

Fig. 7. Result fortoothpaste and two-layer packing

Multi-layer packing: With a suffi-ciently tall bin, objects can be packedin multiple layers. Running the methodbeyond the first plane of the grid ef-fectively shifts all the operations to thehigher plane. This is demonstrated in astandalone run.Different objects: To validate the ap-plicability of the method to othercuboidal objects, V1 was performed fortoothpastes. Over 5 experiments, with 4objects, every run succeeded in placingthe object inside the bin.

VII. DISCUSSION

The proposed pipeline indicates intriguing nuances of thepacking problem. The use of a minimal, suction-based end-effector is a cost-effective, simple and relatively robust wayto pick objects but does not easily allow for complex graspreasoning, regrasping, or within-hand manipulation. Theproposed pipeline achieves high accuracy by leveraging thecompliance of the suction cup and the environment, while theobject is attached. It uses robust reasoning to incrementallycorrect errors, instead of compounding them. While it canbe argued that better baseline components can be developedto minimize uncertainty, the overall philosophy of robust,minimal, and compliant reasoning remains unchanged.

The proposed system and primitives can also deal withcubic objects with different sizes and can be extended tonon-cubic objects by adapting the object models. The keyadaptation corresponds to identifying an appropriate packingarrangement Agoal (potentially labeled in this case) in the tar-get bin and the corresponding picking order from the initialbin. Future work will also explore speeding up performanceand dealing with algorithmic challenges: the combinatorialreasoning over possible placements, physics-based reasoningto further improve pushing and toppling, as well as extendingto more adaptive end-effectors. The platform can also beutilized as a training testbed for reinforcement learning toautomatically discover robust primitives for solving packingtasks.

REFERENCES

[1] J. J. Enright and P. R. Wurman, “Optimization and coordinatedautonomy in mobile fulfillment systems,” in Proc. of the 9th AAAIConference on Automated Action Planning for Autonomous MobileRobots, 2011, pp. 33–38.

[2] A. Sahbani, S. El-Khoury, and P. Bidaud, “An overview of 3d objectgrasp synthesis algorithms,” Robot. Auton. Syst., vol. 60, no. 3, pp.326–336, Mar. 2012.

[3] K. Shimoga, “Robot grasp synthesis algorithms: A survey,” TheInternational Journal of Robotics Research, vol. 15, no. 3, pp. 230–266, 1996.

[4] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data-driven graspsynthesis: A survey,” Trans. Rob., vol. 30, no. 2, pp. 289–309, Apr.2014.

[5] A. Boularias, J. A. D. Bagnell, and A. T. Stentz, “Efficient optimizationfor autonomous robotic manipulation of natural objects,” in Proceed-ings of the Twenty-Eighth AAAI Conference on Artificial Intelligence(AAAI). AAAI, November 2014, pp. 2520–2526.

[6] A. Boularias, J. A. Bagnell, and A. Stentz, “Learning to manipulateunknown objects in clutter by reinforcement,” in Proceedings ofthe Twenty-Ninth AAAI Conference on Artificial Intelligence, ser.AAAI’15. AAAI Press, 2015, pp. 1336–1342. [Online]. Available:http://dl.acm.org/citation.cfm?id=2887007.2887192

[7] D. Morrison, J. Leitner, and P. Corke, “Closing the loop for roboticgrasping: A real-time, generative grasp synthesis approach,” in Pro-ceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania,June 2018.

[8] J. Mahler and K. Goldberg, “Learning deep policies for robot bin pick-ing by simulating robust grasping sequences,” in Proceedings of the 1stAnnual Conference on Robot Learning, ser. Proceedings of MachineLearning Research, S. Levine, V. Vanhoucke, and K. Goldberg, Eds.,vol. 78. PMLR, 13–15 Nov 2017, pp. 515–524.

[9] J. O. Berkey and P. Y. Wang, “Two-dimensional finite bin-packing algorithms,” Journal of the Operational Research Society,vol. 38, no. 5, pp. 423–429, May 1987. [Online]. Available:https://doi.org/10.1057/jors.1987.70

[10] S. Martello, D. Pisinger, and D. Vigo, “The three-dimensional binpacking problem,” Oper. Res., vol. 48, no. 2, pp. 256–267, Mar. 2000.[Online]. Available: http://dx.doi.org/10.1287/opre.48.2.256.12386

[11] S. Albers and M. Mitzenmacher, “Average-case analyses offirst fit and random fit bin packing,” in Proceedings of theNinth Annual ACM-SIAM Symposium on Discrete Algorithms, ser.SODA ’98. Philadelphia, PA, USA: Society for Industrial andApplied Mathematics, 1998, pp. 290–299. [Online]. Available:http://dl.acm.org/citation.cfm?id=314613.314718

[12] N. Correll, K. E. Bekris, D. Berenson, O. Brock, A. Causo, K. Hauser,K. Okada, A. Rodriguez, J. M. Romano, and P. R. Wurman, “Analysisand observations from the first amazon picking challenge,” IEEETransactions on Automation Science and Engineering, 2016.

[13] M. Schwarz, C. Lenz, G. M. Garcıa, S. Koo, A. S. Periyasamy,M. Schreiber, and S. Behnke, “Fast object learning and dual-armcoordination for cluttered stowing, picking, and packing,” in 2018IEEE International Conference on Robotics and Automation (ICRA).IEEE, 2018, pp. 3347–3354.

[14] D. Morrison, A. W. Tow, M. McTaggart, R. Smith, N. Kelly-Boxall,S. Wade-McCue, J. Erskine, R. Grinover, A. Gurman, T. Hunn,et al., “Cartman: The low-cost cartesian manipulator that won theamazon robotics challenge,” in 2018 IEEE International Conferenceon Robotics and Automation (ICRA). IEEE, 2018, pp. 7757–7764.

[15] A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza,D. Ma, O. Taylor, M. Liu, E. Romo, et al., “Robotic pick-and-placeof novel objects in clutter with multi-affordance grasping and cross-domain image matching,” in 2018 IEEE International Conference onRobotics and Automation (ICRA). IEEE, 2018, pp. 1–8.

[16] Z. Littlefield, S. Zhu, C. Kourtev, Z. Psarakis, R. Shome, A. Kimmel,A. Dobson, A. Ferreira De Souza, and K. E. Bekris, “Evaluating end-effector modalities for warehouse picking: A vacuum gripper vs a3-finger underactuated hand,” in CASE, 2016.

[17] C. Rennie, R. Shome, K. E. Bekris, and F. A. De Souza, “A datasetfor improved rgbd-based object detection and poe estimation forwarehouse pick-and-place,” IEEE Robotics and Automation Letters(RA-L), 2016.

[18] S. Hayashi, S. Yamamoto, S. Tsubota, Y. Ochiai, K. Kobayashi, J. Ka-mata, M. Kurita, H. Inazumi, and R. Peter, “Automation technologies

for strawberry harvesting and packing operations in japan 1,” Journalof Berry Research, vol. 4, no. 1, pp. 19–27, 2014.

[19] M. Dogar and S. Srinivasa, “A framework for push-grasping in clutter,”in Proceedings of Robotics: Science and Systems, Los Angeles, CA,USA, June 2011.

[20] M. Dogar and S. Srinivasa, “A planning framework for non-prehensilemanipulation under clutter and uncertainty,” Autonomous Robots,vol. 33, no. 3, pp. 217–236, Oct 2012.

[21] J. E. King, J. A. Haustein, S. S. Srinivasa, and T. Asfour, “Nonpre-hensile whole arm rearrangement planning on physics manifolds,” inRobotics and Automation (ICRA), 2015 IEEE International Conferenceon. IEEE, 2015, pp. 2508–2515.

[22] A. Cosgun, T. Hermans, V. Emeli, and M. Stilman, “Push planning forobject placement on cluttered table surfaces,” in Intelligent Robots andSystems (IROS), 2011 IEEE/RSJ International Conference on. IEEE,2011, pp. 4627–4632.

[23] N. Chavan-Dafle and A. Rodriguez, “Prehensile pushing: In-handmanipulation with push-primitives,” in Intelligent Robots and Systems(IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015,pp. 6215–6222.

[24] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: Aconvolutional neural network for 6d object pose estimation in clutteredscenes,” in Proceedings of Robotics: Science and Systems, Pittsburgh,Pennsylvania, June 2018.

[25] E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton, andC. Rother, “Learning 6d object pose estimation using 3d objectcoordinates,” in European conference on computer vision. Springer,2014, pp. 536–551.

[26] F. Michel, A. Kirillov, E. Brachmann, A. Krull, S. Gumhold,B. Savchynskyy, and C. Rother, “Global hypothesis generation for6d object pose estimation,” in Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017, pp. 462–471.

[27] C. Mitash, A. Boularias, and K. Bekris, “Robust 6d object pose estima-tion with stochastic congruent sets,” arXiv preprint arXiv:1805.06324,2018.

[28] V. Narayanan and M. Likhachev, “Discriminatively-guided deliberativeperception for pose estimation of multiple 3d object instances.” inRobotics: Science and Systems, 2016.

[29] C. Mitash, A. Boularias, and K. E. Bekris, “Improving 6d poseestimation of objects in clutter via physics-aware monte carlo treesearch,” arXiv preprint arXiv:1710.08577, 2017.

[30] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige,and N. Navab, “Model based training, detection and pose estimation oftexture-less 3d objects in heavily cluttered scenes,” in Asian conferenceon computer vision. Springer, 2012, pp. 548–562.

[31] S. Chitta, I. Sucan, and S. Cousins, “Moveit!” IEEE Robotics &Automation Magazine, vol. 19, no. 1, 2012.

[32] S. Karaman and E. Frazzoli, “Sampling-based Algorithms for OptimalMotion Planning,” IJRR, vol. 30, no. 7, June 2011.

[33] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” inProceedings of the IEEE international conference on computer vision,2017, pp. 2961–2969.

[34] C. Mitash, K. E. Bekris, and A. Boularias, “A self-supervised learningsystem for object detection using physics simulation and multi-viewpose estimation,” in IROS, 2017.

[35] C. Mitash, A. Boularias, and K. E. Bekris, “Robust 6d object poseestimation with stochastic congruent sets,” in British Machine VisionConference, 2018.

[36] K. M. Lynch, “Toppling manipulation,” in Proceedings 1999 IEEEInternational Conference on Robotics and Automation, vol. 4. IEEE,1999, pp. 2551–2557.

http://dl.acm.org/citation.cfm?id=2887007.2887192

https://doi.org/10.1057/jors.1987.70

http://dx.doi.org/10.1287/opre.48.2.256.12386

http://dl.acm.org/citation.cfm?id=314613.314718

http://arxiv.org/abs/1805.06324

http://arxiv.org/abs/1710.08577