XXIII Congreso Español de Informática Gráfica, CEIG2013

XXIII Congreso Español de

Informática Gráfica

CEIG 2013

Editores

Mª Carmen Juan y Diego Borro

ISBN: 978-84-695-8333-3

CEIG 2013

Conference Chair

Miguel A. Otaduy Universidad Rey Juan Carlos

Program Chairs

Diego Borro Ceit y Tecnun (Universidad de Navarra)M. Carmen Juan Universidad Politecnica de Valencia

Local Committee

Miguel A. Otaduy Universidad Rey Juan CarlosMarcos Novalbos Universidad Rey Juan CarlosJorge Gascon Universidad Rey Juan Carlos

Program Committee

Francisco Abad Universidad Politecnica de ValenciaAiert Amundarain Ceit - Centro de Estudios e Investigaciones

TecnicasCarlos Andujar Universitat Politecnica de CatalunyaDolors Ayala Vallespi Universitat Politecnica de CatalunyaImma Boada Universitat de GironaDiego Borro Ceit y Tecnun (Universidad de Navarra)Carles Bosch Universitat de GironaPere Brunet Universitat Politecnica de CatalunyaEva Cerezo Universidad de ZaragozaAntonio Chica Calaf Universitat Politecnica de CatalunyaMiguel Chover Universitat Jaume IFrancisco R. Feito Universidad de JaenJulian Flores Universidad de Santiago de CompostelaAlex Garcia-Alonso Universidad del Paıs VascoMarcos Garcıa Lorenzo Universidad Rey Juan CarlosDiego Gutierrez Universidad de ZaragozaJuan Jose Jimenez Del-gado

Universidad de Jaen

M. Carmen Juan Universidad Politecnica de ValenciaDomingo Martın Universidad de GranadaLuis Matey Munoz Ceit y Tecnun (Universidad de Navarra)Fco. Javier Melero Rus Universidad de GranadaRamon Molla Universidad Politecnica de Valencia

CEIG 2013

Adolfo Munoz Orbananos Universidad de ZaragozaMiguel A. Otaduy Universidad Rey Juan CarlosGustavo Patow Universitat de GironaFrancisco J. Perales Universitat de les Illes BalearsAnna Puig Puig Universitat de BarcelonaImmaculada Remolar Universitat Jaume IMateu Sbert Universitat de GironaRafael Jesus Segura Universidad de JaenFrancisco J. Seron Universidad de ZaragozaAntonio Susın Universitat Politecnica de CatalunyaJuan Carlos Torres Universidad de GranadaPere-Pau Vazquez Universitat Politecnica de CatalunyaRoberto Vivo Universidad Politecnica de Valencia

Revisores adicionales

Iker Aguinaga Ceit y Tecnun (Universidad de Navarra)Anton Bardera Universitat de GironaJose Iglesias Universidad de ZaragozaJuan Roberto Jimenez Universidad de JaenJuan Antonio Magallon Universidad de ZaragozaMilan Magdics Universitat de GironaJose Marıa Noguera Rozua Universidad de Jaen

CEIG 2013

Preface

This volume contains the proceedings of the XXIII Spanish Computer Graph-ics Conference, held in Madrid, on September 17-20, 2013. The goal of thisconference is to bring together the research results from the different Spanishgroups on a wide range of topics, from animation to rendering, proceduralmodeling, medicine or augmented reality. In addition to providing a placeto communicate new results, CEIG fosters interactions, and hopes to helpdefine new productive directions for research and applications. We received32 total submissions this year. Each submission was reviewed by at leastthree members of the International Program Committee or assigned exter-nal reviewers. Based on the reviews, we accepted 17 as full papers, 3 aseducational papers, 2 as posters (modality 1) and 7 as posters (modality2); scientific interest and innovation were the only selection criteria. All theaccepted papers are to be presented orally at the conference, grouped in 6sessions. Posters will be presented in one of these sessions. Each posterpresentation will be preceded by a FastForward session where the authorsbriefly presented their works. The regular papers cover a wide range oftopics. We have placed them in 5 sessions:

• Geometry and Levels of Detail

• Measurement and Visualization

• Vision and Imaging

• Animating Objects and Characters

• Games and Education

CEIG 2013 will also enjoy invited talks by renowned international re-searchers. This year, we are pleased to announce two highly successfulyoung researchers in Europe: Prof. Christian Theobalt from Saarland Uni-versity and the Max Planck Institute, and Prof. Niloy Mitra from UniversityCollege London. Incidentally, both are recipients of a 2013 ERC StartingGrant. Prof. Theobalts research lies at the crossroads of computer graphicsand computer vision, and in his talk he will discuss the capture, reconstruc-tion, and modification of reality in motion. Prof. Mitra, recipient also ofthe 2013 ACM SIGGRAPH Significant New Researcher award, is pushingthe boundaries of geometric analysis of shapes, 3D modeling techniques, andcomputational design tools.

New this year to CEIG, the program also includes activities to strengthenthe collaboration between academia and industry in Spain. The programfeatures two exciting roundtables. First, a selected group of entrepreneurswill share their experience in the creation of startup companies in the field of

CEIG 2013

computer graphics, ranging from videogame studios to technology providers.They will debate, among other topics, the role played by technological in-novation in the creation of their startups. Second, representatives from themajor computer graphics companies in Spain will debate about novel ways tocollaborate with academia, including both research and educational aspects.

At the conference, the best papers will be selected and invited to submitan extended version to the Computer Graphics Forum journal, based onboth the reviews and the presentations. All these papers will be reviewedagain to ensure that they contain a sufficiently large amount of new mate-rial not covered in the CEIG version. We would like to thank everybodyinvolved in organizing this conference, the authors of all submissions andthe International Program Committee members and the external reviewers.This year, a primary reviewer was assigned to each paper. This primaryreviewer was in charge of checking if the reviewers suggestions were consid-ered for the final paper version. It has been an honor to serve as GeneralChair and Program Chairs of CEIG 2013, and we hope we have met thehigh standards that the conference demands.

CEIG 2103 General ChairMiguel A. Otaduy (Universidad Rey Juan Carlos)

CEIG 2013 Program ChairsM. Carmen Juan (Universidad Politecnica de Valencia, Spain)Diego Borro (CEIT y Tecnun-Universidad de Navarra, Spain)

CEIG 2013

Table of Contents

Papers 1: Geometry and Levels of Detail

FractalHull: An incremental convex hull algorithm for 2D points . . . . . . 1

Manuel Garcıa and Alejandro Leon

EBP-Octree: An optimized bounding volume hierarchy for massive

polygonal models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Angel Aguilera Garcıa, Francisco Feito Higueruela and Francisco

Javier Melero Rus

NavMeshes with exact clearance for different character sizes . . . . . . . . . . . 21

Ramon Oliva and Nuria Pelechano

City-Level Level-of-Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Gonzalo Besuievsky and Gustavo A. Patow

Papers 2: Measurement and Visualization

Morpho-Volumetric measurement tools for abdominal distension

diagnose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Eva Monclus, Imanol Munoz-Pandiella, Pere-Pau Vazquez, Isabel

Navazo, Elisabeth Barba Orozco, Anna Accarino, Sergi Quiroga

and Fernando Azpiroz

Extending neuron simulation visualizations with haptic feedback . . . . . . 49

Laura Raya, Pablo Aguilar, Marcos Garcıa and Juan B. Hernando

Human-like Recognition of Straight Lines in Sketched Strokes . . . . . . . . . 57

Raquel Plumed, Pedro Company and Peter Varley

Measuring Surface Roughness on Cultural Heritage 3D models . . . . . . . 67

Luis Lopez, Juan Carlos Torres and German Arroyo

Papers 3: Vision and Imaging

A Study of Octocopters for 3D Digitization from Photographs in

Areas of Difficult Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

German Arroyo, Alejandro Rodrıguez and Juan Carlos Torres

Optimized generation of stereoscopic CGI films by 3D image warping . . 89

Jose Marıa Noguera, Antonio Rueda, Miguel A. Espada and

Maximo Martın

A Client-Server Architecture for the Interactive Inspection of

Segmented Volume Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Jordi Surinyac and Pere Brunet

CEIG 2013

Rendering Relativistic Effects in Transient Imaging. . . . . . . . . . . . . . . . . . . . 109

Adrian Jarabo, Belen Masıa, Andreas Velten, Christopher Barsi,

Ramesh Raskar and Diego Gutierrez

Papers 4: Animating Objects and Characters

Anisotropic Strain Limiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

Fernando Hernandez, Gabriel Cirio, Alvaro G. Perez and Miguel

A. Otaduy

Simulation of Hyperelastic Materials Using Energy Constraints . . . . . . . . 129

Jesus Perez, Alvaro G. Perez and Miguel A. Otaduy

An interactive graphical tool for dressing virtual bodies based on

mass-spring model, Verlet integration and raycasting . . . . . . . . . . . . . . . . . . 137

Jose Ignacio Blanco Cruzado, Jose Pascual Molina Masso,

Pascual Gonzalez, Jonatan Martınez Munoz and Arturo Simon

Garcıa Jimenez

Dynamic Footstep Planning for Multiple Characters . . . . . . . . . . . . . . . . . . . 147

Alejandro Beacco, Nuria Pelechano and Mubbasir Kapadia

Papers 5: Games and Education

An extensible framework for teaching Computer Graphics with Java

and OpenGL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Carlos Javier Ogayar, Juan Jose Jimenez Delgado and Jose Marıa

Noguera

Nuevo Grado en Diseno y Desarrollo de Videojuegos . . . . . . . . . . . . . . . . . . 167

Inmaculada Remolar, Cristina Rebollo and Miguel Chover

Graphics Systems in a Software Engineering Curriculum. . . . . . . . . . . . . . . 173

Fco. Javier Melero

A Computer-Based Learning Game for Studying History . . . . . . . . . . . . . . 177

Juan-Fernando Martın-Sanjose, M. Carmen Juan, Juan Cano and

M. Gimenez

CEIG 2013

Posters

Towards a 3D Cadastre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

Marıa Dolores Robles Ortega, Francisco Feito and Lidia Ortega

Illumination of large urban scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Marıa Dolores Robles Ortega, Juan Roberto Jimenez Perez and

Lidia Ortega

*Cages: A MultiLevel, MultiCage Based System for Mesh Deformation

Francisco Gonzalez Garcıa, Teresa Paradinas, Narcıs Coll and

Gustavo A. Patow

Customizable LoD for Procedural Architecture

Gonzalo Besuievsky and Gustavo A. Patow

Modeling and Estimation of Internal Friction in Cloth

Eder Miguel, Rasmus Tamstorf, Derek Bradley, Sara C.

Schvartzman, Thomaszewski Bernhard, Bernd Bickel, Wojciech

Matusik, Steve Marschner and Miguel A. Otaduy

Rendering Light Transport in Transient-State

Julio Marco, Adrian Jarabo, Wojciech Jarosz and Diego Gutierrez

Depicting Stylized Materials with Vector Shade Trees

Jorge Lopez-Moreno

What is the best interface to edit a light field? An Evaluation of

Interaction Paradigms for Light Field Editing

Adrian Jarabo, Belen Masıa, Adrien Bousseau, Fabio Pellacini and

Diego Gutierrez

Large-Scale Multilevel Fluid Simulation with Localized FLIP

Ivan Alduan Iniguez and Miguel A. Otaduy

CEIG 2013

Keynote: Prof. Dr. Christian Theobalt

Short bio

Christian Theobalt is a Professor of Compter Science and the head of theresearch group ”Graphics, Vision, & Video” at the Max-Planck-Institut fuerInformatik, Saarbruecken, Germany. From 2007 until 2009 he was a Visit-ing Assistant Professor in the Department of Computer Science at StanfordUniversity. He received his MSc degree in Artificial Intelligence from theUniversity of Edinburgh, Scotland, and his Diplom (MS) degree in Com-puter Science from Saarland University, in 2000 and 2001 respectively. From2001 to 2005 he was a researcher and PhD candidate in Hans-Peter Seidel’sComputer Graphics Group at MPI Informatik. In 2005, he received his PhD(Dr.-Ing.) from Saarland University and MPI.

Most of his research deals with algorithmic problems that lie on theboundary between the fields of Computer Vision and Computer Graphics,such as dynamic 3D scene reconstruction and marker-less motion capture,computer animation, appearance and reflectance modeling, machine learn-ing for graphics and vision, new sensors for 3D acquisition, advanced videoprocessing, as well as image- and physically-based rendering.

For his work, he received several awards including the Otto Hahn Medalof the Max-Planck Society in 2007, the EUROGRAPHICS Young ResearcherAward in 2009, and the German Pattern Recognition Award 2012. He isalso a Principal Investigator and a member of the Steering Committee ofthe Intel Visual Computing Institute in Saarbruecken.

CEIG 2013

Keynote: Dr. Niloy Mitra

Short bio

Dr. Mitra is a Reader in Geometric Modeling and Computer Graphics inthe Department of Computer Science, University College London (UCL).He was a Senior Lecturer at UCL from 2011-2012. Earlier, Dr. Mitra co-founded the Geometric Modeling and Scientific Visualization (GMSV) centerat KAUST and was an Assistant Professor at Indian Institute of Technol-ogy (IIT) Delhi. Before that Dr. Mitra was a postdoctoral scholar withProf. Helmut Pottmann at Technical University Vienna and received hisMS (2002) and PhD (Sept. 2006) in Electrical Engineering from StanfordUniversity under the guidance of Prof. Leonidas Guibas and Prof. MarcLevoy(associate advisor). He received his BS (advisor Prof. Prabir Biswas)from Indian Institute of Technology (IIT) Kharagpur.

Dr. Mitra’s research primarily centers around algorithmic issues in shapeunderstanding and geometry processing. He is equally interested in applyinganalysis findings (e.g., relations, constraints) to enable simple, smart, andcaptivating interaction possibilities, shape design, and design space explo-ration in general.

Dr. Mitra serves on the editorial board of Transactions on Graphics(TOG), Computer Graphics Forum (CGF), Visual Computer, and Com-puter & Graphics. He was the program cochair for Symposium on GeometryProcessing (SGP) 2012 and Shape Modeling International (SMI) 2011. He isalso the recipient of the 2013 ACM SIGGRAPH Significant New Researcheraward.

Sesion 1

Geometry and Levels of Detail

CEIG - Spanish Computer Graphics Conference (2013)M. Carmen Juan and Diego Borro (Editors)

FractalHull: Cálculo incremental de la envolvente convexa deun conjunto de puntos bidimensionales

M. García Sánchez1,2 and A. León1,2

1Grupo de Investigación en Informática Gráfica - Universidad de Granada 2Laboratorio de Realidad Virtual - Universidad de Granada

AbstractEn este artículo se propone una nueva forma de calcular la envolvente convexa de un conjunto de puntos bidimen-

sionales sin la necesidad de realizar un procesamiento previo. El algoritmo (FractalHull) se basa en una partición

incremental del espacio, la cual mejora la eficiencia de propuestas previas y aporta funcionalidad adicional. El

cálculo de la envolvente convexa se realiza en menos tiempo que los algoritmos actuales, y además, la nueva

representación mediante indexación espacial permite que, cuando se añadan puntos para calcular de nuevo la

envolvente, no sea necesario considerar todos los puntos de nuevo, sino que se base solamente en la envolvente

actual.

Por otro lado, y como parte de nuestra solución mejorada, se ha diseñado una generalización para el reequi-

librio de árboles AVL (árboles escogidos para representar la indexación espacial de la envolvente). Esta nueva

funcionalidad, no existente hasta el momento para ninguna estructura de árboles binarios de búsqueda, nos per-

mite hacer una poda completa de una rama del árbol manteniendo el equilibrio del mismo y con una eficiencia

logarítmica.

Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Computational Geometryand Object Modeling—Geometric, algorithms, languages, and systems

1. Introducción

Uno de los principales problemas en geometría compu-tacional es el cálculo de la envolvente convexa de un con-junto de puntos. La envolvente convexa es utilizada comobase por multitud de algoritmos en diversos campos de apli-cación: desde visión por ordenador hasta análisis de for-mas [Mar07] pasando por sistemas GIS [RSV01], detecciónde colisiones [MS97] o Ray Tracing [Fos07]. Por lo que esmuy importante calcularla de forma eficiente.

Hasta la fecha son muchas las soluciones propuestas pa-ra calcular envolventes convexas de conjuntos de puntos,tanto en el espacio bidimensional (2D) como tridimensi-nal (3D). Nuestro trabajo se centra en el cálculo de la en-volvente convexa de un conjunto de puntos 2D. Podemosestablecer una clasificación de los algoritmos que resuelvenel problema en base a si realizan una ordenación previa delos puntos o no. Para ello distinguiremos entre algoritmoscon preprocesamiento ( [Gra72], [And79]), y sin preproce-samiento ( [Jar73], [PH77], [KS86], [BDH96] y [Cha96]).Según esta clasificación, nuestra propuesta (FractalHull) se

encuentra incluido en el grupo de los algoritmos sin proce-samiento previo.

Dentro de los algoritmos con preprocesamiento destacanlos trabajos de Graham [Gra72] y Andrew [And79]. Grahamordena los puntos usando el ángulo que forman con un puntode referencia escogido arbitrariamente. Andrew en cambio,plantea una ordenación en cualquiera de las dos dimensionesespaciales x o y. En ambos algoritmos, el tiempo que se tardaen calcular la envolvente es mucho menor que el tiempo em-pleado en la ordenación de los puntos. No obstante, el tiem-po requerido por la ordenación es, en la mayoría de los casos,mucho mayor que el tiempo empleado por otras propuestasque no requieren preprocesamiento del conjunto de puntos.Como consecuencia, estas alternativas son buenas siempreque se disponga de un conjunto de puntos previamente orde-nado, pero no compensa utilizarlas si es necesario ordenar elconjunto de puntos, siendo preferible utilizar algún métodosin preprocesamiento.

Los algoritmos sin preprocesamiento son por lo generalalgoritmos iterativos. El algoritmo de Barber [BDH96] (co-múnmente conocido como QuickHull) y el de Kirkpatrick

c© The Eurographics Association 2013.

1

M. García & A. León / FractalHull

[KS86], calculan la envolvente mediante varias pasadas. Ca-da pasada detecta y añade un punto o un segmento que for-ma parte de la envolvente, y descarta un conjunto de pun-tos como puntos internos a la envolvente. Preparata [PH77]sin embargo propone un algoritmo divide y vencerás puro,basado en calcular envolventes parciales y posteriormentecombinarlas. En esta misma linea se encuentra el algorit-mo presentado por Chan [Cha96]. En este caso se fusionandos algoritmos: Uno con preprocesamiento que va calculan-do envolventes parciales, y otro sin preprocesamiento paracalcular la envolvente de las envolventes previamente calcu-ladas.

Algunas de las soluciones anteriormente mencionadas, yasean con o sin procesamiento previo, expresan la envolven-te convexa solución del algoritmo como dos envolventesparciales: La envolvente inferior y la envolvente superior( [And79], [KS86], [BDH96]). Llamamos envolvente infe-rior a la parte de la envolvente que une el punto de menorabscisa y el de mayor abscisa en sentido anti-horario. Analo-gamente, la envolvente superior es la parte de la envolventeque une el punto de menor abscisa y el de mayor abscisa ensentido horario. FractalHull comparte esta filosofía de tra-bajo, devolviendo como resultado final dos estructuras querepresentan ambas subenvolventes.

En cuanto a la representación, todos los algoritmos ante-riormente mencionados almacenan la envolvente como unasecuencia de puntos ordenados, a lo sumo dos (los algorit-mos que calculan la envolvente superior e inferior). Frac-talHull almacena las subenvolventes en dos árboles binariosde búsqueda (uno para la envolvente superior y otro para lainferior). La ordenación de los puntos se mantiene y puedeobtenerse haciendo un recorrido “inorden” del árbol. Graciasa esta indexación adicional, se consigue optimizar el cálcu-lo de la solución, y además, aporta una nueva funcionalidadque no presenta ninguno de los algoritmos previamente ci-tados: La incrementalidad. FractalHull es incremental en elsentido de que no requiere considerar el conjunto de pun-tos completo cuando se añaden nuevos puntos y se necesitarecalcular la nueva envolvente convexa.

Las principales aportaciones de nuestro trabajo son:

Un algoritmo sin preprocesamiento para calcular la envol-vente convexa de un conjunto de puntos 2D que mejorasignificativamente los tiempos de ejecución de propuestasprevias.Una nueva representación mediante árboles AVL de lassubenvolventes, inferior y superior, que hace que nuestrasolución sea incremental en el sentido anteriormente indi-cado.

El resto de este trabajo se estructura de la siguiente for-ma. En la primera sección se explica la idea de FractalHully como se comporta. En ella se podrán ver detalles de la es-tructura de datos, como se realizan las particiones espacialesy como se va actualizando la envolvente. La siguiente sec-ción se centra en los árboles AVL y la nueva funcionalidad

diseñada para este fin. El artículo continua con los experi-mentos realizados y finaliza con unas breves conclusiones ytrabajos futuros.

2. El método

Nuestro algoritmo subdivide el problema de calcular laenvolvente convexa de un conjunto de puntos 2D en dos sub-problemas: cálculo de la envolvente superior e inferior. En elartículo se explica como se calcula la envolvente inferior. Elcálculo de la envolvente superior es equivalente.

La idea fundamental del cálculo de la envolvente conve-xa estriba en que, dada una envolvente (envolvente actual),se escoge un punto del conjunto que no pertenece a dichaenvolvente y se realiza una comprobación. Si el punto estáenglobado por la envolvente actual se descarta el punto, encaso contrario, se actualiza la envolvente para que incluyaa dicho punto. Se procede de esta forma hasta que se hancomprobado todos los puntos del conjunto.

Algoritmo 1 Base de FractalHull

Entrada: Conjunto de puntos C= p1, . . . , pk con k ≥ 3Salida: E = Envolvente de C indexada en un árbol AVL

1: define xmin = punto de menor abscisa2: define xmax = punto de mayor abscisa3: inicializa E con xmin y xmax

4: para todo n ∈ C hacer5: define P = fragmento de E que hay que actualizar6: si Tamaño(P)> 0 entonces7: Actualizar P8: fin si9: fin para

10: devolver E

El coste computacional de FractalHull radica en el cálculode la inclusión de un punto n en la envolvente actual, y en elproceso de actualización de la envolvente si fuese necesario.

La inclusión de n se calcula mediante una partición espa-cial inducida por un punto, p, de la envolvente y se explica enla sección 2.1. Si n no está englobado por la envolvente ac-tual, es necesario añadir dicho punto. En muchos casos, trasla actualización de la envolvente, será necesario eliminar unconjunto de puntos consecutivos pi, pi+1, . . . , p j−1, p j de laenvolvente actual para mantener la convexidad. Este procesode actualización se detalla en la sección 2.3.

2.1. Partición espacial

La partición espacial inducida por un punto p de la envol-vente actual es la base de nuestro algoritmo. Gracias a estapartición se consigue el test de inclusión punto - polígonoconvexo de manera eficiente. p siempre es el punto centralde la envolvente parcial.

En primer lugar vamos a definir los términos que se van a


2


emplear en esta sección y en las siguientes para la explica-ción del método.

C: Conjunto de puntos sobre el que se va a calcular laenvolvente convexa.E: Secuencia de puntos ordenada e indexada que repre-senta la envolvente inferior del conjunto de puntos C.n: Punto a comprobar si es englobado o no por la envol-vente E.Envolvente parcial Pi j: Subconjunto conexo de E queempieza en el punto pi y finaliza en el punto p j.Punto central de Pi j: Punto que divide Pi j en dos envol-ventes parciales con igual número de componentes. En elcaso de no ser posible el número de componentes difierecomo máximo en uno.o: Punto de menor abscisa de una envolvente parcial. Seinicializa con el punto de menor abscida de C.f : Punto de mayor abscisa de una envolvente parcial. Seinicializa con el punto de mayor abscida de C.p: Punto perteneciente a E, que se usa como origen de lassemirectas que determinan la partición espacial. El puntop siempre es el punto central de la envolvente parcial quese esté procesando.a: Punto perteneciente a E e inmediatamente anterior a p.s: Punto perteneciente a E e inmediatamente posterior ap.Segmento: Segmento definido por dos puntos de E

(−−→pi p j). Dicho segmento define una recta dirigida con sen-tido desde pi hasta p j que divide el espacio en dos semi-espacios. Siendo i < j.Exterior de−−→pi p j: Sub-espacio que queda a la derecha delsegmento orientado −−→pi p j.Interior de −−→pi p j: Sub-espacio que queda a la izquierdadel segmento orientado −−→pi p j.

A continuación se enumera una serie de propiedades quecumple una envolvente convexa [PS85]. Estas permitirán de-mostrar como influye la posición de un punto dentro de cadauna de las cuatro zonas en las que dividimos el espacio.

Propiedad 1: Cualquier par de puntos consecutivos(pi, pi+1) de una envolvente convexa E define un seg-mento −−−→pi pi+1 que deja en el interior a todo el conjuntode puntos C. (Ver figura 1).Propiedad 2: Cualquier par de puntos (pi, p j) de una en-volvente convexa E define un segmento −−→pi p j que deja enel exterior todo punto que pertenezca a la envolvente par-cial Pi j . (Ver figura 2).Propiedad 3: Cualquier par de puntos (pi, p j) de una en-volvente convexa E define un segmento −−→pi p j que deja enel interior todo punto de E que no pertenezca a la envol-vente parcial Pi j (Ver figura 2).

Teniendo en cuenta las propiedades anteriormente defini-das, vamos a establecer la partición espacial en la que sebasa nuestro método. Describiremos los distintos subespa-cios y como afectan a un punto n que queremos comprobarsi está englobado o no por la envolvente convexa actual.

Ei

Ei+1

Figura 1: Ilustración gráfica de la Propiedad 1 de una en-

volvente convexa.

Ei

Ej

Figura 2: Ilustración gráfica de las propiedades 2 y 3 de

una envolvente convexa.

La partición espacial se basa en cuatro semi-rectas. Estassemi-rectas tienen su origen en un punto p de la envolventeactual y usan como segmento director: −→op, −→ap, −→ps y

−→p f (Ver

figura 3). En función del subespecio en el que se situe elpunto n se procede de la siguiente manera:

Zona A: Un punto n se encuentra en la zona A, si está enel interior del segmento −→op y en el interior del segmento−→p f .Debido a la propiedad 3, n no debe pertenecer a la suben-volvente Pop, por estar en el interior del segmento −→op. Delmismo modo no debe pertener a la subenvolvente Pp f , al

estar en la parte interior de−→p f . La unión de estas dos con-

clusiones indica que n no debe pertenecer a E. Por tanto,no es necesario modificar E y podemos concluir que laenvolvente actual engloba dicho punto.Zona B: Un punto n se encuentra en la zona B, si está en

ap

s

f

o

psap

oppf

Zona A

Zona C

Zona D

Zona B

Figura 3: Partición del espacio inducida por un punto p de

una envolvente convexa.


3


el exterior del segmento −→ap y en el exterior del segmento−→ps.La propiedad 1 indica que es obligatorio modificar la en-volvente ya que existe un punto que está en el exterior deun segmento definido por dos puntos consecutivos, ya sea−→ap o −→ps. Esta modificación implica añadir el punto n a laenvolvente actual, y posiblemente una actualización de laenvolvente para mantener la convexidad que se explica enla sección 2.3.Para concluir que es necesario actualizar la envolventebastaría con que el punto n estuviese en el exterior delsegmento −→ap o en el exterior del segmento −→ps. Pero se uti-lizan ambas condiciones para poder asegurar que al menoshay que eliminar el punto p.Zona C: Un punto n se encuentra en la zona C, si está enel exterior del segmento −→op y en el interior del segmento−→ps.En esta zona, n puede estar o no englobado por la envol-vente. Al encontrarse en el exterior de −→op y usando la pro-piedad 2, sabemos que si n formase parte de la envolventeestaría en el tramo Pop.Además por la forma convexa de la envolvente podemosasegurar que todo punto menor a p (en abscisas) que estéen el interior del segmento −→ps está en el interior del seg-mento

−→p f . Teniendo en cuenta este hecho y la propiedad

3 podemos afirmar que el punto n, si perteneciese a E, nopuede pertenecer a Pp f .Así pues, podemos concluir que si un punto está en la zonaC es necesario refinar si está o no incluido, pero solamentehay que explorar la zona Pop ya que sabemos con totalseguridad que no pertenecerá a la zona Pp f .Zona D: De forma similar al razonamiento establecidopara la zona C se plantea la ubicación de un punto n en lazona D. En este caso, no se explora en la zona Pop sino enla zona Pp f .

Siempre que un punto n se situe en las zonas A o B, noes necesario realizar ninguna búsqueda adicional. Sin em-bargo, si el punto se situa en las zonas C o D es necesariohacer una partición más exhaustiva, iterando sobre la mitadde la envolvente que se encuentre a la derecha (o izquierda)de la secuencia actual. La búsqueda finaliza cuando el punton está en una zona A o B. Si la secuencia de búsqueda no sepuede subdividir, por definición la partición espacial quedareducida únicamente a Zona A o Zona B, por tanto asegura-mos que la iteración finaliza. La figura 4 muestra el procesode particionamiento del espacio para un punto en concreto.

El algoritmo 2 muestra la idea completa de la particiónespacial y el comportamiento en cada uno de los casos de lapartición.

2.2. Representación

Todos los algoritmos hasta la fecha representan la envol-vente como una lista ordenada de puntos. En nuestro caso,es necesario disponer de otra estructura más avanzada que

n

1

2

34

5

6

7

Zona C

Iteración 1

Iteración 2

Iteración 3

Iteración 4

1 2 3 4 5 6 7

o

E

P

a p s f

n

1

2

34

5

6

7

Zona D

1 2 3 4 5 6 7

o

a

E

P

s fp

n

1

2

34

5

6

7

Zona C

1 5 6 7E

n

1

2

34

5

6

7

Insertar

1 4 5 6 7E

2 3 4

o

a

P

p

f

s

n2 3

o f

Figura 4: Partes en las que se subdivide el espacio en cada

iteración hasta que finalmente se inserta un punto sin nece-

sidad de actualizar.

Algoritmo 2 Pseudocódigo del cálculo necesario para de-terminar si un punto n es englobado o no por la envolventeconvexa utilizando nuestra partición espacial

Entrada: E = Envolvente parcial de Cn = Punto que deseamos testear

Salida: P = Subconjunto de E0 en caso de englobar al punto

1: si Tamaño de E == 2 entonces2: insertar p = entre o y f

3: devolver 04: fin si5: define p = punto central de (E)6: si n ∈ Zona A entonces devolver 07: si n ∈ Zona B entonces devolver E8: si n ∈ Zona C entonces9: devolver Engloba(Pop, n)

10: fin si11: si n ∈ Zona D entonces12: devolver Engloba(Pp f , n)13: fin si

nos permita acceder al punto central de una envolvente con-vexa (parcial o completa). Para ello se indexa la lista en unárbol binario, concretamente un árbol AVL [Sed84]. Comose observa en la figura 5, los puntos se indexan de formaconvencional. Todos los nodos que cuelgan a la izquierda deun nodo cualquiera, tienen una coordenada de abscisa me-nor que el padre, y los que cuelgan a la derecha tienen unaabscisa mayor. Por tanto, un nodo cualquiera que representaal punto p de la envolvente convexa, tiene a su izquierda la


4


envolvente parcial Pop y a su derecha la envolvente parcialPp f . Y precisamente los puntos centrales de cada una de lasenvolventes parciales son los hijos izquierdo y derecho de p.

A

B

C

F

J

LI

D

HE

K

G

A C E H K L

A

C

E

H

K

L

E

Figura 5: Representación tradicional de una envolvente en

la parte superior derecha, representación de la envolvente

que proporciona FractalHull en la parte inferior

Es importante destacar que cada nodo almacena (obvian-do variables inherentes a un árbol AVL: hijo derecho, hi-jo izquierdo, valor de balanceo) un puntero al punto al querepresenta y dos punteros adicionales al punto siguiente yanterior de la envolvente. Los dos últimos punteros no sonestrictamente necesarios, ya que este tipo de estructura nosofrece funciones para calcularlos. No obstante, al tratarse dedatos que se usarán constantemente hemos preferido asumirel coste de almacenamiento de dichos punteros, para ganaren tiempo de computación. Por lo tanto, usando la figura 5como ejemplo: El nodo C almacenaría la información delpunto C más un puntero al punto A como punto anterior yotro al punto E como siguiente.

La estructura AVL nos permite calcular la función de in-clusión de un punto en una envolvente y la actualización dela misma de forma eficiente.

2.3. Actualización de la envolvente

Cuando se analiza espacialmente un punto n respecto auna envolvente dada, puede que el punto esté ya engloba-do por dicha envolvente (Zona A), por tanto no es necesariohacer nada más. Puede que sea necesario únicamente aña-dir el nuevo punto a la envolvente (Criterio de parada). O,en el peor de los casos, será necesario añadir el nuevo pun-to y eliminar ciertos puntos consecutivos que pertenecen ala envolvente actual, y que debido a la nueva inclusión, yano deben formar parte de la envolvente final. En los dos pri-meros casos, no es necesario borrar puntos de la envolvente.Estos dos casos están previstos en la partición espacial. Peroexiste el tercer caso, en el cual si es necesario eliminar uno ovarios puntos de la envolvente. Esta actualización es la quese detalla a continuación.

Los puntos que hay que eliminar, debido a la forma con-vexa de la envolvente son siempre puntos contiguos. Estapropiedad se puede demostrar gracias a las propiedades ex-puestas anteriormente. Sabemos que el punto original y elpunto final nunca serán eliminados. Así pues, si clasifica-mos la envolvente total en puntos que hay que eliminar ypuntos que no, se pueden establecer 3 conjuntos de puntos

consecutivos. El primer conjunto tiene uno o varios puntosque no hay que eliminar (en este conjunto está incluido elpunto original); el segundo contiene cero (ningún punto) ovarios puntos que si hay que eliminar; y finalmente, el ter-cero tendrá uno o varios puntos que no hay que eliminar (eneste se incluye el punto final).

Antes de describir el método de eliminación de puntos, semuestra esquemáticamente en la figura 6 como se comportala partición espacial desde el punto de vista de la envolvente.Sabemos inicialmente que nunca será necesario eliminar elpunto o ni el punto f , ya que son el origen y el final de laenvolvente. El resto no está definido al inicio de la función.Si el punto a incluir n está en la zona A, sabemos que no seránecesario eliminar ningún punto de E. Si el punto a incluir seencuentra en la zona C o D sabemos con total seguridad queuna parte de E no será modificada, por esta razón no entraen la siguiente iteración (mitad izquierda o derecha según lazona C o D). Finalmente, si el punto está en la zona B, hayque eliminar mínimo la raíz de P y posiblemente algo más.En la parte inferior de la figura 6 se observa el fragmento aactualizar. Dicho fragmento tiene en su centro uno o variospuntos que hay que eliminar y en los extremos cero o variospuntos que se mantendrán.

Figura 6: Representación de los nodos a actualizar en fun-

ción de la zona en la que se situal el punto n

Las funciones de actualización tienen como objetivo bus-car los dos “puntos de inflexión” que separan la zona a elimi-nar de las zonas que se mantienen. Estos puntos están situa-dos: uno a la izquierda de la raiz de P y el otro a la derecha.Para conseguir este propósito, recordamos que el fragmentoP es un árbol binario, así que, un punto de inflexión se en-contrará recorriendo el árbol a través de su hijo izquierdo yel otro a través de su hijo derecho. A continuación se deta-lla el método de modificación del subárbol izquierdo ya queambos procesos son simétricos.


5


En primer lugar vamos a centrarnos en la correspondenciaentre la representación lineal y en árbol del fragmento (Par-te superior de la figura 7). Debido a que sólo puede haber3 conjuntos (NO-SI-NO), si fuese necesario eliminar el hijoizquierdo(hi) de p, sabemos que habría que eliminar todo elfragmento entre p y hi. Es decir, se borra hi y su rama de-recha, B, quedando como nuevo hijo izquierdo de p el hijoizquierdo de hi, A(derecha de la figura 7). Del mismo mo-do, si no hubiese que eliminar el hijo izquierdo (hi), se sabeque no es necesario comprobar A porque tampoco habría queeliminarlo. Así que, se desciende por la rama derecha y sevuelve a comprobar la raíz B (izquierda de la figura 7). Estaactualización finaliza al llegar al final del árbol.

Figura 7: Eliminación recursiva de una fracción de la en-

volvente

Algoritmo 3 Pseoudocódigo de ActualizaIzquierda

Entrada: P = Envolvente parcial de Cn = Punto que deseamos agregar

1: define p = raiz(P)2: si p = 0 entonces devolver 03: si p hay que eliminarlo entonces4: Borrar rama derecha de p5: raiz(P) = HijoIzquiedo(p)6: devolver ActualizaIzquierdo(P, n)7: si no8: devolver ActualizaIzquierdo(hijo izquierdo de P, n)9: fin si

2.4. Casos extremos

Nuestro método trabaja inicializando la envolvente conlos puntos de menor y mayor abscisa de C. Esto simplifica lapartición espacial de seis subespacios a cuatro, pero implicaun cálculo previo de dichos puntos. Además, el método que-da limitado una vez calculada la envolvente: No se puedenañadir puntos al conjunto inicial que estén fuera del rangoque definen ambos puntos, ya que éste no estaría preparadopara esos casos. Por estas razones hemos extendido el mé-todo para poder inicializar la envolvente E con dos puntos

aleatorios. Esta sección se encarga de explicar las modifica-ciones necesarias para dicha extensión.

Los casos extremos son dos: que el punto a añadir n estáa la izquierda del origen de la envolvente actual, o que estéa la derecha del punto final de la envolvente actual. Amboscasos de nuevo son homólogos.

La primera modificación hay que realizarla en el algorit-mo principal (Algoritmo 1), añadiendo los dos nuevos espa-cios (Algoritmo 4).

Las actualizaciones de los extremos de nuevo son simi-lares a las actualizaciones normales, aunque en este caso sidividimos E en conjuntos de puntos que hay que eliminar ypuntos que no. No obtenemos tres conjuntos, sino solamentedos. En el caso de una actualización por el extremo izquier-do tendríamos: uno o varios puntos que si hay que eliminar(entre ellos el punto origen actual) y uno o varios puntos queno hay que eliminar (entre ellos el punto final actual). Portanto, es necesario añadir dos nuevas funciones de actualiza-ción de extremos. Una de ellas se ilustra en la figura 8 y seexpresa en el pseudocódigo 5, la otra sería análoga.

Figura 8: Eliminación recursiva de una fracción de la en-

volvente

2.5. Complejidad

Una vez detallado el método, se puede estimar la com-plejidad del algoritmo. Observando el algoritmo 1 podemosasegurar que la complejidad de este método es lineal frenteal número de puntos del conjunto. Además, para cada pun-to hay que tener el cuenta la complejidad de la deteccióndel fragmento a actualizar (algoritmo 2) y en el caso de sernecesaria la actualización, la complejidad de la misma (al-goritmo 3).

La complejidad del algoritmo 1 es O(n) siendo n el nú-mero de puntos. La complejidad de los algoritmo (2 - 3) esO(log(k)) al tratarse de árboles equilibrados, siendo k el nú-mero de elementos del árbol. Por tanto, la complejidad to-tal sería O(nlog(k)). n se refiere al número de elementos delconjunto C y k el número de elementos de E.


6


Algoritmo 4 Base ampliada de FractalHull

Entrada: Conjunto de puntos C= p1, . . . , pk con k ≥ 3Salida: E = Envolvente de C indexada en un árbol AVL

1: define xmin = punto de menor abscisa2: define xmax = punto de mayor abscisa3: inicializa E con xmin y xmax

4: para todo n ∈ C hacer5: si n < origen de E entonces6: ActualizaExtremoIzquierda (E)7: agregar n a E8: si no pero n > fin de E entonces9: ActualizaExtremoDerecha (E)

10: agregar n a E11: en otro caso12: define P = fragmento de E que hay que actualizar13: si Tamaño(P)> 0 entonces14: Actualizar P15: fin si16: fin si17: fin para18: devolver E

Algoritmo 5 Pseudocódigo de ActualizaExtremoIzquierda

Entrada: E = Envolvente completa de Cn = Punto que deseamos agregar

1: define p = raiz(E)2: si p = 0 entonces devolver 03: si p hay que eliminarlo entonces4: Borrar rama izquierda de p5: raiz(E) = HijoDerecha(p)6: devolver ActualizaExtremoIzquierdo(E, n)7: si no8: devolver ActualizaExtremoIzquierdo(hijo izquierdo

de E, n)9: fin si

3. Árboles AVL

Como hemos visto, nuestro método utiliza el árbol AVLcomo la estructura de datos en la que se almacena la en-volvente. En principio da igual que tipo de árbol binario seutilice, pero es necesario que el árbol se mantenga equilibra-do o parcialmente equilibrado tras la inserción de un nodo,el borrado de un nodo y lo más importante el borrado de unarama completa.

Se han estudiado varias posibilidades de árboles bina-rios: BST, AVL [Sed84], Red-Black-tree [Bay72] y AA-tree [And93]. Ninguna de ellas tiene definida una funciónde eliminación completa de una rama, por lo que borrar unarama en todas ellas implica realizar una eliminación nodoa nodo, con su correspondiente reequilibrio entre borradossucesivos. Como la mayoría de las eliminaciones que se rea-lizan en nuestro método implican ramas completas, la solu-ción proporcionada por las estructuras no es recomendable

ya que el tiempo empleado para reequilibrar el árbol es ex-cesivo. Por tanto, nos planteamos ampliar las capacidades dela estructura de datos con una forma de realizar la elimina-ción de una rama y hacer un único reequilibrio global tras elborrado.

Esta tarea no ha sido sencilla, ya que los árboles red-black,o los AA-tree (que la comunidad acepta comúnmente comolos más eficientes) tienen unas restricciones que no permitennuestro propósito. Los árboles BST no están equilibrados,por tanto, no son óptimos para este caso. Sin embargo losárboles AVL, no tienen restricciones complejas y nos hanpermitido diseñar una generalización para el borrado de ra-mas.

3.1. Eliminación de ramas completas

La única restricción de un árbol AVL es: Para todo nododel árbol, la diferencia de profundidad entre el hijo izquierdoy el hijo derecho no puede nunca ser mayor que 1. Si algunode los nodos tiene una diferencia de profundidad entre sushijos de 2 o más, el árbol está desequilibrado y es necesarioequilibrarlo.

Hasta el momento, existen funciones de inserción y bo-rrado de un nodo que mantienen el árbol equilibrado. Estasfunciones se basan en que la inserción de un nodo aumentacomo mucho en 1 la profundidad de una rama, y el borradola disminuye como mucho en 1. Es decir, existen funcionesque reequilibran el árbol cuando la diferencia entre la pro-fundidad del hijo izquierdo y el hijo derecho es igual a 2.Sin embargo, no existen funciones que aseguren el equilibriodel árbol si la diferencia de profundidad es mayor que 2. Pa-ra este algoritmo se ha diseñado una función que es capaz dedevolver un árbol equilibrado al que se le ha eliminado unarama completa. Es necesario que el árbol antes de la podaesté equilibrado, de esta manera nos aseguramos que tras lapoda sólo se desequilibra como mucho un nodo.

Supongamos que un nodo del árbol se encuentra desequi-librado en un valor k hacia la derecha, es decir, si la profun-didad de su hijo izquierdo es n, la profundidad de su hijoderecho es n+ k siendo k ≥ 2. Existen tres casos diferentes(Figura 9):

1. Caso 1. El hijo derecho está equilibrado. Por tanto losdos hijos del hijo derecho (nietos del nodo desequilibra-do), tienen una profundidad de n+k−1. En este caso unarotación simple a la izquierda equilibra el nodo actual ydesequilibra el hijo izquierdo en una unidad menos queel desequilibrio que tenía el padre.

2. Caso 2. El hijo derecho está balanceado hacia la dere-cha. Por tanto los dos hijos del hijo derecho (nietos delnodo desequilibrado), tienen una profundidad de n+k−2(a la izquierda) y n+k−1 (a la derecha). En este caso unarotación simple a la izquierda equilibra el nodo actual ydesequilibra el hijo izquierdo en dos unidades menos queel desequilibrio que tenía el padre.


7


3. Caso 3. El hijo derecho está balanceado hacia la iz-quierda. Por tanto los dos hijos del hijo derecho (nie-tos del nodo desequilibrado), tienen una profundidad den+k−1 (a la izquierda) y n+k−2 (a la derecha). En es-te caso no se puede equilibrar el nodo padre. Este pro-blema se solucionaría, si el hijo derecho pasase de estarbalanceado hacia la izquierda, a estar balanceado hacia lade derecha. De este modo, en lugar de estar en el caso 3,estaríamos en el caso 2 que si tiene solución. Para haceresto mismo, se ha diseñado la función CambiarPeso quese explica a continuación.

De estos tres casos, dos de ellos convierten un desequili-brio k, en otro menor (k−1, o k−2). El tercer caso no tienesolución directa, pero se puede convertir en el segundo apli-cando sucesivamente la función CambiarPeso. Aplicandolas veces que sea necesario esta función vamos reduciendoel desequilibrio hasta llegar a un “desequilibrio” de 1 o 0,que se considera equilibrio.

Figura 9: Comportamiento de la función de equilibrio para

un desequilibrio de k hacia la derecha.

También puede darse que el desequilibrio inicial sea ha-cia la izquierda en lugar de hacia la derecha. La solución esanáloga.

3.2. Cambio de peso

La función cambio de peso tiene como objetivo hacer queun nodo balanceado hacia la derecha pase a estar balanceadohacia la izquierda. Es decir, que su rama más profunda pasade estar en un lado, a estar en el otro. De nuevo, supongamosque el nodo al que queremos cambiar el peso está balanceadohacia la derecha, es decir, si la profundidad del hijo izquierdoes n, la del hijo derecho es n+ 1. Existen también 3 casos(Figura 10):

1. Caso 1. El hijo derecho está equilibrado. Por tanto losdos hijos del hijo derecho (nietos del nodo balanceado),tienen una profundidad de n. En este caso una rotaciónsimple a la izquierda balancea el nodo actual hacia la iz-quierda y no desequilibra el árbol.

2. Caso 2. El hijo derecho está balanceado hacia la dere-cha. Por tanto los dos hijos del hijo derecho (nietos delnodo balanceado), tienen una profundidad de n− 1 (a laizquierda) y n (a la derecha). En este caso una rotaciónsimple a la izquierda balancea el nodo actual hacia la iz-quierda y no desequilibra el árbol.

3. El hijo derecho está balanceado hacia la izquierda.Por tanto los dos hijos del hijo derecho (nietos del nodobalanceado), tienen una profundidad de n (a la izquierda)y n− 1 (a la derecha). En este caso no se puede cam-biar el peso ya que una rotación simple a la izquierdadesequilibraría el árbol. Este problema se solucionaría denuevo, si el hijo derecho pasase de estar balanceado haciala izquierda, a estar balanceado hacia la de derecha. Deeste modo, en lugar de estar en el caso 3, estaríamos enel caso 2 que si tiene solución.

Figura 10: Comportamiento de la función para cambiar el

peso de un nodo

De nuevo tenemos tres casos. Los dos primeros con solu-ción inmediata, y el tercero es necesario aplicar previamentela función CambiarPeso para obtener una solución.

4. Experimentos

Para verificar la eficiencia de esta propuesta se ha reali-zado un estudio comparativo entre varios de los algoritmosexistentes para determinar los tiempos de ejecución. Este es-tudio ha consistido en la implementación de cada algoritmoy la realización de casos de prueba con conjuntos de puntosgenerados sintéticamente. Los algoritmos objeto de estudio


8


han sido: Jarvis, Graham Scan, Monotone Chain y Quick-Hull ( [Jar73], [Gra72], [And79] y [BDH96]).

Los estudios teóricos y los datos aportados por los autoresde trabajos previos indican que la eficiencia de estos algorit-mos depende principalmente de dos aspectos: El número depuntos sobre el que hay que calcular la envolvente convexa#C (cardinalidad de C), y el número de puntos que constitu-yen la envolvente #E (cardinalidad de E). Por tanto, se handiseñado casos de prueba que permiten estudiar la relaciónentre ambos parámetros.

4.1. Conjunto de pruebas

Para probar todos los algoritmos se ha diseñado un con-junto de prueba que tiene en cuenta los dos parámetros rele-vantes. Las nubes de puntos que se han utilizado son: polígo-nos regulares de #E puntos, más tantos puntos distribuidosaleatoriamente en su interior, hasta completar el valor de #Cdeseado. De esta manera nos aseguramos que la envolventetiene el tamaño deseado y el número de puntos del conjuntofinal conocido. A este tipo de nubes de puntos, los llamare-mos nubes poligonales. También se han diseñado otro tipode nubes de puntos a las que llamaremos nubes estrelladas,que consisten en distribuir uniformemente #C en una estrellade #E puntas. Estos conjuntos están diseñados precisamentepara buscar nubes de puntos presumiblemente cóncavas. Verfigura 11.

Figura 11: A la derecha nubes estrelladas, a la izquierda

nubes poligonales.

En la figura 12 se muestran los resultados de los algo-ritmos previos parar nubes poligonales. No se muestran re-sultados para nubes estrelladas, porque las diferencias entreunos y otros son inapreciables. Cada gráfica muestra el com-portamiento (en segundos) de cada uno de los algoritmos enfunción del número de puntos de la envolvente #E (ordena-da) y el número de puntos del conjunto #C (abscisa).

Analizando la figura 12 confirmamos los datos que apor-tan los autores de cada uno de los algoritmos. Graham y Mo-notone (a la derecha de la imagen), al tratarse de algoritmoscon ordenación previa, ofrecen tiempos que aumentan con#C, pero permanecen constante en variaciones de #E. Quick-Hull y Jarvis en cambio, obtienen un aumento de tiempo tan-to si aumenta el número de puntos de la envolvente, como elnúmero de puntos del conjunto. No obstante, si observamosla escala de tiempos, vemos como ante comportamientos si-milares, Quick tarda un máximo de tres segundos aproxima-damente, mientras Jarvis tarda 16s.

Figura 12: Resultados de tiempos de ejecución de los algo-

ritmos previos comparando #C y #E. Nótese que las escalas

son diferentes.

Tras analizar estos resultados hemos confirmado queQuickHull es el mejor algoritmo de los algoritmos testados.A partir de aquí, establecemos una comparación entre nues-tra propuesta y QuickHull. En este caso se ha utilizado elmismo conjunto de pruebas, pero se ha aumentado conside-rablemente el número de puntos, tanto de la envolvente co-mo global, llegando a un máximo de 30,000 puntos global y1,000 en la envolvente. Este cambio se ha realizado para quela diferencia de los resultados sea sustancial, y facilitar asíla comparación de los mismos. La figura 13 muestra el com-portamiento de ambos algoritmos, quedando de manifiestola mejora que presenta nuestra propuesta.

Por último, mostraremos los resultados fijando tanto elnúmero de puntos de la envolvente, como el número de pun-tos del conjunto. En la parte superior de la figura 14 se ha fi-jado #E a 100,000 mientras que #C varia desde 100,000 has-ta 1,000,000. En la parte inferior se ha fijado #C a 100,000mientras que #E varía desde 1,000 hasta 100,000. En am-bos casos se observa como FractalHull mejora a QuickHull.Además se puede ver como la mejora aumenta cuanto ma-yor es el número de puntos de la envolvente. Esto es graciasa que la envolvente en QuickHull se almacena de forma li-neal, mientras que en FractalHull la envolvente se almacenaen un árbol AVL por lo que las inserciones son de orden lo-garítmico.

5. Conclusiones y trabajos futuros

Se ha presentado una propuesta para calcular la envolven-te convexa de un conjunto de puntos bidimensionales basa-da en una partición espacial inducida por el punto centralde una envolvente (o subenvolvente) que mejora los tiemposempleados por las propuestas realizadas en trabajos previos,


9


Figura 13: Resultados de tiempos de ejecución para Frac-

talHull y QuickHull comparando #C y #E.

Figura 14: Resultados de tiempos para FractalHull y Quick-

Hull fijando tanto #E (imagen superior) como #C (imagen

inferior).

que son comúnmente aceptadas como las soluciones más efi-cientes.

Se ha generalizado la eliminación de nodos de una estruc-tura de datos concreta, AVL, para el TDA árbol binario, lacuál permite eliminar ramas completas y realizar un único

reequilibrado tras el borrado de todos los nodos de la rama.

Además, se mantiene una indexación espacial de la en-volvente convexa resultante. Ésta permite poder aumentar elconjunto de puntos sin necesidad de calcular la envolventedesde el principio. De esta manera si se añaden nuevos pun-

tos al conjunto, no es necesario procesar todos los puntos delconjunto inicial, sino que solamente se utiliza la envolventedel conjunto previo al incremento.

Como trabajo futuro estamos estudiando la generalizaciónde nuestro método de cálculo de la envolvente convexa alespacio tridimensional.

References

[And79] ANDREW A.: Another Efficient Algorithm for ConvexHulls in Two Dimensions. Information processing letters (1979).1, 2, 8

[And93] ANDERSSON A.: Balanced Search Trees Made Simple.In WADS ’93: Proceedings of the Third Workshop on Algorithms

and Data Structures (Aug. 1993), Springer-Verlag. 7

[Bay72] BAYER R.: Symmetric binary B-Trees: Data structureand maintenance algorithms. Acta informatica 1, 4 (1972), 290–306. 7

[BDH96] BARBER C. B., DOBKIN D. P., HUHDANPAA H.: Thequickhull algorithm for convex hulls. Transactions on Mathema-

tical Software (TOMS 22, 4 (1996). 1, 2, 8

[Cha96] CHAN T.: Optimal output-sensitive convex hull algo-rithms in two and three dimensions. Discrete & Computational

Geometry (1996). 1, 2

[Fos07] FOSCARI P.: The Realtime Raytracing Realm. ACM

Transactions on Graphics (2007). 1

[Gra72] GRAHAM R.: An efficient algorith for determining theconvex hull of a finite planar set. Information processing letters

(1972). 1, 8

[Jar73] JARVIS R.: On the identification of the convex hull of afinite set of points in the plane. Information processing letters

(1973). 1, 8

[KS86] KIRKPATRICK D. G., SEIDEL R.: The ultimate planarconvex hull algorithm. SIAM Journal on Computing 15, 1 (Feb.1986). 1, 2

[Mar07] MARTINSKY O.: Algorithmic and mathematical princi-ples of automatic number plate recognition systems. Brno Uni-

versity of Technology (2007). 1

[MS97] MEERAN S., SHARE A.: Optimum path planning usingconvex hull and local search heuristic algorithms. Mechatronics

7, 8 (1997), 737–756. 1

[PH77] PREPARATA F. P., HONG S. J.: Convex hulls of finitesets of points in two and three dimensions. Communications of

the ACM 20, 2 (Feb. 1977), 87–93. 1

[PS85] PREPARATA F. P., SHAMOS M. I.: Computational Geo-

metry: An Introduction (Monographs in Computer Science).Springer, Aug. 1985. 3

[RSV01] RIGAUX P., SCHOLL M., VOISARD A.: Spatial Data-

bases: With Application to GIS. The Morgan Kaufmann Series inData Management Systems. Elsevier Science, 2001. 1

[Sed84] SEDGEWICK R.: Algorithms. Algorithms (June 1984),199. 4, 7


10


EBP-Octree: An Optimized Bounding Volume Hierarchy for

Massive Polygonal Models

A. Aguilera1, F. Feito1 and F.J. Melero2

1Dpto. Informática, Universidad de Jaén2Dpto. Lenguajes y Sistemas Informáticos, Universidad de Granada

angel,[email protected], [email protected]

Abstract

This paper presents a data structure to efficiently handle a hierarchy of bounding volumes on massive polygonal

models, such as those obtained from 3D scanning devices. The Extended Bounding-Planes Octree (EBP-Octree) is

able to manage massive polygonal models by using a spatial indexation of the surface and storing a hierarchy of

bounding volumes composed of planes from the original surface. In this work we detail the geometric and design

criteria that have been considered in order to create, just once for each model, an out-of-core data structure that

will be dynamically loaded in run-time while traversing the octree in environments such as collision detections of

progressive transmission. Due to the applications that might use this data structure, the main goals are to obtain

a tight volume at each node and a fast transition among disk and main memory while loading and releasing the

octree branches.

Categories and Subject Descriptors (according to ACM CCS): Computer Graphics [I.3.5]: Computational Geometryand Object Modeling—

1. Introducción

Con la generalización del uso de escáneres 3D, es comúnla obtención de modelos con un gran detalle de representa-ción, formados por varias decenas de millones de polígonos.Trabajar con estos modelos de forma interactiva y en tiem-po real hace que necesitemos trabajar con ordenadores muypotentes que tengan una gran capacidad de cálculo y de me-moria principal, a pesar de que para la mayoría de las opera-ciones no es necesario tener cargado todo el modelo a su ma-yor resolución. Por esto en la literatura podemos encontrardiferentes métodos o técnicas para poder representar y visua-lizar modelos de forma progresiva que permiten una carga adiferente resolución del modelo dependiendo de una serie decriterios como puede ser el punto de vista del observador yla cercanía a cierta parte del modelo [Hop96] [Hop97] .

En este trabajo se presenta una extensión de la estructu-ra de jerarquía de volúmenes BP-Octree [MCT08], que seha modificado para que pueda trabajar con modelos de da-tos formados por varias decenas de millones de polígonos.La nueva jerarquía de volúmenes, denominada EBP-Octree(Extended Bounding Planes Octree) trabaja con datos de 64

bit, soportando árboles de hasta 20 niveles de profundidad.Además, se han optimizado los algoritmos de selección yconstrucción de los volúmenes envolventes, se ha definidouna estructura de archivos para la gestión de memoria exter-na durante la construcción y manipulación de la estructurade datos, y se ha realizado un estudio de tiempos y memoriafrente a volumen envuelto para justificar el criterio de selec-ción de planos utilizado.

El EBP-Octree se calcula una sola vez para cada modelopoligonal, y se almacena en una serie de archivos que per-miten cargar en memoria principal el árbol tantas veces co-mo se necesite, sin tener que volver a realizar los cálculos.Otra característica que incorpora el EBP-Octree es que no secarga completamente en memoria principal la estructura dedatos, sino que se realiza una gestión de los datos a modo decaché de forma que se carga en memoria sólo una pequeñaparte del modelo, y va realizándose un trasvase de memoriaa disco según van cambiando las necesidades que se tenganen un instante dado.


11

A. Aguilera, F. Feito & F.J. Melero / EBP-Octree: An Optimized Bounding Volume Hierarchy for Massive Polygonal Models

2. Trabajos previos

El trabajo que aquí presentamos se puede enmarcar en lagestión interactiva de grandes modelos. Los diferentes méto-dos o técnicas de representación de modelos que se puedenencontrar publicados se pueden clasificar según [Sha02] endos grupos, las representaciones implícitas y constructivaso las representaciones enumerativas y combinatoriales. Lasrepresentaciones implícitas y constructivas consisten en de-finir una función que nos clasifique un conjunto de puntosdel espacio indicando si estos puntos pertenecen o no al ob-jeto. Las representaciones enumerativas y combinatorialesproporcionan un serie de reglas que sirven para poder gene-rar un conjunto de puntos que pertenecen al modelo.

Dentro de las representaciones enumerativas y combina-toriales se encuentran un conjunto de técnicas que utilizanuna representación del modelo con diferentes niveles de de-talle (LOD) en su geometría, encontrándose técnicas que tie-nen almacenado para un mismo objeto diferentes niveles dedetalle [Cla76] utilizando en cada caso la que más convenga,u otras que adaptan el objeto dependiendo de la posición delobservador [Hop96] [Hop97] cargando a mayor resoluciónlas partes del mismo que están más próximas al observador.Otras técnicas para representar objetos sólidos consiste en ladescomposición espacial del objeto mediante estructuras dedatos jerárquicas [ABJN85] [SW83], como podría ser la deincluir el objeto en un voxel y subdividir este voxel inicial enocho subvoxel, dando esto una estructura jerárquica la cualse representa con un árbol de ocho hijos (octree).

Independientemente de la visualización de los objetosen la escena, nos interesa que el EBP-Octree sea una es-tructura que permita al usuario interaccionar en tiemporeal con el modelo representado e incluso ser capaces dedetectar colisiones no sólo en el contexto de un sistemade interacción háptica sino también en entornos de coli-siones entre varios modelos de alta resolución. Para elloexisten diferentes técnicas clásicas como puede ser utili-zar la jerarquía de volúmenes envolventes (AABB [Ben97],OOBB [GLM96], k-DOPs [KHM∗98], etc...), árboles de in-dexación espacial (octrees [SW83] , BSP-Trees [RLVN91],KD-Trees [Ben75]) o árboles que indexen jerárquicamentevolúmenes envolventes (SP-Octrees [MCT05], BP-Octrees[MCT08,MCT10] ).

En todos los trabajos anteriormente citados, las dimensio-nes de los modelos utilizados son bastante modestas. Ac-tualmente los trabajos que se desarrollan para trabajar conmodelos muy grandes están enfocados bien a la visualiza-ción [CGG∗04] [GM05] [YSGM04] o bien al cálculo dela detección de colisiones en entornos dinámicos [LGS∗09][VMTS10] [SPO10] [PKS10] [TMHT10] [BJ10], pero es-tos últimos trabajan con modelos que no llegan a los dosmillones y medio de polígonos. La única excepción a estatónica es el trabajo [YSLM04] que consigue manejar mode-los con un tamaño similar a los que trabaja el EBP-Octree, sibien los propios autores reconocen un gran número de falsos

positivos y su rendimiento no permitiría su uso en entornoshápticos.

2.1. El BP-Octree

Este trabajo constituye una ampliación y mejora del BP-Octree [MCT08], que consiste en un árbol octal con cuatrotipo de nodos: blancos y negros, al uso de los octrees clási-cos; grises, que almacenan un conjunto de planos cuya inter-sección de semiespacios interiores forma un volumen conve-xo que envuelve la geometría original del modelo; y hojas,que además del conjunto de planos envolventes contiene laparte de geometría original del modelo poligonal. En la fi-gura 1 se puede contemplar a la izquierda el nodo raíz de unBP-Octree, representando en un tono translúcido la envol-vente convexa, y a la derecha un nodo hoja con la geometríaoriginal en verde y en translúcido la envolvente que formanlos planos seleccionados.

En el BP-Octree se requiere que los planos que crean laenvolvente formen parte de la geometría original del modelo,y se utilizan bien en su posición original o desplazándolos unoffset hasta que engloban toda la geometría del nodo (en elcaso de nodos hoja) o todas las envolventes de los nodos hijo(en el caso de nodos grises). De esta forma se garantiza queconforme se asciende por el árbol, la envolvente de nivel n-1contiene completamente a todos los volúmenes envolventesde los nodos de nivel n.

(a) (b)

Figura 1: a). Volumen envolvente a nivel 0 de un BP-Octree.

b) Nodo hoja [Mel08]

3. Construcción del EBP-Octree

La idea fundamental del EBP-Octree es poder tratar mo-delos formados por varias decenas de millones de polígo-nos, montando una estructura de volúmenes envolventes de-finidos por los propios planos del modelo original. Esta es-tructura de datos será calculada y almacenada en disco unasola vez para cada modelo, de forma que los ficheros ge-nerados para cada modelo podrán ser cargados en memoriaprincipal de forma rápida en tiempo de ejecución, sin tenerque recalcular ninguno de los elementos que lo forman. Acontinuación se describen todos los pasos que se dan para


12


Figura 2: Octcode de 64 bits.

calcular y almacenar en disco esta estructura jerárquica deplanos envolventes. Al ser muchos de estos pasos comunesal BP-Octree, nos centraremos en aquellos aspectos novedo-sos, partiendo del hecho que el EBP-Octree es de 64 bits, yno de 32 como su predecesor.

3.1. Dimensionamiento del octree

El nodo raíz en nuestra estructura de datos es la caja en-volvente alineada a los ejes (AABB, [Ben97]) del modelo.Se realiza una construcción ascendente o bottom up del árbolpor lo que se realiza una pre-definición del máximo nivel delárbol. Éste se estimará en función del tamaño de los triángu-los del modelo original, de forma que en un nodo hoja del ár-bol se pueda contener un número significativo de triángulosque permita una rápida simplificación del modelo. El nivelmáximo del árbol será aquel cuyas celdas puedan contenercompletamente a un triángulo equilátero de longitud de aris-ta igual a la media de los lados de los triángulos del modelopoligonal. Para evitar recorrer los millones de polígonos quese manejan en el EBP-Octree, la media de la longitud de loslados se calcula tomando el 1% de los polígonos que for-man el modelo original, tomándolos de forma dispersa en elvector de caras del modelo original para evitar el sesgo quepueda producir la coherencia espacial de la muestra.

Cada uno de los nodos del árbol octal vendrá referenciadopor un octcode basado en los códigos Morton [Mor66]. Co-mo se ha comentado anteriormente, el uso de octcodes de 64bits permite alcanzar 20 niveles de profundidad y por tantomanejar modelos de una gran resolución. A modo de ejem-plo, un modelo digital de un campo de fútbol (100x65 m.)estaría representado al máximo nivel de detalle con celdasde 0.19mm de lado, algo actualmente inalcanzable por losdispositivos de captura 3D. Como se aprecia en la figura 2,los cinco bits más significativos se utilizan para almacenar elnivel al que pertenece el nodo, y los 57 bits menos significa-tivos indican la ruta desde el raíz hasta el nodo en cuestión.Este octcode se almacena como un entero largo.

3.2. Clasificación de los triángulos del modelo original y

creación de nodos hoja

En el BP-Octree la indexación espacial de los triángulosse realizaba íntegramente en memoria, ya que los modeloscon los que trabaja no eran muy grandes (<2M de polígo-nos). El trabajar con modelos formados por varias decenas

de millones de polígonos obliga al uso de estructuras auxi-liares en disco, mediante el uso de archivos temporales de200MB. El número de estos archivos dependerá del tamañodel modelo original.

La indexación de cada triángulo se realiza mediante elcálculo de los octcodes de los nodos hoja que son atrave-sados por dicho triángulo, correspondiendo cada octcode aun nodo hoja. Se determina la celda del árbol que contie-ne completamente al triangulo, y se realizan recursivamentetest de inclusión triángulo en caja para cada uno de los hijos,repitiendo recursivamente en caso de resultado positivo has-ta alcanzar el máximo nivel determinado para el árbol. Paracada uno de los nodo hoja alcanzados se almacena una tu-pla <octcode, idTriangulo>en un vector que, al alcanzar los200MB de espacio en memoria es ordenado por octcode yguardado en disco.

Esta ordenación permite obtener agrupados todos lostriángulos de un mismo nodo mediante la aplicación de unclásico algoritmo de mezcla, como se muestra en la figura3, de forma que se crea un nodo hoja por cada octcode quetenga asociados triángulos.

3.3. Creación de envolventes y nodos internos

La creación de los volúmenes envolventes sigue la filoso-fía del BP-Octree, si bien se han tenido que adaptar algorit-mos para poder gestionar la ingente cantidad de datos queahora se maneja. Uno de ellos es la clasificación de las es-quinas de los nodos hoja [Mel08] con respecto a la superficieoriginal del modelo (dentro o fuera), que ahora también serealiza out-of-core y en tres pasadas: las alineadas al eje X, acontinuación las alineadas al eje Y y finalmente las alineadascon el eje Z. Esta clasificación es necesaria pues se careceinicialmente de nodos negros o blancos en la estructura dehojas creadas.

Para cada una de las doce aristas de cada nodo hoja se tie-ne que calcular cuantas veces son cortadas por un polígonoy determinar la clasificación de los vértices de cada aristaen función del número y orientación de los cortes. Gracias aque estas aristas son comunes a varios nodos hoja, para aho-rrar espacio y tiempo de cálculo, se cambia el enfoque conrespecto a [Mel08] y se aplica un recorrido no por los nodoshoja sino por las aristas de éstos. En la figura 4 se ve comoel nodo1 y nodo2 comparten una arista, y es lógico trabajarsobre aristas en lugar de recorrer los nodos y calcular dosveces los puntos de corte común.

Una vez se tienen correctamente clasificadas las esquinasde todos los nodos hoja, se procede a la creación del volumenenvolvente para cada uno de ellos, con la misma filosofía queen el BP-Octree.

Aunque el BP-Octree ya ofrece un criterio de selección deplanos para la envolvente, mediante el uso del algoritmo k-

medianas [KR90] sobre el conjunto de planos del nodo, he-mos realizado un estudio comparativo con otras alternativas


13


Figura 3: Estructura de archivos temporales para indexación espacial.

para la selección del porcentaje de planos que nos queremosquedar:

Escoger el k%de planos que menos offset tengan aplicadopara formar parte de la envolvente.Escoger un k% de planos de forma aleatoria

Según se desprende de la gráficas de volumen ocupado(figura 7 y tabla 2), resulta claro que a mayor porcentaje deplanos utilizados en las envolventes mejor es el ajuste de és-tas al modelo original, si bien en contrapartida supone unmayor tiempo de cálculo y un mayor espacio en disco, tal ycomo se refleja en la tablas 4 y 3 respectivamente. Dado queel tiempo de construcción sólo ha de computarse una únicavez para la construcción del EBP-Octree, y la variación enlos tiempos de carga del modelo es inferior al 10% entre laselección aleatoria y el k-medianas, concluimos que el crite-rio de selección de planos por el agrupamiento o clustering

de los mismos según su orientación es el más adecuado porlas ventajas que proporciona.

Figura 4: Arista común a dos nodos, y sus puntos de corte.

En cuanto al valor del parámetro k, esto es, a cuantos pla-nos seleccionar, habrá que delegar en estudios posteriorescon aplicaciones prácticas (p.ej. la interacción háptica) ladeterminación del umbral óptimo, pues a priori no tenemosmás medida que el volumen envolvente.

Para el cálculo de las envolventes en los nodos internos,se sigue exactamente el mismo procedimiento del BP-Octree

clásico, de forma que la envolvente en cada nodo está for-mada por un subconjunto de los planos que forman las en-volventes en los nodos hijos, siguiendo el criterio de selec-ción anteriormente descrito. La diferencia fundamental conla aproximación del BP-Octree es el uso de memoria externapara realizar una gestión eficiente de los recursos.

4. Organización de ficheros EBP-Octree

Para poder trabajar con modelos formados por varias de-cenas de millones de polígonos, se crea una estructura deficheros que nos permita acceder a los datos deseados deforma eficiente. Por esto se utilizan siete ficheros para guar-dar toda la información necesaria del EBP-Octree. Esta divi-sión se realiza así para poder acceder a la información que sedesee directamente, ya que por ejemplo, los datos que se ne-cesitan para visualizar el modelo son diferentes a los que senecesitan para calcular un test de inclusión. Las extensionesde los ficheros que se utilizan son:

.vtx, donde se guarda la información de los vértices queforman los polígonos del modelo original..tgl, donde se almacena información de todos los polí-gonos del modelo. Cada línea del archivo es el número devértices que forman cada polígono y los desplazamientosen el fichero .vtx para leer las coordenadas de cada uno delos vértices..fcn, que es el fichero donde se guardan las normalesde los polígonos que forman parte del modelo poligonaloriginal..geo, que guarda para cada nodo hoja el número de trián-gulos que lo cortan y la posición de cada triángulo en elfichero .tgl..bvp, almacena la información de los vértices que for-man la envolvente calculada de los nodos. Más concre-tamente se guarda para cada plano: un offset del fichero.fcn con la posición de la normal del plano, el númerode vértices que forman el polígono de la envolvente defi-nido por dicho plano y por último las coordenadas de cadauno de estos vértices.


14


.bpl, donde se guarda la información de los planos se-leccionados para definir la envolvente. Se guarda el nú-mero de planos seleccionados y para cada plano, el pun-tero a la normal en el fichero .fcn y el offset que se leha aplicado al plano para que pueda formar parte de laenvolvente. También se guarda el número de planos queforman la envolvente recortada del nodo y un puntero ha-cia el fichero “.bvp” donde están almacenados los vérticesque forman los planos que se han obtenido tras recortar elnodo..oct, que es el archivo que almacena el octree en sí,guardando la información necesaria para poder montar elárbol. La información de los nodos que forman el árbol sealmacena por niveles, empezando por el nodo raíz y aca-bando con los nodos hoja. La información que se almace-na para los nodos interiores es diferente a la que se alma-cena para los nodos hoja. La información que se guardapara un nodo interior es:

• Dos bytes para saber el tipo de sus nodos hijos, si sonnegros, blancos, grises o nodos hoja (2 bits por hijo).

• Un puntero o desplazamiento, en este mismo fichero,hacia donde esta almacenado el primer nodo hijo.

• Otro puntero o desplazamiento dentro del archivo.bpl donde se almacenan consecutivamente los pla-nos que forman la envolvente en dicho nodo.

Para los nodos hoja, la información que se guarda es:

• Un puntero o desplazamiento en el archivo .geo ha-cia los triángulos reales del modelo que cortan a esenodo hoja.

• Otro puntero o desplazamiento dentro del archivo.bpl donde se almacenan consecutivamente los pla-nos que forman la envolvente en dicho nodo.

Toda esta estructura de archivos y su relación entre sí semuestra gráficamente en la figura 5.

5. Análisis de espacio ocupado y tiempo de construcción

En la tabla 1 mostramos el espacio ocupado en disco porel conjunto de archivos para el modelo Amazona de 25 mi-llones de triángulos para cada uno de los tres criterios de se-lección de planos anteriormente propuestos. Puede observar-se cómo el archivo mas grande es el que contiene los puntosde la envolvente en cada nodo, obtenidos a partir de la inter-sección de los planos entre sí. Este archivo sólo es necesariopara los cálculos de colisiones entre dos EBP-Octrees y pa-ra la visualización, por lo que podría descartarse en caso deusar la estructura para interacción háptica.

En cuanto al tiempo de construcción de toda la estructu-ra, podemos ver en la tabla 4 como en algunos modelos sepuede alcanzar casi una hora de procesamiento, la carga pos-terior se realiza en cualquier caso en menos de 10 segundos,por lo que es asumible el tiempo de cómputo ya que sólo serealizará una vez. Es destacable la sobrecarga que supone la

Archivo Aleatorio K-medianas K-medianas* Offset.blp 240.2 360.4 333.7 238.30.bpo 76.3 76.3 76.3 76.3.bvp 4505.6 5632.0 5427.2 4505.6.geo 164.9 164.9 164.9 164.9.oct 107.3 107.3 107.3 107.3.tgl 19.1 19.1 19.1 19.1.vtx 171.7 171.7 171.7 171.7

Total 5285.1 6531.7 6300.2 5283.2

Tabla 1: Espacio en disco de la estructura EBP-Octree en

MBytes. 40% de planos seleccionados salvo en la columna

marcada con *, que ha sido un 30%.

ejecución del algoritmo k-medianas, si bien su buena selec-ción de planos (representada gráficamente en la figura 10) senota especialmente en los niveles del árbol que se encuentransiempre en memoria, lo que redundará en una detección decolisiones con menos fallos de caché o en una visualizacióna bajo nivel de detalle más aproximada al modelo original.La diferencia que se produce entre el modelo Amazona yel modelo Moldura es debido al número de polígonos delmodelo original que cortan los nodos hoja: mientras que elEBP-Octree de la Amazona se ha forzado para que tenga ni-vel máximo del octree de 10, teniendo tan sólo 3 polígonosde media que cortan a los nodos hoja, mientras la Molduraa nivel máximo del octree de 10 tiene 22, lo que redunda enun mayor tiempo de cómputo del algoritmo k-medianas encada nodo, también se puede inferir que la Amazona podríaser gestionada en un árbol con un nivel máximo del octreemenor.

La diferencia fundamental entre el BP-Octree y EPB-Octree es que los algoritmos del primero no están pensadospara trabajar con modelos de datos muy grandes, mientrasque el EBP-Octree sí, pudiendo éste tratar modelos forma-dos por varias decenas de millones de polígonos. En estesentido, el EBP-Octree es del orden de unas 10 veces másrápido según los modelos con los que se ha podido compa-rar (tabla 4).

De los tres métodos de selección de planos el k-medianases el que nos genera un volumen envolvente mejor, aunquelos tiempos de construcción del EBP-Octree sean mayores.Para seleccionar qué porcentaje de planos envolventes pa-ra el k-medianas se hizo un estudio del volumen envolventegenerado para los diferentes modelos, tomando como por-centajes los valores comprendidos entre el 10% y el 40%,con un intervalo de 10. Como se ve en la tabla 2 para elmodelo de la Amazona cuanto mayor es el porcentaje me-jor es el ajuste, lo que visualmente se puede observar en lafigura 6. El problema que se genera al tomar un porcentajemayor de planos envolventes es que las envolventes son máscomplejas, necesitando un mayor número puntos para poderalmacenarlas, y es por esto por lo que los archivos que segeneran tienen un mayor tamaño (tabla 3). Observando las


15


Figura 5: Estructura de archivos para la gestión del EBP-Octree.

gráficas 8 y 7 se puede deducir que un porcentaje de selec-ción de planos envolventes optimo podría estar entre el 30 oel 40 por ciento.

(a)10% (b)20%

(c)30% (d)40%

Figura 6: Envolvente de nivel 6 generada seleccionando los

planos con el método de k-medianas y variando el porcenta-

je de planos seleccionado.

El volumen englobado en cada nivel del EBP-Octree, conrespecto al volumen original del modelo poligonal, nos per-mite obtener una métrica de ajuste de las envolventes a la

Nivel k-m 40% k-m 30% k-m 20% k-m 10%0 384,84% 540,02% 623,28% 619,99%1 301,92% 417,87% 523,44% 534,85%2 187,08% 244,53% 301,42% 318,92%3 137,75% 156,10% 188,38% 190,79%4 115,47% 122,85% 134,91% 136,17%5 106,78% 109,29% 113,42% 114,04%6 102,92% 103,74% 105,14% 105,39%7 101,34% 101,63% 102,04% 102,12%8 100,75% 100,84% 100,95% 100,96%9 100,52% 100,54% 100,57% 100,57%

10 100,44% 100,44% 100,45% 100,45%

Tabla 2: Volumen de la envolvente del EBP-Octree para el

modelo Amazona con respecto al volumen del modelo poli-

gonal original usando para la selección de planos el algo-

ritmo k-medianas y variando el porcentaje de planos selec-

cionados.


16


Modelo Tamaño Nivel AvgTri EBPO-Alea EBPO-Km EBPO-Km* EBPO-Offset BPOAmazona 25M 11 3 570.4 733.1 670.7 568.6 -Moldura 26M 11 22 2735.9 4161.5 3508.6 2697.0 -

Lucy 28M 11 14 2847.9 4142.0 3595.9 2845.0 -Armadillo 150K 7 22 15.1 23.4 19.9 15.0 283.7

Gárgola 1.7M 9 12 140.2 206.2 180.6 138.7 2658.8

Tabla 4: Tiempo de construcción de la estructura EBP-Octree en segundos. 40% de planos seleccionados salvo en la columna

marcada con *, que ha sido del 30%. Tamaño en triángulos. Nivel: Nivel máximo alcanzado en el EBPO. AvgTri: Media de

triángulos por nodo hoja. Modelos mostrados en la Figura 9.

superficie original. En la figura 10 se muestra el volumencontenido en cada nivel, y si bien los valores numéricos (ta-bla 5) a partir del nivel 6 descienden por debajo del 10% devolumen extra en todas las configuraciones, se puede apre-ciar como el algoritmo de k-medianas con el 30% de losplanos seleccionados alcanza mejor ajuste a partir de nivel 3que el aleatorio o el que usa el criterio del offset con un 40%de los planos candidatos.

El análisis de datos de espacio en disco, tiempo de cons-trucción y volumen englobante nos hace determinar que elcriterio predominante deberá ser el del volumen englobante,pues, como se aprecia en la figura 6, también un volumenmás ajustado produce una visualización más cercana al ori-ginal, con menos elementos distorsionadores.

Figura 7: Volumen de las envolventes hasta nivel 6 de Ama-

zona.

Archivo k-m 40% k-m 30% k-m 20% k-m 10%.bpl 360,4 333,7 315,5 314,4.bpo 76,3 76,3 76,3 76,3.bvp 5632 5427,2 5227,7 5219,5.geo 164,9 164,9 164,9 164,9.oct 107,3 107,3 107,3 107,3.tgl 19,1 19,1 19,1 19,1.vtx 171,7 171,7 171,7 171,7

Total 6531,7 6300,2 6082,4 6073,2

Tabla 3: Tamaño de los ficheros en MB generados con EBP-

Octree para el modelo Amazona.

Todos los datos se han obtenido tras ejecutar los progra-mas y los modelos en un ordenador personal con procesadori3-2310M a 2.1GHz, memoria principal 6 GB DDR3.

6. Carga dinámica del EBP-Octree

El proceso descrito en el apartado anterior sólo se tieneque realizar una sola vez para cada modelo. Una vez cons-truido el EBP-Octree, ya se puede cargar el modelo tantasveces como se quiera sin tener que realizar los cálculos pre-

Figura 8: Tamaño de los archivos generados con EPB-

Octree para Amazona.

Nivel Aleatorio K-medianas K-medianas* Offset0 472.18% 384.84% 540.02% 533.13%1 374.86% 301.92% 417.87% 423.41%2 230.47% 187.08% 244.53% 273.47%3 161.37% 137.75% 156.10% 178.84%4 127.32% 115.47% 122.85% 133.80%5 111.70% 106.78% 109.29% 113.56%6 104.88% 102.92% 103.74% 105.25%7 102.12% 101.34% 101.63% 102.14%8 101.04% 100.75% 100.84% 100.99%9 100.62% 100.52% 100.54% 100.58%

10 100.46% 100.44% 100.44% 100.45%

Tabla 5: Volumen del EBP-Octree del modelo Amazona con

respecto al volumen del modelo poligonal original. 40% de

planos seleccionados salvo en la columna marcada con *,

que ha sido del 30%.


17


vios. Basta con leer la información de los ficheros que ne-cesitemos, según se vaya a visualizar, a realizar test de in-clusión punto en solido, detección de colisiones o trazado derayos, por citar algunas de las aplicaciones.

Al trabajar con modelos formados por varias decenas demillones de polígonos, los datos que se necesitarían, porejemplo, para poder visualizar el modelo a su máxima re-solución, desbordarían la memoria principal. Para evitar esteproblema y viendo que en un instante dado sólo se utilizauna pequeña parte del modelo, el octree no se carga enteroen memoria, sino que en un instante dado sólo se mantie-nen en memoria los nodos de los primeros niveles y aquellossubárboles a mayor nivel que sean estrictamente necesarios,lo que se representa esquemáticamente en la figura 11. Sedefine nivel de corte como el nivel máximo del octree que semantiene en memoria principal. El nivel de corte del octreese calcula dependiendo del tamaño de la memoria principaldel ordenador donde se va a cargar el modelo, de forma queen un ordenador con mayor capacidad de memoria se carga-rán más niveles del octree que en otro con menos memoria.

Figura 9: Algunos modelos utilizados en las pruebas (Ama-

zona, Lucy, Moldura y Gárgola).

Figura 10: Volumen con respecto al modelo poligonal del

EBP-Octree de Moldura (superior) y Lucy (inferior).

Figura 11: Esquema de carga dinámica.


18


Figura 12: Carga dinámica de nivel de detalle.

El sistema de caché de subárboles resulta muy adecuadopara las aplicaciones como la visualización adaptativa o lainteracción háptica del modelo, donde es muy difícil que seproduzcan grandes saltos en el espacio y por tanto sea ne-cesaria una carga/descarga masiva de datos. En la figura 12se pueden apreciar en verde los nodos a máximo nivel deprofundidad, en rojo aquellos que están en un estado de pre-carga y en azul los nodos a nivel de corte del árbol (en estecaso nivel 5 para la Amazona y 4 para Lucy).

Las primeras pruebas realizadas de esta carga dinámicahan determinado que el árbol permanente en memoria ocupaentre 8MB y 12MB en función del algoritmo de selecciónde planos utilizado, y que cada subárbol de máximo nivel enmemoria supone unos 250KB, cantidades fácilmente asumi-bles incluso en dispositivos móviles.


Se ha presentado un método de construcción de una es-tructura de volúmenes envolventes espacialmente indexados(EBP-Octree) que es capaz de manejar en tiempo real mo-delos de varias decenas de millones de polígonos. Al ser elproceso de construcción del EBP-Octree independiente dela carga y posterior gestión del mismo, se puede generar laestructura de archivos en un ordenador más potente si hubie-se restricciones de tiempo, y después usar la estructura deficheros en ordenadores con menor potencia de cálculo.

También se ha descrito un método de carga dinámicaadaptativa de la jerarquía de volúmenes, de manera que seconserve en memoria una estructura que englobe alrededordel 110% del volumen original y se cargue a modo de ca-ché aquellos subárboles que estén siendo objeto de uso más

detallado (como en el caso de una interacción háptica o unavisualización adaptativa).

Entre los trabajos futuros se encuentran la paralelizaciónde la construcción del EBP-Octree, la aplicación de la je-rarquía de volúmenes envolventes al cálculo de inclusionespunto-en-sólido que serán directamente aplicables en inter-acción háptica, y la detección de colisiones entre dos o másobjetos de varias decenas de millones de polígonos. Asimis-mo, se está trabajando en la optimización del tamaño delfichero .bvp, pues hay mucha información redundante.

8. Agradecimientos

Agradecemos a Francisco Soler por su inestimable ayudaen la discusión y resolución de los problemas presentados enla implementación de la gestión de memoria externa, y a losrevisores que nos han proporcionado valiosos comentariospara la mejora y ampliación del trabajo.

Agradecimientos también al repositorio de modelos esca-neados 3D de Stanford por la utilización del modelo Lucy,al VCG-ISTI por el modelo de la Gárgola, y al Museo Mu-nicipal de Ecija por el modelo de La Amazona Herida quees propiedad de la Junta de Andalucía.

Este trabajo ha sido parcialmente financiado por el Minis-terio de Economía y Competitividad y la Unión Europea (através de los fondos FEDER) a través del proyecto de inves-tigación TIN2011-25259.

Bibliografía

[ABJN85] AYALA D., BRUNET P., JOAN R., NAVAZO I.: Objectrepresentation by means of nonminimal division of quadtrees andoctrees. ACM Transaction on Graphics 4, 1 (1985). 2

[Ben75] BENTLEY J. L.: Multidimensional binary search treesused for associative searching. Commun. ACM 18, 9 (1975), 509–517. doi:http://doi.acm.org/10.1145/361002.

361007. 2

[Ben97] BENGEN V. D.: Efficient collision detection of complexdeformable models using aabb trees. Journal of Graphics Tools2, 4 (1997), 1–13. 2, 3

[BJ10] BARBIC J., JAMES D. L.: Subspace self-collision culling.SIGGRAPH 10 ACM SIGGRAPH 2010 papers (2010). 2

[CGG∗04] CIGNONI P., GANOVELLI F., GOBBETTI E., MAR-TON F., PONCHIO F., SCOPIGNO R.: Adaptive tetrapuzzles: effi-cient out-of-core construction and visualization of gigantic mul-tiresolution polygonal models. ACM Transaction on Graphics

23, 3 (2004), 796–803. doi:http://doi.acm.org/10.

1145/1015706.1015802. 2

[Cla76] CLARK J. H.: Hierarchical geometric models for vi-sible surface algorithms. Commun. ACM 19, 10 (1976), 547–554. doi:http://doi.acm.org/10.1145/360349.

360354. 2

[GLM96] GOTTSCHALK S., LIN M., MANOCHA D.: Obb-tree:A hierarchical structure for rapid interference detection. Procee-dings SIGGRAPH’96 (1996), 171–180. 2

[GM05] GOBBETTI E., MARTON F.: Far voxels: a multireso-lution framework for interactive rendering of huge complex 3d


19


models on commodity graphics platforms. ACM Trans. Graph.

24, 3 (2005), 878–885. doi:http://doi.acm.org/10.

1145/1073204.1073277. 2

[Hop96] HOPPE H.: Progressive meshes. In SIGGRAPH ’96:

Proceedings of the 23rd annual conference on Computer grap-

hics and interactive techniques (New York, NY, USA, 1996),ACM Press, pp. 99–108. 1, 2

[Hop97] HOPPE H.: View-dependent refinement of progressi-ve meshes. Computer Graphics 31, Annual Conference Series(1997), 189–198. 1, 2

[KHM∗98] KLOSOWSKI J., HELD M., MITCHELL J., SOWIZ-RAL H., ZIKAN K.: Efficient collision detection using boundingvolume hierarchies of k-dops. IEEE transactions on Visualiza-

tion and Computer Graphics 4, 1 (1998), 21–36. 2

[KR90] KAUFMAN L., ROUSSEEUW P. J.: Finding Groups in

Data – An Introduction to Cluster Analysis. John Wiley & Sons,1990. 3

[LGS∗09] LAUTERBACH C., GARLAND M., SENGUPTA S.,LUEBKE D., MANOCHA D.: Fast bvh construction on gpus.Computer Graphics Forum 28 (2009), 375–384. 2

[MCT05] MELERO F., CANO P., TORRES J.: Combining sp-octrees and impostors for multiresolution visualization. Com-

puter and Graphics 29 (2005), 225–233. 2

[MCT08] MELERO F., CANO P., TORRES J.: Bounding-planesoctree: A new volume-based lod scheme. Computer and Grap-

hics 32, 4 (2008), 385–392. 1, 2

[MCT10] MELERO F., CANO P., TORRES J.: Deteccíon de coli-siones en grandes modelos geométricos. In Actas del Congreso

Español de Informática Gráfica 2010 (2010). 2

[Mel08] MELERO F.: BP-Octree: Una estructura jerárquica de

volúmenes envolventes. PhD thesis, Univ. Granada, 2008. 2, 3

[Mor66] MORTON G.: A computer oriented geodetic data base

and a new technique in file sequencing. Tech. rep., IBM Ltd.,1966. 3

[PKS10] PABST S., KOCH A., STRASSER W.: Fast and scala-ble cpu/gpu collision detection for rigid and deformable surfaces.Computer Graphics Forum 29 (2010), 1605–1612. 2

[RLVN91] RADHA H., LEOONARDI R., VETTERLI M., NAY-LOR B.: Binary space partitioning tree representation of images.Journal of Visual Communications and Image Processing 2(3)

(1991). 2

[Sha02] SHAPIRO V.: Handbook of Computer Aided Geometry

Design. Elsevier, 2002, ch. Solid Modeling, pp. 473–518. 2

[SPO10] SCHVARTZMAN S. C., PEREZ A. G., OTADUY M. A.:Star-contours for efficient hierarchical self-collision detection.SIGGRAPH 10 ACM SIGGRAPH 2010 papers 29 (2010). 2

[SW83] SAMMET H., WEBBER R.: Hierarchical data structuresand algorithms for computer graphics. IEEE Comp. Graphics

and Applications 8, 3 (1983), 48–68. OCTREE. 2

[TMHT10] TANG M., MANOCHA D., HILL C., TONG R.: Fastcontinuous collision detection using deforming non-penetrationfilters. I3D ’10 Proceedings of the 2010 ACM SIGGRAPH sym-

posium on Interactive 3D Graphics and Games (2010), 7–13. 2

[VMTS10] VOGIANNOU A., MOUSTAKAS K., TZOVARAS D.,STRINTZIS M. G.: Enhancing bounding volumes using supportplane mappings for collision detection. Computer Graphics Fo-rum 29 (2010), 1595–1604. 2

[YSGM04] YOON S.-E., SALOMON B., GAYLE R., MANOCHA

D.: Quick-vdr: interactive view-dependent rendering of massivemodels. 131–138. doi:10.1109/VISUAL.2004.86. 2

[YSLM04] YOON S., SALOMON B., LIN M. C., MANOCHA D.:Fast collision detection between massive models using dynamicsimplification. Eurographics Symposium on Geometry Proces-

sing (2004), 136–146. 2


20

CEIG - Spanish Computer Graphics Conference (2013)

M. Carmen Juan and Diego Borro (Editors)

NavMeshes with Exact Clearance for Different Character

Sizes

R. Oliva & N. Pelechano

Universitat Politècnica de Catalunya

Abstract

Navigation in virtual environments for autonomous characters is tipically handled by the combination of a

path planning algorithm which decides the cells to walk through in the navigation mesh, and a local movement

algorithm that carries out the frame to frame trajectory within each cell. Local movement is driven by intermediate

goals (attractors) along the portals that connect cells in the navigation mesh. In both cases, clearance should be

taken into consideration, since it is relevant when choosing the right sequence of cells that each character can

walk by, and also when deciding the location of goals within portals. Previous work has considered clearance for

path planning, but it has not been taken into account when assigning attractors within portals. We demonstrate

in this work that although a path with clearance guarantees that the character can walk through, it does not

guarantee a collision free trajectory. In this work we present three novelties: first a general method for calculating

clearance in Navigation Meshes consisting of convex cells of any type which allows for a small degree of concav-

ities, second a novel method for assigning attractors within portals that guarantee collision free paths, and third

a new method to dynamically locate attractors over portals based on current trajectory, destination, and clearance.

Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional

Graphics and Realism—Animation I.2.9 [Artificial Intelligence]: Robotics—Workcell organization and planning

1. Introduction

Simulation of autonomous agents is an important topic for

applications such as video games where visually convincing

paths are required. Characters need to move towards their

destination through the most suitable path, maintain a certain

amount of clearance with respect to obstacles and smoothly

avoid collisions with other characters.

The most popular approach to deal with navigation of au-

tonomous characters consists of creating a data structure,

typically a Navigation Mesh, that encodes the free space

of the scene by splitting it into convex polygons, known as

cells. A Cell-and-Portal Graph (CPG) is obtained where a

node represents a cell of the partition and a portal is an edge

of the graph that connects two adjacent cells. Then, given a

start and a goal position, paths can be calculated through a

variant of the classic A* algorithm. Finally, at every step of

the simulation, a local movement algorithm is applied in or-

der to guide the agent through the obtained path by comput-

ing intermediate goal positions (commonly known as way-

points) that connect the different nodes of the path.

When using a variety of characters it is convenient to be

able to calculate the shortest route for the characters based

on their size. If we think of applications such as video games,

this would allow a skinny character to escape from a large

monster by running through a narrow passage. Efficiency

is also a key aspect, since we probably need many char-

acters updating their path computation in the same frame

over a large scenario. Path computation should take a very

small amount of time (in the order of milliseconds), since

the systems needs to deal with physical simulation, blending

and rendering. Previous work is either bounded to a specific

amount of clearance, or only works with a specific type of

Navigation Mesh.

Another key focus to achieve plausible paths consists of

assigning the right waypoints to steer local movement. Most

of the previous work is based on computing a single point

over the portal (usually at the center, or close to one end-

point). In this case, since all agents share the same waypoint,

they tend to line up near the portals, increasing the probabil-

ity of a collision and bottlenecks. This problem would be


21

R. Oliva & N. Pelechano / NavMeshes with Exact Clearance for Different Character Sizes

alleviated if attractors could be set per character along the

whole length of the portal.

Main Contributions. We present a novel method for cal-

culating clearance in Navigation Meshes consisting of con-

vex cells of any shape. We also introduce a new method that

assigns attractors to portals in a dynamic fashion thus guar-

anteeing collision-free trajectories when crossing portals, as

well as making use of the whole length of the portal when

possible to avoid forcing all agents to walk through a fixed

location. We have integrated our local movement algorithm

with arbitrary clearance and dynamic attractors into the Nav-

igation Meshes obtained with NEOGEN [OP13][Appendix:

A]. The algorithm is straight forward and computationally

efficient to allow simulation of crowds. The average time

taken by each agent to calculate the attractor for each portal

is 4.5µs.

2. Related Work

Path planning of autonomous characters in virtual environ-

ments is a central problem in the fields of robotics, video

games and crowd simulation. The most popular solutions

are based on a combination of Global and Local Movement

techniques. The target of Global Navigation Techniques is

to provide a representation of the free space of the scene,

which is usually obtained by either constructing a RoadMap

or a Navigation Mesh. The main objective of both methods

is to generate a graph that can be used by a search algorithm

(usually the A* [HNR68]) to find a path free of obstacles

between two points in the scene.

The Roadmap approach [ACF01] [You01] [SGA∗07]

[RA11] captures the connectivity of the free space by using

a network of standardized paths (lines, curves). The main

limitation of this representation is that does not describe the

geometry of the scene, nor where the obstacles are. Conse-

quently, the avoidance of dynamic obstacles is usually a hard

task and not always possible, as exposed in [SGA∗07].

The Navigation Mesh approach [Sno00] [Toz02] [Kal05]

[vTCG12] [OP13] consists of the partition of the naviga-

ble space of the scene into convex regions, guaranteeing

that a character can move between two points of a cell fol-

lowing a straight line, without getting stuck in local min-

ima. Currently NavMeshes have become more popular than

RoadMaps as the representation of the free space is more

intuitive and provides a better description of the free space

and the location of the obstacles. Therefore we focus on this

Global Navigation Technique.

The target of Local Movement Techniques is to provide

a mechanism for the autonomous characters to move from

one location to the next one in the path in a smooth and nat-

ural manner, while avoiding collisions against dynamic ob-

stacles. These methods are generally driven by setting way-

points within the portals of the NavMesh that work as attrac-

tors to steer the agents in the right direction [Rey87] [Rey99]

[PAB07] [vdBLM08] [vdBPS∗08] [SvdBGM11]. The main

problem with this approach is that characters tend to line

up as they share the same attractor point over the portals.

An improvement to traditional waypoints was introduced

in [CSM12] by using way portals where the whole length

of the portal can be used to attract the local movement of the

agents, thus resulting in more natural looking paths. How-

ever, the problem of clearance is not properly addressed in

this paper, since it assumes that a cell is accessible by a char-

acter if the length of its portals is greater or equal than the

diameter of the character, which is not always the case as we

will show in this paper.

In order to carry out Path planning guaranteeing that the

resulting paths will have an arbitrary amount of clearance,

a common solution consists of enlarging the obstacles by

a specific amount of clearance known as the Minkowski

sum. An example of an application using this method is Re-

cast [Mon09]. The main advantage of this approach is that

every calculated path has the desired amount of clearance

and it is calculated offline, so it does not have an impact on

the performance of the path finding algorithm being used.

However its major drawback is that it is bounded to a spe-

cific value of clearance, so all characters must have the same

size, or at most the specific clearance size.

[Kal10] introduced a new type of triangulation called Lo-

cal Clearance Triangulation (LCT) that computes paths free

of obstacles with arbitrary clearance. Such triangulation is

obtained by a process that iteratively refines the Constrained

Delaunay Triangulation (CDT) resulting from the starting

set of obstacles. The resulting structure allows the algorithm

to determine if there exists a path free of obstacles for a given

clearance value. However it introduces more cells in the par-

tition of the scene, thus dropping the performance of the path

finding algorithm. Another limitation of the method is that it

only works for the described LCT and cannot be generalized

to any navigation mesh.

In [Ger10] the Medial Axis of the set of obstacles is ex-

tracted, and a particular type of NavMesh called Explicit

Corridor Map (ECM) is constructed by adding extra edges.

The ECM allows computing the shortest path with the largest

amount of clearance, or any path in between. Straight skele-

tons have also been used to calculate roadmaps for path find-

ing of multiple characters [HLD07].

3. Cell Clearance Values

The clearance value of a cell depends on how we cross this

cell. Given a cell CX , an entry portal P1 and an exit por-

tal P2, we can classify the obstacle edges of the cell into

edges to the left (leftString) and edges to the right (right-

String) with respect to the path that crosses the cell from the

entry portal to the exit portal (see Figure 1). Notice that it is

not necessary to have strictly convex cells, as cells generated

by NEOGEN [OP13] are allowed to have certain concavi-

ties depending on the convexity relaxation threshold chosen


22


when creating the mesh. There are two elements that can

make the clearance of the cell smaller than the length of its

portals: small convexities that are allowed by the user when

creating the NavMesh because they barely affect the local

movement algorithm, and the endpoints of the portals as we

can see in the example shown in Figure 1.

The algorithm proceeds by iterating over every notch (i.e.,

a vertex such that its internal angle is greater than π) present

in leftString and the closest edge in rightString is deter-

mined. The distance between the notch and the closest edge

is the clearance value of this notch. Notice that in this case,

the endpoints of each string must be treated as if they were

notches. The clearance value of the left string cLeft is the

minimum of those distances. To compute the clearance value

of the right string cRight, we proceed in the same way. Fi-

nally, the clearance value of the described path is computed

as follows:

cl (p1, p2) = min(length(P1) , length(P2) ,cLe f t,cRight)(1)

Figure 1: Clearance calculation for a given cell with clear-

ance given by allowed concavities (top) or by an endpoint of

a portal (bottom)

Notice that it is only necessary to check the distance of the

notches of the string against the edges of the opposite string,

as in the case of a convex vertex the distance to the opposite

string must be greater or equal than the clearance value of the

cell. This process is done offline once the NavMesh of the

virtual scenario has been generated. For each cell we store

in a table the clearance value of every possible cell cross-

ing (meaning every combination of entry/exit portals with

the cell) since it is possible to have a cell with three or more

portals, where an agent with a large radius can walk for ex-

ample from portal P1 to P2, but not from portal P1 to P3

(see Figure 2).

4. Portal Clearance

With the process described above, given a starting position,

a goal position, and the radius r of the character, the path

finding algorithm can determine if there is a path from the

starting position to the goal position with enough clearance.

Figure 2: Example of different clearance depending on the

crossing path through a cell.

Suppose that such a path exist, then let CA be the cell were

the character is currently located, CB be the next cell in the

path, and P the portal that joins both cells. We want to cal-

culate the sub-segment P ′ of P such that all points in P ′

have enough clearance. The idea is to shrink the portal P by

displacing the obstacle edges of CA and CB a distance of r

towards the interior of the cell.

Figure 3: CA and CB separated by P (left) are in fact two in-

dependent polygons with their vertices oriented in counter-

clockwise order, so P is the overlapping of PAB and PBA

(right).

CA and CB are two oriented polygons where the vertices

are given in counter-clockwise order. P can then be treated

as two identical overlapping segments given in opposite or-

der depending to which cell they belong to. Then we can re-

fer to them as PAB for the oriented edge that belongs to CA,and PBA for the oriented edge that belongs to CB. Figure 3

depicts this situation.

Given a cell CX , we will have a set of vertices in counter-

clockwise order v0,v1, ...,vn, where each consecutive pair

of vertices in the sequence defines an oriented edge of the

cell, i.e: ~e(i,i+1) is the edge starting in vertex vi and ending

in vertex vi+1, for i = [0,n−1]. We define the shrinking di-

rection of an edge,~s(i,i+1), as the vector perpendicular to the


23


edge with its direction pointing towards the interior of the

cell.

The algorithm for finding portals with enough clearance

proceeds by reducing the size of the original portals based

on the following three cases:

1. Portal length limited simply by r (it always applies).

2. Limitations given by other portals.

3. Limitations given by edges of the current cell.

Case 1: The algorithm starts by displacing each endpoint

of PAB a distance of r units towards the center of the portal.

The resulting sub-segment P ′

AB has enough clearance only

if the rest of the edges in CA are at a distance greater than r

from P ′

AB as we can see in Figure 4. Otherwise, this sub-

segment must be further refined to guarantee collision-free

traversability.

Figure 4: In this example, the portal P ′

BA of the CB is ini-

tialized by displacing the endpoints of the original portal

PBA a distance r towards its center.

Case 2: Let CB be an intermediate cell on the computed

path, i.e., a cell that is not the starting cell of the path nor the

final one. In this case, we have to cross the cell by crossing

two portals, an entry portal PBA and an exit portal PBC. In

such a situation it is possible that the endpoints of P ′

BA are

determined by the endpoints of PBC. This occurs when one

(or both) endpoints of PBC is at a distance less than the de-

sired clearance value from the entry portal PBA. To handle

this situation, we check if a circumference centered in the fi-

nal endpoint of PBC intersects with P ′

BA. If this intersection

exists, we update P ′

BA accordingly. Lets define PBA [0] asthe origin of the oriented portal PBA, and PBA [1] as the end.

Since the portals are also given in counter-clockwise order,

we can state that the origin of any portal can only limit the

clearance of the end of the portal for which we are calculat-

ing clearance, PBC [0] can only shorten P ′

BA [1], and PBC [1]can only shorten P ′

BA [0]. The algorithm to further shorten

P ′

BA continues throgh the following two cases:

• If the circumference centered in PBC [0] intersects P′

BA

in a single point, then P ′

BA [1] is set to be this intersection

point.

• If the circumference centered inPBC [0] intersectsP′

BA in

two points, then P ′

BA [1] is set to be the intersection point

that is furthest from PBA [1].

Symmetrically, a circumference centered on PBC [1]is checked for intersections against P ′

BA to determine

ifP ′

BA [0] needs to be updated (See figure 5).

Figure 5: The endpoint P ′

BA [1] of the entry portal is de-

termined by the endpoint PBC [0] of the exit portal, since

the circumference centered on PBC [0] intersects P′

BA. The

other end of P ′

BA is not modified, since the circumference

centered in PBC [1] does not intersect P′

BA.

Case 3: The algorithm proceeds by displacing each ob-

stacle edge~e(i,i+1) a distance of r unites along its shrinking

direction, ~s(i,i+1) (Please note that this step only needs to

be performed if the shrinking direction points towards the

portal of interest, otherwise there is no possible intersection

after the displacement and so we can skip it). Then, for each

displaced edge ~e′(i,i+1), we calculate whether there is an in-

tersection with P ′

AB, and if so the corresponding endpoint

of P ′

AB is updated depending on the direction of ~e′(i,i+1) as

follows:

• If vi is at the left side of P ′

AB and vi+1 is at the right side

of P ′

AB, PAB [0] is set to be the intersection point.

• If vi is at the right side of P ′

AB and vi+1 is at the left side

of P ′

AB, P′

AB [1] is set to be the intersection point.

The same process is performed for P ′

BA and finally, P ′

is computed as the resulting sub-segment of the intersection

between P ′

AB and P ′

BA. Every point in P ′ is guaranteed

to have enough clearance. Figure 6 shows the result of the

algorithm.

To accelerate the computation of shrinking portals, we

only perform this computation the first time that a given

clearance is needed for a cell, and then we store the results

in a look-up table. Also for those cases where P ′ matches

the result of case 1 for r equal to the clearance of the cell, it

is not necessary to carry out any further calculations for any

character size.

5. Computing Dynamic Way Points

In the case of Navigation Meshes, the method used to steer

the character from one cell to another is a key step to create

natural and well looking routes. First of all, we check if the

goal position of the character is visible from its current po-

sition, i.e., the segment joining the current and the goal po-

sition of the character only produces intersection with portal

type edges. In that case, the attractor point is simply the goal

position. If the segment does intersect with at least one ob-

stacle edge, we need to compute an attractor point over the

next portal in the path to steer the character towards the next

cell of the path. Our target is to avoid characters having the


24


Figure 6: This figure shows the shrinking process that affects

a portal P calculated for each of the connected cells and

the final result P ′ after calculating the intersection of the

intermediate solutions P ′

AB and P ′

BA

same attractor point, so we compute the orthogonal projec-

tion point q of the current position of the character p over

P ′, where P ′ is the shrunk portal after applying the algo-

rithm described in the previous section over the portal P . If

q lies outside the limits of P ′, then the furthest endpoint of

P ′ with respect to the current position of the character is se-

lected as temporal attractor, until q is valid. Notice that since

the current position of the characters changes from frame to

frame affected by the local movement rules, it is unlikely

that two different characters share the same attractor point

over the portal. In addition, P ′ guarantees that any attractor

point will have enough clearance.

Figure 7: The attractor point of character ch1 is the projec-

tion of its current location over P ′. In the case of ch2, the

temporal attractor point is one of the endpoints of P ′ q2 as

the projection point of the current location of ch2 lies outside

P ′.

We have determined empirically that in the case of q being

invalid, the furthest endpoint of P ′ is a better candidate as

temporal attractor than the closest one. This is because when

the steering attractor is the nearest endpoint, the character

tends to move too close to the walls, producing a bad quality

route.

6. Results

The following results have been obtained in an Intel Core 2

Quad Q9300 @ 2.50GHz, 4GB of RAM. Table 1 shows the

time spent per query in miliseconds by the the A* with clear-

ance algorithm, as well as the number of successful paths

that have been obtained over the total. Each of the maps used

for the experimental results has been tested with a total of

1000 path queries with randomly chosen start and goal cells,

and a clearance value in the set 0.5,1.0,1.5.

(cl = 0) (cl = 0.5) (cl = 1.0) (cl = 1.5)

Map1

Time/Q 0.492 0.472 0.420 0.239

Paths 1000 933 761 491

Map2

Time/Q 0.761 0.704 0.648 0.223

Paths 1000 969 867 227

Map3

Time/Q 2.215 2.099 1.975 1.648

Paths 1000 967 877 741

Table 1: For each of the test maps, we show the time spent

per query in milliseconds, and the number of successful

paths found from the total of 1000 random queries, depend-

ing on the clearance value required.

Map1 is a very simple scene that represents two rooms

connected through two corridors of different width with

some simple obstacles, that contains 10 cells and 12 portals.

Map2 is a NavMesh of a single layer environment contain-

ing some columns randomly distributed and orientated, con-

taining a total of 104 cells and 149 portals. Map3 is a scene

of similar characteristics to that in Map2, but is larger, con-

taining a total of 141 cells and 208 portals. All Navigation

Meshes have been obtained using NEOGEN [OP13].

As expected, the performance of A* depends on the num-

ber of successful queries. As we increase the clearance

value, more nodes will be discarded at an early stage, thus

resulting in faster searches. We can see in the table how A*

with increasing clearance results in less successful paths in

the Navigation Mesh, thus reducing the computational time.

Maps 2 and 3 are similar in complexity (and so the total num-

ber of paths found depending on clearance values are rela-

tively similar) but Map3 contains a higher number of cells

and that is why the time needed to find paths is much higher.

Table 2 shows the time spent per query (microseconds)

for Map1 described previously. Multiple versions of the al-

gorithm have been implemented:


25


• Fast(-): A fast but non-exact solution used as a reference,

that simply calculates the waypoint as the midpoint of the

shrunk portal.

• Fast(+): The fast version described above but with the dy-

namic waypoint computed as described in section 5.

• Exact(-): The exact clearance solution described in sec-

tion 4 with the waypoint calculated as the midpoint of the

shrunk portal.

• Exact(+): The exact clearance version (section4) com-

bined with the dynamic waypoint computation (section

5).

In all implemented versions of the algorithm, the previ-

ously computed portals are stored in a look-up table to avoid

redundant queries. Each test case consisted of a set of queries

where, for each query, we randomly chose a cell of the Nav-

igation Mesh, a way to cross this cell and a clearance value

(0.5, 1, or 1.5).

#queries Fast(-) Fast(+) Exact(-) Exact(+)

10 6.222 6.596 8.222 8.596

50 5.782 6.156 7.062 7.436

100 5.042 5.416 6.242 6.616

500 4.092 4.466 4.668 5.042

1000 3.794 4.168 4.115 4.489

Table 2: Time spent per query in microseconds for the dif-

ferent implementations of the shrunk algorithm.

The results of this experiment highlight the efficiency of

our method. The efficiency of the algorithm increases with

the number of queries as the chance of producing a redun-

dant query is higher, and eventually, every query will be re-

dundant. In the example shown in Table 1, for the case with

1000 randomized queries, the cost of Exact(+) is just 1.07

times the cost of Fast(+) and 1.09 times the cost of Exact(-).

So the algorithm presented in this paper (Exact(+)) is prac-

tically as efficient as other simpler implementations, but it

greatly enhances the quality of the resulting paths. With our

method, the calculated sub-portals are the largest ones that

respect the clearance required by the character and the way-

point computation helps to avoid collision with other agents

since the underlying local movement algorithm applied to

the agents will lead them to different attractors over the

crossing portals.

7. Conclusion

We have presented a general technique to compute paths free

of obstacles with an arbitrary value of clearance that can be

easily integrated in any existing Navigation Mesh system.

Our method can be divided into the following steps:

Firstly, during the construction of the NavMesh, the clear-

ance value of each cell is computed in order to obtain paths

that guarantee clearance when applying the A* algorithm.

Secondly, the portals of the path are refined by shrinking

them depending on the clearance required for each charac-

ter and the surrounding geometry. Finally, an attractor point

over the shrunk portal is computed that depends on the char-

acter position and hence, avoids two characters sharing the

same attractor point.

Results show that our method is fast enough compared

to much simpler implementations, but produces paths of

higher quality as it takes into account clearance for both

path planning and waypoint calculations. Assigning dynam-

ically waypoints along portals avoids characters forming

lines when crossing portals, since the avoidance behavior

of the local movement algorithm will steer their trajectories

correctly and thus modifying the projection of their positions

over portals.

As future work we would like to enhance the local behav-

ior against dynamic obstacles since our current implementa-

tion is based on a simple rule based model which does not

predict trajectories of other moving obstacles.

Acknowledgements

This work has been partially funded by the Spanish Ministry

of Science and Innovation under Grant TIN2010-20590-

C01-01.

Appendix A: NEOGEN

NEOGEN [OP13] is an automatic method for generating

near optimal Navigation Meshes from 3D multi-layered vir-

tual environments (with slopes, steps and other obstacles).

The algorithm starts by performing a GPU voxelization of

the whole scene to determine the different layers and cal-

culate a cutting shape, CS. The CS is a depth filter used by

the fragment shader to flatten each layers’ geometry into a

2D high resolution texture encoding the depth map of each

layer. Then each layer is encoded as a single simple poly-

gon with holes which is input to the core NavMesh gener-

ator [OP11] to obtain a convex decomposition with a near-

optimal number of cells. The resulting Cell and Portal Graph

(CPG) is optimized with a novel convexity relaxation tech-

nique which further reduces the number of cells. Finally, all

the layers’ CPG are automatically linked together to obtain

the final CPG of the entire scene. Figure 10 illustrates an

example of the NavMesh generation process.

NEOGEN, overcomes most of the limitations that were

present in previous work such as not being able to handle

degeneracies (e.g: holes and intersections), NavMeshes with

too many unnecessary cells, ill-conditioned cells, T-joints

between portals or NavMeshes that do not cover accurately

the geometry of the environment. NEOGEN provides an ef-

ficient pipeline to go automatically from a 3D multilayered

environment given as a polygon soup, to the final NavMesh

which adjusts tightly to the original geometry. It is efficient

enough to allow the scene modeler to make changes and ob-

serve the impact on the final NavMesh at interactive rates.


26


Figure 8: Test maps used for our experiments. From left to right: Map1, Map2 and Map3.

Figure 9: Map3 with two different radius characters (green for smaller radius, and yellow for larger radius) moving between

cells in the graph.

Figure 10: example of NavMesh generation process carried

out by NEOGEN.

References

[ACF01] ARIKAN O., CHENNEY S., FORSYTH D. A.: Efficientmulti-agent path planning. In Proceedings of the Eurographic

workshop on Computer animation and simulation (New York,NY, USA, 2001), Springer-Verlag New York, Inc., pp. 151–162.2

[CSM12] CURTIS S., SNAPE J., MANOCHA D.: Way portals:efficient multi-agent navigation with line-segment goals. In Pro-

ceedings of the ACM SIGGRAPH Symposium on Interactive 3D

Graphics and Games (New York, NY, USA, 2012), I3D ’12,ACM, pp. 15–22. 2

[Ger10] GERAERTS R.: Planning short paths with clearance usingexplicit corridors. In IEEE International Conference on Robotics

and Automation, ICRA 2010, Anchorage, Alaska, USA, 3-7 May

2010 (2010), IEEE, pp. 1997–2004. 2

[HLD07] HACIOMEROGLU M., LAYCOCK R. G., DAY A. M.:Distributing pedestrians in a virtual environment. vol. 0, IEEEComputer Society, pp. 152–159. 2

[HNR68] HART P. E., NILSSON N. J., RAPHAEL B.: A FormalBasis for the Heuristic Determination of Minimum Cost Paths.Systems Science and Cybernetics, IEEE Transactions on 4, 2(July 1968), 100–107. 2

[Kal05] KALLMANN M.: Path planning in triangulations. In Pro-ceedings of the IJCAI Workshop on Reasoning, Representation,

and Learning in Computer Games (Edinburgh, Scotland, July 312005). 2

[Kal10] KALLMANN M.: Shortest paths with arbitrary clear-ance from navigation meshes. In Proceedings of the 2010 ACM

SIGGRAPH/Eurographics Symposium on Computer Animation

(Aire-la-Ville, Switzerland, Switzerland, 2010), SCA ’10, Euro-graphics Association, pp. 159–168. 2

[Mon09] MONONEN M.: Recast navigation toolkit. http:http://code.google.com/p/recastnavigation/, 2009. 2

[OP11] OLIVA R., PELECHANO N.: Automatic generation ofsuboptimal navmeshes. In Proceedings of the 4th international

conference on Motion in Games (Berlin, Heidelberg, 2011),MIG’11, Springer-Verlag, pp. 328–339. 6

[OP13] OLIVA R., PELECHANO N.: Neogen: Near optimal gen-


27


erator of navigation meshes for 3d multi-layered environments.Computer And Graphics (Apr. 2013). 2, 5, 6

[PAB07] PELECHANO N., ALLBECK J. M., BADLER N. I.:Controlling individual agents in high-density crowd simula-tion. In Proceedings of the 2007 ACM SIGGRAPH/Eurographics

symposium on Computer animation (Aire-la-Ville, Switzerland,Switzerland, 2007), SCA ’07, Eurographics Association, pp. 99–108. 2

[RA11] RODRIGUEZ S., AMATO N. M.: Roadmap-based levelclearing of buildings. In Proceedings of the 4th international con-ference on Motion in Games (Berlin, Heidelberg, 2011), MIG’11,Springer-Verlag, pp. 340–352. 2

[Rey87] REYNOLDS C. W.: Flocks, herds and schools: A dis-tributed behavioral model. SIGGRAPH Comput. Graph. 21, 4(Aug. 1987), 25–34. 2

[Rey99] REYNOLDS C. W.: Steering behaviors for autonomouscharacters. In Proceedings of Game Developers Conference 1999(San Francisco, California, 1999), GDC ’99, Miller FreemanGame Group, pp. 763–782. 2

[SGA∗07] SUD A., GAYLE R., ANDERSEN E., GUY S., LIN

M., MANOCHA D.: Real-time navigation of independent agentsusing adaptive roadmaps. In Proceedings of the 2007 ACM sym-

posium on Virtual reality software and technology (New York,NY, USA, 2007), VRST ’07, ACM, pp. 99–106. 2

[Sno00] SNOOK G.: Simplified 3d movement and. pathfindingusing navigation meshes. In Game Programming Gems. CharlesRiver Media, 2000, pp. 288–304. 2

[SvdBGM11] SNAPE J., VAN DEN BERG J., GUY S. J.,MANOCHA D.: The hybrid reciprocal velocity obstacle. Trans.

Rob. 27, 4 (Aug. 2011), 696–706. 2

[Toz02] TOZOUR P.: Ai game programming wisdom. In Buildinga Near-Optimal Navigation Mesh, Rabin S., (Ed.). Charles RiverMedia, 2002, pp. 171–185. 2

[vdBLM08] VAN DEN BERG J., LIN M., MANOCHA D.: Recip-rocal Velocity Obstacles for real-time multi-agent navigation. In2008 IEEE International Conference on Robotics and Automa-

tion (May 2008), IEEE, pp. 1928–1935. 2

[vdBPS∗08] VAN DEN BERG J., PATIL S., SEWALL J.,MANOCHA D., LIN M.: Interactive navigation of multipleagents in crowded environments. In Proceedings of the 2008

symposium on Interactive 3D graphics and games (New York,NY, USA, 2008), I3D ’08, ACM, pp. 139–147. 2

[vTCG12] VAN TOLL W., COOK IV A. F., GERAERTS R.: Anavigation mesh for dynamic environments. Journal of Visual-

ization and Computer Animation 23, 6 (2012), 535–546. 2

[You01] YOUNG T.: Expanded geometry for points-of-visibilitypathfinding. In Game Programming Gems 2. Charles River Me-dia, 2001, pp. 317–323. 2


28

City-Level Level-of-Detail

Gonzalo Besuievsky and Gustavo Patow

ViRVIG, Universitat de Giorna, Spain

Figure 1: LoD for City Level. A full resolution city model (left) and a view-dependent LoD (right), from the referenced red ball.

Abstract

Modeling large, detailed cities with complex buildings is now feasible with current procedural modeling tech-

niques, which allow their use in large game and movie productions. However, this possibility of generating almost

infinite amounts of detailed geometry can become a serious problem when generating a large urban model. In

this paper we propose a new LoD technique that precisely selects the detail of the geometry to generate, reducing

the geometric quality of those areas that accept simpler representations, according to a user-defined criteria. Our

technique operates at all urban levels: at the block level, the building level, and it smoothly combines with previous

asset-level efforts [BP13].

1. Introduction

Current major game and movie productions seriously chal-lenge modern content creation pipelines. Games like the As-sassin’s Creed series [Ubi13] and the modern Grand TheftAuto series [Roc13] show large urban environments result-ing from thousands of man-hours. As an alternative, proce-dural modeling techniques [PM01] have emerged to enableartists to focus on a few simple input parameters, releasingthem from the tedious and error-prone manual process ofcreating each building from scratch. In recent years thesetechniques have flourished in a rich set of tools that haveenabled large-scale productions like Disney’s blockbustermovie Cars 2 [WAMV11]. Procedural modeling techniquesare now firmly established as a practical mean to generatelarge urban landscapes [WMV∗08,VAW∗10].

However, the greatest strength of procedural techniques,their ability to generate large amounts of geometry from atiny set of rules, becomes one of their greatest weakness:when generating a large detailed city it is very easy to over-run the system capabilities in terms both of computationalpower and storage costs. Simply, once an urban model isgenerated, it is not clear how to handle a model with mil-lions of polygons [PB13]. Thus, designers are faced witha contradiction between needing large and detailed mod-els, and their incompetence in handling the resulting mas-sive geometry. This challenging problem has promoted re-search following along two main lines: on the fly genera-tion of buildings taking advantage of the power of currentgraphics hardware [HWM∗10, KK11] and client-server ar-chitectures [CO11], which distribute workload among var-




29

G. Besuievsky & G. Patow / City-Level LoD

ious servers. A third option that has recently emerged isthe use of specifically tailored Level-Of-Detail (LoD) tech-niques [BP13], but their application up to now has been lim-ited to asset-level (e.g., on windows, doors, and balconies)operations.

In this paper we take the opportunity to investigate thiscomplex, but fundamental question. We propose a new LoDtechnique that accurately selects the detail of the geometryto generate, and reduces the geometric quality of those ar-eas that accept simpler representations according to user-defined criteria. Our technique operates at all urban lev-els, block and building, and combines smoothly with pre-vious asset-oriented efforts. The block LoD technique se-lects those blocks in a city that need further processing, andthose that can be safely generated with the lowest possiblelevel of detail. The building technique operates similarly,but inside each block. Finally, we combine these two tech-niques with an asset-level LoD technique (introduced else-where [BP13]), to effectively create an implicit city-block-building-asset hierarchy, covering the whole range of urbanstructures in a smooth and integrated way. We allow the userto control these techniques through a simple, yet powerfulprogrammable interface. The main contribution of this paperlies on the generalization of previous, asset-only techniques,to the urban level, taking into account a hierarchy consistingof the city, blocks, buildings and assets.

2. Previous Work

The seminal work by Parish and Müller [PM01] about ur-ban modeling, followed by the key works by Wonka etal. [WWSR03] and Müller et al. [MWH∗06] about proce-dural buildings, produced a blossom in urban modeling re-search. All these efforts resulted in the apparition of com-mercial packages, like Esri’s CityEngine [Esr12], or Epic’sUDK [Epi12], focused on, or with modules for, proceduralurban design. The interested reader is referred to the surveysby Watson et al. [WMV∗08], and Vanegas et al. [VAW∗10]for an in-depth review of the state of the art literature in ur-ban modeling.

However, as mentioned, this paper focuses on a differentproblem: the classification of the areas in an urban landscapethat need different amounts of detail, and a mechanism thatfocuses their computation to produce only the required ge-ometry. Previous work on level of detail for urban modelscan be found in the area of urban generalization, like thecartographic generalization proposed by Anders [And05],or the face collapse from known constructive structures aswalls and roofs [RCT∗06]. Chang et al. [CBZ∗08] simpli-fied the urban landscape based on "urban legibility" to pre-serve the urban recognizability at any LoD. The CityGMLstandard [Kol09] proposes the usage of five different LoDlevels, but does not provide a mechanism to generate them,nor an adaptive LoD scheme, as we do here.

Recently, Besuievsky and Patow [BP13] proposed a

mechanism to rewrite the rulesets for the buildings in an ur-ban environment, replacing geometry by LoD-aware opera-tors which produced the right level of detail for each assetaccording to some user-defined parameters. The approachwe propose in this paper builds on this work for the buildingassets, and integrates it in a full urban-level hierarchy thatenables selection, from entire buildings up to whole blocks,for geometric reduction.

3. Background

3.1. Procedural Modeling

The seminal works by Wonka et al. [WWSR03] and Mülleret al. [MWH∗06] introduced Grammar-based proceduralmodeling for buildings. The main concept of this techniqueis a shape grammar, which is based on a ruleset: startingfrom an initial axiom primitive (e.g. a building outline), rulesare iteratively applied, replacing shapes with other shapes. Arule has a labeled shape on the left hand side, called prede-cessor, and one or multiple shapes (also called primitives)and commands on the right hand side, called successor:

predecessor →CommandA,CommandB : labelB;

labelB →CommandC : labelC;

The resulting geometry is formed by shapes that can beoptionally assigned new labels with the purpose of being fur-ther processed. In our system, this geometry carries all labelsthat the shape or any ancestor has received during the pro-duction process. The main commands, the macros that createnew shapes in the classic approach, are:

• Subdivision that performs a subdivision of the currentshape into multiple shapes,

• Repeat that performs a repeated subdivision of one shapemultiple times,

• Component split that creates new components shapes(faces or edges) from initial volumes,

• Insert command that replaces a pre-made asset (e.g., awindow, a door, a chimney) on a current predecessor.

Traditionally, during a rule application, a hierarchy of shapesis generated corresponding to a particular instance createdby the grammar while inserting rule successor shapes as chil-dren of the rule predecessor shape. This production processis executed until only terminal shapes are left.

3.2. Simplification criteria

In the work by Besuievsky and Patow [BP13], there weredefined a set of user-defined simplification criteria that se-lected the right LoD level for each asset. There, assets werereduced to different LoDs, from full resolution (100%) tothe lowest level (0%), where assets were replaced by tex-tured quads. See Figure 2. Here we generalize these criteriato the the different scales at the urban level:


30


Figure 2: Five different stages in our asset-level LoD: 100%, 75%, 50%, 25% and 0%. Observe that the latter is just a quad

with a pre-computed texture.

View-Dependent: For the view-dependent criterion, thesystem uses a virtual viewpoint as a basis to define differ-ent possible criteria:

Distance_LoD (viewpoint): This case evaluatesreductions using the distance (in the euclidean sense) fromthe viewpoint to the shape center. The products that arecloser than a user-defined threshold minDist are set to fullresolution (i.e., 100%), while the ones farther than maxDist

are set to textures (i.e., 0%). Reduction factors for in-between distances (i.e., dist ∈ [minDist,maxDist]) are lin-early interpolated with the formula:

LoD =dist(viewpoint,center(asset))

maxDist −minDist(1)

Silhouette_LoD(viewpoint): We adapted thework by Luebke and Erikson [LE97] for silhouette preser-vation when evaluating reductions. By computing the view-ing cone and an approximate cone of front-facing normalsof the oriented bounding box of the product, we can decidewhether it is front-facing, back-facing, or potentially lies onthe silhouette. In the last case, a higher preservation param-eter is used:

threshold =

baseT hreshold ∗2/3 if testSilhouette(asset)baseT hreshold otherwise

(2)with testSilhouette(asset) defined as Luebke and Erikson[LE97] did, but for the asset bounding box.

ScreenSize_LoD(viewpoint): Computes the re-duction factor based on the size projected on screen [LE97].Our implementation uses a conservative approach, comput-ing a bounding sphere and then obtaining its on-screen pro-jected size.

Semantic_Selection: Here we select geometrybased on some semantic criteria, like selecting a landmarkbuilding and computing distances with respect to this spe-cific building. We observed that this can be used as an ag-gressive two-level reduction strategy: maximum level for theselected geometry and texture for the rest, but in this paperwe decided to use a range of reduction factors.

Programmable Interface: We created a programming in-terface in Python that allows calling all the above mentioned

criteria. This interface also allows the designer to combinedifferent criteria in any arbitrary way, providing a flexibleapproach to design model reductions. One example of this isthe SemanticDist_LoD function that uses the distancerelative to the selected product to evaluate the reduction fac-tor.

selLandmark = Semantic_Selection(selectCriteria)

dist = dist(center(selLandmark), currentElement)

return clamp(0, 1, dist / maxDist) * 100

where selectCriteria is the semantic selection criteria givenby the user (e.g., selecting a given landmark building). In thefollowing example code we can see an example to evaluatethe reduction factor products as the maximum of a viewpointscreen-size and a semantic distance:

scrFactor = ScreenSize_LoD(vp, currentElement)

semFactor = SemanticDist_LoD(selectCriteria)

return max(scrFactor, semFactor)

where vp is a viewpoint and currentElement is an element (ablock or a building) currently selected that acts as semanticreference.

Conflicts between different reduction criteria can besolved though user-defined methods using functions likemin, max or avg. As an implementation note, this "code" isstored in user-provided text fields in the user interface, andevaluated through the Python command eval, which evalu-ates the provided string as code in the current programmingcontext.

The input to our system is the user-provided LoD con-figuration script, which describes in a textual interface allcriteria and parameters for the model reduction. This shouldinclude, if needed, a reference to a camera (or any other ref-erence) object. If a semantic criterion is used, this script mustprovide a valid selection from a list of urban landmarks usedin the model. The main kinds of configurations can be sum-marized as: only view-dependent, only semantic, or a com-bination, as described before.

4. Urban-level LoD

4.1. Preprocessing

Our system does not require any extra preprocessing be-yond the one described in the work by Besuievsky and Pa-


31


Figure 3: Close view of a urban model street at full resolu-

tion (top) and at low LoD (bottom), where all asset-geometry

has been replaced by textures.

tow [BP13]. Basically, before any actual LoD computationbegins, all the textures for the asset replacement for the max-imum reduction case should be precomputed. We do this byrunning a script that selects each insert node in turn and gen-erates for its selected asset a normalized view (i.e., the samesize for all assets), from where a simple rendering is per-formed to get the final image. This image can include illu-mination effects like ambient occlusion, but not shadows byother assets, as these will be incorporated by the final render-ing stage. Of course, images cannot include projected shad-ows, but we did not observe any problem as these imagesare only used at large distances with respect to the observer.Please observe that our technique only requires images at theasset-level, so no ruleset evaluation is needed in this stage.

4.2. Block-level LoD

Our content creation pipeline follows the typical steps in thecreation of procedural cities [VAW∗10]: first major roads aregenerated, dividing the city into districts or quarters, thenminor roads are created, resulting in a set of blocks thatshould be checked against the area of interest to determinethe right LoD to be used in each case. For this purpose, onecould use this two-level hierarchy to decide first the LoD foreach quarter, and then for each block inside the quarters thatpassed the first test. However, in our experience, this was notnecessary, being the process of directly verifying each blockfast enough for our purposes, specially when compared withthe time needed to later evaluate the rules for all the proce-dural buildings.

This way, the purpose of the block-oriented LoD verifi-cation is to quickly determine whether a block will be ren-dered at full LoD (i.e., 100%), at the lowest LoD possible(0%), or at any intermediate LoD level. In our approach, weestablished a maximum range of affectation that defines avolume enclosed by a sphere. The affectation region is re-lated to the defined function of the criteria used. For theDistance_LoD and Semantic_Selection, the user-provided radius is used. For the Silhouette_LoD andScreenSize_LoD criteria, an upper bound was computedtaking into account the size of the largest element in thescene (e.g., the largest block size). If a block lies completelyoutside of the affectation area, it is directly reduced to thelowest LoD. Otherwise it is set for further processing. SeeSection 4.3.

To compute the actual intersections with the sphere de-fined by the above defined range, a full intersection of thesphere with the block volume is needed. To speed compu-tations up, we tested the intersection against the block con-vex hull, instead. In case the reference point is at the groundlevel, which is true for walkthroughs for pedestrian or cardrivers, a 2D intersection would suffice by testing the 2Dfloor plan of the block against the circle defined by the rangeradius at the ground level. If we assume that the range vol-ume is large with respect to the size of the urban blocks, it isreasonably safe to simplify the test just to the floor plan ver-tices, which simply implies computing their distance to thereference point and determining whether this difference islarger or not than the range radius already computed. In Fig-ure 3 we can see a close view of a part of our urban model.Here, the buildings are set to maximum (100%) and to min-imum (0%) LoD, the later case showing the replacement ofeach asset by its respective texture.

4.3. Building-level LoD

Once a block has been chosen for further processing, each ofits buildings are evaluated in turn. The first test to performis to evaluate the building convex hull against the range vol-ume. If a building lies completely outside the volume, itsLoD is set to 0%. However, in our implementation we de-cided to use the building base model (i.e., before any proce-dural rule evaluation), instead. This test under-estimates thereduction factor to be applied, but it is much faster to per-form: using the building bounding volume implies we haveto compute the final building (evaluate its rule set), generateits bounding volume, and then use this result for the com-parison. The way we implemented it simply means takingthe base mass model, without any further computations. Ifa building lies partially or completely inside the affectationvolume we use its modified ruleset, which includes the asset-level LoD criteria presented elsewhere [BP13]. An exam-ple can be observed in Figure 4, where a reference point (aball) is changed to different positions, thus resulting in some


32


Figure 4: Close view of our street urban model at 4 different frames in the walkthrough animation. Observe how the building

farthest on the right is set to LoD 0% until the reference point (the ball) is close enough to activate the asset-level verifications.

buildings selected for minimum LoD and others for furtherprocessing.

4.4. City-Block-Building-Asset Hierachy

In procedural modeling of urban environments it is a com-mon practice to define architectural styles, which representpatterns that sets of buildings follow inside a city, even wholeneighborhoods. These styles are usually represented by asingle set of rules. On the other hand, graph rewriting tech-niques are fast to evaluate, and the rewriting of the rulesetfor a given style only takes a few seconds. Then, the timeto evaluate any ruleset (either the original or the rewrittenone) depends on the particular system implementation. Tak-ing all these factors into account, it is easy to realize that,in our system, it would be prohibitive to modify rulesetsin a building-by-building basis. Instead, we pre-process allrulesets at once, at preprocessing time, storing the city ofrewritten rulesets. In our implementation, these rulesets wererewritten taking LoD into account, replacing assets by LoDaware operators that evaluate the corresponding LoD to com-pute for each asset at runtime.

This evaluation system would imply each asset must ac-cess the simplification criteria, evaluate the asset with re-spect to them, choose a geometrical reduction level and com-pute the corresponding LoD for the given asset. Even withthe use of an efficient cache system, this operation can turnprohibitive if evaluated for all the assets in all buildings inthe city. We can see that efficiency is only achieved when the

full LoD hierarchy is used: block-level LoD selects blocksfor minimum LoD (0%). The blocks that pass this test will beevaluated by the building-level LoD step, classifying build-ings into those that fail the test and are set to 0% LoD, andthose that pass the test. Finally, the ones that passed are fur-ther processed for asset-level evaluation by the rewritten setsof rules.

4.5. User-defined criteria

As we have mentioned in Section 3.2, the user is free to se-lect the criteria to be applied by a simple selection of a fixedcriterion, or through a programming user interface. As wealready mentioned, the selection of the criteria might affectthe radius used for the affectation volume (e.g., as used bythe block- and building-level criteria). To evaluate the affec-tation radius directly from the code would imply not onlyidentifying the criteria in play, which can be easily achievedby locating the criteria command tokens, but also to deter-mine if there are any further modifications to these valueslater on in the code. As this kind of evaluation is beyondthe scope of this paper, we decided to use an upper boundby selecting the maximum of all defined criteria in the code,and let the user provide an alternative value (also in a pro-gramable way inside the same block of code as the LoD re-duction) in case the automatic parsing gives wrong results.In all our experiments we did not observe any problem, butit must be taken into account that different users may comewith countless different pieces of code, with wildly different


33


behaviors. Thus, we always leave open the possibility of theuser to provide the value to be used as affectation distance.

5. Results

We implemented our system on top of the SideFX’s Hou-dini [Sid12] using its own nodes, embedded Python scriptsand external Python methods.

In order to test and evaluate the system we created two ur-ban models, a simple one composed by 10 blocks, 142 build-ings and 1.27 million polygons (see Figure 3); and a complexone composed of 1436 buildings, structured into 103 blocks,with a total of 17.0 million polygons (see Figure 5).

For designing an animation sequence, we proceeded ina two-step approach. First, we designed the animation pathand pre-processed the city model computing the level of de-tail of the affected blocks. In this step we stored into in-dex files the block geometries that where affected, with thecorrespondent reduction. In the second step, which can beregarded as a rendering stage, we executed the animation,loading the corresponding blocks for each frame. This way,we can even generate different views of the sequence by de-signing a difference camera for the same animation. For thecomplex city, the preprocessing time was 4 minutes for eachframe of the sequence. The affectation radius of the spherein this case was set to the size of one block, meaning that,for a given viewpoint, the algorithm takes the four closestblocks into account (see Figure 6).

Information about the simplification of the urban modelused in three different animation frames can be found in Ta-bles 1 and 2. For the complex model at the lowest possiblesimplification level, where all assets can be replaced withtextures, it is reduced to only 129866 polygons, which is0.76% of the whole model. See Figure 7.

Figure 5: A complete urban model with our multi-level LoD

system. Observe the influence volume that can be observed

in the assets on the rightmost building

Full Urban Model low LoD fr1 fr60 fr180

# Polygons 1.27M 16715 29943 29143 25269

% Reduction 1.31% 2.35% 2.29% 1.98%

Table 1: Factor reductions of the urban model animation

related to the simple city model. Frames refer to the images

in Figure 4.

Full Urban Model low LoD fr1 fr60

# Polygons 17M 129K 501K 436K

% Reduction 0.76% 2.9% 2.5%

Table 2: Factor reductions of the complex city model se-

quence. Frames refer to the images in Figure 6.

6. Discussion and Future Work

We have presented an automatic LoD system for urban mod-els that smoothly integrates with state-of-the-art asset-levelLoD techniques. This system is based on a LoD evaluationhierarchy, from blocks to individual buildings, and then, ateach building, an evaluation for each of its constituent as-sets. The hierarchy guarantees that only the minimum num-ber of assets arrives to the final evaluation, discarding blocksand buildings early in the process to be replaces by texturesinstead of the original geometric assets. The assets that ar-rive at the final evaluation use specifically tailored LoD com-mands introduced in a graph rewriting preprocessing stage.

There is an important aspect of our LoD evaluation mech-anism that should be discussed: the distinction between a lo-

cal and a global visibility evaluation approaches. In a globalvisibility evaluation, besides viewing or semantic criteria,inter-building or inter-asset occlusion should be taken intoaccount. That is, if a given building blocks the visibility be-tween an asset and the viewpoint, then this asset should beimmediately set to minimum LoD. In a local approach, re-duction factors are computed without taking into accountthis issue. Computing global visibility is possible by doing apartial evaluation of all rulesets, constructing only the massmodels, saving the result, and then continue the ruleset eval-uation including the visibility verifications. After some pre-liminary studies, we decided to implement a local visibilityapproach because we have observed that computing theseocclusion factors, even taking into account the urban hierar-chy we created, requires a too large computational cost. Wethink this still is a promising avenue for improvement, butfurther research is needed to obtain a method that is bothversatile and fast to evaluate.

The presented LoD mechanism is specifically intendedfor its use in rendering. However, other computations, likeurban wind simulations or solar studies (e.g., for installingsolar panels) require a completely different LoD approach.In these cases, only some protruding structures might be ofinterest, like balconies or roofs, while others can be com-


34


Figure 6: The affectation region on city models for two different frames. The big red sphere encloses the affectation region (left

column), and it is centered at the viewpoint shown with the small sphere (right column).

pletely neglected for simulation purposes. Finding automaticLoD techniques that suit these diverse needs is still an openresearch area, where we do not know of any other previousresearch efforts.

Acknowledgements

Our city is based on the Urban Sprawl model from Daz3D(http://www.daz3d.com/). This work was partiallyfunded by the TIN2010-20590-C02-02 project from Minis-terio de Ciencia e Innovación, Spain.

References

[And05] ANDERS K.-H.: Level of detail generation of 3d build-ing groups by aggregation and typification. In In: Proceedings

of the XXII International Cartographic Conference, La Coruna

(2005). 2

[BP13] BESUIEVSKY G., PATOW G.: Customizable lod for pro-cedural architecture. Computer Graphics Forum, - (2013), ac-cepted for publication. 1, 2, 4

[CBZ∗08] CHANG R., BUTKIEWICZ T., ZIEMKIEWICZ C.,WARTELL Z., POLLARD N., RIBARSKY W.: Legible sim-plification of textured urban models. IEEE Comput. Graph.

Appl. 28 (May 2008), 27–36. URL: http://dl.acm.

org/citation.cfm?id=1373099.1373117, doi:10.1109/MCG.2008.56. 2

[CO11] CULLEN B., O’SULLIVAN C.: A caching approach toreal-time procedural generation of cities from gis data. Journal

of WSCG 19, 3 (2011), 119–126. 1

[Epi12] EPICGAMES: Unreal development kit (udk), 2012.http://udk.com. 2

[Esr12] ESRI: Cityengine, 2012.http://www.esril.com/software/cityengine/index.html. 2

[HWM∗10] HAEGLER S., WONKA P., MÜLLER S., VAN GOOL

L., MÜLLER P.: Grammar-based encoding of facades. Computer

Graphics Forum 29 (2010), 1479–1487. 1

[KK11] KRECKLAU L., KOBBELT L.: Realtime compositing ofprocedural facade textures on the gpu. In Invited Paper at 3D-

Arch 2011 (ISPRS) (2011). 1

[Kol09] KOLBE T. H.: Representing and exchanging 3d citymodels with citygml. In Proceedings of the 3rd International

Workshop on 3D Geo-Information, Lecture Notes in Geoinfor-

mation and Cartography (Seoul, Korea, 2009), Lee J., ZlatanovaS., (Eds.), Springer Verlag, p. 20. 2

[LE97] LUEBKE D., ERIKSON C.: View-dependent simplifica-tion of arbitrary polygonal environments. In Proceedings of the

24th annual conference on Computer graphics and interactive

techniques (New York, NY, USA, 1997), SIGGRAPH ’97, ACMPress/Addison-Wesley Publishing Co., pp. 199–208. URL:http://dx.doi.org/10.1145/258734.258847,doi:http://dx.doi.org/10.1145/258734.

258847. 3

[MWH∗06] MÜLLER P., WONKA P., HAEGLER S., ULMER A.,


35


VAN GOOL L.: Procedural modeling of buildings. ACM Trans.

Graph. 25, 3 (2006), 614–623. 2

[PB13] PATOW G., BESUIEVSKY G.: Challenges in ProceduralModeling of Buildings. In Eurographics Workshop on Urban

Data Modelling and Visualisation (Girona, Spain, 2013), TourreV., Besuievsky G., (Eds.), Eurographics Association, pp. 25–28. URL: http://diglib.eg.org/EG/DL/WS/UDMV/UDMV13/025-028.pdf, doi:10.2312/UDMV/UDMV13/025-028. 1

[PM01] PARISH Y. I. H., MÜLLER P.: Procedural modeling ofcities. In SIGGRAPH ’01: Proceedings of the 28th annual con-

ference on Computer graphics and interactive techniques (2001),pp. 301–308. 1, 2

[RCT∗06] RAU J.-Y., CHEN L.-C., TSAI F., HSIAO K.-H., HSU

W.-C.: Lod generation for 3d polyhedral building model. InAdvances in Image and Video Technology, Chang L.-W., LieW.-N., (Eds.), vol. 4319 of Lecture Notes in Computer Science.Springer Berlin / Heidelberg, 2006, pp. 44–53. URL: http://dx.doi.org/10.1007/119495345. 2

[Roc13] ROCKSTAR GAMES: Grand Theft Aufo game series,2013. http://www.rockstargames.com/grandtheftauto/. 1

[Sid12] SIDEFX: Houdini 12, 2012. http://www.sidefx.com. 6

[Ubi13] UBISOFT: Assassin’s Creed game series, 2013.http://www.assassinscreed.com/. 1

[VAW∗10] VANEGAS C. A., ALIAGA D. G., WONKA P.,MÜLLER P., WADDELL P., WATSON B.: Modelling the appear-ance and behaviour of urban spaces. Comput. Graph. Forum 29,1 (2010), 25–42. 1, 2, 4

[WAMV11] WONKA P., ALIAGA D., MÜLLER P., VANEGAS

C.: Modeling 3d urban spaces using procedural and simulation-based techniques. In ACM SIGGRAPH 2011 Courses (NewYork,NY, USA, 2011), SIGGRAPH ’11, ACM, pp. 9:1–9:261. URL:http://doi.acm.org/10.1145/2037636.2037645,doi:10.1145/2037636.2037645. 1

[WMV∗08] WATSON B., MÜLLER P., VERYOVKA O., FULLER

A., WONKA P., SEXTON C.: Procedural urban modeling in prac-tice. IEEE Computer Graphics and Applications 28 (2008), 18–26. 1, 2

[WWSR03] WONKA P., WIMMER M., SILLION F., RIBARSKY

W.: Instant architecture. ACM Transaction on Graphics 22, 3(July 2003), 669–677. Proceedings ACM SIGGRAPH 2003. 2

Figure 7: Four fully rendered frames integrating the full LoD

hierarchy in the complex city model.

submitted to CEIG - Spanish Computer Graphics Conference (2013)


36

Sesion 2

Measurement and Visualization

Morpho-Volumetric measurement tools for abdominal

distension diagnose

E. Monclús1, I. Muñoz1, I. Navazo1, P.-P. Vázquez1, A. Accarino2, E. Barba2, S. Quiroga3 and F. Azpiroz2

1ViRVIG and MOVING Group, UPC-BarcelonaTech, Barcelona2Digestive System Research Unit, University Hospital Vall d’Hebron,

Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (Ciberehd);

Department of Medicine, Universitat Autònoma de Barcelona3Servicio de Radiología, Hospital Universitario Valle Hebron, Barcelona

Abstract

Digestive bloating is a very common disease, that, though tightly linked to other better known functional diseases

such as functional dyspepsia and irritable intestine syndrome, often appears as isolated disfunction itself. Patients

refer episodes of abdominal pressure that are difficult to explain. Through the analysis of CT captures of the

patients, using a series of measuring tools developed ad-hoc, we have obtained a better comprehension of these

functional digestive diseases that have lead to a proper diagnosis and treatment of such patients. In this paper

we present the tools that have been developed to assist physicians in obtaining measures of different morpho-

volumetric parameters of the abdominal and pulmonary structures and how these are used in the clinical practice

to effectively diagnosing digestive bloating.

Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Picture/Image

Generation—I.2.1 [Applications and Expert Systems]: Medicine and science—

1. Introduction

Functional digestive diseases are defined by the presence of

chronic digestive symptoms [APA∗08]. Most of these symp-

toms were thought to be originated in the digestive tube.

Due to its low relative danger, little attention has been de-

voted onto them by the research community, although these

pathologies are very frequent and sensitively affect patients’

life quality. One of such problems is intestinal bloating. It is

known that around 30% of healthy people suffer this abdom-

inal distension and this percentage grows to 60-88% among

patients with other digestive diseases. Abdominal cavities

adapt to different levels of gas content [VAS∗08] but these

changes do not necessarily accompany the bloating percep-

tion. Initial studies were oriented to measure the amount of

gas in the intestine, since the bloating seemed to be orig-

inated by abdominal pressure that could be caused by an

excess in the amount of gas in the intestine. Commercial

systems do not permit accurate automatic or semi-automatic

gas measurements for these regions. Thus, an ad-hoc module

was developed for this objective (Abdometry). This tool has

been crucial to reveal that organic diseases generate a signif-

icant increase of gas in the gut, as predicted, but, contrary to

the predictions, functional disorders do not seem to produce

a relevant increase in the abdominal gas, while still generat-

ing the bloating perception [APA∗09]. As a consequence, a

second connected module was developed (Thoraxmetry) that

facilitates the acquisition of several parameters of the lungs

cavities with minimum physicians’ efforts. A second study

lead to conclude that, during abdominal distension, the di-

aphragm moves down, lungs’ vertical size increases, and an

abdominal protrusion is produced at the same time.

Both modules now form the diagnostic application. This

application is used nowadays by physicians as a regular ba-

sis in order to classify patients’ pathologies between organic

and functional, and therefore serve to guide the treatment

applied to them.

In this paper we present the main functionalities of the

application. Its objectives are:




39

Monclús Muñoz & & Barba & Accarino & Navazo & Vázquez / Measurement tools for abdominal distension

• Calculation of the amount of gas in the abdominal and

pulmonary cavities.

• The introduction of several (around 15) fiducial points.

• Measurement of several (around 20) morpho-volumetric

distances and volumes such as the length of the pulmonary

cavity in certain directions, or the distance from the col-

umn to the anterior wall.

These computations are performed either with a minimum

user intervention, and, where possible, fully automatically.

To provide these tools the application incorporates differ-

ent techniques of volume rendering, 3D segmentation, and

custom interaction. In order to adapt to the physicians usual

working flow, the system had the following requirements:

• Reduced or null preprocess: Physicians prefer to avoid

preprocesses, although this might imply extra user entry,

provided that it is kept moderate.

• Step by step measurements: Physicians feel more com-

fortable by validating each of the steps the software has

carried out. Therefore, the system must permit a progres-

sive evaluation of each of the measures obtained automat-

ically or semi-automatically.

• Undoable tasks: if the intermediate results are not satis-

factory, the user must me able to undo the last steps.

The application has been experimentally validated using

volunteers and synthetic models ( [VAS∗08]).

The rest of the paper is structured as follows: First, Sec-

tion 2 presents the diagnostic application and the protocol

used to diagnose patients. Then, we present the morpho-

volumetric parameters measured by the Thoraxmetry pack-

age and we provide more insight details on how these are

achieved in our tool in Section 3. Then, the same analysis is

performed for the Abdometry module in Section 4. Finally,

we conclude the paper with an outline of the results drawn

by the clinical studies.

2. Overview of the system

As commented in the introduction, the studies have demon-

strated that abdominal distension is produced by one of two

different diseases: functional and organic. Functional dis-

ease is produced by a change in the position of the di-

aphragm and may be corrected through a treatment consist-

ing on breathing exercises. Organic affection is caused by

a malfunction in the intestines whose origin may be from a

neuronal disorder to a hormonal disorder. The developed ap-

plication is used to determine the typology of the disorder

and it is also used as another source of information to help

the physicians to determine the diagnosis.

2.1. Protocol

After the validation of the system, its use has been incor-

porated to the regular diagnose sessions using the following

protocol (properly approved by the ethics committee of the

center):

• First, subjects are scanned during basal conditions. Typi-

cally, this happens in the morning, where patients do not

present distension symptoms.

• Then, a second CT scan is performed when subjects are

in an episode of maximum distension.

Distension perception is measured using a typical ques-

tionnaire of ranges 0 to 6.

The CT scans are then introduced to our system and the

different measurements are taken with the aid of the imple-

mented tools. Once the measurements are taken, the physi-

cian is able to diagnose which kind of affection has the sub-

ject: organic, or functional. According to the diagnose, dif-

ferent treatments are prescribed.

2.2. Validation

Before using the software in the clinical practice, it was val-

idated to ensure the measurements taken were trustworthy.

The system was tested by comparing measurements before

and after rectal administration of known volumes of air via

a Foley catheter on nine subjects. The balloon was inflated

with 5 mL of water to prevent air leakage. CT scans were

obtained during basal conditions and after infusion of either

100 mL (N = 6) or 400 mL of air (N = 3). These nine healthy

subjects were three women and six men with an age range

of 21−37 years.

In the validation studies, the accuracy of volume detec-

tion was calculated as the difference between the expected

volume (basal gas volume measured in the first scan plus

air volume infused) and the volume measured in the second

scan after rectal air infusion. As a result, it was found that

the system had an error of only up to 0.3± 0.5% of the to-

tal volume, which is precise enough for the diagnosis of the

patients [VAS∗08].

2.3. Architecture of the application

The architecture of the application is shown in Figure 1. As

introduced previously, the diagnosis is based on the differ-

ences in several parameters measured in two different con-

ditions: basal, and distension. The amount of such differ-

ences determines quantitatively the type of abdominal dis-

order. The patient is therefore scanned in both conditions

(see left images in Figure 1). Then, first, the pulmonary cav-

ity is studied using the CT images. In this process, the gas

interior to the body is properly identified and the amount ly-

ing inside the pulmonary cavity is classified. The remainder,

that is, the amount not belonging to the lungs, is labeled as

abdominal gas. This second set is further refined in the Ab-

dometry system, that provides tools to remove the gas out of

the intestines, such as the one that belongs to the stomach.

Once the pulmonary cavity is identified, the Interac-

tion phase starts, where physicians introduce some fiducial

markers (see Section 3). The system then computes several


40


Figure 1: Architecture of the system. This figure shows the overall framework. The patient is scanned in both conditions (basal

and distension). First of all, the medical doctor performs the pulmonary analysis using the Toraxmetry module. Next, the

abdominal analysis is carried out using the Abdometry module. Finally, by analysing both results, the diagnostic is deduced.

morpho-volumetric measurements that serve to determine,

if present, the functional disorder. The results of the thorax

analysis are passed to the abdominal module. With the previ-

ous gas classification and with the CT scans, the evaluation

continues. In this step, the gas is further classified to elim-

inate the one that does not belong to the abdominal cavity.

Then, a second set of fiducial points is introduced by the

physician. This allows the calculation of a second battery of

numerical measurements. The comparison of those numer-

ical values serves the team of physicians to determine the

final diagnosis.

Throughout the system development, the technicians and

the physicians worked together to define, not only the mea-

surements required, but also the levels of user intervention

that have to be present. Therefore, some tasks that could be

otherwise be achieved automatically, are broken into several

pashes in order to let the physician to validate the different

steps.

The final application borrows techniques from med-

ical visualisation [PB07] and uses some modules of

ITK [nSNC03], VTK [SML04]. It has been developed us-

ing Qt [The07] for the Graphical User Interface.

3. Thoraxmetry

The Thoraxmetry package obtains measurements by using as

input the set of around 360 CT-images of 512× 512 resolu-

tion, with a voxel size of 0.916×0.916×1.6 mm. These im-

ages are captured from the patients in each of the two condi-

tions: basal and distension. No previous patient preparation

(i. e. contrast) is required.

The process followed by the medical doctors is depicted in

Figure 2: The images are first loaded and the meshes for the

skin and the skeleton are extracted. The former is used to in-

troduce fiducial points and the later is used to determine the

condition (basal or distension) of the analysed study. These

extractions are automatically performed using a Marching

Cubes algorithm [LC87] based on predefined threshold (the

same capture scanner is always used). Then, the diagnosis

follows these steps:

1. Automatic gas detection. A region growing algorithm de-

termines the gas lying in the interior of the lungs. The

process searches for a seeding point in the top slices of

the model and then generates the tagged volume using a

flood fill algorithm [HGM09]. Then, the lung’s surface

is extracted by a Marching Cubes algorithm. In approxi-

mately the 90% of the cases this operation is fully auto-

matic. In some exceptions, the physician must intervene

(see Section 3.1).

2. Fiducial point markers: The medical doctors use the

skeleton to introduce some fiducial markers at relevant

point positions in the middle of vertebrae T4, T7 and T10.

These markers will later be used to automatically calcu-

late a series of numerical data, which at the end are used

to compare between basal and distension conditions. The

skeleton is therefore used as a reference because is one of


41


Figure 2: Workflow of the Thoraxmetry module. From the CT scans, we apply a Marching Cubes algorithm to extract the

skin and the skeleton. Then, physicians interact with the system to introduce several fiducial markers that will later be used to

calculate a number of measurements. The colour of the text indicates the degree of interactivity required: Green means that the

process is fully automatic, blue means that it requires a small user intervention, and in orange we encode the tasks that require

a higher level of user input.

the anatomical structures that remains invariable to both

conditions.

3. Relevant axial planes: From the fiducial markers, the sys-

tem automatically generates three axial planes that are

used for measurement.

4. Contour lungs calculation: This is performed automati-

cally for each of the CT-image on a relevant planes (see

Figure 3).

5. Distances measurements: On a selected plane, the sys-

tem calculates the antero-posterior and lateral diameters.

It also determines the distance from the fiducial marker

to the skin in the antero-posterior axis.

6. Height of the lung: This is achieved semi-automatically.

Initially, the physician places a point near to the top of

the right lung lobe. Our system refines the point to lie

correctly on the actual lung apex, and then computes the

lung’s height.

Figure 2 shows the different steps and the information re-

quired to perform the functionalities. Moreover, the different

tasks are colour-coded according to the amount of interac-

tion required: orange means that the package uses a high de-

gree of interaction, blue means that the module works semi-

automatically, and green indicates fully automatic processes.

As we said previously, the interaction degree might be re-

duced, but the physicians participating in this project pre-

ferred to have a step-by-step control on the process.

One of the most crucial steps in this stage is the determi-

nation of the volume of the gas in the lungs. We proceed to

explain how it is achieved in the following section.

Figure 3: Left shows the plane where the computations

are carried out, while the right images show how antero-

posterior and lateral diameters are evaluated.

3.1. Lungs Segmentation and Gas Volume Computation

One of the aspects that is crucial for the segmentation of

the lungs is the automatic classification of voxels contain-

ing gas. Gas and tissue are automatically separated using a

user defined threshold. Previous validation studies showed

that Tgas = −500HU (Hounsfield Units) provides the best

accuracy in detecting gas volumes infused into the gut with

an error within the ±40 mL range [APA∗08].

Sometimes the gas segmentation process determines a

connected volume that spreads away from the lungs (that

may be caused by some other patient’s affection or the ac-

curacy of the acquisition process), and the medical doctors

need to limit the amount of gas to be measured. This is


42


achieved with some extra edition tools. The first one is a

plane placer that is used to isolate gas regions that might

spread outside the lungs due to different regions. We also

adapted a 2D Eraser tool, common in many imaging com-

mercial packages, to erase parts of the gas in an image. This

tool allows the user to erase the air-connection between two

different anatomical region by changing the contents of the

DICOM image by painting the area of the air-connection

with a colour different to the colour assigned to air values.

This way, the propagation is prevented through these regions

(see Figure 4).

Figure 4: Lungs Cavity determination process. In order to

separate the lungs cavity from another anatomical structure

containing gas, a semiautomatic process is necessary to ex-

ecute to manually disconnect these structures.

3.1.1. Volume Gas Computation

Measurements of X-ray attenuation of a tissue are referenced

to the relative attenuation of water and air using Hounsfield

Units (HU). In this scale, the attenuation produced by pure

air is -1,000 HU and the attenuation produced by water is 0

HU. Hence, 1 HU is 0.1% of the attenuation of water with

respect to air. The attenuation of the tissues depends on their

composition. So, analyzing the density value (HU scale) of a

voxel in the CT images it is possible to compute the propor-

tion of gas inside it. Voxels of -1,000 HU are considered to

contain 100% gas. A voxel characterised by a mean density

value of −500 HU is considered to be composed of 50%

gas and 50% tissue. A voxel characterised by a mean CT

value of−200 HU is considered to be composed of 20% gas

and 80% tissue. Using this analysis and taking into account

the segmentation air content in the lungs (see Section 3.1

), it is possible to compute the volume of gas and tissue

present in the lungs [BRL∗03]. Combining the first equa-

tions in [BRL∗03], the following equations are derived:

Vgas =#pixels

∑i

[

CTi ≥ 0 ?0 :−CTi

1000

]

Vpixel (1)

=−Vpixel

1000

#pixels

∑i

min(0, CTi) (2)

Vtissue =#pixels

∑i

[

CTi ≥ 0 ?1 :

(

1+CTi

1000

)]

Vpixel

3.1.2. Relevant measurements

As already stated, throughout the development, several tools

were created to obtain a relatively large number of mea-

surements. In this section we will only highlight the mea-

surements that required special tools that are dependent on

the type of problem and data. All of them were added to

the volume renderer. First, we introduce the architecture of

this module, and then, the relevant details of some of these

morpho-volumetric measurements.

Physicians considered the following measurements as rel-

evant:

• Height of the right lobe of the lungs.

• Maximum length of the pulmonary cavity in the antero-

post direction on a given axial plane (see Figure 3-top

right).

• Maximum length of the pulmonary cavity in the lateral di-

rection on a given axial plane (see Figure 3-bottom right).

• Area and perimeter of the pulmonary cavity in a certain

axial plane.

• Total amount of gas in the pulmonary cavity.

Figure 5: Thoraxmetry module: At this point, the module

shows the relevant planes and some of the different measures

are available using the right menu.

Using the segmentation of the lungs, we can calculate au-

tomatically, or with a minimum user input, different mea-

sures in order to study the variability in morphology in both


43


conditions. As we have exposed before, some of the calcula-

tions will be located in some specific axial planes.

1. Lungs width in the sagittal axis. Assuming that the sagit-

tal axis corresponds to one of the main axes in the image.

The computation of the width, can be inferred calculating

the maximum length between two points belonging to the

isocontour.

2. Distance from the reference point of the axial plane to the

skin following the coronal axis.

3. Lobe height: This is computed semi-automatically. Once

the user has set a fiducial point lying on the surface of

the lung placed on the highest part (the part closer to

the trachea), the application searches two points P1 and

P2. These are points that belong to the pulmonary sur-

face, and that define a line parallel to the axis of the im-

ages. They must fulfill that: P1 is the highest point in the

lungs’ surface (pulmonary vertex) and P2 is a point with

is placed at the highest distance from P1 (this later point

is also called diaphragmatic cusp).

4. Abdometry

Similar to the Thoraxmetry module, Abdometry uses as in-

put a set of CT images in basal and distension conditions. It

also obtains the segmentation of the skin and the skeleton,

conveniently provided by the Thoraxmetry module. These

elements are further used to take measurements and set fidu-

cial points. Furthermore, the Thoraxmetry module also sup-

plies the segmentation of the gas content inside the body (the

gas identification inside the pulmonary cavity is well estab-

lished). Then, physicians proceed to refine the gas classifi-

cation and measuring different elements, as detailed in the

following sections.

The analysis of the data is depicted in Figure 6 and pro-

ceeds as follows: First, the gas in the abdominal cavity is

analysed and properly tagged (see Section 4.1). Next, the

volume of the abdominal cavity is measured.

From the medical point of view, the abdominal cavity is

the "Area between the chest and the hips that contains the

stomach, small intestine, large intestine, liver, gallbladder,

pancreas, and spleen” [hyp]. After different discussions with

the specialists, the abdominal cavity is determined as the one

limited by a subset of planes, as shown in Figure 7.

Like in the previous case, we first describe the process of

gas segmentation and measurement and then the most rele-

vant measurement tools are presented.

4.1. Gas volume measurement: Split and label Bubbles

As previously explained, the Thoraxmetry module first de-

tects the gas and classifies the one belonging to the pul-

monary cavity. The remainder of the gas is assumed as ab-

dominal at this moment. However, not all of this gas is im-

portant to analyse the abdominal cavity. Therefore, a la-

Figure 7: Delimiting planes that determine the abdominal

cavity region.

belling tool has been developed to aid physicians to perform

its fine classification.

In order to do so, the system visualises the remaining gas

through the use of a Transfer Function that shows the voxels

whose value is within a known range. From now on, the con-

nected elements in these renderings are called bubbles. Us-

ing this visualisation, the physician may remove the remain-

ing external bubbles and classify the valid ones according to

the anatomical region they belong to. From the valid bub-

bles, physicians may partially remove or split regions that

are not significant for measurements with a tool provided to

this end. Bubbles are also labeled according to the gut region

that holds them.

First of all, a semantic representation of all the bubbles

presented in the volume dataset is calculated. In order to cal-

culate it, a region-growing algorithm is applied sequentially

until all the bubbles are tagged. Due to sampling resolution

errors could be possible to find different anatomical bubbles

being together or be jointed with another anatomical struc-

ture as the stomach or the lungs. Moreover, despite there are

no connected bubbles due to the reason exposed above, it is

also necessary to split some bubbles in order to classify them

as belonging to different parts of the gut.

In this task, the user specifies the bubble to cut and two

points which define the place where to cut the bubble. These

points define a plane by taking the coordinates of both (on

the near plane) and defining a third one projected onto the

same pixel but maximum depth (Z = Z f ar). The plane is

defined in world coordinates.

The user may also define a set of planes by repeating the

previous process. This is useful because sometimes a bubble

must be divided by a shape that adapts to a certain anatom-

ical geometry (e. g. the bubble belongs to two different or-

gans).


44


Figure 6: Workflow of the Abdometry module. The Thoraxmetry module provides the information shared by the two modules.

First, the physicians analyze the segmentation of the gas content and refine it appropriately, in order to calculate the volume of

the gas inside the different anatomical parts the gut is divided. Then, physicians interact with the system to introduce several

fiducial markers that will later be used to calculate a number of measurements. Again, the colour of the text labelling the

operations indicates the degree of interactivity required: Green means that the process is fully automatic, while orange orange

we indicates the tasks that require a higher level of user input.

The result of cutting a bubble by the cutting planes can be

two or more bubbles, provided that the cutting plane actually

cuts the original bubble, depending on its topology. A flood-

fill traversal is then performed starting from the seed of the

bubble to cut. At each traversed pixel we check whether its

center lies in the positive space defined by the cutting plane

(or polygonal). If the result of the test is positive, the pixel

is labeled with a new label and a seed is established for each

new bubble at its first newly found pixel. Figure 8 shows the

appearance of the module bubble view and an example of

cutting a bubble.

From this point, the medical doctor proceeds to select sev-

eral relevant measurements.

4.2. Required measurements

For this module (see a screenshot in Figure 9), physicians

considered that the following measures should be computed:

• Total amount of gas in the gut and the volume of the ab-

dominal cavity (see Figure 10b).

• The antero-posterior abdominal diameter, measured at the

level of the iliac crests.

• The position of the diaphragm, measured as the distance

(in the vertical axis) between the left diaphragmatic dome

and the line connecting the ileal crests.

• Lumbar lordosis, measured in the sagittal plane as the an-

gle between lines tangential to the cranial end-plate of the

first and the cauded end-plate of the fifth lumbar verte-

bra [JFK∗96]

Figure 8: The Split and Label Bubbles view. The top image

shows the main visualization in order to select and label the

different bubbles. On the right side, the volume of the dif-

ferent labels is shown. On the bottom, the split process is

shown.

• The girth, measured by averaging the perimeter of the ab-

dominal surface measured in 10 axial slices 4mm apart,


45


starting tangentially to the iliac crest in the cephalad di-

rection (see Figure 10a).

Figure 9: Abdometry module. Once the anatomical struc-

tures of interest have been classified and the fiducial markers

have been introduced, the physician may perform the differ-

ent calculations using the right widgets.

4.3. Implementation

To determine the abdominal region, we compute three planes

from a set of fiducial points set by the physicians. These

planes are depicted in Figure 7. In this case, the relevant fidu-

cial points are:

• A: The apophysis of the first vertebra under the lungs and

the diaphragm.

• B: The apophysis of the vertebra next to A.

• C: The right cusp of the pelvis.

• D: The left cusp of the pelvis.

• E: It is a point on air between the union of the bones of

the pelvis in the most outgoing part of the front.

From these fiducial points, we define three planes:

• Plane Aab is the plane that passes through A and whose

normal is the line that goes from A to B. This is a cranial

plane that ideally should lie above the diaphragm, i.e. it

should contain the two lung lobes.

• Plane CDE: Caudad plane referenced to the pelvis–

concretely, it passes through markers C, D and E.

• Plane Ecd: Final bottom plane which passes through E

and is rotated around the free vector related to the line

going from Marker C to Marker D. The angle of rotation

depends on the subject and is specified by the user—the

goal is to obtain a “horizontal” cutting plane.

When Plane Aab cuts the diaphragm—in other words,

when portions of the lungs lie below Plane Aab—then a pair

of extra markers are needed. These markers will allow taking

into account the abdominal volume above the highest cutting

plane for a more accurate estimation of the total abdominal

volume.

• Marker Lu1: A half ellipsoid (the northern portion) whose

radius and height can be interactively modified by the

user, that is placed enclosing most of the lung lobe above

Plane Aab, and as tight to the lobe as possible.

• Marker Lu2: Similar to Marker Lu1 but for the other lung.

In order to get ride of the volume of the heart (plane Aab

can cut the lungs, as we said before) two more entities have

to be specified to mimic the heart. Heart A and Heart B

are two cylinders whose compounded volume is aimed at

mimicking the heart. The centers and radii of the cylinders

can be easily established considering the “interior” limits of

the lungs, in order to facilitate their position, the application

shows the axial slice and set the camera direction to be +Z.

Since the top of these cylinders is automatically set to be

the border (in the Z dimension) of the input image, only the

bases of the cylinders remain to be set. In order to facilitate

its position in the 3D scene, the application can show the

coronal cutting plane in order to match the abdominal limit

on the cylinder axis (i.e. the vertical line) setting the camera

position to +Y .

Once all of these entities have been established, the ap-

plication can calculate the abdominal volume counting all

the voxels belonging to body tissue which are in the positive

semispace for all the three planes and not belong to the heart

entities. For voxels above the plane Aab it has to be check if

the voxel is inside of any of the two ellipsoids.

(a) (b)

Figure 10: Two of the main measurements for the Abdom-

etry module: Left shows the different contours for the girth

calculation. Right shows the measurement of the total ab-

dominal volume, used to compare intra-individual changes.

This volume is measured as the body volume between two

planes minus the volume of the lungs (pulmonary air below

the cranial plane and the heart).

4.4. Other measurements

The system allows the physicians to define other relevant

fiducial points that are later used for more measurements.

These fiducial points include:

• L1: Point placed on the center of the first vertebra under

T12 in the front zone.


L1 in the front zone.


46








• S1: The upper outgoing of the first sacra vertebra where

is the union.

Since we have a binarization of the different anatomical

structures, and using the fiducial markers, we are able to

automatically calculate (without any cost) several measures

that are useful to the physicians:

• Girth: It is measured by averaging the perimeter of the

abdominal surface measured in 10 axial slices 4 mm apart,

starting tangentially to the iliac crest in the cephalad di-

rection. At each site, girth was measured as the length of

a polyline following the body contour.

• Lumbar Lordosis: The Lumbar Lordosis (LL) or Lum-

bar Lordotic Angle (LLA) is measured according to dif-

ferent ways in literature, we measure it as the angle from

the cephalad end plate of the first lumbar vertebra to the

caudal end plate of the fifth lumbar vertebra’. As we have

these fiducial markers, we can measure automatically, us-

ing an interactive spline to approximate the lordosis curve.

In this way, user can change interactively the shape of the

lordosis curve (depending on the anatomical shape of the

patient) (see Figure 12).

• Column to anterior wall distance: This measure is taken

at a set of different heights, derived from the morphol-

ogy of the column. Concretely, there are six samples, one

for the salient point of the bottom of vertebrae L1, L2,

L3, L4, L5 and the salient point of S1. Apart from giv-

ing the C2AWD measure for each one of those markers

(or heights), the system outputs a final measure being the

mean of those particular values. In order to compute it,

we can take advantage of the volume dataset generated

to calculate the abdominal volume and extracting the iso-

contour of the abdominal wall ("air" to "skin" transition).

Once the corresponding contours have been extracted we

have to calculate the maximum distance along the saggital

axis (see Figure 11).

Figure 11: Column to Anterior Wall Distance computation.

Figure 12: Lordosis computation. The left image shows the

way the lordosis is calculated in literature. Right image

shows our calculation. Users can modify the spline widget

in order to modify it in order to recompute the calculation.

5. Results and Conclusions

We have developed a system for the aid in the diagnosis of

patients with digestive bloating. The system comprises two

modules that facilitate the measurements of several relevant

morpho-volumetric parameters of the abdominal (Abdome-

try) and pulmonary cavities (Thoraxmetry).

As a result, the system is used to diagnose organic and

functional diseases that present abdominal distension as one

of its main symptoms. Organic affections are determined

with the analysis of the abdominal cavity with the Abdome-

trymodule, since such patients present a significant increase

of gas in the gut (247+ 39 ml vs 151+ 28 ml in basal con-

dition; p < 0.001, 35 patients), whose consequence is a di-

aphragmatic ascension and a thoracic expansion to compen-

sate it.

Functional distension is characterised by a relevant in-

crease of gas in the lungs (2.125+112 ml vs 1.658+88 ml

in basal condition; p< 0.001, 35 patients) that moves the di-

aphragm down, increases lungs’ vertical size, and produces

abdominal protrusion. The determination of such affection

was only possible thanks to the development of the Tho-

raxmetry module. Therefore, a group of patients with a pre-

viously unknown pathology (actually misclassified), named

functional distension, is now effectively diagnosed.

The system, has proven very useful for medical research

and has also been incorporated to the clinical practice.

Nowadays, the physicians use these tools to diagnose the

patients and guide their treatments. For instance, functional

patients are treated by teaching them breathing exercises. At

the moment of this writing, the Abdometry module had been

used in more than 150 patients. The overall system (the com-

bination of the two modules) had been used for the diagnos-

tic of 92 patients. The average time was 30 minutes for each

of the conditions to be studied (basal and distension).

For the time being, the application works under a demand

paradigm. The user has to issue the desired task: extraction

of the different structures, calculation of the different vol-


47


umes and the different measurements. In [ZHF∗04], they

proposed a automatic method for the identification of the

human torso region from CT images by separating it into 7

parts: skin, subcutaneous fat, muscle, bone, diaphragm, tho-

racic cavity and abdominal cavity based on CT number dis-

tribution and spatial relations between different organ and

tissue regions. Our system works in a similar workflow as

them but takes into account the singularities of the CT im-

ages due to we are dealing with patients with abdominal dis-

eases offering a repairing process for the abnormal cases. In

future, we will continue the development of the system by

increasing the automation in the tasks, when physicians feel

comfortable with them.

6. Acknowledgments

This work has been supported by the Spanish Ministry of

Economy and Competitive (project TIN2010-20590-C01-

01) and the Spanish Ministry of Education (Dirección Gen-

eral de Investigación, SAF 2009-07416). Ciberehd is funded

by the Instituto de Salud Carlos III.

The authors want to thank all the participants involved

in project. Specially, Frederic Perez for providing the ini-

tial software for this project. He was the scientist in charge

of developing the first version of the Abdometry software

module.

References

[APA∗08] ACCARINO A., PEREZ F., AZPIROZ F., QUIROGA S.,MALAGELADA J.: Intestinal gas and bloating: Effect of proki-netic stimulation. Gastroenterology 103 (2008), 2036–2042. 1,4

[APA∗09] ACCARINO A., PEREZ F., AZPIROZ F., QUIROGA S.,MALAGELADA J.: Abdominal distention results from caudo-ventral redistribution of contents. Gastroenterology 136 (2009),1544–1551. 1

[BRL∗03] BOUHEMAD B., RICHECOEUR J., LU Q., MAL-BOUISSON L., CLUZEL P., ROUBY J.-J.: Effects of ContrastMaterial on Computed Tomographic Measurements of Lung Vol-umes in Patients with Acute Lung Injury. Crit Care 7, 1 (2003),63–71. 5

[HGM09] HU, GROSSBERG, MAGERAS: Survey of recent volu-metric medical image segmentation techniques. Biomedical En-gineering (2009), 321–346. 3

[hyp] HYPERDICTIONARY.COM: Medical dictionary. www.

hyperdictionary.com. 6

[JFK∗96] JR P. D., FX K., KA M., LM A., M M., AS C.: Mea-surement of lumbar lordosis. evaluation of intraobserver, interob-server, and technique variability. Spine 13, 21 (1996), 1530–5. 7

[LC87] LORENSEN W., CLINE H. E.: Marching cubes: A highresolution 3d surface construction algorithm. SIGGRAPH Com-

put. Graph. 21, 4 (Aug. 1987), 163–169. 3

[nSNC03] NEZ L. I., SCHROEDER W., NG L., CATES J.: The

ITK Software Guide: The Insight Segmentation and Registration

Toolkit. Kitware, 2003. 3

[PB07] PREIM B., BARTZ D.: Visualization in Medicine: Theory,

Algorithms, and Applications. Elsevier, 2007. 3

[SML04] SCHROEDER W., MARTIN K., LORENSEN B.: The Vi-sualization Toolkit. Kitware, 2004. 3

[The07] THELIN: Foundations of Qt Development. Apress, 2007.3

[VAS∗08] VILLORIA A., AZPIROZ F., SOLDEVILLA A., PEREZF., MALAGELADA J.-R.: Abdominal accommodation: a coor-dinated adaptation of the abdominal wall to its content. Am J

Gastroenterol 103, 11 (2008), 2807–15. 1, 2

[ZHF∗04] ZHOU X., HARA T., FUJITA H., YOKOYAMA R.,KIRYU T., HOSHI H.: Automated segmentations of skin, soft-tissue, and skeleton, from torso ct images. In Medical Imag-

ing 2004 (2004), International Society for Optics and Photonics,pp. 1634–1639. 10


48


Extending neuron simulation visualizations with

haptic feedback

Laura Raya1 & Pablo Aguilar2 & Marcos García1 & Juan B. Hernado3

1 Grupo de Modelado y Realidad Virtual (GMRV), Universidad Rey Juan Carlos2 Universidad Politécnica de Madrid

3 Centro de Supercomputación y Visualización de Madrid (CeSViMa), Universidad Politécnica de Madrid

Abstract

Thanks to the advances in supercomputing and large-scale data acquisition technologies neuroscientists are adop-

ting simulation-based research as an alternative to wet laboratory experiments. In this scenario, the size and

complexity of the data produced by simulations of detailed cortical circuits has shifted the bottleneck from data

acquisition to the analysis phase. In this paper, we present a multimodal system to help in the exploratory visua-

lization of the data obtained from these simulations. Such a tool has to map simulation results onto a complex

and cluttered network topology, which poses challenges in both the rendering and interaction with the data. The

direct visualization of the circuits is plenty of clutter and occlusion, transparency and pruning can help reduce the

visual complexity, but they do not address other problems like interaction and navigation. During the exploration

of the circuit, selecting and exploring a specific path using just the visual modality is one of the most complex tasks

unless additional visual cues are included in the already saturated visual channel. Motivated by this challenge,

one of our main objectives is to incorporate path exploration haptic techniques. Tactile information helps during

fine-scale exploration constraining the motion to lie on connected parts of the topology. Additionally, our force

feedback system reinforces the saturated visual modality representing data using haptic metaphors.

Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Design Tools and Tech-niques/User Interface—D.2.2 [Graphics Utilities] I.3.4 Virtual device interfaces

1. Introducción

La conjunción de la creciente capacidad de computo asícomo el desarrollo de sistemas de adquisición de datos expe-rimentales a escala industrial ha posibilitado en los últimosaños que en disciplinas tradicionalmente experimentales lainvestigación basada en simulación comience a ser viable.Muchas disciplinas dentro del área de las Ciencias de la Vi-da están adoptando esta forma de experimentación, conocidacomo in silico como un pilar complementario a la teoría y laexperimentación tradicional.

En investigación basada en simulación es fundamentalque el experto del dominio sea capaz de inspeccionar inter-activamente los resultados de los experimentos virtuales (aser posible en tiempo real), para refinar los modelos y con-trastar hipótesis. Sin embargo, la capacidad de análisis no seha visto incrementada de la misma manera que la de produc-ción de datos, siendo el área de la Visualización una de lasque intenta dar soluciones a este cuello de botella.

En concreto en Neurociencia, los recientes avances en lastécnicas de adquisición de información cuantitativa han da-do lugar a la reconstrucción y síntesis de modelos realistasde porciones significativas de tejido neuronal [Mar06, LD-SO11] que pueden ser simulados y analizados con la ayudade supercomputadores. Muchos simuladores disponibles li-bremente (NEURON [CH06], Moose [Bha06], [KW11]) hanido mejorando sus métodos numéricos para escalar a super-computadores de paralelismo masivo, sin embargo, las he-rramientas de visualización y análisis que proporcionan sonmucho más básicas. La alternativa es llevar a cabo el análisiscon herramientas más genéricas, sin embargo, éstas no pro-porcionan técnicas específicas de dominio que permitan a loscientíficos centrarse en cuestiones científicas en vez de téc-nicas. Estas herramientas tampoco proporcionan técnicas devisualización adecuadas para tratar la complejidad geométri-ca existente en los circuitos modelados. Dicha complejidadinfluye no solo en la síntesis de imágenes, sino que tambiénen la efectividad de la interacción.


49

Laura Raya & Pablo Aguilar / Extending neuron simulation visualizations with haptic feedback

Este trabajo está enmarcado en el contexto del Blue BrainProject, donde actualmente se trabaja en la modelizaciónde una porción de la corteza cerebral de la rata joven. Paraposibilitar la visualización interactiva de los modelos se hadesarrollado tecnología de visualización de altas prestacio-nes [EBA∗12, HPS12, HBB∗13]. El objetivo principal hastaahora ha sido mejorar la eficiencia de la síntesis y la flexi-bilidad en la representación visual. No obstante, la cantidadde información visual puede ser elevada y la navegación yselección de elementos presenta problemas propios que losmétodos de interacción básicos no afrontan.

La interpretación de volúmenes de datos complejos y agran escala puede simplificarse incorporando informaciónmultisensorial. En este sentido, la incorporación del sen-tido del tacto en la visualización puede mejorar el rendi-miento del usuario y facilitar la interpretación de los da-tos [BOYBK90, vRBKB03]. En la simulación de columnascorticales la topología del circuito es un factor determinantede su comportamiento. Por este motivo, a la hora de analizarlos resultados de una simulación resulta igual de importan-te tanto una correcta visualización de los resultados comoque el científico sea capaz de entender la topología de la redneuronal sobre la que está trabajando. Las características delos circuitos neuronales, tanto por su elevado número de ele-mentos como por su estructura topológica y geométrica, difi-cultan el análisis visual de la topología del sistema. Métodosde navegación háptica restringida, navegación planificada ybúsqueda de caminos pueden facilitar la exploración de di-cha topología [ROG11a,BC11].

Figure 1: Imagen de 5 minicolumnas corticales compuestas

por 500 neuronas en total, aplicando claves visuales según

el valor del voltaje, utilizando la interfaz multimodal aquí

presentada.

En este trabajo, se presenta un sistema de visualizacióncaracterizado por dos pilares: un motor de visualización contécnicas específicas para el renderizado interactivo de cir-cuitos corticales; y un sistema háptico que permita la inter-acción, navegación e interpretación de los datos utilizandométodos de guiado háptico y metáforas hápticas. Se obtiene,de esta manera, una interfaz multimodal que puede facilitarla visualización y exploración de circuitos corticales neuro-nales.

Las principales ventajas que ofrece el uso de dispositivoshápticos en esta interfaz multimodal frente a otros tradicio-nales como el ratón pueden resumirse en tres: (i) el uso deun dispositivo tipo stylus combinado con nuestras técnicasde navegación ofrece una interfaz 3D intuitiva para navegarpor el entorno neuronal; (ii) la información táctil ayuda enla exploración topológica del circuito restringiendo la nave-gación del usuario a elementos conexos y guiándole a travésde ramificaciones complejas hasta distintos puntos de inte-rés [ROG11a](iii) el uso de metáforas hápticas para repre-sentar variables y valores descargando el canal visual.

2. Simulación de circuitos corticales

Las simulaciones visualizadas en este trabajo provienende un modelo sintético de una porción de la corteza de larata correspondiente a un cilindro perpendicular a la cortezaen torno a 0,5 mm de grosor y 1,5 mm de altura. La cons-trucción de dicho circuito se presenta en el trabajo de Hill etal. [HWR∗12]. La figura 1 muestra una visualización de unsubconjunto de neuronas de un circuito simulado.

Los esqueletos morfológicos de las neuronas son recons-trucciones manuales hechas por técnicos. A partir de estosesqueletos, se pueden obtener mallados poligonales de lamembrana celular [LHS∗12]. De media, los esqueletos mor-fológicos tienen 4.200 segmentos y los mallados 140.000triángulos. La figura 2 muestra una porción de un esqueletoy su mallado correspondiente. Los esqueletos se instancianen el espacio atendiendo a restricciones biológicas para for-mar un circuito biológicamente probable. Para un cilindrode 0,5 mm de radio son necesarias 10.000 neuronas, las cua-les se organizan en pequeños cilindros de alrededor de 100neuronas (conocidos como minicolumnas). Tras obtener loscontactos entre las células y convertir dichos contactos en si-napsis utilizando información estadística disponible, se pro-cede a simular el comportamiento eléctrico usando una ver-sión especializada del simulador NEURON [CH06,HES08].El resultado de las simulaciones es, por un lado, la secuen-cia de instantes de disparo de cada neurona y, por otro, lasvariables eléctricas asociadas a cada compartimento, princi-palmente, el potencial de membrana.

De cara a su visualización, la representación directa delcircuito presenta una complejidad visual que hace difícilsu comprensión. Las neuronas son objetos geométricamen-te complejos de detalles locales muy pequeños en compara-ción a su cierre convexo. Asimismo, un circuito cortical es


50


Figure 2: Esqueleto morfológico de una neurona (en rojo) y

mallado resultante (en gris).

una maraña densa donde las ramificaciones se solapan unascon otras abundantemente. La poda de ramificaciones y usode transparencia ayudan a reducir la complejidad visual pe-ro no simplifican la navegación e interacción con los datos.Debido a la densidad y tamaño de las ramas en compara-ción al circuito, la navegación y seguimiento de ramas vi-sualmente requiere gran atención e incluso resulta imposiblesi no es con ayuda de pistas visuales adicionales (colorea-do, estereoscopía, movimientos de cámara) o por medio deinteracción multimodal.

3. Trabajo relacionado

En computación científica, el incremento de la canti-dad y complejidad de los datos generados sobrepasa nues-tra habilidad para interpretar de manera rápida y sencillaeste tipo de información lo que ha motivado la apariciónde nuevas técnicas que permitan una mejor visualizaciónde datos científicos [Max05, BCE∗92]. Un reto mayor seha encontrado en diseñar métodos que permitan obteneruna representación visual intuitiva en entornos multivaria-bles [BH07,LJH03,DKS∗05].

Cuando se trata de analizar datos complejos es posibleque el sentido de la vista se vea saturado si se trata de pre-sentar mucha información a la vez [Rob00]. Con el objetivode disminuir este problema, las interfaces multimodales uti-lizan un mayor número de canales modales para representarlos diferentes tipos de información [LJ99]. La incorporacióndel sentido del tacto en la visualización de información pue-de aumentar el rendimiento del usuario y facilita la inter-pretación de los datos [BOYBK90, vRBKB03]. En el casoconcreto de Haptic Data Visualization (HDV) se trata no só-lo de usar el dispositivo háptico para aumentar el realismode un objeto virtual aportando características táctiles, sinode que el usuario sea capaz de entender datos que son re-presentados por el canal háptico para obtener valores de las

variables y ser capaz de obtener conclusiones a partir de esosdatos [PR09,Fra07,KTP05].

En el ámbito del presente artículo, existen diferentes he-rramientas que permiten una visualización interactiva de cir-cuitos neuronales modelados tales como NEURON [CH06],Whole Brain Catalog [LAM∗10], Neuroconstruct [GSS07],RTNEURON [HSMd08]. En muchos casos, estas herra-mientas son las mismas que las que se emplean en la cons-trucción y simulación de los circuitos y ninguna de ellas pro-porciona técnicas de interacción avanzadas y específicas deldominio.

Desde el punto de vista del análisis estructural, diversosautores [SEI10, OVVDW10] han presentado técnicas de vi-sualización para facilitar la compresión de la conectividadentre regiones del cerebro. Sin embargo, dichos trabajos secentran en el estudio de la materia blanca, la cual, al con-trario que los circuitos corticales empleados en este trabajo,tiene una apariencia mucho más organizada.

4. RTHNeuron

La aplicación en la que se basa este trabajo, RTNeuron, esuna aplicación basada en OpenSceneGraph [OB∗13] para lavisualización interactiva de simulaciones de circuitos neuro-nales especialmente diseñada para tratar con la complejidadgeométrica de los datos.

La visualización de los resultados de simulación consisteen traducir los valores de simulación a lo largo de las ramas aun color por cada vértice. La propagación del impulso eléc-trico a lo largo del axón también puede visualizarse comouna modulación del color/transparencia y grosor a lo largodel axón.

Para acelerar la generación de imágenes, RTNeuron im-plementa diversos niveles de detalle así como técnicas derenderizado paralelo con paralelización en espacio imagen(sort-first) y en espacio objeto (sort-last), así como algunascombinaciones de ambas, siendo Equalizer [EMP09] la he-rramienta escogida para este propósito.

Para simplificar y modular la complejidad visual, RT-Neuron implementa un algoritmo para renderizar geometríatransparente [BM08] y poda de las neuronas según el núme-ro de bifurcaciones de una rama desde el soma.

4.1. Renderizado Háptico e interacción

Las técnicas de síntesis de imágenes presentadas en elapartado anterior facilitan el análisis y la interpretación delos datos procedentes de simulaciones neuronales. Sin em-bargo, cuando el volumen de datos es elevado el canal visualse satura con relativa facilidad.

Por esta razón, se ha creado RTHNeuron, una herramien-ta multimodal que pretende simplificar el análisis de datosuniendo las técnicas de representación desarrolladas para


51


RTNeuron con nuevas técnicas hápticas (ver figura 3). Paraesta herramienta se han utilizado algoritmos de integraciónháptica con el objetivo de simplificar las siguientes tareas:

• Mejorar la comprensión de la topología neuronal. Paraello se presentan tres métodos de guiado háptico: nave-gación libre, navegación restringida y navegación pla-

nificada. Los tres métodos ayudan al usuario a seguir latopología de cada neurona restringiendo su movimiento,en función de la fuerza ejercida por el dispositivo háptico.

• Simplificar la representación visual. Se han desarrolladoalgoritmos que permiten representar datos mediante me-táforas hápticas descargando el canal visual.

• Proporcionar al usuario una interfaz 3D amigable para elanálisis neuronal. Los dispositivos hápticos permiten mo-ver, rotar y escalar el área de trabajo de forma natural.

El amplio número de filamentos neuronales interconecta-dos entre sí provoca que no sea fácil el análisis visual deuna estructura concreta. Los métodos de navegación hápti-ca incluidos en la presente herramienta pretenden facilitar

Figure 3: Imagen superior, navegación háptica a través de

20 neuronas. Imagen inferior, usuario probando la interfaz

multimodal.

dicha tarea. En el modo de navegación libre, ninguna fuer-za háptica es devuelta al usuario. Por esta razón, el usuariopuede moverse libremente a través de todos los datos con losgrados de libertad que le permita el dispositivo sin retroali-mentación de fuerza. Esto le facilita situar el área de trabajovirtual en el punto que desee analizar, así como observar laestructura en su conjunto sin restricciones de conectividadentre los diferentes filamentos. En este modo, el dispositivoháptico es sólo utilizado para controlar la cámara, permitien-do al usuario centrar la estructura neuronal desde el punto devista que desee mediante rotaciones, traslaciones y zooms es-calados centradas todas las transformaciones de la escena enla posición de la herramienta virtual.

Las traslaciones se llevan a cabo utilizando los tres gra-dos de libertad de posición del dispositivo háptico utilizado,el Phantom Omni. Por otro lado, el usuario puede rotar elentorno virtual (tomando como centro la posición del proxy)mediante los grados de libertad que proporciona dicho dis-positivo. Sin embargo, pruebas realizadas con usuarios in-dican que la rotación más intuitiva es la del eje de las y’s,por lo que la herramienta desactiva por defecto la rotaciónen el eje de las x’s. Por último, el zoom (centrado en la po-sición del proxy) podrá ser utilizado haciendo uso de los dosbotones del dispositivo (acercar y alejar). La velocidad delos tres movimientos puede ser configurada por el usuario.Además, con el objetivo de disminuir las limitaciones de lasdimensiones del espacio de trabajo del dispositivo háptico,se selecciona el área de trabajo utilizando [DLmB∗05].

El modo de navegación restringida limita el movimien-to del usuario a seguir el camino neuronal conectado, ofre-ciendo al usuario información topológica a través del canalháptico. Se representa hápticamente la estructura neuronalcomo un conjunto de nodos conectados a través de segmen-tos lineales. Se denota como h la posición no restringida deldispositivo háptico en el mundo virtual (conocido como pro-

be). Se denota como p la posición del proxy que minimizalocalmente la distancia a h, y que está restringida a caer so-bre la estructura neuronal. En cada iteración del bucle deactualización háptico, se calcula la nueva posición del proxyasí como una fuerza que se devuelve al usuario:

f= K(p−h), (1)

donde K es la constante de proporcionalidad que se ajustadependiendo de las características del usuario.

La posición restringida del proxy evita que el usuario cam-bie accidentalmente a otra neurona sin estar conectada, per-mitiendo así tener una mejor percepción de la posición decada neurona y consiguiendo facilitar la exploración topoló-gica de neuronas con múltiples ramificaciones [ROG11a].

El modo de navegación planificada permite una navega-ción semiautomática hacia un punto meta goal a través dela estructura. El dispositivo háptico ejerce una fuerza que


52


Figure 4: El usuario mueve el probe (h) en una linea recta.

El proxy (p) es restringido a caer en la estructura neuronal.

obliga al usuario a navegar hacia un determinado punto deinterés. Los puntos de interés pueden ser determinados ma-nualmente por el usuario o semiautomáticamente. El modomanual permite al usuario directamente especificar el pun-to meta con el dispositivo háptico. Desde la posición actualdel proxy, el usuario puede analizar topológicamente todo elcamino, a pesar de las posibles oclusiones visuales que ha-ya entre los dos puntos. El modo semiautomático permiteencontrar puntos de interés entre los nodos de la estructuraneuronal (soma, altas concentraciones de potasio, existenciao no de mielina, etc). Se utiliza un algoritmo de planifica-ción de caminos A* [DSSW09], con una función heurísticah′(wayPoint) para obtener el camino esperado. Esta funcióncalcula la distancia euclidea entre el nodo actual y el nodoobjetivo:

h′(wayPoint) = ||wayPoint−goal||

De esta manera se busca el camino más corto entre la po-sición actual y el punto de interés. La posición del proxy,en este caso, dependerá de los diferentes puntos que formanel camino planificado y no de la posición del probe. En cadaiteración del bucle háptico, el algoritmo de planificación cal-cula un nuevo nodo wayPoint a donde se desplaza el probe.La fuerza se calcula como f= K(proxy−h), de manera queobliga al usuario a ir por el camino planificado (ver Figura5). La velocidad de actualización de la posición del proxyen función del camino planificado está configurada por unparámetro de usuario, que podrá modificar según el tipo deanálisis que desee o sus características físicas. Sin embargo,al contrario que en el caso de la navegación restringida, nila velocidad del proxy ni el punto objetivo podrá ser modifi-cado durante la ejecución.

Con el objetivo de permitir una navegación planificadamás adaptada a las necesidades del usuario, se permite alusuario pausar, reanudar y detener la ejecución de este tipo

Figure 5: Marcado se encuentra el camino planificado por

el algoritmo hacia un determinado punto de interés. La po-

sición del proxy es automáticamente calculada a través de

los nodos que forman el camino.

de navegación en cualquier momento de la ejecución. Porotro lado, el usuario podrá permutar a cada uno de los tresmodos de navegación en cualquier momento durante el aná-lisis del circuito mediante eventos del teclado.

Por otro lado, como ya se ha comentado, el uso de múlti-ples claves visuales en sistemas multivariables puede saturarel canal visual. En nuestro sistema, diferentes iconos visua-les, colores y animaciones son ya utilizadas para identificardiversos valores. En el presente sistema, se propone el usode texturas hápticas para discriminar la proximidad al somao hacia otros puntos de interés. Según [ME03] las caracterís-ticas por las que mejor se pueden discriminar diferente textu-ras son, por orden, las diferencias de frecuencia, después deforma y, por último, de amplitud. Siguiendo este principio seha diseñado una textura sinusoidal y=Asin(ωx) (donde A esla amplitud, x es la posición a lo largo de una ramificación yω la frecuencia espacial de la textura) para diferenciar entreaxones, dendritas basales y dendritas apicales. A los axonesles corresponde una amplitud 0 (sin textura) mientras que alas dendritas se les da una amplitud fija. La diferencia entredendritas apicales y basales se hace usando diferentes fre-cuencias espaciales.

Por otro lado, como las estructuras neuronales son simu-ladas como estructuras filiformes o filamentos, no es posiblecalcular la textura háptica en dirección a la normal de la su-perficie, como ocurre habitualmente. Por esta razón, el vec-tor director de la textura es calculado como el vector unitarioentre la posición del proxy y del probe.

5. Resultados

En relación con el rendimiento del sistema, para obteneruna respuesta realista en el sentido del tacto, el bucle en-cargado de calcular la fuerza que debe devolver el dispositi-


53


vo debe ejecutarse a una frecuencia mucho mayor que parael canal visual. Una frecuencia de actualización de 30 Hzes suficiente para que exista sensación de continuidad en lapercepción visual. Sin embargo, el sentido del tacto necesi-ta 1.000 actualizaciones por segundo para que el usuario noperciba variaciones extrañas. Si bien es cierto que la nave-gación háptica puede otorgar un número elevado de venta-jas a la hora de facilitar la interpretación de los datos, comohemos visto en el apartado anterior, sus necesidades compu-tacionales pueden perturbar el rendimiento de la interfaz. Pa-ra evitar este problema, una práctica común es implementarla parte visual y la parte háptica en dos hilos de ejecucióndiferentes. Sin embargo, es imprescindible evaluar la cargade la comunicación entre ambos así como el rendimiento ensu conjunto.

Por otro lado, al trabajar de manera separada con la parteháptica, es necesario crear en tiempo de ejecución una es-cena háptica y otra visual. La escena háptica es más simpleque la visual. Sin embargo, generar dos escenas sobrecargatambién la inicialización del sistema.

Por estas dos razones y con el objetivo de evaluar la dife-rencia en rendimiento de RTHNeuron (interfaz multimodal)y RTNeuron, se han realizado una serie de pruebas. El sis-tema de prueba consta de una CPU Intel Core i5 3,2 GHz,4GB de RAM y una tarjeta Nvidia GeForce 210, con la ver-sión 311.06 del controlador y sistema operativo Windows 7Profesional de 64bits. El dispositivo háptico utilizado es unPhantom Omni de Sensable.

La escena de prueba consiste en diferentes subconjuntosde un circuito cortical sintético de 10.000 neuronas (cons-truido como se ha descrito en la sección 2). Para evaluar elrendimiento según el número de neuronas se han realizadopruebas con circuitos de 1, 10, 20, 50, 100, 200 y 500 neu-ronas de tamaño.

A continuación, se muestran dos tablas. La primera mues-tra los datos obtenidos con la herramienta RTHNeuron. Lasegunda tabla muestra los mismos resultados únicamentecon RTNeuron. La complejidad de la escena es igual en am-bos casos por lo que sólo se incluyen conteos de vértices ytriángulos en la primera. Del mismo modo el número de no-

Figure 6: En los segmentos que representan dendritas, el

usuario percibe una metáfora háptica indicando su tipo

Neuronas TiempoArranque (s) FPS (lejos/cerca)

1 0,25 60/6010 0,94 60/6020 1,37 60/6050 1,67 60/16100 16,46 30/6200 17,61 19/3500 35,68 9/2

Table 2: Resultados de las pruebas de rendimiento sobre

RTNeuron

dos y segmentos hace referencia a la escena háptica, por loque no aparecen en la segunda tabla.

Figure 7: Escenario de prueba con 500 neuronas

De los datos obtenidos se pueden sacar las siguientes con-clusiones:

El rendimiento en FPS de la nueva herramienta es muysimilar al de RTNeuron original. Esto indica que no hay pér-didas significativas de rendimiento en la interacción al añadirel hilo de ejecución háptica.

Se produce un incremento del tiempo de arranque de lainterfaz multimodal superior a 3.5 veces de media, comopresuponíamos. Sin embargo, este incremento de tiempo seproduce en el preproceso al inicio, lo que no limita la nave-gación en tiempo real posteriormente.

Por otra parte, dado que el renderizado sólo hace uso deun núcleo de manera intensiva y el proceso que gestiona laescena háptica ejecuta en su propio hilo, la navegación deuna sola neurona no presenta problemas de interactividad. Sise quisiera navegar en un circuito permitiendo saltos desdeuna neurona a otra a través de las sinapsis, se debería reducirel contexto del proceso háptico para evitar la saturación dela capacidad de proceso de ese hilo.

En relación con el estudio de facilidad de manejo y au-mento de productividad del análisis de estas estructuras gra-cias a la interfaz multimodal, si bien hasta ahora no se han


54


Neuronas Triángulos Vértices Nodos Segmentos TiempoArranque (ms) FPS (lejos/cerca)

1 71120 35562 2081 2116 0,60 60/6010 1298944 649492 39640 40142 2,62 60/6020 2731680 1365880 83934 85036 5,25 60/6050 6304192 3152196 195345 197726 10,90 60/16,5100 17690960 8845720 538807 544176 64,93 30/6,4200 34988992 17494962 1038714 1049769 102,03 19,4/3,3500 78149008 39075674 2322929 2347081 213,65 8,8/1,5

Table 1: Resultados de la ejecución de las pruebas sobre RTHNeuron

llevado a cabo pruebas específicas de evaluación con neu-rocientíficos, el sistema ha sido probado por un grupo pilo-to compuesto por cuatro sujetos que han determinado unamejora notable en la navegación, interacción y búsquedade puntos de interés en comparación con otras alternativasmono-modales [HSMd08,ROG11a]. Por otro lado, la inclu-sión de métodos de interacción hápticos básicos en el aná-lisis neuronal con escenas reales compuestas por un rangode 644 nodos hasta 33519 nodos, sí ha sido evaluada previa-mente tanto con sujetos informáticos como con neurocien-tíficos del Instituto Cajal de Madrid [ROG11b], obteniendoresultados satisfactorios de adaptación y utilidad.

6. Conclusiones

El análisis y la visualización de datos masivos y comple-jos supone un reto para los neurobiólogos, quienes precisande herramientas especializadas que le faciliten el trabajo. Lainterfaz multimodal, RTHNeuron, ofrece la posibilidad derenderizar, visualizar e interactuar con circuitos neuronalesfacilitando la compresión de este tipo de estructuras. La na-vegación guiada a través de las neuronas buscando puntos deinterés ocultos por oclusiones (debidas al elevado número deestructuras visualizadas), así como los diferentes tipos de vi-sualización, incluyendo visualización háptica, permiten queel número de variables a interpretar pueda ser aumentado enun futuro.

Actualmente, se está llevando a cabo el diseño de una se-rie de experimentos que permitan realizar una evaluación so-bre la percepción de los sujetos. En relación con el trabajofuturo, se están desarrollando nuevos tipos de claves visualesy metáforas hápticas que permitan categorizar diferentes va-riables tanto cuantitativa como cualitativamente de maneramás intuitiva y rápida.

Agradecimientos

Este tabajo ha sido parcialmente financiado por el Minis-terio de Ciencia e Innovación (TIN2010-21289-C02-01&02)y por el Cajal Blue Brain Project.

También agradecemos al Blue Brain Project por la cesiónde los modelos neuronales utilizados en este trabajo.

References

[BC11] BROWN C. L., CONDRON B.: Path2path: Hierarchicalpath-based analysis for neuron matching. In Biomedical Ima-

ging: From Nano to Macro, 2011 IEEE International Symposium

on (2011), pp. 996–999. 2

[BCE∗92] BRODLIE K., CARPENTER L., EARNSHAW R., GA-LLOP J., HUBBOLD R., MUMFORD A., C.D. OSLAND P. Q.:Scientific Visualization: Techniques and Applications. Springer-Verlag, 1992. 3

[BH07] BÜRGER R., HAUSER H.: Visualization of multi-variatescientific data. In EUROGRAPHICS 2007 (2007). 3

[Bha06] BHALLA U. S.: Multiscale Object-Oriented SimulationEnvironment (MOOSE), May 2006. 1

[BM08] BAVOIL L., MEYERS K.: Order Independent Transpa-

rency with Dual Depth Peeling. Tech. rep., NVIDIA Corporation,2008. 3

[BOYBK90] BROOKS F. J., OUH-YOUNG M., BATTER J., KIL-PATRICK P. J.: Project grope: Haptic displays for scientific visua-lization. In Proceedings of SIGGRAPH ’90. Computer Graphics

(1990), vol. 24, pp. 177–185. 2, 3

[CH06] CARNEVALE N. T., HINES M. L.: The NEURON Book.Cambridge University Press, 2006. 1, 2, 3

[DKS∗05] DENNIS B. M., KOCHERLAKOTA S., SAWANT A. P.,TATEOSIAN L., HEALEY C. G.: Designing a visualization fra-mework for multidimensional data. IEEE Computer Graphics

and Applications 25 (2005), 10–15. 3

[DLmB∗05] DOMINJON L., LECUYER A., MARIE BURKHARDT

J., ANDRADE-BARROSO G., RICHIR S.: The bubble techniqueinteracting with large virtual environments using haptic deviceswith limited workspace, presented at. In World Haptics Con-

ference (joint Eurohaptics Conference and Haptics Symposium)

(2005), pp. 639–640. 4

[DSSW09] DELLING D., SANDERS P., SCHULTES D., WAGNER

D.: Engineering route planning algorithms. Algorithmics of largeand complex networks. Springer. 28, 3 (July 2009), 117–139. 5

[EBA∗12] EILEMANN S., BILGILI A., ABDELLAH M., HER-NANDO J., MAKHINYA M., PAJAROLA R., SCHÜRMANN F.:Parallel rendering on hybrid multi-gpu clusters. In Proceedings

Eurographics Symposium on Parallel Graphics and Visualization

(2012), pp. 109–117. 2

[EMP09] EILEMANN S., MAKHINYA M., PAJAROLA R.: Equa-lizer: A scalable parallel rendering framework. IEEE Transac-

tions on Visualization and Computer Graphics (May/June 2009).3

[Fra07] FRANLIN K.: Non-visual data visualization: towards abetter design, 2007. 3


55


[GSS07] GLEESON P., STEUBER V., SILVER R. A.: neuroCons-truct: a tool for modeling networks of neurons in 3D space. Neu-ron 54, 2 (Apr. 2007), 219–235. 3

[HBB∗13] HERNANDO J. B., BIDDISCOMBE J., BOHARA B.,EILEMANN S., SCHÜRMANN F.: Practical parallel rendering ofdetailed neuron simulations. In Proceedings Eurographics Sym-

posium on Parallel Graphics and Visualization (2013). 2

[HES08] HINES M. L., EICHNER H., SCHÜRMANN F.: Fullyimplicit parallel simulation of single neurons. Journal of Compu-tational Neuroscience 25, 3 (aug 2008), 439–448. 2

[HPS12] HERNANDO J. B., PASTOR L., SCHÜRMANN F.: To-wards real-time visualization of detailed neural tissue models:view frustum culling for parallel rendering. In BioVis 2012: 2nd

IEEE Symposium on biological data visualization (2012). 2

[HSMd08] HERNANDO J. B., SCHÜRMANN F., MARKRAM H.,DE MIGUEL P.: RTNeuron, an application for interactive visua-lization of detailed cortical column simulations. XVIII Jornadasde Paralelismo, Spain (2008). 3, 7

[HWR∗12] HILL S. L., WANG Y., RIACHI I., SCHRMANN F.,MARKRAM H.: Statistical connectivity provides a sufficientfoundation for specific functional connectivity in neocorticalneural microcircuits. Proceding of the National Academy of

Science 109, 42 (09 2012), E2885–94. 2

[KTP05] KAHOL K., TRIPATHI P., PANCHANATAHAN S.: Tac-tile cueing in haptic visualization. In Proceedings of the AMC

Workshop on haptic visualization at ACM Computer Human

Conference (New York, 2005). 3

[KW11] KOZLOSKI J., WAGNER J.: An ultrascalable solution tolarge-scale neural tissue simulation. Frontiers in Neuroinforma-

tics 5, 15 (2011). 1

[LAM∗10] LARSON S., APREA C., MARTINEZ J., LITTLE

D., ASTAKHOV V., KIM H., ZASLAVSKY I., MARTONE M.,ELLISMAN M.: An open google earth for neuroinformatics: Thewhole brain catalog. In Neuroinformatics 2010 (Spotlight demo

presentation) (2010), Frontiers in Neuroscience. 3

[LDSO11] LANG S., DERCKSEN V. J., SAKMANN B., OEBER-LAENDER M.: Simulation of signal flow in 3D reconstructionsof an anatomically realistic neural network in rat vibrissal cortex.Neural Networks 24, 9 (2011), 998–1011. 1

[LHS∗12] LASSERRE S., HERNANDO J., SCHÜRMANN F.,DE MIGUEL ANASAGASTI P., ABOU-JAOUDÉ G., MARKRAM

H.: A neuron membrane mesh representation for visualization ofelectrophysiological simulations. IEEE Transactions on Visuali-

zation and Computer Graphics 18, 2 (2012), 214–227. 2

[LJ99] LAVIOLA J. J., JR.: MSVT: A virtual reality-based multi-modal scientific visualization tool, 1999. 3

[LJH03] LARAMEE R. S., JOBARD B., HAUSER H.: Image spa-ce based visualization of unsteady flow on surfaces. In In Pro-

ceedings IEEE Visualization ’03 (2003), IEEE Computer Society,pp. 131–138. 3

[Mar06] MARKRAM H.: The Blue Brain Project. Nature ReviewsNeuroscience 7, 2 (2006), 153–160. http://bluebrain.

epfl.ch. 1

[Max05] MAXN: Progress in scientific visualization. The Visual

Computer (2005), 979–984. 3

[ME03] MACLEAN K., ENRIQUEZ M.: Perceptual design of hap-tic icons. In In Proceedings of Eurohaptics (2003), pp. 351–363.5

[OB∗13] OSFIELD R., BURNS D., ET AL.: OpenSceneGraph.http://www.openscenegraph.org/, 2001-2013. 3

[OVVDW10] OTTEN R., VILANOVA A., VAN DE WETERING

H.: Illustrative white matter fiber bundles. Computer Graphics

Forum 29, 3 (2010), 1013–1022. 3

[PR09] PANEELS S., ROBERTS J. C.: Review of designs for hap-tic data visualization. IEEE Transaction on haptics 3, 2 (2009),119–137. 3

[Rob00] ROBERTS J. C.: Visualization display models and waysto classify visual representation. Computer Integrated Design

and Construction 2, 4 (Dec 2000), 1–10. 3

[ROG11a] RAYA L., OTADUY M., GARCÍA. M.: Haptic navi-gation along filiform neural structures. In IEEE - WORLD HAP-

TICS CONFERENCE 2011. Fourth Joint Eurohaptics conference

and IEEE Haptics Symposium. (2011). 2, 4, 7

[ROG11b] RAYA L., OTADUY M., GARCIA. M.: Neural dataexploration with force feedback. In IBERO-AMERICAN SYM-

POSIUM IN COMPUTER GRAPHICS SIACG 2011. (2011). 7

[SEI10] SVETACHOV P., EVERTS M. H., ISENBERG T.: DTI incontext: Illustrating brain fiber tracts in situ. Computer GraphicsForum 29, 3 (2010), 1023–1032. 3

[vRBKB03] VAN REIMERSDAHL T., BLEY F., KUHLEN T., BIS-CHOF C. H.: Haptic rendering techniques for the interactive ex-ploration of cfd datasets in virtual environments. In Proceeding

EGVE ’03 Proceedings of the workshop on Virtual environments

(2003). 2, 3


56

CEIG – Spanish Computer Graphics Conference (2013) M. Carmen Juan and Diego Borro (Editors)

Human-like Recognition of Straight Lines in Sketched Strokes

R. Plumed1, P. Company2 and P. Varley2

1Department of Mechanical Engineering and Construction, Universitat Jaume I, Castellón de la Plana, Spain 2Institute of New Imaging Technology, Universitat Jaume I, Castellón de la Plana, Spain

Abstract

In this study we consider approaches for recognising straight lines in sketches. We argue that the computer must

attempt to match human perception rather than arbitrary geometric criteria. We describe an experimental proce-

dure for comparing human and machine perception of straight lines, in order to determine which predictions from

automatic recognition of straight lines are “good” (match human perception) and which are “bad”.

We evaluate and compare two well-known computational approaches: chord length and Hough transform, and

conclude that both correlate moderately well with human perception of straight lines, but neither is good enough to

consider this a solved problem. We propose instead a Normalised Hough Transform (NHT), which reliably pro-

duces acceptable results. We identify tuning parameters which allow this algorithm to replicate the human ability

to accept and reject strokes.

We find that the NHT algorithm can produce reasonably good results with a single tuning parameter, but that by

resolving borderline cases with two tuneable criteria we can improve performance still further: rejecting border-

line cases with large oscillations and undulations helps to reject false positives, and the obliqueness of strokes also

has a slight but measurable influence on its perception as a straight line.

Categories and Subject Descriptors (according to ACM CCS): J.6 [Computer-Aided Engineering]: Computer-Aided Design, I.5.2 [Design Methodology]: Classifier design and evaluation, I.4.6 [Segmentation]: Edge and feature detection.

1. Introduction

In this paper, we revisit the problem of straight line recognition. Traditionally, the first step in analysing a sketch is vectorisation: the input is a stroke, and the output is a line. Once lines have been identified (e.g. as straight lines or circumference arcs) the Sketch-Based Modelling (SBM) process may continue. We note that with some in-put devices converting sketched input into discrete strokes is itself non-trivial; this is discussed elsewhere, e.g. [HT06] and [BCF*08].

Various approaches have been proposed for stroke clas-sification. Shpitalni and Lipson [SL97] apply linear least squares fitting to a conic section equation; the resulting ellipse or hyperbola is arbitrarily classified as a straight line if its aspect ratio exceeds 20:1. Qin [Qin05] proposes a method for classifying pen strokes based on adaptive thresholds and fuzzy knowledge with respect to curves' linearity and convexity. Zhang et al [ZSD*06] summarise older approaches, and propose a seeded segment growing algorithm for extracting graphical primitives from a stroke. They try to refine their control parameters by using rela-tionships between primitives. Their algorithm is reportedly reliable for detecting straight segments.

However, the main conclusion here is that, to the best of our knowledge, the thresholds used in the literature were

estimated by the authors without taking into account how well they correlate with human perception.

In this paper, we seek to identify those parameters which help humans to recognise a stroke as a straight line.

We revisit two proposed solutions: the simplest, which compares chord lengths [QWJ01], and the most popular, using the Hough transform [DH72]. Since our ideal of a reliable algorithm is one which "perceives" exactly as hu-mans do, we compare the two approaches with human in-terpretations of the same input data. Both approaches de-pend on tuning parameters, and we analyse the influence of tuning parameters on reliability. In contrast to previous attempts to tune these algorithms, we attempt to match human perception rather than a mathematical ideal.

We find that, although both correlate moderately well with human perception of straight lines, neither is good enough to consider this a solved problem.

We propose a new algorithm based on a modification to the Hough transform. Our results demonstrate that even with a single tuning parameter this fits better with human perception of straight lines. By applying more sophisticated criteria to resolve borderline cases, we can improve this performance still further.

Section 2 presents our test data: the human interpretation of strokes. Section 3 describes the three algorithms. Section 4 presents our results: how the algorithms interpret the


57

D. Fellner & S. Behnke / EG Word Author Guidelines

same input data, and how the algorithms can be tuned so that machine interpretations match human interpretations.

2. Human perception

As stated above, the main goal of this paper is to de-scribe an algorithm which replicates the way humans rec-ognise scribbled lines as depicting straight lines. One im-mediate difficulty is that human perception is influenced by different types of stimulus: by the drawing skill of the au-thor, and by the observer’s knowledge and capability in

interpreting drawings.

Another problem we have noted is that different sketch recognition algorithms are typically tuned manually to fit their implementers’ own drawing style and perception. Our

challenge is to find a more general method, based on what most humans perceive.

Figure 1: Example strokes (not to scale)

Our first step is to study and analyse human stroke recognition by performing experiments with groups of humans who are then interviewed to make their perceptions explicit.

Section 2.1 shows our input data. Sections 2.2, 2.3 and 2.4 describe the experiments performed using such ques-tionnaires.

2.1 Initial data

This section describes the test data which we presented both to humans and to computer algorithms. We collated 30 example sketched strokes which include strokes of dif-ferent nature and length (Figure 1). Each stroke is a list of sampled points, stored by their Cartesian x,y-coordinates.

Some of them depict horizontal and vertical lines (with differing degrees of accuracy), others are clearly slanted lines. Some of them have high values of curvature; others have little curvature.

Example Points Stroke length

Density (%) Speed

Slope (degree)

1 53 782 7 1.39 -1.08

2 60 311 19 0.57 -88.51

3 67 214 31 0.31 47.34

4 136 413 33 0.29 36.22

5 393 904 43 0.25 -30.82

6 257 767 33 0.29 -31.09

7 93 120 78 0.07 -37.28

8 120 187 64 0.12 38.06

9 48 60 80 0.07 -0.69

10 73 183 40 0.23 -51.13

11 387 1801 21 0.54 -32.66

12 110 375 29 0.34 -149.54

13 101 167 60 0.11 -0.54

14 80 260 31 0.32 -91.71

15 57 163 35 0.18 22.31

16 210 286 73 0.09 40.35

17 75 394 19 0.28 -41.99

18 172 410 42 0.20 -1.92

19 121 175 69 0.11 31.55

20 194 619 31 0.34 54.33

21 257 540 48 0.21 41.66

22 76 170 45 0.20 48.38

23 23 171 13 0.76 -115.46

24 15 366 4 2.26 -0.86

25 23 244 9 0.74 31.75

26 98 137 72 0.10 48.27

27 18 203 9 1.00 -45.27

28 198 262 76 0.10 94.50

29 99 176 56 0.15 -32.93

30 38 197 19 0.32 38.06

Table 1: Characteristics of example strokes.

Table 1 shows some characteristics of the examples. The columns list:

· Example number (as shown in Figure 1).

· The number of points (i.e. x,y-coordinate pairs).

· The bounding length of the stroke (distance between endpoints).

· The density of points defining the stroke (calculated as the ratio of number of points to stroke length)—if the stroke has significant variations in its path, these variations can remain hidden by a lack of density.

· Drawing speed, calculated as the ratio of length of the stroke to the time taken to draw it. A higher speed produces a lower density (as shows the negative cor-relation in Table 3).

· Slope of the line (in degrees) which best fits the data points (using the linear regression method explained in Section 3.3).

2.2 First experiment

The purpose of our first experiment is to determine which strokes are perceived by human beings as straight lines. We can guess that length, obliqueness, and drawing irregularities such as undulations, oscillations and high curvature ratio might influence human perception.


58


The examples listed in Figure 1 were distributed in three questionnaires with ten pictures each (Figure 2). A total of 97 questionnaires were returned.

Figure 2: Questionnaire for the first experiment.

Most of the subjects were undergraduate students of in-dustrial engineering or engineering design. Some subjects were academics from different technological areas. We also included a few subjects with no technical drawing training and a few subjects with no education beyond secondary level. Males and females were represented roughly equally.

The subjects were asked to classify the strokes of a ques-tionnaire as depicting: straight lines (Yes), not straight lines (No) or uncertain (?). The results are tabulated in Table 2 as percentage of subjects who perceived them as straight lines. For each example we also list its length (al-ready listed in Table 1, but reproduced here to ease com-parisons), the linearity (obtained from the chord length algorithm explained in Section 3.1), the obliqueness and the tolerance.

Example Yes (%)

No (%)

? (%)

Stroke length

Linearity (%)

Obliq. Tol. (%)

1 97 3 782 99.66 0.024 1.93

2 97 3 311 98.39 0.033 1.96

5 97 3 904 93.06 0.685 2.62

24 94 6 366 99.65 0.019 3.30

14 91 9 260 97.23 0.038 1.59

20 91 9 619 94.89 0.793 2.56

28 91 9 262 94.97 0.100 3.63

13 78 19 3 167 96.79 0.012 3.70

9 76 21 3 597 93.93 0.015 5.36

18 75 25 410 96.92 0.043 3.79

22 47 44 9 170 91.83 0.925 5.77

23 41 47 1 171 97.58 0.566 6.59

3 39 52 9 214 94.58 0.948 6.44

21 28 72 540 87.60 0.926 6.57

12 25 59 1 375 94.92 0.677 5.07

19 25 69 6 175 86.61 0.701 5.96

29 22 69 9 176 89.15 0.732 8.28

10 18 73 9 183 93.58 0.864 8.88

27 16 75 9 203 97.76 0.994 9.64

25 9 81 9 244 96.79 0.706 10.13

7 9 82 9 120 83.80 0.829 11.00

17 6 84 9 394 95.79 0.933 8.49

15 6 91 3 163 91.07 0.496 10.14

11 3 94 3 1801

94.58 0.726 6.46

30 94 6 197 94.49 0.846 10.81

4 97 3 413 92.24 0.805 13.16

6 97 3 767 93.49 0.691 7.42

8 97 3 187 79.45 0.846 14.09

16 100 286 82.06 0.897 8.55

26 100 137 85.23 0.927 16.52

Table 2: Results of the first experiment.

Obliqueness is a parameter which measures how slanted a stroke is (we cannot correlate slope directly because of its

non-linear behaviour). We define obliqueness as a normal-ised value in the range 0 (horizontal or vertical) to 1 (slope of 45 degrees). It is calculated from the slope data of the regression line fitted to the stroke and listed in Table 1 (it ranges between -180° and 180°) as follows:

if slope ϵ -180º ...0º, then slope ← 180 + slope;

if slope ϵ 90° .. 180°, then slope ←180-slope;

if slope ϵ 45° .. 90° then slope←90-slope;

Obliqueness← slope/45

Tolerance is well known concept in Geometric Dimen-sion and Tolerancing for measuring the “straightness” of a

line. Given the bounding box of the line, defining x-range as the length of the side nearly parallel to the line and y-range as that of the side nearly perpendicular to the line, the absolute tolerance of straightness is the absolute value y-range, and the relative tolerance of straightness is the ratio y-range/x-range. The lower these parameters are, the straighter the stroke is considered to be. These parameters do not distinguish whether the lack of straightness results from oscillations or undulations (higher or lower frequency respectively).

Figure 3: Regression line

The tolerance parameter measures the minimum bound-ing box which contains all the stroke points.

Stroke length

Density (%) Speed Lin (%) Obliq

Tol (rel)

Tol (abs) Yes

Stroke length

Pear 1 -.320 .187 .198 .018 -.274 .824** .053

Sig. .085 .323 .295 .926 .143 .000 .782

Densy (%)

Pear -.320 1 -.703** -.721** .044 .221 -.237 -.093

Sig. .085 .000 .000 .818 .240 .208 .627

Speed Pear .187 -.703** 1 .556** -.311 -.276 .018 .313

Sig. .323 .000 .001 .095 .140 .923 .092

Linea (%)

Pear .198 -.721** .556** 1 -.527** -.609** -.029 .552**

Sig. .295 .000 .001 .003 .000 .879 .002

Obliq Pear .018 .044 -.311 -.527** 1 .647** .316 -.760**

Sig. .926 .818 .095 .003 .000 .088 .000

Tol (rel)

Pear -.274 .221 -.276 -.609** .647** 1 .217 -.856**

Sig. .143 .240 .140 .000 .000 .25 .000

Tol (abs)

Pear .824** -.237 .018 -.029 .316 .217 1 -.446**

Sig. .000 .208 .923 .879 .088 .25 .013

Yes Pear .052 -.092 .314 .552** -.760** -.856** -.446** 1

Sig. .785 .628 .092 .002 .000 .000 .013

Table 3: Pearson correlation

Figure 3 shows an original stroke and the computed re-gression line rotated to a horizontal orientation.

We applied Pearson correlation analysis to those pa-rameters. Table 3 shows the results.

We find that relative tolerance correlates better with hu-man perception (YES) than does absolute tolerance. For


59


this reason, from now on we use only relative tolerance and we abbreviate it to tolerance.

Although Table 2 shows that no stroke was always per-ceived as a straight line, examples 1, 2, 5, 24, 14, 20 and 28 were considered straight lines by more than 90% of subjects. Hence, we can conclude that low tolerance gener-ally leads to high levels of perception of straightness.

Examples 13, 9 and 18 depict horizontal strokes with a medium tolerance due to a slight curvature and medium values of linearity, but even so, they were classified as straight lines by around 75% of the subjects. In contrast, example 12, which shows a relative medium tolerance but is oblique, was only perceived as straight line by around 25%.

The rest of the examples were considered as straight lines by less than 50% of the subjects. These strokes were drawn with different combinations of lengths and toleranc-es.

Examples 16 and 26 were invariably classified as not straight lines. These represent short strokes with noticeable high tolerances and low values of linearity. We can con-clude that the combination of these factors is enough on its own to cause a stroke to be interpreted as not a straight line.

This preliminary analysis confirms that length does not influence human perception at all.

The analysis suggests that obliqueness does indeed affect the human perception of straight lines. Proving this re-quires a specific experiment, which should avoid any pos-sible corruption in the sample due to the different abilities of drawer to draw lines with different slope. We discuss the result of this experiment in the next section.

2.3 Second experiment

Family 1 (example 10) Family 2 (example 20)



Figure 4: Strokes which define each family of lines

The goal in this section is to analyse and discuss the in-fluence of the stroke´s obliqueness on the human percep-tion of straightness. This general goal is specified in two hypotheses:

1. Slopes with no obliqueness (horizontal and vertical lines) are perceived in a similar way.

2. Vertical and horizontal directions are considered spe-cial directions which invoke in people the perception of straightness much more than do other values of oblique-ness.

We created a test set of 72 examples which include 6 dif-ferent families of stroke. Each family was generated by

rotating an original horizontal stroke. The horizontal strokes of each family are shown in Figure 4

Each family is characterised by parameters such as num-ber of points, length, linearity and tolerance. The values of these parameters for each family are shown in Table 4.

Id family Points Length %Linearity Tol.

Family 1 641 760,79 94,51 1,71 Family 2 193 743,60 97,63 4,67 Family 3 578 686,10 91,54 2,20 Family 4 111 658,28 97,91 5,67 Family 5 301 799,64 95,93 3,80 Family 6 433 771,42 93,21 3,35

Table 4: General parameters of each family.

Each original stroke was rotated so that the regression line which best fits the stroke was oriented at the angles listed in Table 5 (thus each family contains twelve strokes). Each example is labelled by the Id angle defined in Table 5 followed by the number of its family.

Idangle 1 2 3 4 5 6 7 8 9 10 11 12 Angles 0º 9º 27º 45º 54º 72º 90º 99º 117º 135º 144º 162º Obliq. 0 0.2 0.6 1 0.8 0.4 0 0.2 0.6 1 0.8 0.4

Table 5: Values of the angles used in the experiment

With regard to the questionnaires:

1. Each questionnaire contained twelve different exam-ples chosen randomly, without any repetition. Each exam-ple appears in two different questionnaires. At the end, we created twelve different questionnaires.

2. The answer form contained the instructions for the ex-periment. It also contained a Likert-type scale to score the “straightness” of each figure. Each subject scored each example with a value from 5 (the figure was perceived as a straight line) down to 1 (the figure was perceived as not a straight line).

Figure 5 shows an example of questionnaire and an an-swer form.

Figure 5: Example of questionnaire and answer form for the second experiment.

We collected a total of 144 answer forms, and obtained 24 perception data for each type of questionnaire and 48 perception data for each stroke (i.e combination of family and obliqueness).

Statistically, as we have the same number of data in each classification, the power of the method is maximised. In addition, our questionnaire design ensures that observa-


60


tions are independent. However, the number of samples was not enough for the requirements of each population’s

normal distribution and equality of variance to be satisfied.

To verify the first hypothesis, we used an ANOVA [HAT*98] taking as main factors the family identification and the slopes, using only the angles 0º and 90º, to obtain a 2x6 classification table. The ANOVA results are shown in Table 6.

Test of inter-subjects

Dependent variable: Straightness score

Origin Sum of Square type III df Mean F Sig.

Corrected model 205.927a 11 18.721 20.141 0.000

Intersection 2194.531 1 2194.531 2360.983 0.000

Idfamily 195.323 5 39.065 42.028 0.000

Slope 3.337 1 3.337 3.590 0.059

Idfamily * Slope 7.267 5 1.453 1.564 0.170

Error 256.542 276 0.929

Total 2657.000 288

Corrected total 462.469 287

a. R squared = 0.445 (R squared and corrected = 0.423)

Table 6: ANOVA results for first hypothesis

We deduce that whereas the groups defined by the factor Idfamily have perception score means considerably different (Sig = 0.000 < 0.05), the factor Slope does not have a sig-nificant effect over the perception score mean (Sig = 0.059 >0.05) and neither does the interaction Idfamily*Slope (Sig = 0.17 > 0.05).

Therefore, the first hypothesis has been confirmed: hori-zontal and vertical slopes are perceived similarly.

To test the second hypothesis we applied an ANOVA, taking as main factors the family identification and all six levels of obliqueness, to obtain a 6x6 classification table. The results are shown in Table 7.

Test of inter-subjects

Dependent variable: Straightness Score

Origin Sum of Square type III df Mean Squares

F Sig.

Corrected model 1141.298a 35 32.609 35.320 0.000

Intersection 13139.598 1 13139.598 14232.213 0.000

Idfamily 1055.621 5 211.124 228.680 0.000

Obliq. 47.385 5 9.477 10.265 0.000

Idfamily *Obliq 38.292 25 1.532 1.659 0.022

Error 1562.104 1692 0.923

Total 15843.000 1728

Corrected total 2703.402 1727

a. R squared = 0.422 (R squared and corrected = 0.410)

Table 7: ANOVA results for second hypothesis

The results suggest that both factors have a significant effect over the perception of straightness (in both cases Sig = 0.000 < 0.05). Therefore, the different families of lines are perceived as having different straightness, and the dif-ferent members of the same family are also perceived dif-ferently—both the quality and the direction of the line influence the perception of straightness.

In addition, the interaction factor Idfamily*Obliqueness shows a significant level of Sig = 0.022 (lower than 0.05), which means that even within the same family, the percep-tion of straightness differs according to the obliqueness.

However, the weight of the Obliqueness variable is no-where near as strong as that of the Idfamily variable, as the high F-test value shows in the case of Idfamily (F = 228.68) as opposed to the low value for Obliqueness (F = 10.265).

We can accept then that although obliqueness seems to affect the way people tend to perceive the straightness of a line, it should be considered as a secondary factor, not as important as those parameters which characterise the dif-ferent families. The model with only these two main factors would only explain 42.2% (value of R squared) of the vari-ations in the answers.

Straightness Score Tukey Ba,b

Obliqueness N

Subset

1 2 3

0,2 288 2.54 0,4 288 2.56 0,0 288 2.76 0,6 288 2.80 0,8 288 2.88 2.88 1,0 288 3.01 The table shows the means of the groups of homogeneous subsets. a. It uses the simple size of the harmonic mean = 288.000 b. Alpha = 0.05

Table 8: Post hoc analysis, homogeneous groups.

Table 8 shows the results of a Tukey post-hoc analysis [DV99] which groups the different levels of obliqueness according to the similarity of their scoring means. As we can see, strokes with low but non-zero value of obliqueness have the lowest mean; their scores are nearly 5% lower than the mean scores of the second subset (which includes strokes with zero obliqueness); in contrast, the third subset, which groups the highest levels of obliqueness, has mean scores around 3% higher than the second subset.

Thus, regarding our second hypothesis, we can conclude that obliqueness does indeed seem to influence the percep-tion of straightness, but, contrary to our expectations, peo-ple seem to be more sensitive to lack of straightness for lines close to horizontal or vertical. They are less demand-ing when the line depicts a slope around 45° or 135°.

3. Algorithms

In this section, we describe three algorithms for detecting straight lines from strokes. Our goal is to demonstrate the feasibility of defining perceptually-rooted parameters and their significance thresholds for such algorithms, taking into account the analysis made for human behaviour de-scribed in Section 2. In addition we shall compare the reli-ability of the algorithms.

In each case, our input data is a temporally-ordered list of sampled points captured in a single sequence: pen-down, pen-move, and pen-up.


61


3.1 Chord length

We study the chord length algorithm, based on the line-arity parameter used by Qin et al. [QWJ01] to identify straight lines from an input data polygon. We have chosen this algorithm because of its ease of implementation and its very low computational cost

Linearity of a stroke is the ratio of the distance between the two end points to the sum of the distances between consecutive points.

The value of linearity lies between 0 and 1. A strict straight line has a linearity of 1. True straight lines rarely occur in freehand sketches, so we need to determine a tol-erance in order to classify an input stroke. Our algorithm will identify a stroke as a straight line if its linearity is greater than a threshold. The explanation of how this threshold is set is in Section 4.

3.2 Hough Transform

The Standard Hough Transform (SHT) is an algorithm widely used to solve line detection problems in image pro-cessing and computer vision.

This was introduced by Hough in 1962, but all versions of the algorithm in use today are based on the Standard Hough Transform (SHT) of Duda and Hart [DH72]. Here, we study an adaptation of the SHT for rapid processing of a single pen input stroke [Lee06].

The algorithm represents a line as a linear equation in normal form, where the normal for a given line is the short-est segment between the line and the origin:

ρ = x cos θ+y sin θ (1)

In this expression, θ represents the angle of inclination of the normal and ρ is the length of the normal. With these parameters fixed, x and y represent the Cartesian coordi-nates of each point which belongs to this straight line.

Using the normal form, we can represent each point in (x,y) space as a sinusoidal curve in (θ,ρ) space. Applying

this procedure to every point, we obtain a set of sinusoidal curves.

All the sinusoidal curves which intersect at a particular point in (ρ, θ) space represent points which belong to the

same straight line. The algorithm proposed by Lee [Lee06] discretises (ρ, θ) space into a finite number of cells, using

an accumulator ρ-θ array where each cell is a counter

which is incremented whenever a sinusoidal curve passes through it.

In the matrix, θ takes values between 0° and 180° with

one cell per degree. The range of ρ is determined by dou-bling the length of the diagonal which frames the input stroke, and adding one to get an odd value:

ρ = 2 sqrt (xrange2+yrange

2) +1 (2)

The known advantage of the SHT algorithm is its ro-bustness. However, the algorithm also has weaknesses. First, as we work with freehand strokes, we need some flexibility to tolerate inaccuracies. Furthermore, the accura-

cy of the results depends on the stroke length: the longer the stroke, the higher the value of ρ, and the better the ac-curacy. Secondly, stroke inclination affects ρ, as the varia-tions of the parameters xrange and yrange (equation 2) change with varying inclination, so the same stroke is evaluated differently if its inclination changes. Finally, the algorithm requires more computation than the chord length algo-rithm, and increasing the precision requires more columns in the ρ-θ matrix, further increasing the computational cost.

In order to avoid these problems, we propose a modifica-tion to the Standard Hough Transform, the Normalised Hough Transform (NHT), where the difference is how the matrix parameters are defined. In the NHT, the discretisations of ρ and θ are fixed before running the algo-rithm, so the algorithm does not depend on the number of points in the stroke. In addition, the stroke is pre-rotated until we get its most likely horizontal direction (this pro-cess is described in Section 3.3). This allows us to deter-mine a fixed tolerance value, independent of the length, the slope and number of points in the stroke.

The size of the ρ-θ matrix remains to be determined (this

will be done in Section 4).

3.3 Stroke Pre-Rotation for NHT algorithm

In order to get the rotated stroke, the original stroke must be rotated so that it is (more or less) horizontal. But strokes are not straight lines, so determining the rotation angle is non-trivial: we must fit a straight line to the stroke data, and use its orientation as the rotation angle.

For this, we used orthogonal regression (OR), which minimises the sum of the squared orthogonal distances from the stroke data points to the fitting line. This is a natu-ral generalisation of the least-squares approximation when the data in both variables, x and y, is perturbed. Other methodology closely related with OR is the Principle Component Analysis (PCA). Both methods obtain the same fitting line. We used OR method because our input data ease its application.

In order to convert our orthogonal regression into a sim-ple linear regression problem, we adapt Brown’s idea of seeking the principle directions of the data points [Bro12], but in our case we rotate the entire set of n points about the centroid. The rotation angle θ, which rotates the regression

line so that its perpendicular corresponds to the vertical, is then calculated by minimising the sum of squares of the vertical heights of the n transformed data points.

4. Comparison between human perception and algo-

rithmic classification

In this section we analyse the influence of rotation and obliqueness on the ability of algorithms to mimic human perception.


62


4.1 Influence of stroke rotation

To measure the influence of stroke rotation in the per-formance of the algorithms, we chose two strokes (16 and 18) and rotated them to get three different variants (Figure 6). We evaluated the six resulting strokes with four algo-rithms (Chord Length, SHT, Unrotated NHT, Rotated NHT). Table 9 collates the results of how those strokes were evaluated by the different algorithms.

16 (a) 16 (b) 16 (c)

18 (a) 18 (b) 18 (c)

Figure 6: Strokes presented in different orientations.

As Table 9 shows, the Chord Length and the Rotated NHT algorithms can be considered robust, because (unlike the SHT and Unrotated NHT) they give a consistent value independent of the stroke orientation. Hence, we can dis-card Unrotated NHT, and all subsequent references to NHT are to Rotated NHT. SHT is also sensitive to rotation, but we shall not yet discard it, as we still wish to identify its other strengths and weaknesses.

Example Slope Linearity SHT NHT STIR%

STIR% No rotated Rotated

16.a 89º 95.93 10.30 50.83 50.83

16.b 64º 95.93 12.29 55.48 50.83

16.c 142º 95.93 14.62 53.82 50.83

18.a 28º 93.21 9.24 47.11 50.12

18.b 94º 93.21 9.70 45.73 50.12

18.c 123º 93.21 9.70 52.19 50.12

Table 9: Results of algorithms applied to Figure 6.

4.2 Tuning the algorithms

In order to find accuracy thresholds for the algorithms described in Section 3, we compare the output of each algorithm with the results of human perception of experi-ment 1. Table 10 shows our results. It is subdivided into three groups:

· examples considered as straight lines by human per-ception in more than 90% of cases.

· examples considered as straight lines in 50% to 90% of cases.

· examples considered as straight lines in less than 50% of cases.

Table 11 shows the Pearson correlations coefficients be-tween the output of the algorithms and the initial input parameters. Table 12 shows the correlation coefficients between the results obtained with each algorithm and the human perception results.

Example Chord length SHT NHT

Yes % Linearity % STIR % STIR % Obliq Tol*COB

1 97 99.66 28.30 88.68 0.024 1.89

2 97 98.39 30.00 58.33 0.033 1.92

5 97 93.06 10.69 61.83 0.685 2.46

24 94 99.65 46.67 60.00 0.019 3.23

14 91 97.23 33.75 67.50 0.038 1.56

20 91 94.89 12.89 71.13 0.793 2.35

28 91 94.97 28.79 66.16 0.100 3.60

13 78 96.79 54.46 67.33 0.012 3.62

9 76 93.93 52.08 52.08 0.015 5.24

18 75 96.92 19.19 53.49 0.043 3.72

22 47 91.83 27.63 38.16 0.925 5.21

23 41 97.58 26.09 47.83 0.566 6.34

3 39 94.58 22.39 38.81 0.948 5.80

21 28 87.6 11.67 49.42 0.926 5.93

12 25 94.92 17.27 43.63 0.677 4.76

19 25 86.61 22.31 39.67 0.701 5.57

29 22 89.15 24.24 46.46 0.732 7.69

10 18 93.58 27.40 46.58 0.864 8.06

27 16 97.76 33.33 50.00 0.994 8.68

25 9 96.79 26.09 47.83 0.706 9.45

7 9 83.8 24.73 35.48 0.828 10.04

17 6 95.79 18.67 26.32 0.933 9.15

15 6 91.07 19.30 40.00 0.496 8.28

11 3 94.58 6.20 51.68 0.726 6.00

30 0 94.49 23.68 37.50 0.846 11.98

4 0 79.45 18.33 43.19 0.805 6.80

6 0 92.24 17.65 29.17 0.691 13.19

8 0 93.49 12.84 32.86 0.846 7.78

16 0 85.23 22.45 39.80 0.897 14.94

26 0 82.06 14.76 36.84 0.927 9.75

Table 10: Algorithm results against human perception

4.2.1 Chord Length

Linearity shows results obtained using the chord length algorithm. Qin et al [QWJ01] used a threshold of 95%, a value in accordance with the middle group of examples. With this value, we get one false negative (example 5) and six false positives (examples 13, 18, 23, 27, 25 and 17).

We note a high negative correlation (-0.721) between the Linearity and the Density (see Table 11), which suggests that Density is an influence in false results. Examples 17, 23, 25 and 27 have very low densities, and all depict lines with high curvature. It appears that piecewise linear ap-proximation of a scribbled line input is unreliable in these cases—the chord length algorithm will classify smooth curved strokes as straight lines even when the curvature is high enough for humans not to perceive them as straight lines.

Stroke length has no effect on the results of the chord length algorithm. So, in this aspect, the algorithm behaves as humans do.


63


Stroke length Density Obliq. Tol.

SHT column Linearity

SHT STIR

NHT STIR

Stroke length

Pear 1 -.320 .018 -.274 .999** .198 -.514** .256 Sig. .085 .926 .143 .000 .295 .004 .172

Density Pear -.320 1 .043 .221 -.347 -.721** .062 -.212 Sig. .085 .820 .240 .060 .000 .744 .261

Obliq. Pear .018 .043 1 .647** -.010 -.527** -.607** -.674**

Sig. .926 .820 .000 .959 .003 .000 .000

Tol. Pear -.274 .221 .647** 1 -.295 -.609** -.240 -.714**

Sig. .143 .240 .000 .114 .000 .202 .000

SHT colum

Pear .999** -.347 -.010 -.295 1 .236 -.490** .280

Sig. .000 .060 .959 .114 .209 .006 .133

Linearity Pear .198 -.721** -.527** -.609** .236 1 .370* .547**

Sig. .295 .000 .003 .000 .209 .044 .002

SHT STIR

Pear -.514** .062 -.607** -.240 -.490** .370* 1 .335

Sig. .004 .744 .000 .202 .006 .044 .070

NHT STIR

Pear .256 -.212 -.674** -.714** .280 .547** .335 1

Sig. .172 .261 .000 .000 .133 .002 .070

*. Correlation is significant at 0.05 level (bilateral). **. Correlation is significant at 0.01 level (bilateral). Sample size is N=30

Table 11: Pearson correlation between output and input parameters.

Table 12 shows specifically the correlations between the different approaches and the results of human perception (YES). In the case of linearity the correlation is positive but insufficiently strong.

Linearity SHT NHT Yes

Linearity Pearson 1 .370* .547** .552**

Sig. .044 .002 .002

SHT Pearson .370* 1 .335 .444*

Sig. .044 .070 .014

NHT Pearson .547** .335 1 .827**

Sig. .002 .070 .000

Yes Pearson .552** .444* .827** 1

Sig. .002 .014 .000

*. Correlation is significant at 0.05 level (bilateral). **. Correlation is significant at 0.01 level (bilateral).

Table 12: Pearson correlation between output and per-ception of experiment 1.

We conclude that although linearity parameter is easy to calculate using the chord length algorithm, it produces occasional false positives and negatives and, even worse, systematic false positives for smooth curves of low density.

4.2.2 Standard Hough Transform

The results of the Standard Hough Transform algorithm are shown in the SHT column of Table 10. The parameter shown is the signal-to-input ratio (STIR), i.e. the number of sinusoidal curves which intersect at a particular cell of the ρ-θ matrix.

It is not obvious how to find an appropriate threshold from these results.

It appears that the false results depend on the stroke length (and the derived parameter SHT columns in Table 1). Examples 5 and 20 have very high values of SHT col-umns, so the SHT algorithm is stricter than humans when classifying long strokes as lines. The false positive exam-ples have low values of SHT columns, making the SHT

algorithm less strict than humans when classifying short strokes as lines.

Table 11 shows that there is no correlation between the result of Signal-to-Input ratio of SHT and the tolerance.

Table 12 shows that, overall, SHT has a weak correlation with human straight line perception, and if we also take into account the fact that the algorithm evaluation depends on the stroke direction, we can conclude that this algorithm does not allow us to obtain a good estimation of stroke straightness.

4.2.3 Normalised Hough Transform

First, as noted in Section 3.2, the Normalised Hough Transform requires additional tuning parameters: the num-ber of rows and columns of the ρ-θ matrix. To this end, we analysed the signal-to-input ratio for each example in ex-periment 1 with several versions of the algorithm:

- We analysed the influence of the number of rows (i.e. discretisation of θ which corresponds to fidelity

in rotation/inclination), by varying this parameter from 91 to 361 in steps of 30.

- We analysed the influence of the number of col-umns (i.e. discretisation of ρ which corresponds to

fidelity in stroke length), by varying this parameter from 31 to 1199 in steps of 4.

We obtained the STIR for each stroke example and each combination of parameters, and compared them with the percentage of human perception as a straight line (see Ta-ble 10).

We find that human beings are more decisive than algo-rithms when evaluating good quality or bad quality strokes. The NHT algorithm frequently finds some residual straightness when humans completely reject a bad stroke, and when the size of the accumulator matrix cells are small enough, the algorithm will find some imperfections even when most of human beings ignore them and perceive the stroke as a good straight line.

As a consequence, we assume that discrepancy between humans and the algorithm is higher for very good and very bad quality strokes than it is for average quality ones. This means that the tuning parameters which fit the NHT algo-rithm to human perception are different for the three rang-es. Since the threshold which distinguishes between strokes representing straight and not straight lines clearly belongs in the intermediate range, we removed the good and bad strokes and concentrated our further analysis on average strokes.

The best matches between NHT STIR and human per-ception occur when threshold values are between 5% and 95%.

Considering only those examples within this range, we summarised the results of each combination (each particu-lar pair of number of rows and number of columns) as a single parameter: the absolute differences between the re-sults as calculated by NHT algorithm and as perceived by humans (%YES). Figures 7 and 8 respectively show how


64


this parameter varies with respect to different numbers of rows and columns. The lower the difference, the better STIR correlates to human perception. Therefore, the mini-mum function value gives us the best choices for rows and columns.

Figure 7: Influence of number of rows.

Figure 8: Influence of number of columns.

Analysing Figures 7 and 8, we chose a 180x143 matrix size, which minimizes the difference between STIR and %YES (the minimum values is 423.94). In subsequent experiments with NHT, we used 180x143 ρ-θ matrix

The results of the STIR of the Normalised Hough Trans-form algorithm are shown in the NHT column in Table 10. As Table 12 shows, this algorithm has a high (0.827) and significant (at 0.01 level) correlation with human percep-tion. Using a threshold of 52%, there are no false nega-tives, and no false positives.

However, there are two borderline examples (9 and 18) which are close to being false negatives, with NHT STIR of 52.08 and 53.49 respectively.

There are also two examples (11 and 27) for which the STIR value is very close to the threshold value. These are characterised by having a high value of tolerance (6.46 and 9.64 respectively). According to Table 3, tolerance and positive results of human perception (YES) maintain a high negative correlation (-0.858) at 0.01 level. If we modify the algorithm by adding the extra condition that the tolerance must be lower than a certain threshold value for us to con-sider a borderline stroke as a straight line, all of the doubt-ful cases can easily be resolved.

Finally, it seems that obliqueness influences perception of strokes as straight lines, so perhaps obliqueness should

also be considered when determining the threshold value for tolerance.

4.2.4 Replicating obliqueness distortion in straightness

perception

As a conclusion of our second experiment (Section 2.3), we stated that the orientation of the line influences the human perception of straightness. We want our algorithm to replicate this. Hence, we define a variable tolerance threshold which becomes stricter for orientations easily perceived by humans, and relaxes for orientations poorly perceived by humans.

In the light of the ANOVA analysis results (Table 7) the Obliqueness factor could be considered a secondary factor as its weight (F-test value) is roughly 20 times lower than the weight of the factor which defines the type of stroke (NHT STIR). For this reason, we defined a Coefficient of Obliqueness (COB) which affects the threshold value de-pending on the stroke’s obliqueness.

According to Table 8, people seem to be strictest when obliqueness is around 0.2 (strokes close to the horizontal or vertical directions), and least strict when obliqueness reaches 1 (slope of 45º). From Table 8, the perceptual vari-ation between the maximum value (3.01 for obliqueness of 0.8) and the minimum one (2.54 when obliqueness is 0.2) is 9.4%, so we suggest use of one coefficient which will affect the threshold value by up to 10% of its value.

In order to model our Coefficient of Obliqueness so that it behaves similarly to humans, we use a sinusoidal func-tion where the input variable (x) is the stroke’s obliqueness

(Figure 9). The maximum value (at obliqueness 0.2) is 1.0, and the minimum value (at obliqueness 1.0) is 0.9. This function has been chosen for its characteristic of continuity between two extreme values, and its ease to be adapted to the behaviour we seek.

COB = sin ((x+0.125) (4π/3))/20 + 0.95 (3)

Where the frequency of the wave is defined as 2/3, the peak deviation of the function from the average value will be 5%, which means that the amplitude must be 1/20, and the average value to which the wave oscillates is 0.95. In order to get the maximum value of deviation when obliqueness is 0.2, we introduced a phase lag in the wave of 0.125.

Figure 9: Coefficient of Obliqueness.


65


We apply the COB only to the NHT tolerance parameter. We do not apply it to the NHT primary threshold since NHT is a robust algorithm and the influence of the stroke’s

direction has been removed by pre-rotation to the horizon-tal. We do not apply it to the Chord Length algorithm be-cause this only depends on chord and edge lengths which do not depend on the orientation of the stroke. And we do not apply it to the SHT because this already depends on the orientation of the stroke.

The results of tolerance for the first experiment taking into account the varying threshold are shown in the column “Tol*COB” of Table 10.

As can be seen in Table 10, all examples perceived as straight lines by more than 75% have a tol*COB maximum value of 5.3. Therefore our preferred algorithm includes a secondary condition such that a stroke must have a NHT STIR higher than 52% and also a tol*COB parameter lower than 5.3 to be classified as a straight line.

5. Conclusions

In this study we propose that recognition of straight lines should match human perception, rather than the intentions of the designer who produced the sketch.

We compare two algorithms for detecting straight lines: the simplest (chord length) and the most popular (Hough transform). We conclude that neither can be easily tuned so that machine interpretations replicate human interpreta-tions. Instead, we propose a new algorithm based on a modification of the Hough Transform which matches hu-man interpretations acceptably well.

Chord length (when tuned with a linearity threshold of 95%) has a reasonable correlation with human perception, but some false positives and negatives still appear. More worryingly, systematic false positives occur for smooth curves of high curvature (lines with undulations and with-out oscillations). We conclude that this approach should be avoided, as it ignores important perceptual assumptions.

SHT correlates poorly with human perception, and even finding an appropriate threshold for signal-to-input ratio (STIR) value is problematic. Small variations in STIR threshold produce very different classification results. In-stead, we should use an approach which is less sensitive to the analysed parameters.

The proposed Rotated NHT algorithm solves these prob-lems. With a STIR threshold of 52%, the algorithm repli-cates the way humans accept and reject strokes as lines in all cases tested in our experiments. However, some exam-ples have STIR values near the defined threshold, so we include an additional condition to discriminate such doubt-ful cases: the tolerance threshold must be lower than 5.3 in order to prevent false positives.

We have also determined that the obliqueness of strokes appears to have a slight influence on the perception of a straight line. For this reason, we vary the threshold of tol-erance by up to 10% depending on the stroke’s oblique-

ness, using the Coefficient of Obliqueness described in section 4.2.4.

Acknowledgements

This work was partially funded by financial support from the Ramon y Cajal Scholarship Programme and by the "Pla de Promoció de la Investigació de la Universitat Jaume I", project P1 1B2010-01.

References

[HT06] HILAIRE X., TOMBRE K.: Robust and accurate vectorization of line drawings, IEEE Trans. Pattern Analysis and Machine Intelligence, 28 (6), (2006) 890-904.

[BCF*08] BARTOLO A., CAMILLERI K. P., FABRI S. G., BORG J. C.: Line tracking algorithm for scrib-bled drawings.,3rd Int. Symposium on Com-munications, Control and Signal Processing, (2008), pp 554-559.

[SL97] SHPITALNI S., LIPSON H.: Classification of sketch strokes and corner detection using conic sections and adaptive clustering. Trans. ASME J. Mech. Design, 119 (2), (1997) 131–135.

[Qin05] QIN S. F.: Intelligent Classification of Sketch Strokes. IEEE EUROCON2005 “Computer

as a Tool”, (2005) 1374-1377. [ZSD*06] ZHANG X., SONG J., DAI G., LYU M.R.: Ex-

traction of line segments and circular arcs from freehand strokes based on segmental homogeneity features. IEEE Trans. Systems, Man, and Cybernetics, 36 (2), (April 2006)

[QWJ01] QIN S. F., WRIGHT D. K., JORDANOV I. N.: On-Line Segmentation of Freehand Sketches by Knowledge-Based Nonlinear Threshold-ing Operations. Journal. Pattern Recognition, 34 (10), (2001) 1885-1893.

[DH72] DUDA R. O., HART P.E.: Use of the Hough Transformation to Detect Lines and Curves in Pictures. Communication of the ACM, 15, (1), (1972) 11-15.

[HAT*98] HAIR J. F., ANDERSON R. E., TATHAM R. L., BLACK W.: Multivariate Data Analysis, Fifth edition. Prentice Hall International Inc., 1998.

[DV99] DEAN A., VOSS D.: Design and Analysis of Experiments, Springer-Verlag New York, 1999, pp. 78-85

[Lee06] LEE K.: Application of the Hough Transform, Unpublished Research Paper No. 2006-005, University of Massachusetts Lowell, Dept. of Computer Science, Lowell, MA 01854, (2006).

[Bro12] BROWN K.: Perpendicular Regression of a Line. in MathPages. www.mathpages.com/home/kmath110.htm Accessed, February 2012.


66

Measuring Surface Roughness on Cultural Heritage 3D

models

L. López1, J.C. Torres1, G. Arroyo1

1Lab. Realidad Virtual, Universidad de Granada, Spain

Abstract

Surface roughness is an important feature of cultural heritage models. it’s a measure of the surface texture and

it’s related to the erosion and restoration processes.

Although measuring roughness is a common task for industrial applications, there no standard method for quan-

tifying it in cultural heritage applications. This paper introduces a new definition of surface roughness suitable

for this field and proposes a method to compute it from 3D digital models using a Cultural Heritage Information

System.

Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Picture/ImageGeneration—Line and curve generation

1. Introducción

Entendemos la rugosidad de una superficie como el relie-ve percibido al desplazarnos sobre la superficie. De un modomas formal, podemos entender la rugosidad como una medi-da de las diferencias de altura de la superficie. La rugosidaddepende del material, de la forma en que se ha mecanizadoy de los procesos físicos y químicos que han tenido lugar ensu superficie.

En conservación de patrimonio, la rugosidad de la superfi-cie es un parámetro relevante al analizar elementos, pudien-do servir para determinar la procedencia o naturaleza de losmateriales, su estado de conservación y las técnicas usadaspara elaborarlos.

Por otra parte, la rugosidad está influenciada por el esta-do de conservación, y a su vez influye en la vulnerabilidadfrente a agentes externos. La Figura 1 muestra la fotografíade dos leones del Patio de los Leones de la Alhambra y enella puede observarse la erosión en las zonas de la frente de-bido a la fractura producida por las heladas. El proceso dedeterioro se acelera con la rugosidad y con la horizontalidadde la superficie, y a su vez produce, como efecto, un aumentode la rugosidad.

Van Griegen ha analizado el proceso de deterioro de losmonumentos de piedra, identificando la rugosidad como unode los parámetros que están directamente relacionados con el

Figura 1: Fuente del Patio de los Leones de la Alhambra(Alaskan Duke).

depósito de agentes contaminantes [SBT∗05], recomendan-do realizar un seguimiento de la rugosidad de la superficiepara controlar el proceso de deterioro.

Tiano define el concepto de bioreceptividad, como la ca-pacidad del elemento para acoger flora y fauna, que son tam-bién agentes responsables de su degradación [Tia02], iden-tificando la rugosidad como el factor mas importante queinfluye en la bioreceptividad de la piedra.

En ambos casos, degradación por agentes químicos y or-gánicos, la rugosidad es un factor determinante. Por otra par-




67

L. López, J.C. Torres, G. Arroyo / Roughness on Cultural Heritage

te, el proceso de deterioro producido por estos agentes, y porlos agentes meteorológicos, contribuye a aumentar la rugo-sidad de la superficie, cerrando el círculo.

Por tanto, la rugosidad es simultáneamente un síntoma deldeterioro así como un factor que lo acelera. Por este motivo,es esencial poder medir y analizar la rugosidad en monu-mentos. Actualmente, dada la tendencia a digitalizar los ele-mentos patrimoniales, tiene sentido tratar de medir el nivelde rugosidad sobre el modelo digital generado, por ejemplo,con un escáner láser. Esto permite obtener información de-tallada de todo un monumento, así como realizar controlesperiódicos, cubriendo un área mayor que la abarcable conaparatos de medida específicos.

En este trabajo presentamos un método de estimación derugosidad para modelos digitales de elementos patrimonia-les. Las principales contribuciones del mismo son:

La propuesta de un modelo de cálculo de rugosidad paraaplicaciones de patrimonio cultural.Un algoritmo para calcular la rugosidad usando el modeloanterior para mallas poligonales indexadas.La descripción y evaluación de la implementación enCHISel [TLRS12] del método propuesto.

2. Definición de rugosidad

Se han propuesto muchas formas de medir la rugosidad,la mayor parte de ellas la determinan a partir de la diferenciade alturas de la superficie [ACGI02]. De entre estas, la me-dida más usada en aplicaciones industriales es la rugosidadRa, que se define sobre un corte lineal en la superficie comola media de las desviaciones en crestas y valles respecto a latrayectoria recta en ese corte. La trayectoria recta se determi-na realizando un ajuste por mínimos cuadrados [GKM∗02].

Ra =1n

n

∑i=1

|Yi| (1)

Esta forma de definir la rugosidad facilita su medida.Existen aparatos industriales que permiten medir la rugosi-dad: rugosímetros. Los rugosímetros miden directamente larugosidad Ra usando medios mecánicos (ver Figura 2). Másrecientemente se han desarrollado dispositivos ópticos quepermiten medir la rugosidad sobre un área.

Se han definido estándares de medida de rugosidad paraaplicaciones industriales, tratando de normalizar el relieveproducido por las marcas de mecanizado. La norma DIN-4766-1 define doce niveles de rugosidad. Los niveles 1 al4 corresponden a superficies especulares con una rugosidadmenor que 0,4µm. Los niveles 5 y 6 corresponden a superfi-cies con marcas de mecanizado que no se aprecian a simplevista, pero se pueden ver con lupa (rugosidades entre 0,4µmy 0,8µm). Los siguientes niveles corresponden a rugosidades

Figura 2: Esquema del funcionamiento de un rugosímetromecánico.

apreciables con la vista (0,8µm y 3,2µm). Por encima de estevalor las rugosidades se aprecian con el tacto.

La rugosidad Ra es útil para caracterizar piezas mecáni-cas planas. En el caso de superficies curvas, el valor dadopor ésta se encontrará influenciado por la curvatura. En estetrabajo estamos interesados en la caracterización de la rugo-sidad para patrimonio cultural, es decir, sobre la superficiede esculturas, fachadas u otros elementos tangibles.

La rugosidad se ha utilizado también para caracterizar te-rrenos [Bur01]. En este tipo de aplicaciones, la rugosidad secalcula usando un modelo de elevación (representación ras-ter de la altura del terreno). Existen diversas medidas paradefinir este tipo de rugosidad. La más usada se basa en la de-terminación de las distancias de las celdas vecinas al planotangente al terreno en el punto considerado.

La rugosidad también se ha usado en Informática Gráfi-ca para caracterizar mallas poligonales [KG00]. Ha permiti-do medir cuantitativamente el resultado de simplificación demallas o de utilización de marcas de agua en modelos 3D.En este contexto, la rugosidad se ha definido clásicamente apartir de los ángulos del diedro formado por la malla en cadacara. A partir de este valor, Corsini propuso una medida derugosidad asociada a los vértices [CGE05] y calculada comouna media ponderada de los valores en las caras adyacentes.

Entre otros problemas, este tipo de medidas es sensiblea la topología de la malla. Para resolverlo, Loeuve propusomedidas derivadas de la curvatura [Lav09]. Concretamente,calcula la rugosidad como la diferencia entre la curvaturalocal y la media de las curvaturas de los vecinos.

Sin embargo, ni la propuesta de Corsini ni la mejora deLoueve se adaptan al problema planteado, pues necesitamosque las medidas realizadas tengan un significado físico ysean independientes de la topología de la malla. Por otraparte, el método propuesto se ha integrado en CHISel, unsistema de información para patrimonio cultural en el cualel modelo se encuentra indexado usando un octree, por lo


68


que la información de rugosidad debe calcularse a nivel delvoxel y no de vértice o de triángulo.

Por tanto, basándonos en el concepto mecánico de rugosi-dad detallado en lineas anteriores, la calcularemos a partir dela distancia de la superficie real a la superficie ideal (sin ru-gosidad). Esto nos permite generar una medida de rugosidadcon significado físico, que vendrá dada en mm o µm.

Asimismo, para independizar nuestra medida de la curva-tura, definimos la rugosidad como la media de las distanciasde la superficie a la esfera tangente al punto, en un entornode este. De esta forma la rugosidad de una esfera es cero yla rugosidad en una superficie plana es la misma que la dadapor la medida Ra.

Conviene constatar que, cuando se utiliza un rugosímetromecánico, el resultado medido no está afectado por la rugo-sidad debido a que el usuario desplaza el aparato sobre lasuperficie, y no sobre el plano tangente.

Esta medida es fácilmente implementable en un modeloindexado mediante una retícula regular, generando un valorde rugosidad en cada celda. Obviamente, la medida realizadaestará condicionada por el nivel de detalle de la discretiza-ción.

En las secciones siguientes se detalla el algoritmo de es-timación de rugosidad propuesto. En primer lugar, comen-zaremos estudiando cómo determinar la curvatura, ya quenuestro concepto de rugosidad está basado en ella.

3. Cálculo de curvatura

CHISel se sirve de octrees para indexar espacialmente lageometría de los modelos 3D escaneados que utiliza comoentrada. Cada celda de dicho octree almacena múltiples pro-piedades geométricas, entre las que se encuentran: (1) lasnormales, entendidas como la media de las normales asocia-das a los vértices incluidos en la celda; (2) los centroides,obtenidos como el promedio de los centroides de los trián-gulos que intersectan dicha celda. Asimismo, CHISel realizalas operaciones necesarias para establecer las relaciones to-pológicas de vecindad entre las distintas celdas.

Con esta estructura se realiza una discretización de la su-perficie (tal y como se hace para representar mapas raster).El proceso se ilustra de forma gráfica en las Figuras 3 y 4.La Figura 3 representa la sección de un objeto. El espacioocupado por el objeto se divide en celdas cúbicas, que en lasección mostrada por la Figura 3 se corresponden con cua-drados. De entre todas estas celdas, nos quedamos con aque-llas que son cruzadas por la superficie.

Las capas de información utilizadas por CHISel son es-tructuras donde cada valor de la misma está asociado conuna determinada celda. Para establecer esta correspondenciaentre las celdas de la superficie del objeto y los valores dela capa, se numeran las celdas cruzadas por la superficie (verFigura 4). De esta forma, las capas de información se pueden

Figura 3: Esquema del proceso de división en celdas de lasuperficie del objeto.

Figura 4: Asignación de identificadores a las celdas para laasociación de capas de información a la superficie.

almacenar como una secuencia numerada de valores, dondecada valor está asociado con la celda que le corresponde se-gún su posición en la secuencia. Gracias a esta disposición,es posible tanto encontrar el valor asociado a un punto dela superficie como localizar las zonas de la superficie quetienen asociado un determinado valor.

Esta subdivisión en celdas se utiliza también para indexarlos elementos geométricos del modelo (vértices y triángu-los), estableciendo de este modo una correspondencia entrelos elementos geométricos y las propiedades a través del ín-dice espacial constituido por las celdas (ver Figura 5).

Esta estructura permite además establecer corresponden-cias entre capas. En todas las capas de información asociadascon un mismo modelo, los valores que ocupan la misma po-


69


Figura 5: Asociación entre geometría y propiedades a travésdel índice espacial.

Figura 6: La correspondencia entre valores asociados a lamisma celda en distintas capas permite realizar operacionesentre capas.

sición se corresponden con la misma celda (ver Figura 6).Esto permite realizar operaciones entre capas. Es posible al-macenar cualquier tipo de información en las capas, inclu-yendo registros de una base de datos. En este caso, la estruc-tura lineal de la capa almacena la llave primaria del registro,lo que permite compartir un registro entre varias celdas.

Haciendo uso de las mencionadas estructuras y consultan-do las relaciones de vecindad, el cálculo de la curvatura serealiza aplicando el siguiente algoritmo:

Para cada celda c del modelo 3D:

1. Obtener las celdas vecinas de c, que se corresponderáncon aquellas celdas cuyos centroides se encuentren a unadistancia dmenor que la especificada por el usuario comoentrada del algoritmo.

2. Para cada celda vecina v:

a) Calcular el ángulo existente entre las normales de las

Figura 7: Esquema bidimensional de la indexación espacialde la geometría realizada por CHISel. Cada celda dispone denormales y centroides, entre otras propiedades geométricas.

celdas c y v:

a= normal (v)×normal (c)

y= normal (v) · (a×normal (c))

x= normal (v) ·normal (c)

α = arctan2(y,x)

(2)

b) Calcular el radio de la esfera tangente a los centroidesde las celdas c y v (Figura 8):

radio(v) =d2

sin α2

(3)

c) Calcular el peso en función de su área y su distancia ac:

peso(v) =area(v)

maxArea×

1,0d

(4)

donde maxArea es el área de mayor tamaño contenidaen una celda.

!/2

d/2

radio(v)

centr(v) centr(c)

o

Figura 8: Cálculo del radio de la esfera tangente a los cen-troides de dos celdas vecinas.


70


Figura 9: Escultura en terracota policromada del ángel de unbelén perteneciente a la escuela siciliana (siglo XVIII).

3. Calcular la media ponderada de los radios obtenidos:

radio=n

∑i=1

peso(i)radio(i)

peso(i)(5)

donde n es el número de vecinos encontrados.4. La curvatura se corresponderá con la inversa de esta últi-

ma media:

curvatura(c) =1,0

radio(6)

Para ejemplificar la utilidad de este cálculo se ha emplea-do el modelo 3D escaneado que aparece en la Figura 9. Setrata de una escultura en terracota policromada pertenecientea la escuela siciliana del siglo XVIII que ha sido parcialmen-te restaurada. En la Figura 10 se comparan los resultados ob-tenidos sobre las alas del angel y en ella se pueden apreciarclaramente las diferencias existentes entre la curvatura delala de derecha, sobre la que se ha realizado un proceso derestauración, y el ala izquierda, que permanece intacta.

4. Cálculo de rugosidad

El cálculo del la rugosidad está estrechamente relaciona-do con la curvatura pues, al igual que en el cálculo de éstaultima, se ha de obtener la media ponderada de los radiosde las distintas circunferencias tangentes. Por esta razón, só-lo se reiterarán los pasos comunes entre ambos algoritmos,detallando las ecuaciones únicamente en aquellos pasos es-pecíficos de la rugosidad.

Para cada celda c del modelo 3D:

Figura 10: Comparación entre las curvaturas del ala derecha,restaurada, y el ala izquierda, sin tratar.

1. Obtener las celdas vecinas de c, que se corresponderáncon aquellas celdas cuyos centroides se encuentren a unadistancia dmenor que la especificada por el usuario comoentrada del algoritmo.


a) Calcular el ángulo existente entre las normales de lasceldas c y v (Ecuación 2).

b) Calcular el radio de la esfera tangente a los centroidesde las celdas c y v (Ecuación 3).

c) Calcular el peso en función de su área y su distancia ac (Ecuación 4).

3. Calcular la media ponderada de los radios obtenidos(Ecuación 5).

4. Calcular el centro de la esfera tangente:

centro(c) = centr (c)− radio×normal (c) (7)


a) Calcular la distancia d entre el centro de la esfera tan-gente de la celda c y el centroide de la celda v.

b) Calcular la diferencia dif entre la distancia d y el radio(Figura 11).

6. La rugosidad se corresponderá con la media ponderadade las diferencias obtenidas:

rugosidad =n

∑i=1

peso(i) |di f (i)|

peso(i)(8)

donde n es el número de vecinos encontrados.

Utilizando el mismo caso propuesto en el apartado ante-rior, la Figura 12 muestra un ejemplo en cual se comparanlos resultados obtenidos sobre las alas de la escultura. Comoya ocurriera en el caso de la curvatura, se puede apreciar deforma clara la disparidad existente entre las rugosidades delala restaurada y el ala sin tratar.


71


centr(c)

centr(v)

centro(c)

normal(c)

dradio

Figura 11: Cálculo de la diferencia entre la distancia delcentro de la esfera tangente, centro(c), al centroide, centr(v),y su radio, radio.

- +

Figura 12: Comparación entre las rugosidades del ala dere-cha, restaurada, y el ala izquierda, sin tratar.

5. Evaluación

Con el fin de comprobar la calidad de nuestro cálculo, he-mos generado sintéticamente un modelo 3D aproximado deuna esfera y hemos obtenido una capa con la rugosidad de lamisma. Como se puede observar en la Figura 13, y a pesardel detalle expuesto por la paleta utilizada, la mayor partede la superficie presenta rugosidad cero o muy próxima a es-te valor. Las celdas que contienen valores algo mayores sonaquellas que se corresponden con los límites de los trián-gulos que conforman la malla utilizada por CHISel y estosvalores se explican fácilmente como los errores producidosal aproximar la figura geométrica de la esfera utilizando unamalla poligonal.

El método propuesto calcula la rugosidad celda a celda,por lo que es de esperar que el tiempo de cálculo sea pro-porcional al número de nodos del modelo. Para comprobar-

Figura 13: Cálculo de la rugosidad de una esfera generadasintéticamente.

lo hemos procesado varios modelos con distinto nivel decomplejidad. Dichos modelos se han capturado utilizandoel mismo escáner laser, Minolta Vivid-910, y se han proce-sado con las mismas herramientas software, Geomagic. Entabla 1 se comparan los resultados obtenidos. La segunda co-lumna muestra el número de niveles del octree generado. Latercera, el número de nodos cruzados por la superficie (estoes, las celdas de la capa). La cuarta detalla el tamaño en mm

de estas celdas. La quinta indica la figura en la que puedeverse el modelo. La última columna muestra el tiempo decálculo en segundos. Las pruebas se han llevado a cabo enun ordenador personal con un procesador Intel Core i3-530a 2.93 GHz, una tarjeta gráfica NVIDIA GeForce GTX 460,4 GB de memoria RAM y una distribución Kubuntu 12.04del sistema operativo Linux.

En la Figura 14 se comparan los resultados obtenidos alcalcular la rugosidad sobre la escultura de la cabeza de unLeón empleando distintos tamaños de celda y visualizándo-los con la paleta de de colores especificada en 14d. Como sepuede observar, los niveles de rugosidad crecen con el tama-ño de las celdas, dado que aumentar el tamaño de las celdasimplica calcular el valor de rugosidad en áreas más grandes,y por tanto la suma de desviaciones es mayor. Las seccionesblancas que se aprecian en la la Figura 14a se correspondencon valores mayores que el máximo de la paleta asignada.

La información de rugosidad es útil para analizar y preverel nivel de deterioro. El método propuesto en este trabajopermite calcular el nivel de rugosidad en cada punto de lasuperficie de una escultura. Con esta información es posiblesegmentar la superficie identificando las zonas que tienenun nivel de rugosidad alto (susceptibles de tener un mayornivel de deterioro) o una rugosidad muy baja, lo que puedeindicar la existencia de una zona restaurada, o tratada con unmaterial o técnica diferente.


72


Modelo Niveles Nodos Tamaño (mm) Figura Tiempo (seg)Angel 8 29.440 0.787 9 0.64Angel 9 118.319 0.394 2.73Angel 10 474.217 0.197 12.66León 8 41.308 3.302 14a 0.85León 9 164.898 1.651 14b 3.68León 10 659.543 0.825 14c 16.32Vasija 8 48.188 0.822 15 1.04Vasija 9 192.948 0.411 4.85Vasija 10 772.191 0.205 22.42

Tabla 1: Tiempos de generación de la capa de rugosidad para varios modelos.

(a) Rugosidad con 8 niveles de oc-tree generados.

(b) Rugosidad con 9 niveles de oc-tree generados.

(c) Rugosidad con 10 niveles deoctree generados.

(d) Paleta utilizada para los trescasos contemplados.

Figura 14: Comparación entre las rugosidades obtenidas utilizando un mismo modelo, una única paleta de colores y distintostamaños de celda.

Figura 15: Vasija de cerámica, mostrando información de ru-gosidad.

Esta función se ha usado para identificar zonas restaura-das en la escultura del ángel del belén de la Figura 9. Paraello se obtiene la rugosidad de la escultura (capa Rugosidadmostrada en la Figura 16).

A partir de esta capa de rugosidad se calculan las zonas

Figura 16: Capa de rugosidad. Los valores de la escala estánen mm.

con un valor de rugosidad alto (por encima de 0.06), usandola expresión algebraica:

RugosidadAlta= i f (Rugosidad > 0,06,1,null()) (9)


73


Figura 17: Zonas con rugosidad alta, calculadas en la capaRugosidadAlta.

El resultado se muestra en la Figura 17. Posteriormente, seeliminan pequeñas áreas que pueden ser debidas a la propiamorfología del modelo. Para ello se calcula un campo dedistancia, bufferAlta, a esta capa (ver Figura 18).

Calculamos las zonas que están a una distancia mayor que1.3 mm de las zonas de rugosidad alta, usando la expresión:

NuceloBa ja= i f (Bu f f erAlta> 1,3,1,null()) (10)

Esta capa representa las áreas con rugosidad baja que es-tán distantes de zonas de rugosidad alta. Ahora añadimos laszonas de rugosidad baja en el contorno de esta última capa,calculando de nuevo un campo de distancias a partir de ella(ver Figura 19).

Generamos el resultado final combinando este últimocampo de distancias con la capa de rugosidad original (verFigura 20), usando la expresión:

Restaurado= i f (Bu f f erBa ja< 1,3&&

Rugosidad < 0,06,1,null())(11)

5.1. Conclusiones

Disponer de información relacionada con la superficie deun modelo permite realizar análisis y simulaciones del esta-

Figura 18: Campo de distancias a las zonas de rugosidad alta.Capa BufferAlta.

Figura 19: Campo de distancias a las zonas de rugosidad ba-ja. Capa BufferBaja.


74


Figura 20: Estimación de zonas que han sido restauradas.

do de ésta. Uno de los parámetros mas importantes en res-tauración y conservación de bienes culturales es la rugosidadde la superficie. En este trabajo se ha propuesto un métodosimple de cálculo de rugosidad a partir de información geo-métrica en un modelo voxelizado. Se ha presentado un algo-ritmo para calcular la rugosidad usando el modelo anteriorpara mallas poligonales indexadas.

El algoritmo se ha integrado en CHISel [TLRS12], un sis-tema de información para patrimonio cultural.

Actualmente el método se ha implementado usando el en-torno inmediato de cada celda. En el futuro nos proponemosextenderlo para entornos de tamaño variable, y evaluar la in-fluencia del área de cálculo en los resultados.

Agradecimientos

Este trabajo ha sido financiado por la Consejería de Inno-vación, Ciencia y Empresa de la Junta de Andalucía a travésdel proyecto de excelencia PE09-TIC-5276, en colaboracióncon el Patronato de la Alhambra y del Generalife y del Con-junto Arqueológico de Itálica.

Los modelos mostrados son propiedad del Patronato de laAlhambra y del Generalife, del Conjunto Arqueológico deItálica, del Museo Histórico Municipal de Écija, del Museode Puebla de Don Fadrique, del Centro Andaluz de Arqueo-logía Ibérica y del Departamento de Escultura de la facultadde Bellas Artes de la Universidad de Granada.

References

[ACGI02] AMARAL R., CHONG L. H., GUNA D., INTRODUC-TION S.: Surface roughness, 2002. 2

[Bur01] BURROUGH P.: Gis and geostatistics: Essential partnersfor spatial analysis. Environmental and Ecological Statistics 8, 4(2001), 361–377. doi:10.1023/A:1012734519752. 2

[CGE05] CORSINI M., GELASCA E. D., EBRAHIMI T.: A multi-scale roughness metric for 3d watermarking quality assessment.In in Workshop on Image Analysis for Multimedia Interactive

Services 2005 (2005). 2

[GKM∗02] GADELMAWLA E., KOURA M., MAKSOUD T.,ELEWA I., SOLIMAN H.: Roughness parameters. Journal of

Materials Processing Technology 123, 1 (2002), 133–145. 2

[GS94] GRIMMOND C., SOUCH C.: Surface description for ur-ban climate studies: a gis based methodology. Geocarto Interna-tional 9, 1 (1994), 47–59.

[KG00] KARNI Z., GOTSMAN C.: Spectral compression of meshgeometry. In Proceedings of the 27th annual conference on Com-puter graphics and interactive techniques (New York, NY, USA,2000), SIGGRAPH ’00, ACM Press/Addison-Wesley PublishingCo., pp. 279–286. URL: http://dx.doi.org/10.1145/344779.344924, doi:10.1145/344779.344924. 2

[Lav09] LAVOUÉ G.: A local roughness measure for 3d mes-hes and its application to visual masking. ACM Trans. Appl.

Percept. 5, 4 (Feb. 2009), 21:1–21:23. URL: http://

doi.acm.org/10.1145/1462048.1462052, doi:10.1145/1462048.1462052. 2

[SBT∗05] SALVADÓ N., BUTÍ S., TOBIN M. J., PANTOS E.,PRAG A. J. N., PRADELL T.: Advantages of the use of sr-ft-irmicrospectroscopy: applications to cultural heritage. Analytical

chemistry 77, 11 (2005), 3444–3451. 1

[STLL12] SOLER F., TORRES J. C., LEÓN A. J., LUZÓN V.:Design of a cultural heritage information system. In Congreso

Español de Informática Gráfica (2012).

[Tia02] TIANO P.: Biodegradation of cultural heritage: decay me-chanisms and control methods. In Seminar article, New Uni-

versity of Lisbon, Department of Conservation and Restoration

(2002), pp. 7–12. 1

[TLRS12] TORRES J., LÓPEZ L., ROMO C., SOLER F.: An in-formation system to analize cultural heritage information. Pro-

gress in Cultural Heritage Preservation (2012), 809–816. 2, 9

[VGDG∗98] VAN GRIEKEN R., DELALIEUX F., GYSELS K.,ET AL.: Cultural heritage and the environment. Pure and ap-

plied chemistry 70, 12 (1998), 2327–2331.


75

Sesion 3

Vision and Imaging



A Study of Octocopters for 3D Digitization from Photographs

in Areas of Difficult Access

Germán Arroyo1,2,3, Alejandro Rodríguez3, Juan Carlos Torres1,2,3

[email protected], [email protected], [email protected]

1Departamento de Lenguajes y Sistemas Informáticos, Universidad de Granada2Laboratorio de Realidad Virtual, Universidad de Granada

3Virtum Graphics S.L.

Abstract

Reconstruction of 3D models from 3D scanners is a well consolidated process but sometimes expensive due to

the difficulty or even impossibility of positioning the scanners. This is an issue when scanning high buildings or

inaccessible areas. Unmanned Aerial Vehicles (UAVs) can solve the problem of accessibility, but it is necessary to

systematize the process followed both for recording and for data processing. In this paper, we present a study of

methodology, algorithms and parameters when octocopters are used for 3D reconstruction of buildings parts by

meaning of photographs. Results of this paper show under what conditions this technology can be effectively used

as a low-cost alternative for 3D scanners, reducing considerably the time employed for scanning. We also present

some effectively reconstructed examples by means of this technique with open source algorithms.

Categories and Subject Descriptors (according to ACM CCS): I.4.1 [Computer Graphics]: Image Processing and

Computer Vision/Image processing software—3D reconstruction, Surface reconstruction, Unmanned Aereal Vehi-

cles, Octocopter, Microcopter

1. Introducción

La digitalización tridimensional es la generación de un

modelo digital, tridimensional y preciso de un objeto. El

proceso de creación de modelos tridimensionales a partir de

escáner es un proceso bastante consolidado dentro del área

de digitalización tridimensional, en el que ya existen algorit-

mos de reconstrucción de superficie que funcionan de forma

estable.

Uno de los problemas que tiene el escaneo 3D es el cos-

te en función de la orografía del terreno que depende del

posicionamiento del escáner. En muchos casos también hay

que posicionar los focos necesarios para obtener el color que

se utilizará posteriormente en la texturización del modelo.

En los casos de edificios altos es necesario el uso adicio-

nal de grúas o el posicionamiento en balcones colindantes

para tener una correcta visibilidad, siendo en algunos casos

imposible dicha reconstrucción debido a la imposibilidad de

colocación del escáner. Lo que hace incrementar considera-

blemente el coste en tiempo y dinero asociado al escaneo.

Por otra parte, recientemente ha proliferado el uso de

vehículos aéreos no tripulados (UAVs) para la obtención de

datos tridimensionales debido al bajo coste de estos disposi-

tivos y a su utilidad, especialmente en áreas tales como la fo-

togrametría [LCT∗12,Eis06]. La ventaja de algunos de estos

dispositivos para el escaneo de fachadas de edificios altos es

palpable, especialmente en el caso de los microcópteros, de

los cuales existen distintos modelos en función del número

de hélices y brazos de que disponen (quadcopter, octocopter,

etc.), y que pueden volar estacionariamente alrededor de ob-

jetos durante tiempos superiores a los 20 minutos [Qua13].

También se han venido desarrollando algoritmos última-

mente para la reconstrucción de nubes de puntos a partir de

simples fotografías, tanto a partir de imágenes calibradas co-

mo a partir de imágenes no calibradas [ND10], y aunque su

uso ha sido recientemente probado para la reconstrucción

de modelos 3D en cámaras montadas en UAVs sobrevolan-

do edificios [rS12], no existe una metodología ni un estudio

comparativo de técnicas que permitan asegurar un buen es-

caneo a partir de un vuelo único.


79

G. Arroyo & A. Rodríguez & J. C. Torres / A Study of Octocopters for 3D Digitization from Photographs

En este artículo se presenta un estudio de las posibilida-

des de estas técnicas y un método para combinar técnicas de

visión artificial y dispositivos UAV, así como algoritmos de

generación de superficie a partir de nube de puntos para la

reconstrucción de superficies como alternativa al escáner en

zonas de difícil acceso. También se muestra un nuevo algo-

ritmo para la suavización de las normales en el proceso final

de reconstrucción.

2. Trabajos relacionados

Cualquier método que permita medir un conjunto sufi-

cientemente grande de posiciones sobre la superficie de un

objeto puede usarse para digitalizar modelos. Aunque el pro-

pósito de la mayoría de técnicas que permiten capturar la

forma de un objeto es la reconstrucción de una superficie,

existen métodos para la visualización directa de nubes de

puntos a partir de sistemas específicos de captación de da-

tos [PGC11,MGB∗11]. De entre todas las técnicas existen-

tes, las no intrusivas son las más utilizadas para el escaneo de

edificios y elementos patrimoniales ya que no alteran la obra

original. La mayoría de estas tecnologías se han construido

en forma de escáneres 3D.

Los escáneres láseres tienen problemas al difractar el rayo

en superficies con alta transluminiscencia (subsurface scat-

tering) [LPC∗00, LES∗07], lo que hace que no sean idea-

les para determinados tipos de materiales. Los escáneres por

tiempo de vuelo (time of flight scanners) [CSC∗10] han si-

do utilizados numerosas veces para reemplazar a los escá-

neres láser a muy largas distancias, su principal desventaja

es que la precisión que podemos obtener está limitada por

el tamaño de la onda del láser, además de su elevado precio.

Recientemente han ganado popularidad los escáneres de luz

estructurada [RCM∗01]. Sin embargo, el sistema tiene que

estar cercano a la superficie, lo cual no es siempre posible.

En cualquier caso, el problema crítico de todos estos dispo-

sitivos es el peso, excesivo para un pequeño microcóptero

capaz de llevar como máximo 2Kg de peso sin perder esta-

bilidad en vuelo [Qua13]. También la gran cantidad de po-

tencia eléctrica consumida lo hace inviable para un sistema

portable. Por tanto, se hace necesario el uso de pequeñas o

medianas cámaras fotográficas para la captación de datos.

En cuanto al uso de cámaras no calibradas para recons-

trucción 3D, nos encontramos con distintos algoritmos que

permiten la reconstrucción si hay suficientes fotografías y

las fotografías proveen suficiente detalle. Siendo el principal

algoritmo utilizado el SfM (Structure from Motion) ya que

presupone que los objetos están estáticos y es la cámara la

que se va moviendo por la escena [DSTT00], de forma que

se supone que el mismo objeto es visto por distintos foto-

gramas de la cámara en distintos ángulos, lo que nos per-

mitiría estimar la posición de la cámara en cada momento.

La mayoría de los algoritmos SfM siguen una secuenciali-

dad: empiezan por una reconstrucción mínima o dispersa, e

incrementalmente añaden nuevas vistas usando algoritmos

para la estimación de posición y para la triangulación. Sin

embargo, no hay garantías de que la reconstrucción converja

a una solución global óptima [MMM12].

De acuerdo con Bernardini y otros autores [BR02], cuan-

do analizamos un sistema de escaneo 3D, podemos asegu-

rarnos de que sea una técnica realmente efectiva si tenemos

en cuenta los siguientes factores: a) hay que tener métodos

de planificación para la obtención de datos bien definidos;

b) se deben poder capturar objetos grandes y con superficies

complicadas; c) se debe automatizar todos los pasos posi-

bles, minimizando la entrada del usuario; d) Se debe poder

asegurar la precisión tras el registrado y tratamiento de los

datos escaneados. En las siguientes secciones se introduci-

rá el método a seguir para poder realizar escaneo a partir de

fotografías de forma efectiva usando cámaras de acuerdo a

estos principios.

3. Procedimiento en el escaneo y trabajo de campo

La prioridad cuando se realiza un escaneo es la calidad

de los datos, en nuestro caso solapamiento entre imágenes,

y la minimización del tiempo requerido para la obtención de

los datos, en nuestro caso el tiempo de vuelo. Con esta filo-

sofía se ha realizado primero un estudio de cómo afecta el

posicionamiento de la cámara y el recorrido de la misma pa-

ra posteriormente hacer pruebas de vuelo real maximizando

datos y minimizando el tiempo de vuelo.

En principio, y dado que utilizamos algoritmos que toman

como entrada imágenes de cámaras no calibradas, el orden

en el que se toman las fotografías o fotogramas no afecta

a la calidad del proceso, aunque si bien es cierto, podrían

decrementar los tiempos de cómputo. Puesto que el tiempo

de cómputo para la reconstrucción no afecta al tiempo de

escaneo, el orden en el que se toman las fotografías no ha

sido considerado un parámetro relevante para el ahorro en

el tiempo de escaneo y trabajo de campo. Sin embargo, sí

que es relevante la forma en la que se toman las fotografías

para optimizar el tiempo de vuelo del dispositivo con idea

de tomar el máximo número de fotografías posible para el

posterior procesamiento de las mismas.

Por tanto, antes de probar el vuelo del dispositivo hemos

realizado un estudio mediante fotografía con trípode, con

tres técnicas básicas de fotografiado de acuerdo al esque-

ma de la Figura 1: primero se probó a mover la cámara a lo

largo de un solo eje, fotografiando a una misma altura el edi-

ficio, después se probó a girar la cámara con un movimiento

de alabeo (a), se repitieron distintas variantes con cabeceo y

distintos ejes (b, c), y posteriormente se probó a capturar las

fotografías mediante una combinación de todas las técnicas.

Tomados los datos se realizaron varias reconstrucciones

y se comparó empíricamente cuál era el resultado de dicha

reconstrucción. Se comprobó que todas aquellas reconstruc-

ciones que combinaban alabeo o cabeceo mejoraban la cohe-

rencia ya que se obtenían del orden del 40% más correspon-


80


Figure 1: Proceso de fotografiado en un plano, las técnicas

se han probado sin girar la cámara y girándola: a) movi-

miento en eje x con alabeo (pitch), b) movimiento en eje y

con cabeceo (yaw), c) movimiento en eje x con cabeceo d)

con combinación de movimiento x e y, y cabeceo y alabeo.

dencias que si no se giraba la cámara, debido al alto sola-

pamiento obtenido (superior al 80%) de los píxeles.. Tam-

bién se comprobó que no hay diferencia entre realizar alabeo

o cabeceo, o ambas, siempre que se hiciera constantemente

antes de desplazar el UAV a otra posición lejana.

Adicionalmente se realizaron fotografías a diferentes dis-

tancias del edificio, y aunque en teoría los algoritmos utiliza-

dos son independientes de escala, en la práctica un cambio

de escala considerable hace que no haya correspondencias

y sea perjudicial para la reconstrucción. Se han hecho prue-

bas registrando tomas en días diferentes y usándolas para

reconstruir el mismo modelo, afectando tan sólo a la colora-

ción de la textura. Aunque este tipo de dispositivos es muy

estable a costa de un bajo rendimiento energético, es ideal

evitar que haya un fuerte viento o lluvia, aprovechando con-

diciones meteorológicas estables.

Una vez conocidos estos datos, se procedió a capturar las

fotografías con un octocóptero. En nuestro caso el disposi-

tivo utilizado fue un octocóptero de modelo i-UAV diseña-

do por la empresa Intelligenia DynamicsI, el cual tenía una

autonomía de vuelo aproximada de 25 minutos, y podía le-

vantar hasta 2Kg. Se probaron tres cámaras de muy distintas

características, tal y como muestra la Figura 2, de sus carac-

terísticas técnicas lo más relevante es que las tres cámaras

tenían una lente muy distinta y una de las cámaras grababa

vídeo de alta definición, mientras que las otras dos eran cá-

maras de fotografía. El precio de las tres cámaras y el UAV

no superaba los 6.000 euros.

Tras un análisis de las nubes de puntos obtenidas, se desa-

rrolló la siguiente metodología en la planificación de vuelo:

en todos los vuelos usamos un esquema de vuelo en zig-zag

mostrado en la Figura 3, en el que para cada nueva posición

en la trayectoria, el UAV se estabiliza durante un par de se-

gundos girando horizontalmente el UAV. Siempre intentando

que la cámara enfocara alguna de las partes más interesantes

I Intelligenia Dynamics S.L.: http://www.iuavs.com/

Figure 2: a) Octocóptero utilizado para la captación de da-

tos; b) cámaras conectadas en los distintos vuelos del UAV,

de izquierda a derecha: Webcam de gran angular Go-Pro

Hero 3; cámara reflex Canon EOS 600D con objetivo EF-

S18-55 IS II; cámara compacta Canon PowerShot SX50 HS.

del edificio (zonas con detalles tales como ventanas, marcos,

etc.) para que los algoritmos posteriores lo utilizaran como

referencia.

Figure 3: Recorrido seguido por el octocóptero para la cap-

tación de datos. Debe seguir un esquema en zig-zag, empe-

zando por cualquiera de las partes. Si el octocóptero queda

sin batería, al reemplazarla se debe continuar por el lugar

por el que paró. Se puede observar como es ideal que el

dispositivo vuele siempre sobre planos bien delimitados, ga-

rantizando un 80% de solapamiento mínimo entre toma ho-

rizontal y un 20% con las tomas verticales.

En total se realizaron 6 vuelos repartidos entre dos edifi-

cios de muy distintas características, el Laboratorio de Reali-

dad Virtual de la Universidad de Granada y el Monasterio de

la Cartuja de Granada (Figura 4). Estos edificios fueron se-

leccionados por dos motivos, ambos tienen un nivel de textu-

ra muy diferente, mientras que el edificio a) es prácticamente

liso, el b) tiene una alta rugosidad. Además, en el a) hay cris-

taleras, mientras que en el b) hay materiales de tipo piedra

altamente especular. Se activó un disparador automático ca-

da 5 segundos, obteniendo unas 12 fotografías por minuto.

La media de la duración de los vuelos fue de 22 minutos

aproximadamente, siendo el vuelo más corto de 10 minutos,

y el más largo de 30 minutos. Como media de cada vuelo se

obtenían de media unas 260 fotografías. Posteriormente, ya

en tierra, se descargaban las fotografías en la estación central

y se descartaban aquellas que estaban borrosas o duplicadas

que en promedio eran unas 20. Este proceso era manual y se

realizó en todos los casos en aproximadamente 10 minutos.


81


Figure 4: Edificios usados para el escaneo: a) Laboratorio

de Realidad Virtual (Universidad de Granada), b) Puerta

del monasterio de la cartuja

4. Algoritmos utilizados para el estudio

Una vez los datos han sido obtenidos, se ha de utilizar un

software de reconstrucción. En la reconstrucción clásica me-

diante escáner 3D las fases son tradicionalmente: a) procesar

las nubes de puntos para que todas tengan el mismo sistema

de coordenadas, b) minimización del espacio entre tomas,

c) creación de la malla de triángulos, d) tapado de posibles

agujeros.

Sin embargo, en el proceso de obtención de la nube de

puntos a partir de fotografías los pasos a) y b) no son nece-

sarios realizarlos ya que en una primera fase los algoritmos

obtienen la posición 3D de las cámaras que estaban enfocan-

do cada una de las fotografías, con lo que intrínsecamente se

conoce la posición relativa de los distintos puntos.

Una vez hecho eso, podemos interpolar las posiciones in-

termedias de los puntos reconocidos de las tomas de las cá-

maras para obtener una nube de puntos densa que pasará a

una tercera fase donde se realizará la reconstrucción de la

superficie. Dicha nube no está georeferenciada, es por ello

que se necesita conocer al menos la posición y orientación

de tres puntos para corregir la escala.

En las siguientes subsecciones se comentarán los algorit-

mos empleados para cada uno de los pasos de reconstrucción

mencionados anteriormente.

4.1. Algoritmos para la reconstrucción de la nube de

puntos

Para reconstruir la nube de puntos es imprescindible pa-

sar por una serie de pasos, tales como los que se muestran

en el esquema de la Figura 5. Dichos pasos involucran el

obtener información de cada uno de los píxeles en el resto

de fotografías tomadas y finalmente la reconstrucción 3D de

aquellos puntos más característicos de la imagen.

La selección de dichos algoritmos no ha sido arbitraria,

sino que se han utilizado aquellos que, estando disponibles

en la comunidad científica o bajo términos de software libre,

se enmarcan dentro de la lista de algoritmos con mejores

resultados en las comparativas más recientes [Wan11].

En el primer paso, un algoritmo se encarga de detectar

las partes más invariantes de la imagen, denominadas carac-

terísticas, estas características se suelen relacionar con las

Figure 5: Cauce de procesamiento para la obtención de la

nube de puntos densa.

esquinas de la imagen que son secciones donde dos bordes

cruzan [Jai88]. También en esta fase, tras la detección de

características se produce el emparejamiento de las mismas

en distintas fotografías, de esta forma, varios puntos de una

imagen se relacionan con esos mismos puntos en otras imá-

genes de distintos ángulos del mismo objeto. Dicho proceso

puede realizarse mediante el algoritmo SIFT [Low04a], del

cual se ha empleado para este trabajo la versión GPUSift ba-

jo licencia de software libre optimizada para tarjetas de tipo

NVIDIAII, debido a que el software original está bajo paten-

te, y a que la versión en GPU es bastante más rápida que la

original [SmFPG06].

En un segundo paso se realiza un ajuste estadístico de las

posibles posiciones de las cámaras para cada una de las fo-

tografías teniendo en cuenta los emparejamientos y las ca-

racterísticas asociadas a la misma, basado en los últimos al-

goritmos de SfM [VGS07], concretamente, el algoritmo que

mejor resultados obtiene según las comparativas antes men-

cionadas es Bundler [SSS06]. El cual está implementado en

su versión abiertaIII, y en al menos un software comercial

conocidoIV.

El problema de Bundler es que es altamente dependien-

te del detector de características, el cual no obtiene nor-

malmente más de un 30% de las características de la ima-

gen [SSS06], lo que resulta en menos de un tercio de puntos

de los que se podrían obtener con la información resultan-

te de los píxeles de las imágenes. La salida de Bundler es

II NVIDIAc©: http://www.nvidia.com

III Bundler: http://phototour.cs.washington.edu/bundler/IV Photo Turism

c©: http://phototour.cs.washington.edu/


82


siempre una nube de puntos dispersa, y por ello, es necesario

en un siguiente paso volver a recomputar las posiciones 3D

de los píxeles atendiendo a la información de las cámaras.

Debido a que el algoritmo es extremadamente lento, lo que

se hace en este paso es intentar computar cada uno de los

trozos detectados por Bundler y luego a posteriori recom-

binarlos todos en un único modelo. Si esto no es posible,

varios modelos son devueltos por los algoritmos utilizados.

El algoritmo que mejor funciona según las comparativas an-

teriormente mencionadas es PVMS2V y CMVSVI desarro-

llado por Yasutaka Furukawa y Jean Ponce [FP10] para los

dos pasos anteriormente comentados, los cuales están dispo-

nibles bajo licencia abierta.

El resultado de estos algoritmos es una nube de puntos

densa con normales estimadas, sobre la cual se pueden pasar

algoritmos de reconstrucción de malla. Por tanto, el proceso

posterior se parece al seguido en nubes de puntos de escá-

neres 3D, con la diferencia de que el registrado de nube de

puntos no es necesario ya que CMVS se encarga de este paso

de forma automática y solamente en el caso en que falle es

necesario un ajuste manual. Dicho paso manual no ha tenido

que realizarse en ninguna de las reconstrucciones mostradas

en este trabajo.

4.2. Algoritmos para la reconstrucción de la malla

El resultado de los algoritmos de visión artificial para la

reconstrucción de nube de puntos es siempre no estructurado

con la única información añadida de la normal al vértice, al

igual que en el caso de nubes de puntos escaneadas con és-

caner. Sin embargo, a diferencia de las nubes obtenidas con

escáner, los puntos obtenidos no están separados entre ellos

por una distancia regular, siendo puntos no-organizados, lo

que hace que la mayoría de algoritmos de escaneo tengan

dificultades con la reconstrucción.

En nuestro caso hemos probado 4 algoritmos ampliamen-

te usados en la reconstrucción de superficies y que son un

muestreo de las técnicas usadas para reconstrucción de su-

perficie a partir de nube de puntos:

1. Un método basado en grafos de Voronoi, que se basa

en una mejora al algoritmo de triangulación de Delau-

nay [GKS00].

2. Un método volumétrico, basado en una modificación al

algoritmo de Curless [CL96] implementado en Mesh-

LabVII bajo el nombre de VCG.

3. Una versión del algoritmo de reconstrucción de superfi-

cies por Poisson [KBH06a] bajo licencia abierta, el cual

es un híbrido entre un método volumétrico y un método

de adaptación de superficies.

V PMVS2: http://www.di.ens.fr/pmvs/VI CMVS: http://www.di.ens.fr/cmvs/VII MeshLab. http://meshlab.sourceforge.net/

4. Una implementación del algoritmo Ball-Pivoting

[BMR∗99], que es un algoritmo heurístico también

basado en el algoritmo de triangulación de Delanuay.

La Figura 6 muestra una reconstrucción con parámetros

por defecto de estas 4 técnicas. Tras un ajuste intensivo de

parámetros no hemos conseguido que los algoritmos de De-

launay o de Curless mejoren considerablemente, por ello los

hemos descartado de la comparativa. Esto era más o menos

esperable, ya que estos algoritmos no se apoyan de las nor-

males para cada vértice lo que hace que trabajen con menos

información y, por tanto, obtengan peor resultado. El pro-

blema del resto de técnicas es que no todas las normales es-

tán correctamente calculadas debido a problemas en la pre-

cisión de los algoritmos PMVS2 y CMVS, los cuales tratan

de calcular las normales a partir de la posición de la cáma-

ra [FP10]. En la siguiente sección se explica la solución que

hemos empleado para resolverlo.

En cualquiera de las técnicas, el mayor problema viene

derivado de que los agujeros producidos por la nube densa

son amplios y los puntos están distribuidos muy irregular-

mente.

Como hemos apreciado experimentalmente, la recons-

trucción con el método de Poisson tapa todos los agujeros,

incluyendo información que no se corresponde con la reali-

dad (como al tapar los agujeros de la puerta o las zonas en

sombra), lo cual no es deseable porque es más sencillo ce-

rrar agujeros a posteriori con otro algoritmo. Adicionalmen-

te Poisson no reconstruye con suavidad la superficie, inven-

tando información donde no existe, un ajuste de parámetros

puede mejorar la apariencia del objeto, pero disminuye la

precisión [KBH06b]. Es por ello que para las comparativas

sucesivas se utilizará siempre el algoritmo de Ball-Pivoting,

aunque como hemos visto el algoritmo basado en el método

de Poisson también puede darnos resultados aceptables si so-

mos capaces de recomponer los agujeros reales del modelo,

automáticamente tapados.

El tiempo de computación de todos estos algoritmos, in-

cluyendo la reconstrucción de la nube de puntos y la de la

superficie, es elevado (de 4 a 7 horas en algunos casos). Sin

embargo, el tiempo no es un problema grave en cuanto al

coste económico en esta segunda fase, ya que no hay interac-

ción humana en el proceso una vez ajustados los parámetros

y tan sólo hay tiempo de cómputo, dichos costes de procesa-

miento pueden tender a ser extremadamente bajos mediante

el aprovechamiento de las últimas capacidades de procesa-

miento en la nube (cloud computing) [Rac13].

4.3. Algoritmos para la medición de datos y

refinamiento del proceso

Para suavizar las normales y disminuir el error aumentan-

do la conectividad de la nube de puntos hemos diseñado un

algortimo basado en una estructura de grafo que hemos de-

nominado GDCC (Grafo denso de conectividad de cámaras).


83


Figure 6: Arriba: reconstrucción obtenida a partir de distintos métodos sobre la misma nube de 1.033.120 puntos: a) Delaunay

(7.714.178 triángulos); b) Curless (12.881.940 triángulos); c) Ball-Pivoting (1.349.715 triángulos); d) Poisson (2.065.641

triángulos).

Antes de entrar a analizar la estructura, es importante te-

ner en cuenta que los algoritmos para la generación de la

nube dispersa trabajan solamente en niveles de intensidad,

para lo que no es relevante el color. Sin embargo, PMVS2 y

CMVS utilizan el color de la imagen tratando cada canal del

espacio RGB por separando. Esto produce resultados ligera-

mente distintos cuando desaturamos la imagen, tal y como

puede apreciarse en la Figura 7. Merece la pena notar que

aunque la nube no es tan densa, hay varios agujeros peque-

ños que son rellenados por puntos adicionales, lo que podría

servirnos para cerrar huecos en una primera fase.

Al igual que CMVS, nuestro algoritmo usa color para la

creación del GDCC, pero a diferencia de este, no hace una

separación por canales y un tratamiento independiente de ca-

da uno de ellos.

El GDCC es un grafo adireccional que está compuesto de

de nodos con pesos, donde cada nodo es un punto recons-

truido de la nube densa. Cada peso se calcula en función de

la fidelidad de la información de cámara, y además almacena

la propia información de cámara y la normal obtenida. Los

enlaces entre nodos también se usan para desplazar la esfera

del Ball-Pivoting y cerrar pequeños huecos.

El algoritmo para crear el GDCC empieza comparan-

do iterativamente parejas de puntos calculando la distancia

euclídea entre los vecinos cercanos dentro de una esfera

de radio dado por el algoritmo de reconstrucción de Ball-

Pivoting, seleccionando solamente aquellos vecinos cerca-

nos en intensidad y color. La cercanía se puede calcular me-

diante distancias en cualquier espacio de color (euclídea si

usamos RGB, distancia polar en el caso de HSV, etc.). Se

considera que un vecino está conectado si su distancia de co-

lor es menor o igual a un constante (que experimentalmente

hemos ajustado para todas las reconstrucciones al 5%) del

punto con el que se comparar.

El color lo extraemos empleando una máscara centrada en

la posición donde ha caído la proyección. Esta máscara tiene

Figure 7: Reconstrucción usando las mismas fotografías: a)

en color; b) en 256 tonos de grises. Las fotografías en tonos

de grises se obtuvieron desaturando las imágenes en color

en base a luminosidad.

un radio constante, en nuestro caso para todas las compara-

tivas en imágenes de alta resolución hemos usado un radio

5x5 debido a que es el tamaño estándar para la eliminación

de ruido en filtros gaussianos [Jai88]. Además se establece

un método de extracción que puede ser la media de los pí-

xeles de la máscara, aunque en nuestro caso hemos usado

una media ponderada mediante una función gaussiana, del


84


mismo modo que se suele utilizar para los algoritmos de ex-

tracción de características [Low04b].

El algoritmo sería el siguiente:

1. Para cada vértice sin información de cámara

2. Se exploran los vecinos con información de cámara

3. Se proyecta el pixel en la imagen de las n cámaras que

coinciden en todos los vecinos

4. Por cada cámara donde se encuentre el píxel, aumenta-

mos en 1 el peso del nodo

5. Se añade la información de cámara común a todos sus

vecinos

A posteriori, usando el GDCC volvemos a aplicar Ball-

Pivoting eligiendo los ciclos del grafo con más peso en caso

de empate, y suavizando las normales en función de la den-

sidad de conexión del nodo de acuerdo con el método de

Nelson Max [Max99], pero utilizando los pesos ya calcula-

dos y normalizados en lugar de calcularlos a partir de una

superficie. Los vértices con información de cámara siempre

tienen un peso máximo.

Puesto que el GDCC se forma a partir de los puntos de la

nube densa, cada punto de esta tiene una relación 1 a 1 con

los nodos del GDCC.

Posteriormente podemos pasar un algoritmo de cerrado

de agujeros [BNK02], obteniendo una versión suavizada con

algunos agujeros menos de forma totalmente automática tal

y como muestra la Figura 8.

Figure 8: Posproceso para la corrección de las normales en

superficies lisas: a) Ball-Pivoting, b) modificación de nor-

males propuesta en la Sección 4.3 para Ball-Pivoting.

En la siguiente sección se analizan y comentan como afec-

tan las imágenes a la nube densa, ya que el Ball-Pivoting

funciona de acuerdo a la densidad de puntos obtenidos.

5. Resultados

En primer lugar, los algoritmos son totalmente dependien-

tes de las fotografías y de cómo se hayan tomado, así que

se ha seguido el procedimiento mencionado en la Sección 3,

hemos tomado distintas fotografías a una distancia de 3±0,5

metros de distancia.

El número de fotografías utilizado ha sido 240 fotogra-

fías para ambos modelos. Experimentalmente también he-

mos comprobado que la distorsión de la lente no afecta a los

algoritmos, lo cual era previsible, ya que todos estos algorit-

mos realizan una corrección de la deformación de la imagen,

es por ello que hay mezcla de fotografías de distintas cáma-

ras en la reconstrucción, sin afectar esto al resultado.

Los algoritmos de reconstrucción son extremadamente

sensibles a la resolución. La Figura 9 muestra una compara-

tiva de una reconstrucción usando las mismas fotografías a

distintas resoluciones. Además, se muestra una gráfica (na-

ranja) con el número de vértices obtenidos por la recons-

trucción dispersa y densa, así como el número de vértices

a los que se le corrigieron las normales y se añadieron in-

formación de cámara gracias al GDCC. En la gráfica verde

tenemos el número de conexiones máximo, y el número de

conexiones media para el GDCC reconstruido para cada re-

solución. Se puede apreciar como a más resolución, menos

conexo es el grafo, lo que significa que tenemos más preci-

cisión en la reconstrucción.

Esto se corresponde con las medidas tomadas en los edifi-

cios, la Figura 10 muestra unas medidas tomadas en puntos

característicos, y la gráfica correspondiente a las diferencias

encontradas en los vértices de la nube densa. Se puede apre-

ciar que para la máxima resolución el error medio cometido

es de ±0,5cm., siendo el máximo error 3cm., y el mínimo

0,03cm.

Es muy importante remarcar cómo la compresión con

pérdida (como la que encontramos en el estándar JPEG o

MPEG) afecta muy negativamente al resultado, tal y como

muestra la Figura 11.

Finalmente, la Figura 12 muestra como el algoritmo Ball-

Pivoting puede tapar bastantes agujeros, siendo desaconse-

jable utilizar valores altos para evitar que cometa fallos al

cerrar la malla. Pero claramente se observa que el algoritmo

es totalmente dependiente de la malla densa obtenida, por lo

que es fundamental tener unos buenos algoritmos de recons-

trucción a partir de fotografías como base.

6. Conclusiones

Como conclusiones podemos decir que hemos demostra-

do que el escaneo mediante cámara fotográfica en vehículos

aéreos no tripulados es factible, llegando a obtener resolu-

ciones cercanas al centímetro a una distancia de 3 metros sin

zoom. Además es una tecnología que se ha demostrado bara-

ta con respecto a los costes estándar de un escáner, especial-

mente en zonas de difícil acceso, donde el escaneo no tarda

más de media hora, siendo todo el procesamiento realizado

a posteriori en una estación remota. Además, se ha dado una

guía de como realizar el escaneo, así como los algoritmos a

utilizar y los parámetros a emplear para una reconstrucción

óptima. Finalmente, se ha diseñado un algoritmo que permi-


85


50

100

150

200

Puerta del monasterio

Vecinos (máximo)

Vecinos (media)

432x288 864x576 1728x1152 3456x2304

2

4

·106

Vértices (dispersa)

Vértices (densa)

Vértices (GDCC)

20

40

60

80

Laboratorio de realidad virtual

Vecinos (máximo)

Vecinos (media)

432x288 864x576 1728x1152 3456x2304

2

4

6

·106

Vértices (dispersa)

Vértices (densa)

Vértices (GDCC)

Figure 9: Arriba: Reconstrucciones de nubes de puntos densas para los dos edificios para las mismas imágenes con distinta

resolución (indicada en la parte superior para cada modelo). Abajo: gráficas comparando el número de vértices obtenido en

las distintas nubes (naranja), y la conectividad del grafo generado (verde).

te suavizar las mallas y tapar pequeños agujeros para dar un

aspecto más limpio a la malla reconstruida.

References

[BMR∗99] BERNARDINI F., MITTLEMAN J., RUSHMEIER H.,SILVA C., TAUBIN G.: The ball-pivoting algorithm for surfa-ce reconstruction. Visualization and Computer Graphics, IEEE

Transactions on 5, 4 (1999), 349–359. doi:10.1109/2945.817351. 5

[BNK02] BORODIN P., NOVOTNI M., KLEIN R.: Progressivegap closing for mesh repairing. In Advances in Modelling, Ani-

mation and Rendering, Vince J., Earnshaw R., (Eds.). SpringerVerlag, July 2002, pp. 201–213. 7

[BR02] BERNARDINI F., RUSHMEIER H. E.: The 3d model ac-

quisition pipeline. Comput. Graph. Forum 21, 2 (2002), 149–172.2

[CL96] CURLESS B., LEVOY M.: A volumetric method for buil-ding complex models from range images. In Proceedings of the

23rd annual conference on Computer graphics and interactive te-

chniques (New York, NY, USA, 1996), SIGGRAPH ’96, ACM,pp. 303–312. URL: http://doi.acm.org/10.1145/237170.237269, doi:10.1145/237170.237269. 5

[CSC∗10] CUI Y., SCHUON S., CHAN D., THRUN S., THEO-BALT C.: 3d shape scanning with a time-of-flight camera. InComputer Vision and Pattern Recognition (CVPR), 2010 IEEE

Conference on (2010), pp. 1173–1180. doi:10.1109/CVPR.2010.5540082. 2

[DSTT00] DELLAERT F., SEITZ S. M., THORPE C. E., TH-RUN S.: Structure from motion without correspondence. In In


86


a) b) c) d) e) f) g) h)

−2

0

2

±0,51cm.

432x288 864x576

1728x1152 3456x2304

Figure 10: Medidas reales tomadas para distintas zonas de la fachada del laboratorio y comparativa con con las nubes de

puntos para distintas resoluciones. Se puede apreciar en la gráfica, como a máxima resolución el error cometido es de ±0,51

cm, lo que indica que para fotografías tomadas a 3±0,5m. de distancia se obtiene un error con resolución de fotografía de

3456x2304 de un centímetro en media.

Figure 11: Reconstrucciones de nubes de puntos densas para los dos edificios para las mismas imágenes con distinto ratio de

compresión (indicado en la parte superior para cada modelo).

IEEE Conf. on Computer Vision and Pattern Recognition (CVPR

(2000), pp. 557–564. 2

[Eis06] EISENBEISS H.: Applications of photogrammetric pro-cessing using an autonomous model helicopter. In International

Archives of Photogrammetry, Remote Sensing and Spatial Infor-

mation Sciences (2006), vol. 36. 1

[FP10] FURUKAWA Y., PONCE J.: Accurate, dense, androbust multiview stereopsis. IEEE Transactions on Pattern

Analysis and Machine Intelligence 32, 8 (2010), 1362–1376.doi:http://doi.ieeecomputersociety.org/10.

1109/TPAMI.2009.161. 5

[GKS00] GOPI M., KRISHNAN S., SILVA C.: Surfa-ce reconstruction based on lower dimensional localizeddelaunay triangulation. Computer Graphics Forum 19,3 (2000), 467–478. URL: http://dx.doi.org/10.

1111/1467-8659.00439, doi:10.1111/1467-8659.00439. 5

[Jai88] JAIN A. K.: Fundamentals of Digital Image Processing.Prentice Hall, october 1988. 4, 6

[KBH06a] KAZHDAN M., BOLITHO M., HOPPE H.: Poissonsurface reconstruction. In Proceedings of the fourth Eurograp-

hics symposium on Geometry processing (Aire-la-Ville, Switzer-land, Switzerland, 2006), SGP ’06, Eurographics Association,pp. 61–70. URL: http://dl.acm.org/citation.cfm?id=1281957.1281965. 5

[KBH06b] KAZHDAN M., BOLITHO M., HOPPE H.: Poissonsurface reconstruction. In Proceedings of the fourth Eurograp-

hics symposium on Geometry processing (Aire-la-Ville, Switzer-land, Switzerland, 2006), SGP ’06, Eurographics Association,pp. 61–70. URL: http://dl.acm.org/citation.cfm?id=1281957.1281965. 5

[LCT∗12] LIU X., CHEN P., TONG X., LIU S., LIU S., HONG

Z., LI L., LUAN K.: Uav-based low-altitude aerial photogram-metric application in mine areas measurement. In Earth Ob-

servation and Remote Sensing Applications (EORSA), 2012 Se-


87


Figure 12: Resultados tras la variación del parámetro de distancia (d) del algoritmo Ball-Pivoting.

cond International Workshop (june 2012), pp. 240 –242. doi:10.1109/EORSA.2012.6261173. 1

[LES∗07] LAMBERS K., EISENBEISS H., SAUERBIER M.,KUPFERSCHMIDT D., GAISECKER T., SOTOODEH S., HA-NUSCH T.: Combining photogrammetry and laser scanning forthe recording and modelling of the late intermediate period siteof pinchango alto. In Journal of archaeological science (Palpa,Peru, 2007). 2

[Low04a] LOWE D. G.: Distinctive image features fromscale-invariant keypoints. Int. J. Comput. Vision 60, 2 (Nov.2004), 91–110. URL: http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94, doi:10.1023/B:

VISI.0000029664.99615.94. 4

[Low04b] LOWE D. G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision

60, 2 (2004), 91–110. 7

[LPC∗00] LEVOY M., PULLI K., CURLESS B., RUSINKIEWICZ

S., KOLLER D., PEREIRA L., GINZTON M., ANDERSON S.,DAVIS J., GINSBERG J., SHADE J., FULK D.: The digital mi-chelangelo project: 3d scanning of large statues. In Proceedings

of the 27th annual conference on Computer graphics and in-

teractive techniques (New York, NY, USA, 2000), SIGGRAPH’00, ACM Press/Addison-Wesley Publishing Co., pp. 131–144. URL: http://dx.doi.org/10.1145/344779.

344849, doi:10.1145/344779.344849. 2

[Max99] MAX N.: Weights for computing vertex normalsfrom facet normals. J. Graph. Tools 4, 2 (Mar. 1999), 1–6.URL: http://dx.doi.org/10.1080/10867651.

1999.10487501, doi:10.1080/10867651.1999.

10487501. 7

[MGB∗11] MARTON F., GOBBETTI E., BETTIO F., GUITIAN J.,PINTUS R.: A real-time coarse-to-fine multiview capture sys-tem for all-in-focus rendering on a light-field display. In 3DTV

Conference: The True Vision - Capture, Transmission and Dis-

play of 3D Video (3DTV-CON), 2011 (2011), pp. 1–4. doi:

10.1109/3DTV.2011.5877176. 2

[MMM12] MOULON P., MONASSE P., MARLET R.: Adaptivestructure frommotion with a contrario model estimation. In Com-

puter Vision – ACCV 2012 (2012), pp. 257–270. 2

[ND10] NEWCOMBE R., DAVISON A.: Live dense reconstruc-tion with a single moving camera. In Computer Vision and Pat-

tern Recognition (CVPR), 2010 IEEE Conference (june 2010),pp. 1498 –1505. doi:10.1109/CVPR.2010.5539794. 1

[PGC11] PINTUS R., GOBBETTI E., CALLIERI M.: Fast low-memory seamless photo blending on massive point clouds usinga streaming framework. In ACM Journal on Computing and Cul-

tural Heritage (2011), vol. 4. 2

[Qua13] QUADROCOPTER: Customization of a microcop-ter: Cinestar 8. http://www.quadrocopter.com/

Custom-CineStar-8-Ready-to-Fly_p_627.html,april 2013. 1, 2

[Rac13] RACKSPACE: Rackspace cloud computing servi-ces. http://www.rackspace.com/cloud/servers/

pricing_b/, april 2013. 5

[RCM∗01] ROCCHINI C., CIGNONI P., MONTANI C., PINGI P.,SCOPIGNO R.: A low cost 3d scanner based on structured light.In EUROGRAPHICS 2001 (2001). 2

[rS12] 33RD SQUARE: Autodesk creates 3dscan of its headquarters using robot microcop-ter. http://www.33rdsquare.com/2012/04/

autodesk-creates-3d-scan-of-its.html, april2012. 1

[SmFPG06] SINHA S. N., MICHAEL FRAHM J., POLLEFEYS

M., GENC Y.: GPU-based Video Feature Tracking and Mat-

ching. Tech. rep., In Workshop on Edge Computing Using NewCommodity Architectures, 2006. 4

[SSS06] SNAVELY N., SEITZ S. M., SZELISKI R.: Pho-to tourism: exploring photo collections in 3d. ACM Trans.

Graph. 25, 3 (July 2006), 835–846. URL: http://

doi.acm.org/10.1145/1141911.1141964, doi:10.1145/1141911.1141964. 4

[VGS07] VEDALDI A., GUIDI G., SOATTO S.: Moving forwardin structure from motion. In Computer Vision and Pattern Recog-

nition, 2007. CVPR ’07. IEEE Conference on (2007), pp. 1–7.doi:10.1109/CVPR.2007.383117. 4

[Wan11] WANG Y.-F.: A Comparison Study of Five 3D Modeling

Systems Based on the SfM Principles. Technical Report, TR 2011-

01. Tech. rep., Visualsize Inc., 2011. 4


88

Optimized generation of stereoscopic CGI films

by 3D image warping

José M. Noguera1, Antonio J. Rueda1, Miguel A. Espada2, Máximo Martín2

1Grupo de Gráficos y Geomática, Universidad de Jaén. Campus Las Lagunillas, Edificio A3. 23071 Jaén, Spainjnoguera, [email protected]

2Kandor Graphics SL. BIC Granada. PTS, Avda de la Innovación, 1. 18100 Armilla (Granada), Spainridli, [email protected]

Abstract

The generation of a stereoscopic animation film requires doubling the rendering times and hence the cost. In this

paper we address this problem and propose an automatic system for generating a stereo pair from a given image

and its depth map. Albeit several solutions exist in the literature, the high standards of image quality required in

the context of a professional animation studio forced us to develop specially crafted algorithms that avoid artifacts

caused by occlusions, anti-aliasing filters, etc. This paper describes all the algorithms involved in our system and

provides their GPU implementation. The proposed system has been tested with real-life working scenarios. Our

experiments show that the second view of the stereoscopic pair can be computed with as little as the 20% the effort

of the original image while guarantying a similar quality.

Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Picture/ImageGeneration—Display algorithms

1. Introducción

El año 2010 supuso el arranque definitivo del cine en 3D.La mayoría de los cines comerciales están equipados conesta tecnología, y no se concibe una superproducción cine-matográfica que no soporte este formato.

En este artículo nos centramos en la producción de con-tenido estereoscópico para cine de animación generado porordenador [SKK∗11]. En este contexto, es necesario contarcon una configuración específica de cámaras estereoscópi-cas en paralelo, tal y como se describe en [WDK93]. Estaconfiguración permite captar dos vistas de la misma escena.Al proporcionar ambas vistas por separado a cada uno de losojos del espectador, su cerebro es capaz de formar la ilusiónde una imagen 3D.

Si bien los rudimentos para la generación de cine de ani-mación estereoscópico ya son bien conocidos, hemos de te-ner en cuenta que procesos de generación de animacionesa partir de modelos 3D con calidad cinematográfica requie-ren tiempos de cómputo muy elevados que repecuten en al-tos costes de producción. Cada fotograma suele requerir untiempo de procesamiento de varias horas, que en el caso deuna película estereoscópica debe multiplicarse por dos

En este trabajo se han desarrollado técnicas que permitenreducir este sobrecosto. Para ello, hemos desarrollado unatécnica que se basa en el paradigma conocido como “DepthImage Based Rendering” (DIBR) [Feh04] (visualización deimágenes basadas en profundidad) que nos permite reusar laimagen de uno de los ojos para generar la imagen del otro,sin necesidad de realizar el proceso de dibujado completo.

Pese a que este tipo de técnicas ya se habían utilizadocon anterioridad, nunca se habían aplicado a la generaciónde imágenes estereoscópicas para cine de animación. En es-te trabajo resolvemos las problemáticas inherentes a dichastécnicas, y las adaptamos a los altos requerimientos de cali-dad del contexto cinematográfico. Por último, evaluamos losresultados obtenidos con dos escenas de producción reales.

De esta manera, este artículo presenta un sistema que ex-tiende y combina algoritmos existentes para alcanzar la ro-bustez y variabilidad necesaria para operar en entornos deproducción cinematográfica de alta calidad. Las novedadesson:

Un método eficiente y simple basado en la GPU para in-ferir la imagen de la segunda cámara a partir de la imagende la primera cámara y su mapa de profundidad.




89

J.M. Noguera, A.J. Rueda, M.A. Espada & M. Martín / Optimized generation of stereoscopic CGI films by 3D image warping

Una técnica novel para determinar una máscara con losagujeros presentes en la imagen inferida.Métodos para garantizar la alta calidad exigida en el en-torno cinematográfico, incluyendo problemas derivadosde superposiciones, filtrados “anti-aliasing”, etc.

El resto del artículo se estructura de la siguiente mane-ra. La Sección 2 describe el estado del arte y contextualizanuestro trabajo. El método propuesto consta de cuatro pa-sos, que se describen en la Sección 3. A continuación, laSección 4 presenta la evaluación empírica de este método.La Sección 5 concluye el artículo y esboza líneas de trabajofuturas.

2. Trabajos previos

Las técnicas de modelado y visualización basadas en imá-genes han recibido mucha atención en la literatura como unaalternativa a considerar frente a las técnicas tradicionales ba-sadas en geometría [SK00]. En función de la cantidad de in-formación geométrica que se use para generar una imagen,los métodos de visualización basados en imágenes puedenclasificarse en [CSN07]:

Métodos que emplean solamente imágenes sin geometríaexplícita. Estos métodos determinan la geometría de for-ma implícita mediante las correspondencias especiales en-tre un conjunto pequeño de imágenes, lo que permite ge-nerar vistas nuevas, ver [LH96, GGSC96]. Este tipo demétodos se emplean para generar nuevas vistas de un ob-jeto a partir de un conjunto de fotografías.Métodos en los que la información geométrica es acce-sible de manera explícita, ya sea mediante la profundi-dad de cada píxel o mediante coordenadas explícitas 3D.Dependiendo del tipo de propiedades geométricas cono-cidas existen diversas técnicas de este tipo [SKK∗11],por ejemplo, los ya mencionados DIBR [Feh04], imáge-nes por capas de profundidad (“layered depth images”, oLDI) [SGHS98], y reconstrucción de la vista intermedia(“intermediate view reconstruction”, o IVR) [ZKU∗04].

En el caso particular del cine de animación nos interesa elsegundo tipo de técnicas. Esto se debe a que sólo contamoscon la imagen sintetizada para una de las dos vistas, y nues-tro objetivo es generar la segunda vista. Además, el mapade profundidad asociada a cada píxel de la imagen (distan-cia a la que se encuentra del observador) es fácil de obtenerya que es ampliamente usado en varias etapas de la post-producción.

Entre los métodos enumerados en esta segunda catego-ría, destacamos que los métodos DIBR pueden utilizar-se para sintetizar imágenes desde puntos de vista cerca-nos mediante el empleo del mapa de profundidad, véase[MMB97,McM97]. Conceptualmente, este proceso consistede dos pasos. Primero, los píxeles de la imagen de referenciason “desproyectados” hasta su emplazamiento 3D original,utilizando para ello sus respectivos valores de profundidad.

Segundo, estos puntos en el espacio 3D se proyectan confor-me a la nueva vista. Esta concatenación de proyección 2D-a-3D, y la subsiguiente 3D-a-2D, se denomina usualmentedeformación 3D de imagen (“3D image warping”). El prin-cipal problema de estas técnicas es la aparición de agujerosen la nueva imagen derivada debido a la existencia de par-tes de la escena ocultas desde la vista original, pero visiblesdesde la nueva.

En la literatura existen varias soluciones para evitar dichosagujeros. La mayoría de autores, como [MMB97, PSM04],utilizan múltiples imágenes de referencia que deben mez-clarse entre sí a fin de obtener la imagen derivada. En el casodel cine de animación, esta solución es impracticable puestoque sólo disponemos de una imagen como referencia. Váz-quez et al. [VTS06] proponen un estudio con diversas técni-cas de llenado de agujeros. Pero aunque estas técnicas pue-dan disimular en cierta medida los agujeros (por ejemplo,mediante interpolación lineal del color de los bordes), sonincapaces de generar información que no estuviera presenteen la imagen original. Por tanto, sus posibilidades de éxitodependen de la escena concreta y del tamaño del agujero, ylos resultados difícilmente son aceptables para una produc-ción cinematográfica.

3. Descripción del método

En esta sección describimos nuestro método para sinteti-zar imágenes estereoscópicas sin necesidad de tener que di-bujar explícitamente las imágenes correspondientes a ambascámaras estereoscópicas. Nuestra solución evita además lasanomalías visuales introducidas por problemas de superpo-sición y por el filtrado “anti-aliasing”.

Primeramente, es preciso dibujar la imagen correspon-diente a una de las dos cámaras (ya sea la derecha o la iz-quierda) de la manera convencional mediante un motor de“render” profesional. A partir de esta imagen, nuestro méto-do consiste en los siguientes pasos:

1. La imagen de la segunda cámara se deriva de la prime-ra mediante una técnica de deformación 3D de imagenbasada en la profundidad.

2. La nueva imagen contiene agujeros y errores de superpo-sición [CW93]. Éstos se detectan y se cubren medianteuna máscara.

3. La máscara se amplía para evitar la aparición de dientesde sierra en los bordes de los objetos.

4. El motor de “render” se utiliza nuevamente para rellenarlas partes de la imagen cubiertas por la máscara, obte-niéndose mediante composición la imagen completa fi-nal.

En lo sucesivo, a la imagen generada de manera conven-cional la llamaremos imagen de referencia. Y a la imageninferida mediante nuestro método la denominaremos imagenderivada.

Las siguientes secciones explican estos pasos con mayor


90


Crear tira de triángulos

Desproyectar vértices

según cámara A

Trasladar vértices según

mapa de profundidad

Proyectar vértices según

cámara B

Colorear con color

(0,0,0,0)

¿Fragmento de

trg marcado?

Colorear con color de

imagen de referencia

Framebuffer:

Capa 1: Imagen derivada 1 y

máscara 1

Capa 2: Mapa de profundad 1

Pro

cesa

do

r d

e v

ért

ice

s

Pro

cesa

do

r d

e f

rag

me

nto

s

CP

U

Colorear con color

(0,0,0,0)

¿Fragmento en

frontera?

Colorear con color de

imagen derivada 1

Framebuffer:

Capa 1: Imagen derivada 2 y

máscara 2

Capa 2: Mapa de profundad 2

Pro

cesa

do

r d

e f

rag

me

nto

s

Pasada 1

Pasada 2

Bypass

Pro

cesa

do

r

de

vé

rtic

es

Limpiar framebuffer con

color (0,0,0,0)

Marcar triángulo

¿Triángulo

ortogonal?

Pro

cesa

do

r d

e g

eo

me

tría

Sí

No

Sí No Sí No

Figure 1: Etapas de implementación del método en GPU.

detalle y describen cómo pueden implementarse eficiente-mente en GPU. La Figura 1 ilustra las distintas etapas dedicha implementación.

3.1. Paso 1: Deformación 3D de la imagen

Partimos de un par de cámaras que presentan una configu-ración estereoscópica en paralelo [WDK93]. Ambas cáma-ras, que llamaremos A y B, comparten los mismos paráme-tros intrínsecos y la misma dirección de visión, pero estánsituadas a lo largo de la línea horizontal que contiene susrespectivos puntos focales. Esta disposición de cámaras ga-

rantiza que las imágenes obtenidas a través de cada una deellas no presentan paralaje vertical.

Consideremos una imagen de referencia observada ymuestreada, que representa la proyección en un plano 2D deuna escena natural 3D a través de la proyección definida porla cámara A. Esta imagen se compone por una rejilla regularde píxeles. Sea Z un mapa de profundidad definido sobre lamisma rejilla, conteniendo la distancia desde el punto focalde la cámara A hasta el correspondiente punto en la escena3D proyectado en dicho píxel.

El objetivo del método de deformación 3D de la imagen


91


(a)

(b)

(c)

Figure 2: Las líneas indican los “frustum” de las cáma-

ras estereoscópicas. (a) Malla de triángulos insertada en el

“frustum” de la cámara A. (b) Misma malla tras aplicar la

traslación según mapa de profundidad. (c) En negro, trián-

gulos que conforman la máscara.

//salida hacia el procesador de geometría

out VertexData

vec2 texCoord;

float worldDepth;

VOut;

//matrices de cámara A

uniform mat4 Mp;

uniform mat4 Mv;

//posicion de la cámara A

uniform vec3 O;

//matrices de cámara B

uniform mat4 M′

p;

uniform mat4 M′

v;

//mapa de profundidad

uniform sampler2D Z;

//vértice en coordenadas de pantalla

in vec4 Vp;

//coordenadas de textura

in vec2 texCoord;

void main()

float depth = texture2D(Z, texCoord).r;

mat4 inverseM = inverse(Mp * Mv);

vec4 Vm = inverseM * Vp;

Vm = Vm/Vm.w;

vec3 V ′

m = O + depth * normalize(Vm.xyz-O);

gl_Position = M′

p * M′

v * vec4(V ′

m, 1.0);

VOut.texCoord = texCoord;

VOut.worldDepth = V ′

m.z;

Listado 1: Programa GLSL de vértices de la primera pasa-

da del método.

consiste en obtener una nueva imagen derivada que sea iguala la que se obtendría proyectando la escena a través de lacámara B.

El método propuesto se puede implementar eficientemen-te en el procesador de vértices de la GPU, ver Figura 1. ElListado 1 muestra el código GLSL de dicha implementación.Primero se genera una malla de triángulos plana con sus res-pectivas coordenadas de textura. La dimensión en vértices deesta malla coincide con la resolución en píxeles de la ima-gen de referencia. La malla se sitúa en el espacio de maneraque sea paralela al plano de proyección de la cámara A yocupe completamente su “frustum” de visión, ver Figura 2a.Para ello, en nuestra solución los vértices se generan en refe-rencia al sistema de coordenadas de la pantalla, distribuidosregularmente entre (-1, -1,) y (1, 1). La componente z es in-diferente siempre y cuando esté comprendida en el intervalo(0, 1), es decir, entre los planos de recorte cercano y distantede la cámara.

Cada vértice Vp de la malla es recibido el procesador de


92


vértices, ver Listado 1, y “desproyectado” para obtener suequivalente Vm en coordenadas del mundo. Para ello es pre-ciso multiplicar el vértice por la inversa del producto de lamatriz de vista Mv por la de proyección Mp. Ambas matricescorresponden a la cámara A:

Vm = (MpMv)−1

Vp

A continuación, y de manera similar a [MMB97], los vér-tices de esta malla se perturban, obteniéndose una superfi-cie continua que aproxima a la escena 3D. Esta perturbaciónconsiste en trasladar cada vérticeVm a lo largo de la línea queune dicho vértice (en coordenadas del mundo) con el centrode proyección de la cámara A hasta una nueva posición V ′

m,tal y como se ilustra en la Figura 2b. Con esta traslación seconsigue que la distancia entre los vértices y la cámara seaigual a la indicada en el mapa de profundidad Z para cadapíxel.

Como resultado de esta deformación, existirán triángulosde la malla que se estiren. No obstante, la continuidad dela malla no se rompe, y los triángulos estirados se coloreanmediante interpolación lineal entre los colores de sus vérti-ces. Esto evita la aparición de los agujeros que surgirían sise empleara como primitiva el punto en lugar del triánguloal dibujar la malla.

El vértice ya trasladado debe proyectarse de nuevo segúnlas matrices que definen a la cámara B: M′

p y M′

v:

V′

p = M′

pM′

vV′

m

Para finalizar esta etapa, el vértice se emite a la siguienteetapa del cauce gráfico.

3.2. Paso 2: Determinación de la máscara

El principal problema de la técnica de deformación 3Dde la imagen es que pueden aparecer anomalías visuales enla imagen derivada. En esta sección se detalla cómo trata-mos con estos problemas. Nuestra solución general consisteen determinar qué píxeles de la imagen derivada deben sereliminados, y construir una máscara que los contenga. Lamáscara puede calcularse de forma eficiente en GPU me-diante el uso combinado del procesador de geometría y el devértices, ver Figura 1. Los Listados 2 y 3 muestran nuestraimplementación en GLSL.

La máscara se representa mediante una imagen con lasmismas dimensiones que la imagen derivada pero que guar-da un bit por píxel. Cada bit indica la pertenencia a la másca-ra del píxel correspondiente de la imagen derivada. En nues-tra implementación en GPU, si I es la imagen derivada al-macenada en el “framebuffer” RGBA al final del procesográfico, y Pi, j un píxel de la misma, entonces la máscara M

está formada por:

//lambda, en unidades del mundo OpenGL

uniform float λ;

layout(triangles) in;

layout (triangle_strip, max_vertices=3) out;

//entrada del procesador de vértices

in VertexData

vec2 texCoord;

float worldDepth;

VIn[];

//salida hacia el procesador de fragmentos

out VertexData

vec2 texCoord;

float worldDepth;

float rubber_sheet;

VOut;

//definiciones para simplificar texto

#define v0 gl_in[0].gl_Position



void main()

VOut.rubber_sheet = 0.0;

float depthRange =

max( abs(v0.z-v1.z), abs(v0.z-v1.z) );

if( depthRange >= λ )

VOut.rubber_sheet = 1.0;

for(int i = 0; i < gl_in.length(); i++)

gl_Position = gl_in[i].gl_Position;

VOut.texCoord = VIn[i].texCoord;

VOut.worldDepth = VIn[i].worldDepth;

EmitVertex();

EndPrimitive();

Listado 2: Programa GLSL de geometría de la primera pa-

sada del método.

M = Pi, j ∈ I : Pi, j = (0,0,0,0)

La necesidad de la máscara reside en que la técnica de de-formación 3D de la imagen habitualmente introduce anoma-lías en las fronteras entre objetos de la escena. Por ejemplo,entre el borde de un objeto en primer plano y el fondo. Al serla malla una superficie continua, se generan superficies arti-ficiales que realmente no existe en la escena 3D original. Es-tas superficies están formadas por triángulos de la malla muyalargados, conocidos en la literatura como triángulos “rubber


93


//entrada desde el procesador de geometría

in VertexData

vec2 texCoord;

float worldDepth;

float rubber_sheet;

VIn;

//salida: imagen derivada y máscara

layout(location = 0) out vec4 colorOut;

//salida: nuevo mapa profundidad

layout(location = 1) out float depthOut;

uniform sampler2D RGB;

void main()

if( VIn.rubber_sheet == 1.0 )

colorOut = vec4(0,0,0,0);

depthOut = 0;

else

colorOut = vec4(

texture2D(RGB, VIn.texCoord).rgb,

1.0);

depthOut = -VIn.worldDepth;

Listado 3: Programa GLSL de fragmentos de la primera pa-

sada del método.

sheet” [MMB97]. Éstos pueden apreciarse claramente en laFigura 2b.

Para solventar este problema, debemos saber distinguirentre superficies que de verdad existen en la escena y lassuperficies artificiales formadas por los triángulos alargados.En este trabajo empleamos una técnica eficiente basada en eltest de ortogonalidad propuesto por Pajarola et al. [PSM04]e implementada en GPU.

Su funcionamiento se basa en que los triángulos alargadostienen la siguiente propiedad en común: la normal del trián-gulo es casi perpendicular la vector formado por la posiciónde la cámara hasta el centro del triángulo. En consecuencia,los triángulos alargados son casi paralelos a la línea de vis-ta, por lo que su rango de profundidad ∆z en coordenadasdel mundo es muy elevado en proporción al resto de trián-gulos. En nuestra implementación, empleamos al procesadorde geometría para detectar triángulos alargados simplemen-te verificando si se cumple ∆z > λ para cada triángulo y uncierto umbral λ, ver Listado 2.

Una vez un triángulo ha sido detectado como alargado,una solución inmediata podría ser eliminarlo de la malla. Es-to se puede implementar eficientemente descartando el trián-gulo en el procesador de geometría. Desgraciadamente, aldescartar un triángulo se rompe la continuidad de la malla,

(a) (b)

Figure 3: Los propios triángulos alargados permiten aña-

dir zonas indeseadas a la máscara, evitando anomalías de

superposición.

por lo que aparecen agujeros en su superficie. A través de es-tos agujeros pueden visualizarse otras partes de la malla, loque puede provocar errores de superposición [CW93]. Estefenómeno se produce cuando el objeto visualizado a travésde dicho agujero debería realmente estar oculto tras otro ob-jeto que no era visible en la imagen de referencia. Por tanto,las superposiciones deben tratarse como si fueran agujeros,y añadirse a la máscara. Por ejemplo, en la Figura 3a muestrauna superposición. A través de uno de los agujeros se ve unobjeto (en color azul) donde debería estar situado el cuerpodel modelo.

Nosotros proponemos aprovechar estos triángulos alarga-dos indeseados para generar la máscara de una forma senci-lla y directa. En lugar de borrarlos, todos los triángulos seemiten a la siguiente etapa del cauce gráfico. Pero los alar-gados son previamente marcados para que el procesador defragmentos (ver Listado 2) los pinte con color (0,0,0,0), in-dicando su pertenencia a la máscara. En la Figura 2c se hancoloreado todos los triángulos alargados en negro. Al pro-yectar dicha malla desde la cámara B, se obtiene una imagencomo la de la Figura 3b. Aquí vemos cómo los propios trián-gulos alargados permiten marcar la zona de la imagen quedebe volver a dibujarse en el paso 4, evitando la aparición deagujeros y superposiciones.

Puede ocurrir que este criterio señale como triángulos aeliminar a algunos que pertenezcan a superficies legítimasque sean casi paralelas a la línea de visión. Dado el elevadorequerimiento de calidad que exigen las producciones cine-matográficas, es preferible descartar superficies de más (queluego pueden volver a regenerarse), a permitir la apariciónde errores en la escena final. Además, una superficies legíti-ma casi perpendicular a la vista va a estar extremadamentesubmuestreada, por lo que no es un problema descartarla pa-ra volver a generarla.

Es importante señalar que el umbral λ depende de la esce-na, y de su valor va a depender en gran medida el resultado


94


//entrada desde el procesador de vértices

in VertexData

vec2 texCoord;

vec4 color;

VIn;

layout(location = 0) out vec4 colorOut;

layout(location = 1) out vec4 maskOut;

uniform sampler2D RGB;

uniform sampler2D Z;

//lambda, en unidades del mundo OpenGL

uniform float λ;

//Tamaño en píxeles de la silueta a detectar

uniform int B;

//Inversa del tamaño de la imagen en píxeles

uniform vec2 invImagen;

bool descartar( float actualZ )

for( int i=-B; i<=B; i++ )

for( int j=-B; j<=B; j++ )

vec2 texC = vec2(

VIn.texCoord.x + invImagen.x*i,

VIn.texCoord.y + invImagen.y*j );

float vecinoZ = texture2D(Z, texC).r;

if( abs(vecinoZ - actualZ) > λ )

return true;

return false;

void main()

colorOut = texture2D(RGB, VIn.texCoord);

float z = texture2D(Z, VIn.texCoord).r;

if( descartar(z) )

maskOut = vec4(1,1,1,1);

colorOut = vec4(0,0,0,0);

Listado 4: Programa GLSL de fragmentos de la segunda

pasada del método.

final del proceso. Por tanto, su valor debe escogerse adecua-damente.

3.3. Paso 3: Tratamiento del “Anti-aliasing”

En informática gráfica, el “aliasing” es un problema grá-fico que surge al representar imágenes con altas frecuencias

(a) (b)

Figure 4: (a) La frontera entre objetos pueden generar dien-

tes de sierra. (b) Para evitarlos, se añaden a la máscara.

en una imagen de menor resolución. Debido a ello es habi-tual la utilización de filtros “anti-aliasing” para mezclar lospíxeles frontera entre objetos. Esto permite reducir el des-agradable efecto de líneas escalonadas, produciendo contor-nos más suaves.

Como indican Chen y Williams [CW93], la imagen de re-ferencia no debería someterse a filtrado “anti-aliasing”, por-que la mezcla del color en la frontera entre dos objetos esdependiente de la vista. Peor aún, el método de deformación3D de imagen requiere un único valor de profundidad porpíxel, y el valor de profundidad para un píxel filtrado es am-biguo al pertenecer a más de un objeto.

Por desgracia, en el contexto del cine de animación noes posible generar las escenas de referencia sin filtro “anti-aliasing” porque su calidad no sería aceptable para fines ci-nematográficos. Por lo tanto, nuestro método debe ser capazde tratar la ambigüedad en el color los píxeles frontera gene-rados por dicho filtrado.

En nuestra propuesta utilizamos una imagen de referenciacon filtrado “anti-aliasing”, pero el mapa de profundidad Z

asociado se genera sin dicho filtrado. Esto permite solventarel problema de la ambigüedad en la profundidad de los pí-xeles frontera, sin con ello reducir la calidad de la imagende referencia. Desgraciadamente, esto no acaba totalmentecon el problema. La imagen derivada también puede incluirnuevos errores de “aliasing” no presentes en la imagen de re-ferencia. Esto se produce como resultado del solapamientode la malla de triángulos consigo misma tras ser desplazadaen el paso 1 (ver Sección 3.1). Por ejemplo, en la Figura 4ael rostro del modelo se ha desplazado hacia la derecha delobservador, solapando al fondo de color azul. En esta nue-va frontera se evidencia un pronunciado efecto de dientes desierra.

Nuestra propuesta para eliminar los dientes de sierra enlas fronteras consiste en efectuar una segunda pasada en laGPU (ver Figura 1) que toma como entrada la imagen deri-vada y la máscara generada en los pasos 1 y 2. Esta pasada


95


adicional aplica un filtro que detecta las siluetas entre obje-tos en base al mapa de profundidad, y las añade a la máscara.El Listado 4 muestra el código GLSL que implementa estapropuesta.

Con más detalle, el filtro calcula para cada píxel de la ima-gen el máximo incremento en profundidad ∆z entre el píxely su vecindario. Se añaden a la máscara todos los píxeles pa-ra los que se cumpla ∆z > λ para el umbral λ ya definido enla Sección 3.2. La Figura 4b ilustra el resultado tras aplicarel filtro descrito.

Es cierto que muchas de estas fronteras no presentan pro-blemas de “aliasing”, y serán introducidas en la máscaraigualmente. Pero es preferible un ligero aumento en el tiem-po de procesamiento en el paso 4 a generar imágenes concalidad no aceptable para su explotación cinematográfica.

El tamaño del vecindario a considerar depende del tamañoen píxeles de la escena. Cuando mayor sea el vecindario, másgruesa será la silueta. Una silueta de mayor grosor reduce lasposibilidades de que bordes escalonados superen esta etapay aparezcan en la imagen final. A cambio, el coste de llenarlos agujeros en el paso 4 será mayor.

3.4. Paso 4: composición final

Como resultado de los pasos 1-3, se obtiene una imagenderivada que contiene una vista parcial de la escena visiblepor la cámara B. Parte de esta imagen está cubierta por unamáscara.

En este paso se vuelve a emplear el software profesionalde “render” para dibujar la escena a partir de la cámara B.Pero solamente se dibujan aquellos píxeles incluídos en lamáscara. Nótese que típicamente la máscara cubre una frac-ción pequeña de la superficie de la imagen derivada, por loque el costo de este dibujado es mucho menor que el de ge-nerar la imagen completa. Como resultado, se obtiene unanueva imagen con el contenido de la máscara.

Finalmente, esa nueva imagen se emplea para completar ala imagen derivada. Como resultado se obtiene la escena fi-nal visible desde la cámara B. Las imágenenes de referenciay derivada pueden combinarse para obtener la deseada vistaestereoscópica.

4. Resultados

En esta sección describimos los resultados experimenta-les que hemos obtenido con nuestro método. Como se haespecificado anteriormente, la motivación de nuestro trabajoes obtener un método eficiente que permita generar imáge-nes estereoscópicas para cine de animación sin necesidad dedibujar completamente las dos imágenes. Por tanto, nuestroobjetivo es doble.

1. Por un lado, asegurar que la calidad de la imagen genera-da automáticamente es comparable a la que se obtendríamediante el método convencional.

2. Por otro lado, conseguir reducir los elevados tiempos decómputo requeridos para dibujar la imagen estereoscópi-ca por el método convencional.

Para demostrar que nuestra técnica consigue ambos obje-tivos, hemos empleado nuestra técnica en escenas 3D habi-tuales de trabajo. También evaluamos la calidad de las imá-genes obtenidas mediante un estudio cuantitativo. La Figu-ra 5 muestra las dos escenas empleadas en nuestra experi-mentación.

Todos los experimentos han sido realizados en un PC conprocesador Intel Core2 Quad a 2.40 GHz, equipado con unaGPU GeForce 8800GT y 4 GB de RAM. El software de“render” empleado para generar las imágenes ha sido Arnold4.0.10.2 para Windows 64 bits.

4.1. Rendimiento

La Tabla 1 muestra los resultados que hemos obtenidocon nuestro método al inferir la imagen para el ojo izquierdo(imagen derivada) a partir de la del ojo derecho y su mapa deprofundidad (imagen de referencia). De izquierda a derecha,se enumera el nombre de la escena, la resolución de la ima-gen, el parámetro λ (definido en la Sección 3.2), el tamañodel borde en píxeles empleado para evitar el “aliasing” (tal ycomo se describe en la Sección 3.3), el porcentaje de píxelesde la imagen incluidos en la máscara (y que por tanto debenser dibujados por Arnold), el tiempo en segundos emplea-do por Arnold al dibujar la imagen de referencia completa,y el tiempo total de nuestro método para generar la imagenderivada completa.

El tiempo total de nuestro método que se ofrece en la Ta-bla 1 incluye todo el proceso: a) leer y adaptar los ficherosde escena de Arnold adecuadamente desde disco; b) reali-zar el proceso de deformación 3D de imagen; c) realizar eldibujado de los agujeros con Arnold; d) componer la escenafinal; y e) borrar todos los ficheros intermedios. Debemos se-ñalar que, en nuestros experimentos, el tiempo requerido porla GPU para realizar el paso b) ha sido despreciable frenteal resto de pasos anteriormente enumerados.

El valor del umbral λ y del borde se ha elegido para mi-nimizar el tamaño de la máscara asegurando una buena cali-dad de la imagen derivada. Debemos apreciar que el costo deinicializar Arnold para una escena dada (carga desde disco,etc.) es indiferente de que vayamos a visualizar la imagenentera, o solo el subconjunto indicado por la máscara.

Como puede verse en la Tabla 1, los tiempos de generar laimagen con nuestro método son muy inferiores a los reque-ridos por el método habitual, con ganancias en tiempo entre3.2x y 4.5x, según la escena.

En el caso de la escena “Tres mimos”, la máscara cubrecerca de la mitad de la imagen. Esto indica que debe de vol-ver a dibujarse la mitad de la escena de la manera conven-cional para completar la imagen. Este elevado porcentaje se


96


(a) (b)

(c) (d)

Figure 5: Escenas usadas en nuestros experimentos. Arriba: escena “Tres Mimos”. Abajo: escena “Parque”. Izquierda:

máscara. Derecha: vista estereoscópica generado con nuestro método.

Escena Resolución λ Borde Máscara T(s) img. T(s) img.referencia derivada

Tres mimos 2048×872 0.5 2 52.88% 245 75.50Parque 2358×1080 6 4 24.487% 2725 610.72

Table 1: Resultados en la generación de la imagen derivada para distintas escenas.

debe a que el fondo de dicha escena está formado por unplano azul muy inclinado respecto a la línea de visión. Elloprovoca que gran parte del mismo sea detectado como super-ficie artificial, y sea incluido en la máscara. No obstante, lacomplejidad geométrica de dicho fondo es tan reducida queapenas requiere tiempo de redibujado.

En el caso de la escena “Parque”, la máscara comprendealgo menos de la cuarta parte de la imagen, principalmentebordes de los objetos. Ello redunda en una drástica reduccióndel tiempo de generar la escena derivada en comparación congenerar la imagen íntegramente con el método tradicional.

4.2. Calidad de la imagen

Dada la importancia que tiene la calidad de la imagen parauna superproducción cinematográfica, nuestro método debegarantizar que la imagen derivada es indistinguible frente ala imagen homóloga generada con métodos habituales. Enla práctica, no se exige una igualdad matemática entre am-bas imágenes, sino que ambas sean indistinguibles al ojo hu-mano.

La Tabla 2 informa del error absoluto en escena con un

Escena Error absolutoTres mimos 0.123%Parque 1.258%

Table 2: Error absoluto: porcentaje de píxeles de la imagen

derivada cuyo color difiere en menos del 5% respecto a la

misma escena generada de forma convencional.

valor de permisibilidad del 5%. Es decir, se ha comparadopíxel a píxel la imagen generada por nuestro método y lagenerada de forma convencional. La tabla informa del por-centaje de píxeles cuyo color difiere en más de un 5% entreambas. A tal fin, se ha empleado la utilidad “compare” delpaquete “ImageMagick”. Los parámetros utilizados han si-do: “compare -metric AE -verbose -fuzz 5%”.

Podemos considerar que una diferencia menor del 5% enun color representado por 24 bits es inapreciable para la vis-ta humana. Por lo tanto, la práctica totalidad de la imagengenerada por nuestra técnica será, en la práctica, indistingui-ble para el espectador. La diferencia será aún menos notablecuando la imagen derivada se ofrezca al espectador mezcla-


97


da con la imagen de referencia en una proyección estereos-cópica.


En este trabajo hemos presentado un método eficiente ba-sado en profundidad que permite la generación de imágenesestereoscópicas a partir de la imagen de una de las dos cáma-ras y su mapa de profundidad. La técnica propuesta permi-te reducir de forma drástica el tiempo de cómputo. Ademáses robusta y automática. Sólo requiere por parte de un ope-rador humano la fijación de dos parámetros: el umbral λ yel tamaño del borde de las fronteras. Estos parámetros sonconstantes para un plano dado siempre y cuando no cambiela escena.

Como limitaciones de la técnica descrita, creemos queaquellos efectos visuales que dependan de la posición de lavista (por ejemplo, “motion-blur”, brillos especulares, refle-xiones, refracciones, cáusticas, etc.) podrían causar proble-mas al ser proyectados desde la posición de la segunda cá-mara. Estos problemas podrían manifestarse como disconti-nuidades visibles en las zonas de la imagen situadas en lafrontera de la máscara. Por tanto, como trabajos futuros que-remos estudiar el comportamiento de nuestro método en ta-les situaciones, y desarrollar soluciones a los problemas quepudieran surgir.

Agradecimientos

Queremos agradecer a Solid Angle SL por proporcionar-nos licencia de su motor de “render” Arnold para la realiza-ción de esta investigación.

References

[CSN07] CHAN S., SHUM H.-Y., NG K.-T.: Image-based ren-dering and synthesis. Signal Processing Magazine, IEEE 24, 6(2007), 22–33. doi:10.1109/MSP.2007.905702. 2

[CW93] CHEN S. E., WILLIAMS L.: View interpolation forimage synthesis. In Proceedings of the 20th annual confe-

rence on Computer graphics and interactive techniques (NewYork, NY, USA, 1993), SIGGRAPH ’93, ACM, pp. 279–288.doi:10.1145/166117.166153. 2, 6, 7

[Feh04] FEHN C.: Depth-Image-Based Rendering (DIBR), Com-pression and Transmission for a New Approach on 3D-TV. InProceedings of SPIE Stereoscopic Displays and Virtual Reality

Systems XI (2004), vol. 5291, pp. 93–104. 1, 2

[GGSC96] GORTLER S. J., GRZESZCZUK R., SZELISKI R.,COHEN M. F.: The lumigraph. In Proceedings of the 23rd

annual conference on Computer graphics and interactive tech-

niques (New York, NY, USA, 1996), SIGGRAPH ’96, ACM,pp. 43–54. doi:10.1145/237170.237200. 2

[LH96] LEVOY M., HANRAHAN P.: Light field rendering. InProceedings of the 23rd annual conference on Computer grap-

hics and interactive techniques (New York, NY, USA, 1996),SIGGRAPH ’96, ACM, pp. 31–42. doi:10.1145/237170.237199. 2

[McM97] MCMILLAN JR. L.: An image-based approach to

three-dimensional computer graphics. PhD thesis, Chapel Hill,NC, USA, 1997. UMI Order No. GAX97-30561. 2

[MMB97] MARK W. R., MCMILLAN L., BISHOP G.: Post-rendering 3d warping. In Proceedings of the 1997 symposium

on Interactive 3D graphics (New York, NY, USA, 1997), I3D’97, ACM, pp. 7–ff. doi:10.1145/253284.253292. 2, 5,6

[PSM04] PAJAROLA R., SAINZ M., MENG Y.: Dmesh: Fastdepth-image meshing and warping. International Journal of Ima-

ge and Graphics 04, 04 (2004), 653–681. doi:10.1142/

S0219467804001580. 2, 6

[SGHS98] SHADE J., GORTLER S., HE L.-W., SZELISKI R.: La-yered depth images. In Proceedings of the 25th annual confe-

rence on Computer graphics and interactive techniques (NewYork, NY, USA, 1998), SIGGRAPH ’98, ACM, pp. 231–242.doi:10.1145/280814.280882. 2

[SK00] SHUM H., KANG S. B.: Review of image-based ren-dering techniques. In Communications and Image Processing

(2000), Ngan K. N., Sikora T., Sun M.-T., (Eds.), SPIE, pp. 2–13. 2

[SKK∗11] SMOLIC A., KAUFF P., KNORR S., HORNUNG A.,KUNTER M., MULLER M., LANG M.: Three-dimensional videopostproduction and processing. Proceedings of the IEEE 99, 4(2011), 607–625. doi:10.1109/JPROC.2010.2098350.1, 2

[VTS06] VAZQUEZ C., TAM W. J., SPERANZA F.: Stereosco-pic imaging: filling disoccluded areas in depth image-based ren-dering. In Proc. SPIE 6392, Three-Dimensional TV, Video, and

Display V (2006). doi:10.1117/12.685047. 2

[WDK93] WOODS A., DOCHERTY T., KOCH R.: Image distor-tions in stereoscopic video systems. In Proceedings of SPIE:

Stereoscopic Dispalys and Applications IV (1993), vol. 1915,pp. 36–48. 1, 3

[ZKU∗04] ZITNICK C. L., KANG S. B., UYTTENDAELE M.,WINDER S., SZELISKI R.: High-quality video view interpola-tion using a layered representation. ACM Trans. Graph. 23, 3(Aug. 2004), 600–608. doi:10.1145/1015706.1015766.2


98


A Client-Server Architecture for the Interactive Inspection of

Segmented Volume Models

J. Surinyac1, P. Brunet2

1Department of Digital Technologies and Information, University of Vic, Vic, Spain2Department of Software, Polytechnic University of Catalonia, Barcelona, Spain

Abstract

Interactive inspection of segmented volume models and anatomy atlases in client-server architectures is becoming

more and more popular as 3D medical volume models are improving in quality and detail. In this paper we

present a novel technique for the progressive transmission and region-based inspection of these medical models.

Our scheme is based on a forest of multiresolution octrees. Every segmented organ in the region of interest is

preprocessed and represented by an octree of surfels which code normals, color and color gradients. Individual

octrees can be progressively transmitted to the client on demand, depending on the importance of the organs and

on specific interaction queries. The performance of the proposed algorithms is discussed on a number of practical

examples.

Keywords: volume visualization, progressive transmission, client-server architectures, segmented volume models,

octrees, surfels, interactive inspection.

Categories and Subject Descriptors (according to ACM CCS): I.3.3 [Computer Graphics]: Picture/ImageGeneration—Line and curve generation

1. Introduction

Interactive inspection of segmented volume models andanatomy atlases in client-server architectures is becomingmore and more popular, now that 3Dmedical volumemodelsare continuously improving in quality and detail. Advancednew algorithms for compression and progressive transmis-sion are however required, because of user interest on dis-playing the models and interacting with them in low-end,portable and mobile clients.

In this paper we present a novel technique for the pro-gressive transmission and region-based inspection of thesemedical models. Our scheme is based on a forest of mul-tiresolution octrees. Every segmented organ in the region ofinterest is preprocessed and represented by an octree of sur-fels which code normals, color and color gradients. Individ-ual octrees can be progressively transmitted to the client ondemand, depending on the importance of the organs and onspecific interaction queries. The main contributions of ourapproach are:

• A new multiresolution volume data structure for binarysegmented volumetric models. Individual organs are mod-

eled by octrees of surfels with smoothed normal vectorsand non-constant color per surfel.

• A new surfel representation scheme which is based on theefficient coding of a linear scalar field in the cubic domainof each octree node. Surfel simplification is performedbottom-up by interpolating their linear scalar fields underthe assumption of volume preservation in octree nodes.

• A simple progressive transmission paradigm based on sur-fel compression and sequential transmission of individuallevels of organ octrees in the multiresolution octree forest.

• A set of specific and interactive inspection tools in theclient, including variable resolutions for different organs,selection of organs and a higher resolution spherical re-gion.

The paper is organized as follows. After reviewing theprevious work in next Section, Section 3 presents anoverview of the proposed algorithms. Sections 4 and 5 aredevoted to the generation of the Surfels Octree in the server,to the progressive transmission and to the interactive, region-based inspection in the client. Finally, Section 6 presents anddiscusses a number of examples while showing the main in-


99

J. Surinyac & P. Brunet / A Client-Server Architecture for the Interactive Inspection of Segmented Volume Models

teraction paradigms. Conclusions and future work are listedin Section 7.

2. Previous work

We present related work grouped into three categories: bi-nary models and plane parameterizations, hierarchical mod-els and progressive transmission, and visualization and inter-action in low-end client devices. Our literature review doesnot aim for completeness, but rather selects a few papers inthe different categories to position our work in the context ofprevious research.

Binary Models and Plane Parameterizations. Binaryvolumetric models have become more important over thelast few years. Binary voxelizations are commonly gener-ated as the result of segmentation algorithms working onvolumetric medical data (see for instance [TSH98] ) thatmust assign each voxel to a specific organ. Binary volumemodels miss information on the scalar field of densities andthey only represent in/out binary information on each of thesegmented organs. Discrete volume objects are also createdby the voxelization of different input models. Many recon-struction [EBV05] and model repair [BPK05] algorithms arebased on volume binary models, and a number of volumeoperations like splitting produce binary information in themodified regions.

Chica et al. [CWA∗06] designed an algorithm to extractsmooth surfaces from binary models while preserving fea-tures and sharp edges. The algorithm works by identifyingsticks (edges of voxels that connect in-out vertices). Smooth-ing displacements are computed using a bilaplacian opera-tor and averaging the displacements in two orthogonal 2Dplanes.

In most cases, the efficiency of the algorithms is stronglyrelated to the correct choice of the parameterization ofplanes. Our approach requires that the parameterization usedbe linear (so key computations reduce to simple plane-pointlinear tests) and if discretized, that it be faithful (the planerepresentation does not give more importance to some re-gions of the Universe Cube in detriment of others; a reg-ular grid in the parameterization space should map onto auniformly distributed family of planes in 3D space). How-ever, the Hough representation (or spherical coordinates, likethose used in [DDSD03]), must be discretized in a very fineway to be faithful: the required density of planes in sparseregions forces an unnecessarily high density of planes inother regions. Furthermore the Hough parameterization isnot linear, [ABC∗04]. In contrast with Hough’s representa-tion, the Connected-Cubes Plane Parameterization (CCPP)as proposed by Andújar et al. in [ACB∗04] and [ABC∗04] isa tetrahedron-based parameterization of planes. Given fournot colinear vertices of the unity cube that define a tetra-hedron, an oriented plane can be represented with a weightin each vertex, defining also a linear scalar field being the

plane its zero iso-surface. In the 4D space of the parameters,a plane is represented by any point in the straight line defin-ing it. The CCPP represents planes with the crossing pointbetween that line and the hypercube centered at the origin.This is a linear parameterization, that is, the constraint thata point in the real space is inside or outside the plane, alsobecomes a linear constraint in the 4D space of the parame-ters, and it is also faithful. This parameterization has beenused to find the largest planar region that approximates a 3Dobject, [ABC∗04].

Hierarchical Models and Progressive Transmission.

Client-server architectures and progressive transmission al-gorithms started almost two decades ago with mesh-specificmethods. Hoppe [Hop96] proposed an algorithm for thetransmission of triangle meshes as a sequence of progressivesmaller meshes.

Gobetti and Marton [GM04] work on client-server archi-tectures for huge point data clouds, by using a k-d tree hi-erarchical data structure. In their approach, each tree nodecontains a region of space with a similar amount of equallydistributed samples that are complemented by the samples inupper levels. Rendering traverses the tree from the top andcontinues until a predefined level/density is reached.

P. Callahan et al. [CBPS06] implemented a client-serverarchitecture for progressive volume rendering of unstruc-tured data. They used the server as a data repository andclients as renderer machines that accumulate the incominggeometry and display it in a progressively improving way.The server processed the model into and octree, by this waytraversing it by depth ranges from front to back. When theclient requested a certain packet, the server culled the ge-ometry outside the current depth range and sent visible vol-umes to the client, also incrementing depth ranges for futuressteps.

Gobetti et al. [GIM12] have recently proposed an algo-rithm for rendering gigantic volume models. In their client-server approach, the volume is decomposed into a mul-tiresolution hierarchy of bricks. Each brick is further sub-divided into smaller blocks, which are compactly describedby sparse linear combinations of prototype blocks stored inan overcomplete dictionary. The dictionary is learned, usinglimited computational and memory resources, by applyingthe K-SVD algorithm to a re-weighted non-uniformly sam-pled subset of the input volume.

For a more complete survey on volume rendering archi-tectures, client-server schemes and compression techniques,we refer to the recent work of Balsa et al, [BGI∗13].

Visualization and interaction in low-end client devices.

Nowadays, hospitals are more and more interested in tele-medicine and tele-diagnostic. Client-server applications al-low these functionalities. Sometimes the use of mobile andlow-end devices is necessary due to their portability and easymaintenance. However, low performance hardware proper-


100


ties make quite complex to build efficient visualization sys-tems on these devices.

Lamberti et al. [LZS∗03] implemented a technique inwhich storage and computation resources are provided bya server system while mobile devices are used only as clientfront ends. User interaction in the mobile device is codifiedand sent to the server which, in turn, produces the corre-sponding image in its frame buffer. The image is sent viaTCP to the client which shows it in its display. With lowbandwidth, the image is compressed to jpeg or gzip, but re-quires time to decodification in the client part that decreasesthe frame rate. The image can also be sent in half resolu-tion when the model is being rotated. To visualize at thesame time interior and exterior parts of a volume Monclúset al. [MDNV09] propose a metaphor named Virtual MagicLantern which uses a 3D pointing device to define a visu-alization cone. The volume outside the cone is rendered us-ing the original Transfer Function whereas the inside onecan use a different function, for instance rendering only spe-cific organs or parts of the model. Although this metaphorwas designed for a raycasting algorithm, it can be adapted toother visualization methods. Moser and Weikpof [MW08]introduced an interactive technique for volume renderingon mobile devices that adopts the 2D texture slicing ap-proach. Noguera et al. [NJOS12] propose an algorithm thatovercomes the 3D texture limitation of mobile devices andachieves interactive frame rates by caching the geometry ofthe slices in a vertex buffer object. The method proposed byDíaz et al. [DMNV12] produces expressive medical illustra-tions. Segmented medical datasets can be cut by a clippingplane defined by the user and then selected organs can beextruded out of the plane by dragging them with the mouse.An interesting application for mobile devices can be foundin [VB12]. The authors analyze current graphics hardwarein most high-end Android mobile devices and perform apractical comparison of volume rendering, as a well-knownGPU-intensive task. The paper discusses implementations ofthree different classical algorithms, to show how the currentstate-of-the art mobile GPUs behave in volume rendering.Campoalegre et al. [CBN12] proposed a scheme of visual-ization based in equal-sized blocks which are compressedvia wavelets and sent to the client device, which only de-compresses the blocks in the region of interest. Last but notleast, Movania and Feng [MF12] presented a single-pass raycaster and a 3D texture slicer rendering algorithm for theWebGL platform, capable of handle dynamic transfer func-tions in mobile devices.

Our approach is aimed at binary segmented volume mod-els. It relies on a set of hierarchical data structures while rep-resenting individual surfels with the plane parameterizationfrom [ACB∗04], which turns out to be specially well suitedin this case.

3. Overview of the proposed approach

Figure 1 presents the overall approach of our proposal. Thealgorithms starts from a segmented binary volume model V

without density information. In binary segmented visualiza-tions, every voxel stores an index to the organ in its specificlocation. In our case, we also assume that voxels in V con-tain the color attribute of its tissue. In a preprocess, organs inthe region of interest are detected, and a surfel octree (namedOrgan Octree in what follows) is computed for each organ.The server stores the set of all organ octrees (Octree Forestfrom now on) and sends parts of it to the clients based ontheir interactive requests.

The computation of Organ Octrees is repeated for eachorgan in the region being studied. For each octree, surfels inthe leaf octree nodes are first computed and smoothed. In asecond step, grey octree nodes are computed by a bottom-upsimplification process. The Organ Octree generation algo-rithm is detailed in Section 4.

The Octree Forest is rather compact and can be stored inthe server main memory. It is represented as an array of sur-fel octrees, each of them being indexed by its organ label.Transmission to the clients is performed on demand and ina progressive way. Every client c has a lower resolution ver-sion of the Octree Forest, as represented by its resolutionarray Rc = (dc,1,dc,2, ...dc,n,), where dc,k is depth of the oc-tree corresponding to organ k in the local representation ofclient c. Client requests are driven by their interaction events.A client with a resolution array Rc can ask for a refinementfor a certain organ k, and the next octree level for organ k inthe server Octree Forest will be sent to this client, also in-crementing dc,k by one. Clients render their volume modelsby octree traversal and surfel splatting. Progressive transmis-sion and region-based inspection in the clients is detailed inSection 5.

4. Generation of the surfels octree

Our initial volume model is a segmented human point dataset where the points have color and are arranged in an uni-form, axis-oriented regular grid defining a voxelization. ThisSection describes the preprocessing phase which computesthe Octree Forest. As already mentioned, the Octree Forestis a set of Organ octrees. We process one organ at a time inorder to compute its corresponding octree. Grid points areclassified in in points when they belong to this given organand out points otherwise.

A stick is any segment of the grid connecting an in gridpoint to an out grid point. For any voxel, its sticks form asubset of the edges of its cube. The surface of the organ muststab any stick in an undetermined point within it. Any givenstick belongs to four adjacent voxels which constitute thestick’s ring.

We first compute the set of sticks of the organ. Surface


101


Figure 1: Overview of the approach. Per organ preprocess and Client-Server interaction.

stabbing points are initially located at stick middle points.Then, stabbing points in all sticks are relaxed in order to in-crease the smoothness of the estimated organ surface. Wehave successfully used a bi-Laplacian Pressing smoothingalgorithm as detailed in [CWA∗06] to obtain fair organ sur-faces. We also estimate and relax the normal vector at eachstick stabbing point, see Figure 2. To do this, for each stick s

we access the stabbing points in the sticks of the four voxelsof the stick ring. Sticks in s’s ring create a triangle fan, withits central vertex in the stabbing point of s. Normals of eachtriangle in the fan are computed, averaged, and assigned tos. Normals of all sticks are finally smoothed by applying aLaplacian filter which takes into account the normals of theother sticks in its ring.

Figure 2: a) Voxel with vertices marked as in (black) or out(white). Thick lines are sticks. b) Stabbing points after Press-

ing. c) Normals of the sticks at the stabbing points. d) Ori-

ented surfel of the voxel.

Organ octrees are constructed bottom-up, with voxels be-ing the lowest level nodes. Only voxels with sticks are con-sidered. The data structure of a node includes a surfel andthe eight pointers to the non-void descendants. A surfel s

is determined by its supporting plane s.pl, its color s.c anda color gradient s.g. We first compute one surfel per organvoxel (organ voxels are voxels having sticks for this partic-ular organ), Figure 2. The term surfel is an abbreviation of"surface element", [PZvBG00]. By using surfels, objects canbe represented by a dense set of discs holding local, first-order surface information which include position and normalvectors, as described in [PZvBG00].

By taking advantage of the octree structure, we are able toencode surfels in a very compact way. Surfel geometries aresimply represented as oriented planes in the cubes of their

octree nodes. We have chosen the Connected Cubes PlaneParameterization (CCPP from now on) from [ABC∗04], toencode surfel planes with respect to their cubic domain. Theparameterization is based on four weights wk which definea linear scalar field, the weights being attributes of the ver-tices of the base tetrahedron defined by alternate vertices ofthe cubic domain, see figure 3a). The represented plane isthe zero-isosurface of the linear scalar field. The plane canbe represented by a line (the plane line) through the origin inthe four dimensional w0 .. w3 space, as the field isosurface isinvariant to any weights common scaling. To remove redun-dancy, the CCPP plane parameterization represents planesby the intersection between their plane lines and the 3D facesof a hypercube centered at the origin. This is a set of eight 3Dcubes, corresponding to the maximum and minimum valuesof the four weights, see figure 3b). The CCPP parameteriza-tion is faithful (it uniformly represents all possible planes inthe domain) and 2-concise, see [ABC∗04]. The 3D positions.P of the surfel is implicit in its CCPP representation. Wecan always compute it from the surfel plane s.pl by defin-ing it as the centroid of the intersections between the surfelplane and the edges of the boundary of its cubic domain.

Figure 3: a) The four vertices of the base tetrahedron for

the node-based Connected Cubes plane parameterization. b)

The eight CCPP cubes. In red, the connection corresponding

to the significant face (w+2 ..w−

3 ).

Surfels in organ voxels are computed as an average of theinformation in their stick stabbing points. The central points.P of the surfel in a voxel is the centroid of the stabbing


102


points of the sticks in the voxel, its normal vector being com-puted by averaging the stick normals.

Surfels in upper octree levels are computed by averagingand volume preservation. The normal vector of the parentnode is computed as the normal of the centroid of the sur-fels of the son nodes, computed in CCPP space. To fix thesurfel plane location, we work from the observation that theplane of the surfel partitions the volume of its cubic node intwo parts: the volume that is inside the organ and the onewhich is outside. In upper levels of the octree, we accumu-late the volume of the son nodes. Then, the exact positionof the surfel plane s.pl is adjusted by translating it until thevolume it determines coincides with the total volume of itsdescendants, see green volumes in Figure 4.

Figure 4: 2D example of volume preservation: a) Surfels in

the octants. b) Son volumes. c) Parent surfel and volume.

When surfels are almost opposite, they can produce a nullaverage for the parent. Then the parent surfel is computed asa double-sided surfel, visible from both sides and placed be-tween the initial surfels. If descendants define a closed vol-ume, their average is null again and there is no surfel definedfor the parent node.

The color of a stick is the color of its in point in the inputdata set. In voxels, the color of a surfel is computed as theaverage of the colors of all sticks in the voxel. In upper levelsof the octree the color s.c of a surfel is also the average of thecolors of its descendants. Colors are computed and averagedin YCoCg color space.

Constant surfel colors result in a piecewise flat appear-ance which is unpleasant. Instead, in our approach the colorof each fragment in the surfel is interpolated through a pre-computed color gradient which is stored as part of the surfelinformation. The color gradient is computed from the colorof four neighboring nodes. Given the surfel’s normal vec-tor ~n = (nx,ny,nz) and being |na| its dominant component,|na| ≥ |nb| ≥ |nc| where (a,b,c) denote a permutation of(x,y,z), neighbors are looked up in both directions of axisb and c. If any of those neighbors has no sticks (due to thebending of the surface), then four more candidates for thatneighbor are sequentially sought: an offset of +/− timesthe side of the node in the direction of axis a is applied tothe void neighbor, and new candidate nodes are obtained. Fi-nally, duplicates are rejected. The central points of the neigh-bors are then projected into the plane of surfel of the con-sidered node (figure 5). A linear gradient of the luminance

component of the colors is computed from these points andthe color of their surfel center by solving a linear system oftwo equations, as the gradient is a 2D vector which lies inthe plane of the surfel. The plane frame for gradient repre-sentation consists of two orthogonal vectors, the first one e1being the projection of the world coordinate direction nextto a in the cyclic ordering x-y-z-x, on the surfel plane. Ob-viously, the second intrinsic direction of the plane frame isthe cross-product between ~n and e1. By using this implicitframe we are able to decode and use color gradients in theclient without sending any information on the surfel planeframe.

Figure 5: Central node and four neighbors. Nodes are

clipped by the plane of the central node. Points a and d are

central points of neighbors projected to the plane. Surfels c

and d are not shown for clarity.

5. Progressive transmission and region-based inspection

Surfel information is encoded for fast and efficient transmis-sion. Surfel planes s.pl are encoded to 4 bytes by using theconnected cubes plane parameterization, [ABC∗04]. We use3 bits to encode the cube (W+

0 , W−

0 , .. W+3 ,W−

3 ) and 29 bitsto encode the coordinates (s.pl.x, s.pl.y, s.pl.z) of the planein its cube (s.pl.x and s.pl.y are encoded with 10 bits each,whereas 9 bits are used for s.pl.z, z being the direction whichis orthogonal to the initial images of the CT or MRI scan).To maximize the encoding accuracy in the connected cubesplane parameterization, we locally relate this parameteriza-tion to the cube of the octree node, by using four alternatevertices of the surfel’s octree node to define the vertices ofthe base tetrahedron (see Figure 3). Surfel color s.c requires3 bytes, while the 2D gradient (in the plane s.pl) of the colorluminance Y in YCoCg space s.g is quantified to 4 bytes (2bytes per gradient component). Chromaticity gradient is ne-glected, while the surfel location s.P is not encoded. Thetotal memory requirements per surfel is therefore 4+3+4=11bytes.

Progressive octree transmission is performed on demand.Every client c stores a lower resolution version of the Oc-tree Forest, as represented by its resolution array Rc =(dc,1,dc,2, ...dc,n), where dc,k is the depth of the octree cor-responding to organ k in the local representation of clientc. The server stores a two-dimensional array containing theresolution arrays of all clients. When a client with a reso-lution array Rc sends a query for a refinement of a certainorgan k, the server increases dc,k by one and it sends thenext octree level for organ k from its Octree Forest to this


103


client, with compressed surfels. Client queries include therequested level of the corresponding organ octree to avoidpotential synchronization problems. The client receives itand also increments its dc,k by one. Section 6 discussesmemory requirements for the progressive transmission.

After receiving one extra layer for any organ octree, theclient decodes surfel planes s.pl and computes the positionof their representative points s.P. For any surfel s, its surfelpoint s.P can be directly derived from the size and locationof tree node associated to the surfel (node locations and sizesare implicitly coded in the octree data structure). Surfel po-sitions s.P are simply computed as the centroids of the inter-section points between the surfel plane s.pl and the edges ofthe node. Once surfel point locations and surfel planes andnormals have been decoded, the new layer is inserted in thecorresponding organ octree.

At any moment, client c stores a set of multiresolutionorgan octrees with maximum resolutions given by its presentresolution array Rc = (dc,1,dc,2, ...dc,n). Interaction exploitsthese multiresolutions by supporting a number of differentparadigms:

• Organs can be individually selected and made either visi-ble or invisible, to highlight the remaining structures. Therendering algorithm simply traverses the subset of visibleorgan octrees.

• Visible organs can be assigned specific and different oc-tree resolutions. Organs at lower resolutions will be ren-dered with coarser surfels which correspond to higher lev-els in their organ octrees.

• The user can use a "higher resolution sphere" and moveit through the volume. The sphere acts as a "resolutionmagnifying lens", all surfels in the sphere being renderedat the maximum resolution available in the client at thatmoment.

• Distance-based view-dependent multiresolution is alsopossible. In this case, the level of the surfel in the octreeis computed from its distance to the observer, like in otherview-dependent algorithms.

At any frame, the surfels to be rendered are the surfelsof all visible organ octrees. The octree level for each surfelis determined by the resolution of the corresponding organand by detecting whether the surfel is inside the higher reso-lution sphere or not. Surfels are finally rendered by splat-ting, with a per fragment color computed from the surfelcolor s.c and the luminance gradient s.g. The splat size inobject space is computed as the maximum distance betweens.P and the intersections of s.pl with the edges of the oc-tree node. The fragment which renders the point s.P inheritsthe surfel color s.c whereas the color f .c of any other frag-ment f is computed from s.c and s.g, depending on the vec-tor from s.P to the fragment position f .P within the planes.pl: f .lum = s.lum+ < s.g∗ ( f .P− s.P), where f .lum ands.lum are Y luminances in YCoCg color space.

Section 6 presents several examples which show the po-tential of these interaction paradigms.

6. Results and discussion

We present examples on two different regions of the VisibleHuman model. Figure 6 shows a rectangular region of 75 x67 x 50 mm. representing part of the neck. In this case, thetotal number of processed organs is 49, therefore resulting inan Octree Forest of 49 components. Figure 7 shows a rect-angular region of 67 x 94 x 80 mm. representing part of thehead. In this case, the total number of components in the Oc-tree Forest is 32. Resolutions of the model regions presentedin figures 6 and 7 are 225 x 200 x 150 voxels and 195 x 280x 240 voxels respectively.

The images have been generated by a client with an IntelCore i7 CPU with 4 cores at 2GHz, 4 Gb of RAM and a GPUNVidia GeForce GT550M that has 2 Gb of RAM.

Tables 1 and 2 show preprocess information in the twocases presented in Figures 6 and 7: a rectangular region of75 x 67 x 50 mm. in the neck with a total of 49 organ octreesand a rectangular region of 67 x 94 x 80 mm. in the head witha total of 32 organ octrees. These tables display informationfor a subset of six organs in each case. The total numberof surfels in levels n-3, n-2, n-1 and n is shown (columns3, 2, 1 and 0 respectively) together with total of surfels peroctree level and per organ. Level n is the maximum depth ofthe corresponding organ octree. Preprocessing times are allbelow one minute in a server with an Intel Core i7 CPU at2GHz and with 4Gb of RAM.

Rendering and interaction snapshots with these two Oc-tree Forests are presented in Figures 6 and 7. In both Fig-ures, the first column shows the volume with all organs,while conjunctive tissue has been removed in the second col-umn for the sake of clarity. It should be observed that userscan turn organs on and off in a direct way to have either aglobal view or to inspect a selected subset of them. Secondand third rows in Figures 6 and 7 some several interactionpossibilities. Users can interactively select the proper reso-lution level for every organ to highlight some of them at themaximum resolution, selected organs can be rendered withsemi-transparent surfels, and a high-resolution sphere can beproperly located to inspect some organs at a maximum de-tail.

Rendering all organs in Figure 6 at the maximum resolu-tion requires a total of 1.89 Msplats, while rendering themat resolution level n−3 requires a total of 23 Ksplats. In thecase shown in Figure 7, the number of splats is 1.81 Msplatsand 22 Ksplats respectively.

We start the progressive transmission of the organ octreesat resolution level n−3 and then successively send next lev-els on demand until sending the maximum resolution surfelsat level n. In our two cases, level n−3 contains 23 Ksurfels


104


Number of SurfelsOrgan

3 2 1 0 TotalTime (sec)

Parotid gland 1706 7503 32227 132460 174367 46.90

Masseter 858 3988 17392 71467 93924 16.25

Mandible 557 2466 10157 41299 54619 24.92

Cerebellum 106 479 2048 8398 11070 2.53

... ... ...

Intervertebral disk C3 12 32 147 724 924 0.22

Buccinator 2 12 59 254 335 3.43

Box total 23244 106474 459964 1898287 2494097 562.56

Table 1: Preprocessing times of six organs from Figure 6. Times are between less than one second and 47 seconds.

Number of SurfelsOrgan

3 2 1 0 TotalTime (sec)

Skull 2927 13919 59356 241746 318817 53.46

Orbicularis 1346 6828 30102 124097 162752 53.34

Masseter 1105 4792 20115 82118 108424 14.453

Cerebrum 445 1855 7832 32152 42439 12.92

... ... ...

Mandible 167 719 2838 11601 15377 17.35

Oblique superior 10 51 213 859 1137 12.89

Box total 21912 100725 437877 1812313 2378778 569.91

Table 2: Preprocessing times of six organs from Figure 7. Times are between 12 and 53 seconds.

and 22 Ksurfels, as already mentioned. With a compressionrate of 11 bytes/surfel, this is equivalent to 253 KB and 242KB respectively, which give an acceptable quality with a re-duced data flow. Organ octrees are afterwards progressivelysent to the clients and refined, and some of them will finallyreach the maximum resolution level n. The transmitted dataflow depends on the subset of selected organs for high res-olution rendering. The worst case (transmission of all max-imum resolution surfels) requires 1.89*11= 20.8 MB in thecase depicted in Figure 6 and 1.81*11= 19.9 MB in the casein Figure 7.

The overall application runs at interactive rates with aframe rate always above 30 fps, as shown in the accompany-ing video. Transmission times are reasonable and compatiblewith real-time interaction. In the examples in Tables 1 and 2,the amount of transmitted information is 0.25 MBytes forlevel n− 3 and 1 MByte for level n− 2, resulting in trans-mission times between 1 and 4 seconds in our tests. Themaximum resolution requires the transmission of 20 MBytes(around 50 seconds depending on the internet conditions)but it can be performed in parallel with real-time inspection.Users can interact with models at a resolution level of n−2(or n− 1) while the full resolution level is being received.Observe that network bandwidth limitations are not signifi-

cantly affecting the frame rate, as several resolutions of theinspected organs are always residing in the client memoryand ready for inspection and interaction.

Computing times both in preprocessing and transmissionare proportional to the number of surfels at maximum res-olution, which is a small fraction of the volume resolution(between 14% and 28% in our experiments). This fractiondecreases as the resolution increases, based on the proper-ties of boundary-encoding octrees.

One of the limitations of the present implementation is re-lated to user interaction, which must be still improved in or-der to attain optimum user expectations and the requirementsof medical doctors. Coarse simplification at upper tree levelscan result in some artifacts in regions containing thin organs,but we succeed in avoiding them by restricting the visualiza-tion to the octree levels between n−3 and n, for instance (itmust be observed that the information size at level n− 3 isonly about 250 KBytes). We also plan to improve the render-ing algorithm by including surfel clipping in the boundary ofthe high resolution sphere, to avoid artifacts in this region.The presented algorithms are currently being tested on a PCserver and on mobile device clients.


105


a) b)

c) d)

e) f)

Figure 6: Left part of the neck. Left column shows all organs while in the right column, only a subset of the organs have been

selected. In the top row (a and b) all organs are displayed at maximum resolution. In the middle row (c and d) bones are shown

at maximum resolution while the rest of organs are displayed with surfels from upper levels of the octree. In the bottom row (e

and f) the effect of the high resolution sphere interaction is shown


106


a) b)

c) d)

e) f)

Figure 7: Left part of the head. Left column shows all organs while in the right column, only a subset of the organs have been

selected. In the top row (a and b) all organs are displayed at maximum resolution. In the middle row (c and d) the eye and the

optic nerve are shown at maximum resolution while the rest of organs are displayed with surfels from upper levels of the octree.

In the bottom row (e and f) the effect of the high resolution sphere interaction is shown


107


7. Conclusions and future work

We present a new technique for the progressive transmissionand region-based inspection of segmented medical models.The overall scheme is based on a forest of multiresolutionoctrees, every segmented organ in the region of interest be-ing preprocessed and represented by an octree of surfelswhich encode surfel planes, color and color gradients fornon-constant color rendering. Individual octrees can be pro-gressively transmitted to the client on demand, dependingon the importance of the organs and on specific interactionqueries. The progressive transmission paradigm is based onsurfel compression and sequential transmission of individuallevels of organ octrees in the multiresolution octree forest.

One of the key ingredients in our scheme is a compact sur-fel representation which is based on the efficient coding of alinear scalar field in the cubic domain of each octree node.Surfel simplification is performed bottom-up by interpolat-ing their linear scalar fields under the assumption of volumepreservation in octree nodes. Our scheme also provides a setof specific and interactive inspection tools in the client, in-cluding variable resolutions for different organs, selection oforgans and a higher resolution spherical region.

In our future work we will perform user studies withmedical doctors to experimentally determine the perceptualthreshold and to fix the maximum allowable splat size indisplay coordinates in different conditions (no color gradi-ent, luminance gradient, gradient of all color components).We plan to handle cases when there is no more space in theclient for the higher-resolution data through LRU strategies.We will also implement improved clipping algorithms forsurfels in the boundary of the high resolution sphere, whilealso focusing on the design of a user friendly interface in theclient for selecting organs, resolutions and regions of inter-est. Finally, current tests will lead to future implementationson mobile device clients.

8. Acknowledgments

This work has been partially funded by the project TIN2010-20590-C02-01 of the Spanish Government. The authorsthank the anonymous reviewers for their valuable comments.

References

[ABC∗04] ANDÚJAR C., BRUNET P., CHICA A., NAVAZO I.,ROSSIGNAC J., VINACUA A.: Computing maximal tiles andapplication to impostor-based simplification. Computer Graphics

Forum 23, 3 (2004), 401–410. 2, 4, 5

[ACB∗04] ANDÚJAR C., CHICA A., BRUNET P., NAVAZO I.,ROSSIGNAC J., VINACUA A.: The Connected-Cubes Plane Pa-

rameterization. Tech. rep., Polytechnic University of Catalonia,2004. http: //www.lsi.upc.edu/~pere/Planes . 2, 3

[BGI∗13] BALSA M., GOBBETTI E., IGLESIAS J., MAKHINYA

M., MARTON F., PAJAROLA R., SUTER S.: A survey of com-pressed gpu-based direct volume rendering state of the art report.In Eurographics STARS (2013). 2

[BPK05] BISCHOFF S., PAVIC D., KOBBELT L.: Automaticrestoration of polygon models. ACM Transactions on Graphics

24, 4 (2005), 1332–1352. 2

[CBN12] CAMPOALEGRE L., BRUNET P., NAVAZO I.: Interac-

tive Visualization of Medical Volume Models in Mobile Devices.Springer-Verlag, 2012. 3

[CBPS06] CALLAHAN S. P., BAVOIL L., PASCUCCI V., SILVAC. T.: Progressive volume rendering of large unstructured grids.IEEE Transactions on Visualization and Computer Graphics

(2006), 1307–1314. 2

[CWA∗06] CHICA A., WILLIAMS J., ANDÚJAR C., BRUNET P.,NAVAZO I., ROSSIGNAC J., VINACUA A.: Pressing: Smoothisosurfaces with flats from binary grids. Computer Graphics Fo-

rum 27, 1 (2006), 36–46. 2, 4

[DDSD03] DÉCORET X., DURAND F., SILLION F., DORSEY J.:Billboard clouds for extreme model simplification. ACM Trans-

actions on Graphics 22, 3 (2003), 689–696. 2

[DMNV12] DÍAZ J., MONCLÚS V., NAVAZO I., VÀZQUEZ P.:Adaptive cross-sections of anatomical models. Computer Graph-

ics Forum (Proc. Pacific Graphics), Forum 31(7) (2012), 2155–2164. 3

[EBV05] ESTEVE J., BRUNET P., VINACUA A.: Approximationof a variable density cloud of points by shrinking a discrete mem-brane. Computer Graphics Forum 24, 4 (2005), 791–808. 2

[GIM12] GOBBETTI E., IGLESIAS J., MARTON F.: Covra: Acompression-domain output-sensitive volume rendering architec-ture based on a sparse representation of voxel blocks. EuroVis

2012, Computer Graphics Forum 31, 3 (2012), 1315–1324. 2

[GM04] GOBETTI E., MARTON F.: Layered point clouds. a sim-ple and efficient multiresolution structure for distributing andrendering gigantic point-sampled models. Computers & Graph-

ics 28, 6 (December 2004), 815–826. 2

[Hop96] HOPPE H.: Progressive meshes. ACM SIGGRAPH Pro-

ceedings (1996), 99–108. 2

[LZS∗03] LAMBERTI F., ZUNINO C., SANNA A., FIUME A.,MANIEZZO M.: An accelerated remote graphics architecture forpdas. Proc. of the 8th International Conference on 3D Web Tech-

nology. ACM (2003), 55–61. 3

[MDNV09] MONCLÚS V., DÍAZ J., NAVAZO I., VÀZQUEZ P.:The virtual magic lantern: An interaction metaphor for enhancedmedical data inspection. VRST (2009), 119–122. 3

[MF12] MOBANIA M., FENG L.: Mobile visualization ofbiomedical volume datasets. Journal of Internet Technology and

Secured Transactions (JITST) 1, 1 (2012). 3

[MW08] MOSER M., WEISKOPF D.: Interactive volume render-ing on mobile devices. In Workshop on Vision, Modelling and

Visualization VMV’08 (2008). 3

[NJOS12] NOGUERA J., JIMÉNEZ J., OGÁYAR C., SEGURA

R.: Volume rendering strategies on mobile devices. Interna-

tional Conference on Computer Graphics Theory and Applica-

tions (GRAPP) (2012). 3

[PZvBG00] PFISTER H., ZWICKER M., VAN BAAR J., GROSS

M.: Surfels: Surface elements as rendering primitives. Proc. of

SIGGRAPH 2000, ACM Transactions on Graphics 19 (2000). 4

[TSH98] TIEDE U., SHIEMANN T., HOEHNE K.: High qualityrendering of attributed volume data. Proceedings of the IEEE

Visualization Conference (1998), 255–262. 2

[VB12] VÀZQUEZ P., BALSA M.: Practical volume rendering inmobile devices. International Simposium on Visual Computing,

Lecture Notes in Computer Science 7431 (2012), 708–718. 3


108

Rendering Relativistic Effects in Transient Imaging

Adrian Jarabo1 Belen Masia1,2 Andreas Velten2,3 Christopher Barsi2 Ramesh Raskar2 Diego Gutierrez1

1Universidad de Zaragoza 2MIT Media Lab 3Morgridge Institute for Research

Abstract

We present a real-time framework which allows interactive visualization of relativistic effects for time-resolved

light transport. We leverage data from two different sources: real-world data acquired with an effective exposure

time of less than 2 picoseconds, using a novel imaging technique termed femto-photography, and a transient

renderer based on ray-tracing. We overcome the two main limitations of existing models for relativistic effects,

namely the assumption that surface irradiance is constant over time, and that all frames of reference are purely

translational. We explore the effects of time dilation, light aberration, frequency shift and radiance accumulation,

as an unconstrained virtual camera explores a reconstructed 3D scene depicting dynamic illumination. We modify

existing models of these relativistic effects to take into account the time-resolved nature of our data, and introduce

the first model of relativistic sensor rotation in computer graphics.

1. Introduction

Analyzing and synthesizing light transport is a core research

topic in computer graphics, computing vision and scientific

imaging [GNJJ08]. One of the most common simplifica-

tions, rarely challenged, is the assumption that the speed

of light is infinite. While this is a valid assumption in most

cases, it is certainly not true: light travels extremely fast, but

with finite speed. In this paper, we lift this assumption and

explore the consequences of dealing with time-resolved data

(finite speed of light), focusing on the relativistic effects that

occur when the camera moves at speeds comparable with the

speed of light.

Transient imaging has recently emerged as a vibrant, ex-

citing area of research. Being able to analyze light trans-

port at picosecond scale has already helped gain a bet-

ter understanding of the complexities of light propaga-

tion [VWJ∗12, VWJ∗13], to approximate the shape of hid-

den objects [VWG∗12] or reflectance properties of planar

patches [NZV∗11]. In this paper, we offer a novel contribu-

tion by visualizing relativistic effects of time-varying irra-

diance. Beyond the pure scientific interest of advancing the

field of relativistic visualization, our work has direct appli-

cations in games (see for instance OpenRelativity from the

MIT Game Lab [KTS13]) and education. Additionally, it can

also help set the ground to derive a time-resolved theory of

light transport.

Relativistic rendering is not new [CLC96, WBE∗06].

However, our time-resolved framework implies by defini-

tion that surface irradiance is not constant in the temporal

domain, so existing models must be revised and redefined.

We describe here our technique to render and inspect scenes

where relativistic effects take place: in particular, we address

time dilation, light aberration, the Doppler effect and the

searchlight effect. Moreover, no existing model of relativis-

tic rotation exists in the literature, which hinders free explo-

ration of scenes; we additionally introduce the first model of

relativistic sensor rotation in computer graphics.

To obtain input data, we rely on two sources of infor-

mation. One the one hand, the recent imaging technique by

Velten and colleagues called femto-photography [VWJ∗13].

Guided by a femto-second laser as a light source, it has an

effective exposure time per frame of less than 2 ps, allow-

ing to visualize the propagation of light through a scene.

This is real-world captured data, which we leverage using

image-based rendering techniques. Since the camera can-

not be moved in Velten’s setup (please refer to Section 3

and the original paper for more details), our technique al-

lows to visualize novel view points, synthesizing light trans-

port in a physically accurate manner. On the other hand, we

also employ the transient renderer by Jarabo et al. [JMG13],

which allows us to create novel scenes and render simula-

tions of time-resolved light transport. Both approaches can

help gain a deeper understanding of light transport at pico-

second scale.

In summary, we have developed a rendering and visual-

ization tool for transient light transport, capable of simulat-




109

Jarabo et al. / Rendering Relativistic Effects in Transient Imaging

ing generalized relativistic effects, freed from the restrictions

of previous works. Our contributions can be summarized as

follows:

• We revise and correct well-established concepts about rel-

ativistic rendering, to take into account that irradiance can

no longer be assumed to be constant over time

• Previous techniques were also limited by linear velocities

of the (virtual) cameras. We propose the first approximate

solution for the case of a rotating sensor, so the camera

can be freely moved in 3D space

• We implement a fully working prototype, which allows

interactive visualization and exploration of both real and

simulated data

2. Related Work

Time Resolved Light Transport A modified rendering

equation can account for the finite speed of light and han-

dle transient effects [Arv93, SSD08]. However, in previous

works no practical rendering framework is derived from the

proposed transient rendering framework. A fully functional

time-resolved rendering system was recently presented by

Jarabo and colleagues [JMG13]. Wu et al. [WWB∗12] per-

form a rigorous analysis on transient light transport in fre-

quency space. They derive an analytic expression that mod-

els the information transfer between dimensions, and show

that this derivation can be exploited to build a bare-sensor

imaging system.

Time-resolved imaging is further analyzed by Wu et

al. [WOV∗12] to separate direct and global illumination

components in macroscopic table-top scenes. The authors

analyze the time profile for each pixel and decompose it into

direct, subsurface scattering and interreflection components.

Kirmani et al. [KHDR09] utilized global information in

time-of-flight images to infer geometries of hidden objects,

not directly visible by the camera, by using the encoded

time-of-flight of diffuse indirect illumination. This work

was further improved by Velten et al. [VWG∗12]. Material

BRDFs of distant patches were reconstructed [NZV∗11] via

light-bounce analysis from ultrafast image measurements.

Last, Velten et al. [VWJ∗12, VWJ∗13] developed femto-

photography, a technique that allows ultra-fast (in the order

of picoseconds) capture of transient light transport, by us-

ing a streak sensor, a femto-second laser, and computational

techniques. We explain this system in more detail in Sec-

tion 3, since we rely on the data it provides to render some

of the relativistic effects shown in this paper. This femto-

photography technique has inspired new approaches in tran-

sient imaging: recently, Heide et al. [HHGH13] developed

a system based on photonic mixer devices. While the hard-

ware employed is cheaper, the temporal resolution is not as

good, and the system relies on heavy optimization which can

take several hours.

Relativistic Rendering Here we discuss the most relevant

work on relativistic rendering. For a wider survey, we refer

to [WBE∗06], where the different proposed techniques for

both general and special relativistic rendering are discussed,

including their application as educational tools. Chang et

al. [CLC96] introduced the theory of Special Relativity in

the field of computer graphics. Their work accounts for

geometric and radiance transformations due to fast mov-

ing objects or camera. However, their formulation mod-

eled the searchlight and Doppler effects incorrectly; these

were later corrected by Weiskopf et al. [WKR99]. Follow-

ing work [WKR00] simulates relativistic effects in real cap-

tured scenes modeled with image-based techniques, by ap-

plying the relativistic transformations directly on the light

field. However, the authors assume light incoming from in-

finitely far away light sources with constant radiance, so both

the effects of distance and time-varying irradiance are ig-

nored. This allows them to make some simplifying assump-

tions about the radiance in the scene, which no longer hold

in the context of time-resolved data we deal with. Finally,

visualization approaches and games have been created with

a didactic goal, aiming at helping students in the understand-

ing of relativity. The game A Slower Speed of Light, notable

among these, uses the open-source toolkit OpenRelativity

which works with the Unity engine and can simulate spe-

cial relativity effects [KTS13]. However, to our knowledge,

they do not deal with time-varying irradiance either.

3. Time-Resolved Data

In this section we introduce briefly our two sources of time-

resolved light transport data: the novel femto-photography

technique of Velten et al. [VWJ∗13] allows us to capture real

data, while the rendering system of Jarabo at el. [JMG13]

provides simulated results. Note that Velten et al.’s paper de-

scribes the capture setup, while this work deals with synthe-

sizing new viewpoints based on the captured data, and taking

into account the associated relativistic effects that arise. We

refer the reader to the original references for more details.

3.1. Femto-Photography

The term femto-photography [VWJ∗12, VWJ∗13] refers to

a novel imaging technique which allows to code time of ar-

rival of individual photons on a spatial coordinate of a reg-

ular sensor. The technique has an effective exposure time of

down to 1.85 picoseconds, which allows to image the propa-

gation of light as it interacts with objects in a scene, opening

up new and exciting possibilities in forward and inverse light

transport analysis.

The system works as follows (see Figure 1 for a schematic

overview): a Ti:Sapphire femto-second laser pulse is repeat-

edly shot against a diffuser, which reflects it into the scene as

a spherical wave. Light interacts with the scene, and photons

enter the camera through a horizontal slit (thus only a sin-

gle scan line is imaged at a time). Within the camera, which


110


Figure 1: Schematic view of the femto-photography setup.

The inset shows an example streak image, as captured by the

sensor. The streak camera encodes time of arrival of individ-

ual photons in the y-dimension by means of a time-varying

voltage.

is known as a streak camera and is synchronized with the

laser pulse by means of a beam splitter, photons are con-

verted into electrons and then deflected vertically by a time-

varying voltage. In this way, photons arriving first will be

imaged onto different parts of the sensor along its vertical

coordinate, effectively coding time as a spatial coordinate

in the sensor. This yields one x-t streak image (see the in-

set in Figure 1). A rotating mirror progressively scans the

whole scene along its y-coordinate, as more laser pulses are

shot. This generates a 3D volume of x-y-t time-resolved data

which, when visualized along the t coordinate, produces the

final videos†.

3.2. Transient Rendering

Jarabo et al. [JMG13] build over the classical rendering

equation, by introducing the time domain:

L(x,ωo, t) = Le(x,ωo, t)+

∫Ω+ Li(x,ωi, t)ρ(x,ωi,ωo)(−ωi ·n)dωi

(1)

where x is the point in the scene being illuminated, n its nor-

mal; ωi and ωo the incoming and outgoing directions, re-

spectively; Le(x,ωo, t) is the emitted radiance in direction

ωo at time instant t; Li(x,ωi, t) is the incoming radiance at

x from direction ωi at instant t; ρ(x,ωi,ωo) represents the

BRDF at x; and Ω+ is the hemisphere centered at n. The so-

lution to this equation is computed by Montecarlo ray trac-

ing, taking into account the distance traveled by a ray from

its origin to the next intersection, as well as the index of re-

fraction η of the medium. This affects the speed of light v in

† Videos and data from scenes captured with this setup can be

found online at: http://femtophotography.info

the medium according to the equation v = c/η, where c is

the speed of light in a vacuum.

Figure 2 shows some results for the bunny scene. A spher-

ical wavefront of light advances towards the bunny inside a

Cornell box. The first two frames show the primary wave-

front reaching the floor, the bunny and the left wall, while

some secondary fronts reflecting from the bunny appear on

the floor. The front on the left wall appears mostly white

due to dynamic range issues, since the primary waveform

has much more energy than the secondary reflections. The

third and fourth frames show the primary wavefront past the

bunny and reaching the farthest wall, plus the rich combina-

tion of multiple reflections.

4. Relativistic Rendering

Time-resolved data allows us to explore light transport like

never before, no longer being constrained by the assump-

tion that light speed is infinite. While this is indeed a valid

assumption in most cases, the possibilities that open up an-

alyzing the dynamics of light at pico-second resolution are

fascinating.

4.1. Frames of Reference

Assuming that the geometry in the scene is known (which

can be easily acquired with a digitizer arm or from time-

of-flight data), we can synthesize new viewpoints and an-

imations of the scene by taking an image-based rendering

approach, using x-y textures from the x-y-t data cube and

projecting them onto the geometry. This allows us to visu-

alize real-world events from new, interesting angles. How-

ever, visualizing light transport events at this time scale

yields counter-intuitive results, as observed by Velten et

al. [VWJ∗13]. Due to the finite speed of light, events are

not captured in the sensor as they occur, which leads to un-

expected apparent distortions in the propagation of light.

Figure 3 illustrates this. From this observation, it follows

that two different temporal frames of reference must be em-

ployed: one for the world (when the events occur) and one

for the camera (when the events are actually captured).

As a consequence, sensor data acquired by the femto-

photography technique appears warped in the temporal do-

main, and must be time-unwarped to take into account the

finite speed of light. So for each frame in the synthesized an-

imations, we access the original warped data and apply the

following transformation [VWJ∗13]:

t′i j = ti j +

zi j

c/η(2)

where t′i j and ti j are camera and world times respectively,

zi j is the depth from each point (i, j) to the new camera po-

sition, and η the index of refraction of the medium. Note

how a naive approach based on simply sticking the textures

from the first frame to the geometry through the animation


111


Figure 2: The first four images show selected frames of a time-resolved rendering for the bunny scene [JMG13]. The rightmost

image shows the classic view of the scene, with all light integrated on the sensor during the simulated exposure.

Figure 3: Counter-intuitive results in time-resolved imag-

ing. Left: photons are shot simultaneously from the light

source towards the floor. Because their traveled distances

are different, they reach the floor at slightly different times,

and a wavefront appears traveling right to left (color-coded

blue to red). Right: Since the distances to the sensor are also

different, the bounced photons reach the sensor in inverse

order. The result is that the wavefront imaged on the sensor

travels in the reverse direction, left to right.

would produce wrong results; the distance from each geom-

etry point to the center of projection of the camera varies for

each frame, and thus a different transformation must be ap-

plied each time to the original, warped x-y-t data (see Figure

4). We assume a pinhole model for the camera.

4.2. Relativistic Effects

Apart from the time-warping of data, macroscopic cam-

era movement at pico-second time scales, like the one syn-

thesized in Figure 4 would give rise to relativistic effects.

This requires a relativistic framework to correctly repre-

sent and visualize light traveling through the 3D scene. Al-

though simulations of relativistic effects have existed for a

while [CLC96, WBE∗06], visualizing our particular time-

resolved datasets requires departing from the common sim-

plifying assumption of constant irradiance on surfaces. As

we will see in the following paragraphs, this has direct im-

plications on how radiance gets imaged onto the sensor.

According to special relativity, light aberration, the

Doppler effect, and the searchlight effect need to be taken

into account when simulating motion at fast speeds. Light

aberration accounts for the apparent geometry deformation

Figure 4: Time unwarping between camera time and world

time for synthesized new views of a cube scene. Top row,

left: Scene rendered from a novel view keeping the unwarped

camera time from the first frame (the small inset shows the

original viewpoint). Right: The same view, warping data ac-

cording to the new camera position. Notice the large changes

in light propagation, in particular the wavefronts on the floor

not visible in the previous image. Bottom row: Isochrones

visualization of the cube-scene for a given virtual camera

(color encodes time); from left to right: original x-y-t vol-

ume in the time-frame of the capturing camera, unwarped

x-y-t data in world time frame, and re-warped data for the

new virtual camera. Note the striking differences between

corresponding isochrones.

caused by two space-time events measured in two reference

frames moving at relativistic speeds with respect to each

other. The Doppler effect produces a wavelength shift given

by the Doppler factor. Last, the searchlight effect increases

or decreases radiance, according to whether the observer is

approaching or moving away from a scene. We modify exist-

ing models for the three effects to support time-resolved irra-

diance, and approximate the yet-unsolved solution for cam-

era rotation.

We build our relativistic visualization framework on the

derivations by Weiskopf et al. [WKR99]. We consider two

inertial frames, O and O′, where O′ (the sensor) is moving

with velocity v = βc with respect to O, with β ∈ [0..± 1).


112


L represents radiance measured in O, defined by direction

(θ,φ) (defined with respect to the motion direction) and

wavelength λ. The corresponding primed variables (θ′,φ′)and λ′ define radiance L′ measured in O′. To obtain the mod-

ified radiance L′ given L and the speed of the sensor, we need

to apply the following equation:

L′(θ′,φ′,λ′) = D

−5L

(

arccoscosθ′+β

1+βcosθ′,φ′,

λ′

D

)

(3)

where D = γ(1+βcosθ′) and γ = 1/√

1−β2. This equation

accounts for all three factors: light aberration, the Doppler

effect, and the searchlight effect. However, it cannot model

explicitly the effect of special relativity on time-resolved ir-

radiance. In the following paragraphs we explain each effect

separately, and discuss the modifications needed to handle

time-resolved irradiance.

Time dilation: Breaking the assumption of constant ir-

radiance means that we cannot ignore the effect of time di-

lation [Ein61]. Time dilation relates directly with Lorentz

contraction, and is defined as the difference in elapsed time

∆t between two events observed in different inertial frames;

for our world and camera frames of reference, this translates

into ∆t′ = γ∆t. This means that time in these two frames

advances at different speeds, making time in the stationary

frame (the world) advance faster than in the moving frame

(the camera). Thus, we need to keep track of both world t

and camera time t′, since they differ depending on the mo-

tion speed.

Light aberration: An easy example to understand light

aberration is to visualize how we see rain drops when travel-

ing on a speeding train. When the train is not moving, rain-

drops fall vertically; but as the train picks up speed, rain-

drop trajectories become increasingly diagonal as a function

of the train’s speed. This is because the speed of the train

is comparable with the speed of raindrops. A similar phe-

nomenon occurs with light if moving at relativistic speeds.

However, as opposed to rain drops, relativistic light aberra-

tion cannot be modeled with classical physics aberration; the

Lorentz transformation needs to be applied instead.

Light aberration is computed by transforming θ′ and φ′

with the following equations, which provide the geometric

transformation between two space-time events measured in

two reference frames which move at relativistic speeds with

respect to each other:

cosθ′ =cosθ−β

1−βcosθ(4)

φ′ = φ (5)

The end result is that light rays appear curved, with more

curvature as velocity increases. Given this curvature, light

rays reaching the sensor from behind the camera become

visible. Finally, as β approaches 1, and thus v ≈ c, most in-

coming light rays are compressed towards the motion direc-

tion; this makes the scene collapse into a single point as the

camera moves towards it (note that this produces the wrong

impression that the camera is moving away from the scene).

The first two rows in Figure 5 show the effects of light aber-

ration with increasing velocity as the sensor moves at rel-

ativistic speeds, towards and away from the scene respec-

tively.

Doppler effect: The Doppler effect is better known for

sound, and it is not a phenomenon restricted to relativistic

velocities. In our case, the Doppler effect alters the observed

frequency of the captured events in the world when seen by

a fast-moving camera, which produces a wavelength shift, as

defined by the Doppler factor D:

λ′ = Dλ (6)

The overall result is a color shift as a function of the ve-

locity of the sensor relative to the scene. Somewhat less

known, the Doppler effect also creates a perceived speed-

up (or down, depending on the direction of camera motion)

of the captured events. This means that the frame rate of the

time-varying irradiance f in world frame is Doppler shifted,

making the perceived frame rate f ′ in camera frame become

f ′ = f/D. Figure 5 (third row) shows an example of the

Doppler effect.

Searchlight effect: Due to the searchlight effect, photons

from several instants are captured at the same time differen-

tial, in part as a cause of the Doppler shift on the camera’s

perceived frame rate. This results in increased (if the ob-

server is approaching the scene) or decreased (if the observer

is moving away) brightness (see Figure 5, bottom row):

L′(θ′,φ′,λ′) = D

−5L(θ,φ,λ) (7)

Intuitively, continuing with our previous rain analogy, it is

similar to what occurs in a vehicle driving in the rain: the

front windshield will accumulate more water than the rear

windshield. For our time-varying streak-data, this means that

irradiance from several frames in world time interval dt is

integrated over the same camera differential time dt′, such

that dt = dt′/D. Note that the D−5 factor only is valid for

the case in which the directions of the velocity vector v and

the normal to the detector are parallel. We later show how to

approximate a rotation of the sensor.

Finally, Figures 6 and 7 show the result of combining

all these relativistic effects, both for the cube scene (data

captured with femto-photography techniques) and the bunny

scene (simulated data by rendering) respectively. The laser

wavelength is set at 670 nm for visualization purposes. We

refer the reader to the supplementary videos to see the full

animations.

4.3. Relativistic Rotation

Providing free navigation of a scene depicting time-resolved

light transport implies that the viewers should be allowed to

rotate the camera. However, there is no universally accepted


113


Figure 5: Relativistic effects shown separately for the cube scene. First row: Distortion due to light aberration as the camera

moves towards the scene at different velocities, with β = 0,0.3,0.6,0.9,0.99. We assume a laser wavelength of 670 nm for

visualization purposes. Second row: The same effect as the sensor moves away from the scene, with the opposite velocity from

the previous row. Notice how in both cases light aberration produces counter-intuitive results as the camera appears to be

moving in the opposite direction. Third row: Doppler effect, showing the shift in color as a consequence of the frequency shift

of light reaching the sensor, with β = 0,0.15,0.25,0.35,0.50,0.55. Fourth row: Searchlight effect, resulting in an apparent

increase in brightness as the speed of the approaching camera increases, with β = 0,0.2,0.3,0.4,0.5 (simulated laser at 508

nm). All images have been tone-mapped to avoid saturation.

theory of relativistic rotation [RR04]. We propose a suitable

approximation based on limiting the rotation to very small

angles per frame, so the differential rotation of the camera’s

viewing direction between frames can be neglected. How-

ever, for non-infinitesimal sensors this small rotation causes

that the sensor’s differential surfaces to move at different

speeds: it creates a continuous linear velocity field Ψ on the

sensor, with a zero-crossing at the axis of rotation.

To simulate the rotation of the camera we therefore first

divide the sensor S in different areas s ∈ S. Our approxima-

tion effectively turns each of them into a different transla-

tional frame, with linear velocity ψs. Then, for each s we

render the scene applying the novel relativistic transforma-

tions introduced in this section, with a different βs for each

s (trivially obtained from an input β measured at the edge of

the sensor). This makes the incoming radiance be deformed

differently depending on the position of the sensor where it

is imaged. Figure 8 shows an example, where the sensor is

rotating clockwise.

5. Implementation

Our implementation allows for real-time visualization of rel-

ativistic effects, both from real and simulated data. It is im-

plemented in OpenGL as an stand-alone application, taking

as input the reconstructed geometry of the scene, as well

as the time-resolved data. The system is based on classic


114


Figure 6: Relativistic phenomena for the cube scene (real data) including light aberration, Doppler effect and the searchlight

effect, as the camera approaches the scene at increasing relativistic velocities v = βc (with β increasing from 0 to 0.77).

Figure 7: Relativistic phenomena for the bunny scene (simulated data) including light aberration, Doppler effect and the

searchlight effect, as the camera approaches the scene at increasing relativistic velocities v = βc (with β increasing from 0.2 to

0.9). Note that we transform the RGB computed radiance into luminance.

Figure 8: Relativistic rotation. Left: assuming that the rotation angle θ can be neglected between frames, we model the rotation

as a continuous linear velocity field on the sensor Ψ, so each differential area is assigned a different velocity ψs. This causes

that depending on the position on the sensor, different relativistic transformations are applied on the scene. The rest of the

frames show the effects of a clockwise rotation of the sensor, with β = 0,0.4,0.8,0.99 (measured at the edge of the sensor).

The small inset shows the original scene.


115


image-based rendering (IBR) techniques, where the shading

of the surface is modeled by the images projected over the

surface.

In our case, we use x-y images from the x-y-t data cube to

shade the geometry. The cube is stored as a 3D texture on the

GPU in world time coordinates. This allows us to apply time-

warping to adapt it to the new viewpoint in rendering time,

by simply applying the transformation defined in Equation 2

(see Section 4.2).

Due to light aberration the geometry viewed from the

camera is distorted. This distortion causes straight lines to

become curved, so the geometry has to be re-tessellated.

Image-space warping, which has been used in many scenar-

ios [CSHD11,TDR∗12,MWA∗13] and may appear as an al-

ternative, is not viable in this scenario because of the large

extent of the deformations, that make well-known prob-

lems of warping such as disocclusions clearly apparent. Our

implementation performs the re-tessellation off-line on the

CPU, but it is straightforward to tessellate it on the GPU on

the fly. Then, in render time, each vertex should be trans-

formed according to Equation 3.

Doppler effect is introduced by modifying the wavelength

of the outgoing illumination from the surfaces. To avoid the

complexity of a full-fledged spectral renderer, we assume

light with energy in only one wavelength of the spectrum. To

display radiance we use a simple wavelenght-to-RGB con-

version encoded as a 1D texture. Wavelengths out of the vis-

ible spectrum are displayed as gray-scale values.

Finally, when modeling the searchlight effect, we avoid

the straightforward approach to access all frames in the

streak data cube, bounded by dt, and integrate them. This

would require several accesses to the 3D texture, which

would hinder interactivity. Instead, we pre-integrate irradi-

ance values in the temporal domain, and use anisotropic

mipmapping to access the pre-integrated irradiance values,

using dt to select the mipmap level in the time dimension.

6. Conclusions and Future Work

In this paper we visualize light transport effects from an en-

tirely new perspective, no longer constrained by the assump-

tion of infinite speed of light. We hope this will spur future

research and help to better understand the complex behav-

ior of time-resolved interactions between light and matter.

We have used real data from the recent femto-photography

technique [VWJ∗13], as well simulation data produced by

a physically-based ray tracing engine especially designed to

support transient rendering [JMG13].

To visualize this data, we have developed an interactive

image-based rendering application, that allows free naviga-

tion through the reconstruction of the captured scenes, in-

cluding physically-based relativistic effects due to fast cam-

era motion. We have introduced, for the first time in com-

puter graphics, the modified equations necessary to render

surfaces when irradiance is not constant over time, as well as

an approximate solution for the case of rotation, for which a

definite solution does not exist in the physics literature.

Of course there is plenty of exciting future work ahead.

Our current implementation assumes Lambertian surfaces,

so the viewing angle with respect to the normal has no influ-

ence in the result. This assumption can be relaxed by using

more sophisticated IBR techniques e.g. [BG01]. Addition-

ally, right now we only use radiance as captured by the sen-

sor. When camera movement reveals surfaces which were

originally occluded, we simply render them black. How-

ever, the use of time-resolved photographic techniques has

already demonstrated promising results at recovering hid-

den information, including both geometry [VWG∗12] and

a parametric model of reflectance [NZV∗11]. A promising

avenue of research we are already working on involves gen-

eralizing these seminal works to be able to obtain both ge-

ometry and reflectance at the same time for hidden objects.

Acknowledgements

This research has been funded by the European Commis-

sion, Seventh Framework Programme, through the projects

GOLEM (Marie Curie IAPP, grant agreement no.: 251415)

and VERVE (Information and Communication Technolo-

gies, grant agreement no.: 288914), the Spanish Ministry

of Science and Technology (TIN2010-21543), by the Media

Lab Consortium Members, MIT Lincoln Labs and the Army

Research Office through the Institute for Soldier Nanotech-

nologies at MIT. Belen Masia was additionally funded by

an FPU grant from the Spanish Ministry of Education and

by an NVIDIA Graduate Fellowship. Ramesh Raskar was

supported by an Alfred P. Sloan Research Fellowship and a

DARPA Young Faculty Award.

References

[Arv93] ARVO J.: Transfer equations in global illumination. InGlobal Illumination, SIGGRAPH’93 Course Notes (1993). 2

[BG01] BOIVIN S., GAGALOWICZ A.: Image-based renderingof diffuse, specular and glossy surfaces from a single image. InProceedings of the 28th annual conference on Computer graph-

ics and interactive techniques (2001), SIGGRAPH ’01, pp. 107–116. 8

[CLC96] CHANG M.-C., LAI F., CHEN W.-C.: Image shadingtaking into account relativistic effects. ACM Trans. Graph. 15, 4(Oct. 1996), 265–300. 1, 2, 4

[CSHD11] CHAURASIA G., SORKINE-HORNUNG O., DRET-TAKIS G.: Silhouette-aware warping for image-based render-ing. Computer Graphics Forum (Proceedings of the Eurograph-

ics Symposium on Rendering) 30, 4 (2011). 8

[Ein61] EINSTEIN A.: Relativity: the special and the general the-

ory. Crown Publishers, 1961. 5

[GNJJ08] GUTIERREZ D., NARASIMHAN S., JENSEN H.,JAROSZ W.: Scattering. In ACM SIGGRAPH Asia Courses, 18

(2008). 1


116


[HHGH13] HEIDE F., HULLIN M., GREGSON J., HEIDRICH

W.: Low-budget transient imaging using photonic mixer devices.ACM Trans. Graph. 32, 4 (2013). 2

[JMG13] JARABO A., MASIA B., GUTIERREZ D.: Transient

Rendering and Relativistic Visualization. Tech. Rep. TR-01-2013, Universidad de Zaragoza, April 2013. 1, 2, 3, 4, 8

[KHDR09] KIRMANI A., HUTCHISON T., DAVIS J., RASKAR

R.: Looking around the corner using transient imaging. In ICCV

(2009). 2

[KTS13] KORTEMEYER G., TAN P., SCHIRRA S.: A slowerspeed of light: Developing intuition about special relativity withgames. In Proceedings of the International Conference on the

Foundations of Digital Games (FDG) (2013). 1, 2

[MWA∗13] MASIA B., WETZSTEIN G., ALIAGA C., RASKAR

R., GUTIERREZ D.: Display adaptive 3D content remapping.Computers & Graphics (2013). 8

[NZV∗11] NAIK N., ZHAO S., VELTEN A., RASKAR R., BALA

K.: Single view reflectance capture using multiplexed scatteringand time-of-flight imaging. ACM Trans. Graph. 30 (Dec. 2011),171:1–171:10. 1, 2, 8

[RR04] RIZZI G., RUGGIERO M. L.: Relativity in Rotating

Frames. Kluber Academic, 2004. 6

[SSD08] SMITH A., SKORUPSKI J., DAVIS J.: Transient Ren-

dering. Tech. Rep. UCSC-SOE-08-26, School of Engineering,University of California, Santa Cruz, February 2008. 2

[TDR∗12] TEMPLIN K., DIDYK P., RITSCHEL T.,MYSZKOWSKI K., SEIDEL H.-P.: Highlight microdispar-ity for improved gloss depiction. ACM Trans. Graph. 31, 4 (July2012), 92:1–92:5. 8

[VWG∗12] VELTEN A., WILLWACHER T., GUPTA O., VEER-ARAGHAVAN A., BAWENDI M. G., RASKAR R.: Recoveringthree-dimensional shape around a corner using ultrafast time-of-flight imaging. Nature Communications, 3 (July 2012). 1, 2, 8

[VWJ∗12] VELTEN A., WU D., JARABO A., MASIA B., BARSI

C., LAWSON E., JOSHI C., GUTIERREZ D., BAWENDI M. G.,RASKAR R.: Relativistic ultrafast rendering using time-of-flightimaging. In ACM SIGGRAPH 2012 Talks (2012). 1, 2

[VWJ∗13] VELTEN A., WU D., JARABO A., MASIA B., BARSI

C., JOSHI C., LAWSON E., BAWENDI M., GUTIERREZ D.,RASKAR R.: Femto-photography: Capturing and visualizing thepropagation of light. ACM Trans. Graph. 32, 4 (2013). 1, 2, 3, 8

[WBE∗06] WEISKOPF D., BORCHERS M., ERTL T., FALK M.,FECHTIG O., FRANK R., GRAVE F., KING A., KRAUS U.,MULLER T., NOLLERT H.-P., RICA MENDEZ I., RUDER H.,SCHAFHITZEL T., SCHAR S., ZAHN C., ZATLOUKAL M.: Ex-planatory and illustrative visualization of special and generalrelativity. IEEE Transactions on Visualization and Computer

Graphics 12, 4 (July 2006), 522–534. 1, 2, 4

[WKR99] WEISKOPF D., KRAUS U., RUDER H.: Searchlightand doppler effects in the visualization of special relativity: Acorrected derivation of the transformation of radiance. ACM

Trans. Graph. 18, 3 (1999). 2, 4

[WKR00] WEISKOPF D., KOBRAS D., RUDER H.: Real-worldrelativity: Image-based special relativistic visualization. In IEEE

Visualization (2000), pp. 303–310. 2

[WOV∗12] WU D., O’TOOLE M., VELTEN A., AGRAWAL A.,RASKAR R.: Decomposing global light transport using time offlight imaging. In IEEE Computer Vision and Pattern Recogni-

tion (2012), pp. 366 –373. 2

[WWB∗12] WU D., WETZSTEIN G., BARSI C., WILLWACHER

T., O’TOOLE M., NAIK N., DAI Q., KUTULAKOS K., RASKAR

R.: Frequency Analysis of Transient Light Transport with Ap-plications in Bare Sensor Imaging. In European Conference on

Computer Vision 2012 (2012). 2


117

Sesion 4

Animating Objects and Characters

Anisotropic Strain Limiting

F. Hernandez, G. Cirio, A.G. Perez, and M.A. Otaduy

URJC Madrid, Spain

Abstract

Many materials exhibit a highly nonlinear elastic behavior, such as textiles or finger flesh. An efficient way of

enforcing the nonlinearity of these materials is through strain-limiting constraints, which is often the model of

choice in computer graphics. Strain-limiting allows to model highly non-linear stiff materials by eliminating de-

grees of freedom from the computations and by enforcing a set of constraints. However, many nonlinear elastic

materials, such as composites, wood or flesh, exhibit anisotropic behaviors, with different material responses de-

pending on the deformation direction. This anisotropic behavior has not been addressed in the past in the context

of strain limiting, and naïve approaches, such as applying a different constraint on each component of the prin-

cipal axes of deformation, produce unrealistic results. In this paper, we enable anisotropic behaviors when using

strain-limiting constraints to model nonlinear elastic materials. We compute the limits for each principal axis of

deformation through the rotation and hyperbolic projection of the deformation limits defined in the global refer-

ence frame. The limits are used to formulate the strain-limiting constraints, which are then seamlessly combined

with frictional contact constraints in a standard constrained dynamics solver.

Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Physically basedmodeling—

1. Introduction

Highly nonlinear elastic materials, such as flesh and fab-rics, can be modeled very accurately using hyperelasticity.However, hyperelastic models exhibit a very high numericalstiffness, which requires very small simulation time steps.An efficient alternative to hyperelasticity is the use of strain-limiting constraints. In essence, strain-limitig eliminates de-grees of freedom from the computations and, as a counter-part, enforces a set of constraints. Therefore, strain-limitingmethods enable larger time steps, and turn the complexityinto the enforcement of constraints. They are often the modelof choice for highly nonlinear elasticity in computer graph-ics [Pro95,BMF03,TPS09].

Many nonlinear stiff materials exhibit anisotropic be-haviors. Wood, for instance, has different deformation andstrength properties along three clearly defined directions:longitudinal (parallel to the grain), radial (across the growthrings) and tangential (tangent to the growth rings). Musclesare anisotropic, with different properties according to thedirection of the muscular fibers. When modeling heteroge-neous objects through a single mesh, such as a human fin-

ger, the presence of flesh, skin and bones generates a highlynonlinear and anisotropic behavior, with different amountsof deformation depending on the position and direction ofthe applied loads.

This anisotropic behavior has not been addressed in thepast in the context of constraint-based strain limiting. If set-ting up common limits for all possible directions is a solvedproblem, it is not clear how to set diverse limits for arbitrarydirections. There are simple and straightforward approachesused in other contexts to model anisotropy, such as project-ing the principal deformations onto orthogonal directions,or computing linear interpolation of limits defined for fixedorthogonal directions. However, these solutions produce un-realistic results due to over- or under-constrained axes.

In this paper, we introduce a novel hyperbolic projec-tion function to compute stretch and compress limits alongany deformation direction, and formulate the strain-limitingconstraints based on this interpolation. Since we enforcethe constraints following a constrained optimization for-mulation, we show how to compute the jacobians of theconstraints w.r.t. the generalized coordinates of the system.




121

F. Hernandez et al. / Anisotropic Strain Limiting

Strain-limiting and frictional contact constraints are thenseamlessly combined in a standard constrained dynamicssolver. We compare our approach to naïve solutions and dif-ferent approaches found in the literature, and show that ourapproach produces predictable and more realistic results.

2. Related Work

Strain-limiting was initially applied to cloth simulationbased on the mass-spring model [Pro95, BMF03], and laterextended to finite element methods [TPS09], where strain ismeasured and later limited by computing a correcting ve-locity vector that enforces the strain limits on each com-ponent. Wang et al. [WOR10] propose an approach inde-pendent from the underlying parametrization by comput-ing the principal strains of each mesh element, which arelater constrained to predefined limits in an isotropic fash-ion. They also improve the convergence of relaxation com-pared to [TPS09] by following a multi-resolution scheme.Principal strains are also computed for the simulation ofinvertible hyperelastic materials [ITF04], and gradients ofprincipal strains are needed for robust implicit integrationof such hyperelastic materials [SZL∗11]. Recently, Perez etal. [PCH∗13] proposed to directly constrain the deforma-tion tensor, and satisfy strain-limiting constraints using aLagrange-multiplier formulation. Such a formulation lever-ages implicit integration, which makes the relaxation stepsglobal and improves convergence, and treats strain-limitingconstraints just like other constraints such as contact, allow-ing them to be solved simultaneously using standard solvers.

The anisotropic behavior of real-world hyperelastic ma-terials has been scarcely addressed in the past in the con-text of strain limiting, yet many materials exhibit differ-ent material responses depending on the deformation direc-tion. Anisotropic behaviors are hard to implement in edge-based strain-limiting approaches [Pro95, BMF03], sinceedges need to be aligned with the deformation directionthat is being constrained, requiring extensive remeshing. Incontinuum-based approaches, Thomaszewski et al. [TPS09]use different limits for each strain value component of acloth simulation (weft, warp and shear strains). With this ap-proach, limits and strain values are always defined on un-deformed axes, hence they do not distinguish well the var-ious deformation modes under large deformations. Picin-bono et al. [PDA03] allow transverse anisotropic strain-limiting (with a transverse and a radial privileged direction)by adding an energy term to a hyperalasticity formulation,penalizing stretch deformations in the transverse direction.This formulation does not suffer from the same problems asfull anisotropy, since strain-limiting is only enforced on oneaxis, the radial axis being free to deform. Therefore, no inter-polation is required, but only a projection of the strain tensoralong the transverse direction.

Anisotropy behaviors can also be found in other dynamicphenomena. For instance, in the context of anisotropic frac-

ture propagation, Allard et al. [AMC09] define two fracturestress thresholds in reference orthogonal directions. In orderto define the threshold for other directions, they interpolatebetween the reference thresholds based on the angle betweendirections, and favor directions close to the reference by us-ing a peak function with a controllable steepness.

In Wang et al. [WOR10], as well as our previous ap-proach [PCH∗13], strain limiting is achieved by constrain-ing principal strains with given maximal and minimal val-ues. These two approaches are therefore isotropic. In orderto make them anisotropic, in this paper we design a novelhyperbolic projection function for stretch and compress lim-its for any deformation direction, and we take into accountthe resulting constraint formulation in our implicit solver.

3. Formulation of Anisotropic Strain Limiting

In this section, we present our formulation of anisotropicstrain limiting using a hyperbolic projection method. We firstrecall the formulation of strain-limiting, which limits theprincipal axes of deformation inside each tetrahedron. Wethen define the problem of computing strain limits along ar-bitrary directions, and present our solution using hyperbolicprojection. We also formulate the strain-limiting constraintsusing these anisotropic limits and describe the computationof constraint Jacobians, necessary for the constrained opti-mization solver.

3.1. Basic Formulation of Strain Limiting

As the underlying elasticity model, we use a linear co-rotational strain formulation [MG04] with a linear Hookeanmaterial model. We discretize the continuum elasticity equa-tions using the finite element method (FEM) and a tetrahe-dral mesh with linear basis functions. With these assump-tions, the strain and stress tensors are constant inside eachtetrahedral element.

Given the four nodes x1,x2,x3,x4 of a tetrahedral ele-ment, we define its volume matrix

X =(

x1 −x4 x2 −x4 x3 −x4)

. (1)

For convenience, we express the inverse of the rest-state vol-ume matrix based on its rows:

X−10 =

r1

r2

r3

. (2)

It is also convenient to define a fictitious row r4 = −(r1 +r2 + r3).

Using the volume matrix, the deformation gradient G =∂x∂x0

of a tetrahedron can be computed as

G = XX−10 . (3)


122


Following Perez et al. [PCH∗13], we limit strain effec-tively by limiting the deformation gradient of each tetrahe-dron in the finite element mesh. To this end, we compute asingular value decomposition (SVD) of the deformation gra-dient of each tetrahedron:

G = USVT ⇒ S =

s1 0 00 s2 00 0 s3

= UT

GV, (4)

where the singular values s1,s2,s3 capture deformationsalong principal axes. U and V are rotation matrices, and S isa scaling matrix. Unit singular values in all directions (i.e.,si = 1) imply no deformation. We enforce strain limiting byapplying a lower limit smin (i.e., compression constraint) andan upper limit smax (i.e., stretch constraint) on each singularvalue of the deformation gradient:

smin ≤ si ≤ smax. (5)

3.2. Definition of Strain Limits

In the isotropic case, computing the limits for any principalaxis of deformation is straightforward, since the limits areall the same no matter the direction. In the anisotropic case,however, limits and deformation values are defined on dif-ferent sets of axes. Stretch and compress limits are defined

on each axis of the global reference frame (s jmax and s

jmin,

with j ∈ 1,2,3). Deformation values are defined along theprincipal axes of deformation computed through the SVD(si, with i ∈ 1,2,3). In general, the frames do not match.Yet, we need to know the value of stretch and compress lim-its along the principal axes of deformation to be able to for-mulate the constraints as in Eq. (5). In the following, we de-scribe our method for the computation of deformation limitsalong the principal axes from deformation limits given on aglobal frame.

Fig. 1 illustrates the problem and our solution. Let Fd bethe orthonormal frame representing the principal axes of de-formation. According to the SVD decomposition in Eq. (4),in order to transform a vector from the global frame (wherethe limits are defined) to the frame Fd (where the deforma-tion values are defined), the vector has to be rotated by ma-trix VT . Since the limits are defined on the global frame,which uses a canonical basis (e1 e2 e3), VT provides thethree directions along which the limits are known in Fd .However, the deformation values to be limited are knownalong the axes of Fd . Hence, our problem is reduced to find-ing what the limits are along these axes.

For the general case, we require a function p that projectseach rotated limit onto the axes of Fd , thus providing stretchand compress limits to apply to each deformation value si.Since there are three directions (e1 e2 e3) with two limitseach (stretch and compress), and each direction has to beprojected on each axis of Fd , there is a total of 18 limits tobe computed (6 for each deformation value si).

Figure 1: Illustration of our hyperbolic projection method,

which projects the limits from the rotated global axes onto

the principal axes of deformation.

3.3. Hyperbolic Projection Function

Naïve approaches for p, such as orthogonal projection or lin-ear interpolation, result in incorrect or unrealistic results, asshown later in Section 5. Naturally, we want a non-linearinterpolation where the limit remains unchanged if deforma-tion and limit directions match, and where the limit vanishes(i.e. becomes infinitely large) when deformation and limitdirections are orthogonal. Therefore, we define p as:

p(θ) =1

|cos(θ)| , (6)

where θ is the angle between a given rotated limit directionand a given axis of Fd , as illustrated in Fig. 1.

Let us consider, for instance, axis e j of the global

frame, where sjmin and s

jmax are defined. The limit

direction in Fd is VT e j , and the axes of Fd are

((1,0,0)T (0,1,0)T (0,0,1)T ) = (e1 e2 e3). This results inthe following stretch and compress values for each deforma-tion value si:

sj,imin = 1+

sjmin −1

|eTi VT e j|

, (7)

sj,imax = 1+

sjmax −1

|eTi VT e j|

. (8)

Eqs. (7)-(8) provide stretch and compress values for eachlimit defined on a global axis ( j ∈ 1,2,3) and each princi-pal axis of deformation (i ∈ 1,2,3).


123


3.4. Constraint Formulation

In the isotropic case, the constraints are defined as:

Cimin = si − smin ≥ 0, (9)

Cimax = smax − si ≥ 0. (10)

Based on Eqs. (7)-(8), we reformulate our constraints to takeinto account each interpolated limit, resulting in:

Cj,imin = |eT

i VT

e j|(si −1)− (sjmin −1)≥ 0, (11)

Cj,imax = (s

jmax −1)−|eT

i VT

e j|(si −1)≥ 0. (12)

3.5. Constraint Jacobians

We enforce strain limiting constraints following a con-strained optimization formulation [PCH∗13], summarizedlater in Section 4. This formulation requires the computa-tion of constraint Jacobians w.r.t. the generalized coordinatesof the system (i.e., the nodal positions of the finite elementmesh) due to two reasons. First, constraints are nonlinear,and we locally linearize them in each simulation step. Sec-ond, we enforce constraints using the method of Lagrangemultipliers, which applies forces in the direction normal tothe constraints.

Taking the derivatives of Eqs. (11)-(12) w.r.t. a node xn

requires computing the derivatives of si and VT w.r.t. xn. Forthe differentiation of si, we show in [PCH∗13] that:

∂si

∂xn= rn vi u

Ti . (13)

Papadopoulo and Lourakis [PL00] define the derivative ofV w.r.t. each component gkl of the deformation gradient G

as:

∂V

∂gkl

=−VΩk,lv , (14)

where Ωk,lv is found by solving a 2×2 linear system for each

gkl . Since we need the derivative of the transpose of V, and

knowing that Ωk,lv is antisymmetric, we have:

∂VT

∂gkl

= Ωk,lv V

T. (15)

We can now use the chain rule to get the derivatives w.r.t.tetrahedral nodes xn. To avoid dealing with rank-3 tensors,we directly formulate the derivatives of VT e j instead:

∂VT e j

∂xn= ∑

l

(

Ω1,lv VT e j Ω

2,lv VT e j Ω

3,lv VT e j

)

· rn,l .

(16)Using Eq. (13) and Eq. (16), we can compute the derivativesof the constraints in Eqs. (11)-(12) w.r.t. the nodal positions

of the mesh:

∂Ci, jmin

∂xn= (si −1)sign(eT

i VT

e j)eTi

∂VT e j

∂xn+ |eT

i VT

e j|∂si

∂xn,

(17)

∂Ci, jmax

∂xn= (1− si)sign(eT

i VT

e j)eTi

∂VT e j

∂xn−|eT

i VT

e j|∂si

∂xn.

(18)

4. Simulation Algorithm

In Perez et al. [PCH∗13], we describe our algorithm for sim-ulating deformation dynamics with strain limiting. We for-mulate the simulation as a constrained optimization prob-lem, namely a linear complementarity problem, and we ap-ply standard solvers.

Given the nodal positions and velocities at the beginningof a simulation step, we perform an unconstrained dynam-ics step by integrating the unconstrained dynamics equa-tions with backward Euler implicit integration and linearizedforces. We then check whether strain-limiting constraints areviolated. We formulate the constraints using Eqs. (11)-(12),and linearize them at the beginning of the simulation stepusing the constrain jacobians in Eqs. (17)-(18).

The resulting linear complementarity problem (LCP)is solved using projected Gauss-Seidel (PGS) relaxation[CPS92]. Frictional contact is incorporated by comput-ing non-penetration constraints with contact friction usingCoulomb’s model. Contact constraints are linearized andseamlessly combined with strain-limiting constraints, andthe entire constraint set is solved simultaneously.

5. Results and Discussion

In this section, we present a set of simulation scenariosto illustrate and qualitatively assess our anisotropic strain-limiting approach. We also compare our work with sim-ple but naïve ways of addressing anisotropic strain limiting,such as orthogonal projection and linear interpolation of lim-its.

Simulations were run on a 3.4 GHz Quad-core Intel Corei7-3770 CPU with 32GB of memory.

5.1. Animation Tests

In order to qualitatively test the effect of anisotropic strainlimiting, we ran different simulations with a 1m× 0.2m×0.2m beam, fixed at one of its ends, with 200 tetrahedra,and a mass density of 1,000 Kg/m3. Fig. 2 (middle) showsthe results for a highly compliant beam (Young modulus ofE = 5kPa) with anisotropic strain limiting (unrestricted de-formation in the horizontal axis and restricted to 4% stretchand compress in the other two axes). For comparison, onthe right we show the same beam with isotropic 4% strain


124


Figure 2: A deformed beam with three different materials. From left to right: stiff without restrictions (E = 200kPa), compliant

anisotropic (E = 5kPa, without restriction in the horizontal axis and 0.96 < si < 1.04 in the other two), and compliant isotropic

(E = 5kPa, 0.96 < si < 1.04 in all axes).

limiting, and on the left a stiffer unrestricted beam (E =200kPa) with similar vertical deformation. Besides beingsignificantly more stretched than the others, the anisotropicbeam manages to preserve its wobbly elastic behavior alongits main axis due to its very low stiffness and its lack of re-striction, as observed in the video accompanying this paper.

A real-world finger is a clear example of anisotropic non-linear elastic behavior, particularly under compression. Dueto the presence of skin, flesh and bones, it is very compliantunder light loading, but soon becomes almost rigid. This istrue when the fingertip is pressed flat against a surface. Whenpressed on the side, there is hardly any deformation, showinga high anisotropy.

We simulate these highly nonlinear, highly anisotropicconditions using a finger model of approximately 7cm with347 tetrahedra, simulated with a mass density of 1,000Kg/m3 (roughly the average mass density of human flesh),and a Young modulus of E = 2MPa. The finger model is ini-tialized with its longitudinal direction aligned with the hori-zontal axis (e1), and the nail facing up along the vertical axis(e2). Limits are defined as 0.95 < s1 < 1.05 (stiff along e1),0.75 < s2 < 1.25 (compliant along e2) and 0.98 < s3 < 1.02(almost incompressible along e3). The aforementioned simu-lation parameters were selected by trial and error to approx-imately match the behavior of a real finger. Fig. 3 showssome results of the deformations when the finger is pressedagainst a table along each axis. We compared our model withan isotropic model using 0.75 < s1 < 1.25. As expected, forthe same motions we obtained similar results along (e2), andoverly compliant behavior along the other axes. The differ-ences across the models are clearly visible in the accompa-nying video.

Regarding performance, our approach is currently quiteexpensive. However, we have not tried to optimize the con-vergence of the solver. In the scenarios presented above, thesimulation runs in real-time for a low number of constraints

(∼< 5) and drops below interactive rates for highly con-strained configurations. In the finger scenarios, the framer-ate dropped below 1Hz during highly constrained motions(more than 40 tetrahedra with constraints).

5.2. Comparison with Other Approaches

In order to justify the use of our hyperbolic projection func-tion for the computation of limits along an arbitraty direc-tion, in this section we show that straightforward approachesdo not yield correct results. We compare our hyperbolic pro-jection method with the two simple but naïve approachesamong the projection and the interpolation categories: or-thogonal projection and linear interpolation.

Orthogonal projection works by simply rotating the limitsdefined in the global frame to frame Fd , and then project-ing these limits onto the axes of Fd , where the deformationvalues are defined. Therefore, there is a total of 18 limitsand constraints, as in our approach, with three stretch lim-

its s_orthopro jj,imax and three compress limit s_orthopro j

j,imin

per principal axis of deformation:

s_orthopro jj,imin = 1+(s

jmin −1) |eT

i VT

e j|, (19)

s_orthopro jj,imax = 1+(s

jmax −1) |eT

i VT

e j|. (20)

Linear interpolation, on the other hand, interpolates thevalues defined in the global frame to find the limits alongan arbitrary direction. Instead of rotating the global frameto Fd , we proceed the other way around: we apply the in-verse rotation to Fd to get the principal axes of deformationin the global frame. This allows us to easily compute the in-terpolations by simply computing the intersection of the linedefined by each principal axis of deformation with the ellip-soid defined by the global frame and its limits. Therefore,there is a total of 6 limits and constraints, as in the isotropic


125


Figure 3: A finger is pressed against a table in three different configurations. Top: the finger has anisotropic limits simulating

the behavior of a real finger (compliant when pressed flat, stiff otherwise). Bottom: the finger with isotropic compliant limits.

case, with a stretch limit s_linearint imax and a compress limit

s_linearint imin per principal axis of deformation:

s_linearintimin = ‖

s1min 0 00 s2

min 00 0 s3

min

Vei‖, (21)

s_linearintimax = ‖

s1max 0 00 s2

max 00 0 s3

max

Vei‖. (22)

We highlight the limitations of both aforementioned ap-proaches in the simple scenario of a compliant vertical beam(E = 10kPa), fixed at its bottom, and compressing due togravity, shown in Fig. 4. Poisson’s ratio is set to ν = 0.3.The vertical axis and one of the transverse axes are unre-stricted. The remaining transverse axis can only deform upto 5% (i.e., 0.95 < si < 1.05). Fig. 4 shows the state of thebeam when a constraint is violated for the first time for or-thogonal projection (left), our approach (middle), and linearinterpolation (right).

In the case of orthogonal projection, constraints are al-ready violated during the first frame of simulation, thusclearly yielding an overly stiff material. The reason behindthis erroneous behavior is the absence of weights to re-duce the influence of the limits defined on axes that are farfrom the principal axes of deformation. In our scenario, theSVD decomposition computed a vertical principal axis ofdeformation matching the global vertical axis. Therefore, theother two global axes, where stretch and compress limits aredefined, are orthogonal to the vertical principal axis of defor-mation. Since the orthogonal projection between orthogonalaxes is zero, according to Eqs. (19)-(20) there are two stretch

and two compress limits on the vertical principal axis of de-formation that are equal to 1, meaning that no deformationis allowed along that axis. The beam is therefore frozen inits initial configuration.

In the case of linear interpolation, constraints are violatedvery late, when the beam has almost completely collapsedon itself and artifacts start to appear, clearly beyond the ex-pected 5% maximal transversal deformation. This is due tothe weighted combination of unrestrictive limits and very re-strictive ones. Since values are interpolated, the very restric-tive limit (in this case, the 5% limit) is progressively relaxedto the unrestritive limit as the principal axis of deformationmoves from the restricted to the unrestricted axis. Since inthis vertical beam scenario the rotation V resulted in a 180-degree rotation around the vertical axis, the transversal unre-strictive limit overly relaxed the transversal restrictive limit,thus resulting in an overly compliant material.

When using our approach, the state of the beam is coher-ent with the 5% transversal deformation limit.

6. Conclusion

In this paper, we have presented a model for simulatinganisotropic behaviors in highly nonlinear elastic materialsusing strain-limiting constraints. The core novelty of ourapproach is the use of a hyperbolic projection method tocompute limits along any deformation direction given a setof limits defined in the global axes. Using our model, weare able to simulate the highly anisotropic and non-linearelastic behavior of a finger, which is initially compliantwhen pressed flat against a surface but extremely stiff whenpressed on the side. We compared our projection methodwith simple solutions such as orthogonal projection or linear


126


Figure 4: State of a deforming beam when the first con-

straint violation is detected, for different ways of comput-

ing the limits. The beam (E = 10kPa, ν = 0.3) is resting on

the floor and is compressing under gravity. The vertical axis

and one of the transverse axes are unrestricted. The remain-

ing transverse axis can deform up to 5% (0.95 < si < 1.05).

From left to right: orthogonal projection, hyperbolic projec-

tion (our approach), and linear interpolation.

interpolation, and showed that our approach produces pre-dictable and more realistic results.

Nevertheless, our hyperbolic projection approach exhibitssome limitations, since it does not exactly preserve the lim-its in the case of isotropic behavior. If the principal defor-mation axes do not match the global axes, limits are scaledas expected according to the angle between the axes. In anisotropic scenario, however, limits should not be scaled sincethey are the same for every direction. In the worst case sce-nario (half-way between axes, i.e., an angle of 45 degrees),the compress limit, for instance, is equal to 1+(smin−1)

√2

instead of simply smin.

In addition, we observed some cases of locking when thelimits were too restrictive, resulting in an overconstrainedsystem. Future work will address these locking issues, aswell as investigate ways of limiting other deformation modessuch as shear. In addition, we would like to automaticallyestimate and place anisotropic limits in a given model us-ing real-world measurements [BBO∗09], thus avoiding ad-hoc tuning and improving the quality of the deformations.Finally, we would like to explore the use of more efficientsolvers, ideally reaching interactive rates for high-resolutionmodels.

Acknowledgements

This work was supported in part by grants from the Span-ish Ministry of Economy (TIN2012-35840) and the EU FP7project WEARHAP (601165).

References

[AMC09] ALLARD J., MARCHAL M., COTIN S.: Fiber-basedfracture model for simulating soft tissue tearing. In Proc. of

Medicine Meets Virtual Reality (MMVR) (2009). 2

[BBO∗09] BICKEL B., BÄCHER M., OTADUY M. A., MATUSIK

W., PFISTER H., GROSS M.: Capture and modeling of non-linear heterogeneous soft tissue. ACM Trans. Graph. 28, 3 (July2009), 89:1–89:9. 7

[BMF03] BRIDSON R., MARINO S., FEDKIW R.: Simulation ofclothing with folds and wrinkles. Proc. of ACM SIGGRAPH /

Eurographics Symposium on Computer Animation (2003). 1, 2

[CPS92] COTTLE R., PANG J., STONE R.: The Linear Comple-

mentarity Problem. Academic Press, 1992. 4

[ITF04] IRVING G., TERAN J., FEDKIW R.: Invertible finite ele-ments for robust simulation of large deformation. Proc. of ACM


(2004), 131–140. 2

[MG04] MÜLLER M., GROSS M.: Interactive virtual materials.Proc. of Graphics Interface (2004). 2

[PCH∗13] PEREZ A. G., CIRIO G., HERNANDEZ F., GARRE C.,OTADUY M. A.: Strain limiting for soft finger contact simula-tion. In Proc. of IEEE World Haptics Conference (2013). 2, 3,4

[PDA03] PICINBONO G., DELINGETTE H., AYACHE N.: Non-linear anisotropic elasticity for real-time surgery simulation.Graph. Models 65, 5 (2003), 305–321. 2

[PL00] PAPADOPOULO T., LOURAKIS M. I. A.: Estimating thejacobian of the singular value decomposition: Theory and appli-cations. In European Conference on Computer Vision (2000). 4

[Pro95] PROVOT X.: Deformation constraints in a mass-springmodel to describe rigid cloth behavior. Proc. of Graphics Inter-

face (1995). 1, 2

[SZL∗11] SIN F., ZHU Y., LI Y., SCHROEDER D., BARBIC J.:Invertible isotropic hyperelasticity using SVD gradients. In ACM

SIGGRAPH / Eurographics Symposium on Computer Animation

(Posters) (2011). 2

[TPS09] THOMASZEWSKI B., PABST S., STRASSER W.:Continuum-based strain limiting. Computer Graphics Forum 28,2 (2009), 569–576. 1, 2

[WOR10] WANG H., O’BRIEN J., RAMAMOORTHI R.: Multi-resolution isotropic strain limiting. Proc. of ACM SIGGRAPH

Asia (2010). 2


127

CEIG - Spanish Computer Graphics Conference (2013), pp. 1–8M. Carmen Juan and Diego Borro (Editors)

Simulation of Hyperelastic Materials Using Energy

Constraints

Jesus Perez, AlvaroG. Perez and Miguel A. Otaduy

URJC Madrid, Spain

Abstract

Real-world materials exhibit highly nonlinear mechanical behavior, but computer animation often neglects such

nonlinearities. Hyperelasticity, or strain-dependent material stiffness, is one of the clear sources of nonlinearity.

Correctly modeling real-world materials would require capturing strain-dependent elasticity, but hyperelasticity

induces stiff differential equations that may complicate simulation, in particular for real-time computer animation.

In this paper, we propose a method based on constrained optimization for the simulation of hyperelastic materials.

The key novelty of our method lies on limiting elastic energy to model extremely nonlinear elasticity within a

common linear co-rotational formulation. Our method is designed on a hexahedral FEM discretization to avoid

locking phenomena, and is capable of solving together energy-limiting and frictional contact constraints. We show

that our approach enables the simulation of a large range of hyperelastic material behaviors.

1. Introduction

Linear material models prevail in the field of computer ani-mation, but they cannot faithfully capture the huge range ofphysical behaviors of real-world materials. For example, softbiological tissues are usually heterogeneous and highly in-compressible; moreover they normally require sophisticatedconstitutive models including features such as anisotropy orhyperelasticity [Ogd97].

The use of hyperelastic models comes with drawbacks,as they usually exhibit higher numerical stiffness. Conse-quently, small simulation time steps are required, whichcannot be afforded by interactive applications. Commonlyaccepted elasticity models in computer animation includethe linear co-rotational model [MG04] and the St. Venant-Kirchhoff model with non linear Green-Lagrange strain buta linear constitutive model [ITF04]. In contrast, hyperelas-ticity in computer animation is more often addressed bythe more efficient alternative of using either soft or hardconstraints. Constraints may be added to limit deforma-tion [Pro95, BMF03, TPS09, PCH∗13] or to preserve vol-ume [ISF07,PMS12].

In this paper, we propose a constraint-based method forthe simulation of highly non-linear hyperelastic materials.Contrary to previous works, our approach is based on theuse of constrained optimization to directly limit the elasticenergy of each simulated element. This allow us to robustly

Figure 1: Simulation of hyperelastic deformations on a hex-ahedral lattice embedding.

reproduce the behavior of very compliant materials, whichsuddenly become rigid under conditions of large deforma-tion.

Our framework is based on the usual co-rotational strainformulation with Hookean elasticity, and discretized usingthe finite element method (FEM). More precisely, we em-



129

Jesus Perez, Alvaro G. Perez and Miguel A. Otaduy / Simulation of Hyperelastic Materials Using Energy Constraints

ploy a hexahedral simulation mesh with trilinear basis func-tions per element, where we embed complex geometry asshown in Fig. 1. Our choice of hexahedral discretizationis motivated by two reasons. First, nonlinear shape func-tions overcome the severe locking problems suffered bysimple linear tetrahedra when modeling constrained materi-als [ISF07]. Second, hexahedral elements produce a smallernumber of constraints per degree-of-freedom, hence the useof hexahedra turns out to be less computationally expensive.

Our overall simulation algorithm is simple and relies onstandard solvers, allowing the solution of dynamics with ro-bust implicit integration. Embedded contact and Coulombfriction are also formulated in a constraint-based manner,and are treated together with energy-limiting constraints inthe same solver.

Finally, we have tested our method on different examplesimulations, highlighting the diversity of nonlinear behav-iors that can be achieved in contrast to linear materials.

2. Related Work

Hyperelastic materials, also called Green-elastic materials,extend the properties of linear elastic materials, and al-low the computation of elastic stress from arbitrary en-ergy functions. Some examples of real-world hyperelas-tic materials are, among others, rubber, wood, woven fab-rics and soft biological tissues. St. Venant-Kirchhoff, whichis the simplest hyperelastic material, is a common elas-ticity model used in computer graphics to capture non-linear elasticity [ITF04, BJ05]. However, in the field ofmechanics, more sophisticated mathematical models havebeen designed to describe a wide variety of physical phe-nomena, such as Hookean, Ogden, or Mooney-Rivlin mod-els [BW97, Ogd97, Hol00, BW00]. A recent approach tomodel nonlinear materials in computer graphics is to inter-polate linear elastic models estimated from measured defor-mation examples [BBO∗09,WOR11,MBT∗12].

Modeling highly nonlinear hyperelastic materials is com-putationally complex, and more efficient alternatives havebeen proposed recently, including nonlinear model reduc-tion [Bec12] or strain-limiting using constraints [TPS09].Geometric constraints are attractive ways to model invari-ant properties in computer animation, and they can even im-prove the stability of animation in contrast to traditional nu-merical integration of Newtonian mechanics [BMOT13].

Strain limiting is an approach for the simulation of bipha-sic hyperelastic materials, which can be described by a lin-ear compliant behavior under moderately small strains, andquasi-rigid behavior beyond a limit strain. Several authorsin computer graphics have proposed strain-limiting methodsfor mass-spring systems, by limiting the elongation of springelements [Pro95,DSB99,BMF03,GHF∗07]. Thomaszewskiet al. [TPS09] extended the use of strain-limiting to con-tinuum elasticity, by setting constraints on the components

of the strain tensor. Recently, Wang et al. [WOR10] pro-posed a geometric approach to strain limiting, while Perezet al. [PCH∗13] formulated strain limiting as a constraineddynamics problem.

All these approaches are formulated on linear elements,and rely on the definition of a constant strain per element.However, the simulation of linear finite elements with con-straints may suffer from locking, when the (local) ratio ofconstraints to degrees of freedom is too high and the motionappears too rigid [ISF07]. In this work, we propose a hyper-elastic model based on constraints for hexahedral finite ele-ments, which does not suffer from locking. Previous strain-limiting approaches for tetrahedra cannot be directly appliedto hexahedra though. In hexahedra, strain is not unique, andsetting constraints on an average strain may not be suffi-cient for constraining deformations, as positive and nega-tive strains present in higher-order deformation modes maysimply cancel out. Instead, we introduce energy constraints,which accurately capture high local strain even for higher-order deformation modes.

Recently, Patterson et al. [PMS12] have described a gen-eral framework for the simulation of nonlinear elastic ob-jects, including anisotropy and volume conservation, on hex-ahedral lattices. Interestingly, they combine various quadra-ture schemes for improved performance. Specifically, theypropose a novel second-order scheme with 4-point quadra-ture for accurate boundary treatment, but they speed-upcomputations when possible using a first-order scheme witha one-point quadrature rule [MZS∗11].

3. FEM for Elasticity

In this section, we present the basics of our elasticity model,without the addition of constraints. We first introduce theformulation of continuum elasticity, followed by a descrip-tion of the FE discretization using hexahedral meshes. Fi-nally, we describe the computation of elastic forces.

3.1. Elasticity Model

In continuum mechanics, object deformation is described bya displacement map u : X → x, from initial (material) coor-dinates X to deformed (world) coordinates x. We considerelasticity models for which internal forces are a function ofthe deformation gradient G(X) = ∂x/∂X, along with ma-terial properties. In particular, in this work we assume anisotropic Hookean material, where stress σ linearly dependson strain as follows:

σ = Eε, (1)

where ε is the so-called Cauchy (linear) strain tensor:

ε =1

2(∇u+∇u

T ), (2)

and E is a factor that is solely determined by material proper-ties, namely Young modulus Y and Poisson’s ratio v. Under


130


these assumptions, elastic forces can be easily derived fromthe stress field σ as felastic =∇·σ.To discretize the elasticity equations, FEM partitions the

material space into elements Ω ≡ ⋃Ωe, such as tetrahedra

or hexahedra. This partition provides a framework to inter-polate variables inside the volume of each element from val-ues defined at its vertices (i.e., nodes). The vector of nodalforces f can then be defined as a linear function of the vectorof nodal displacements u, through a stiffness matrix K:

f =−Ku. (3)

3.2. Hexahedral Discretization

The formulation of shape functions is simplified by theuse of per-element iso-parametric natural coordinates s =(s1 s2 s3)

T ∈ Ωe [Hol00]. In the case of hexahedra, thematerial of a hexahedron in natural coordinates is given bythe cube Ωe ≡ [−1,+1]3, with the coordinates of its eightnodes sn = ((−1)i (−1) j (−1)k)T , for i, j,k = 1,2 andn = 4(i− 1)+ 2( j− 1)+ k (See Fig. 2). In this context, thevalue of any variable y(s) within the element can be interpo-lated as:

y(s) =8

∑n=1

yn Nn(s), (4)

where yn are nodal values and Nn(s) are trilinear interpola-tion (shape) functions associated with each node:

Nn(s) =1

8(1+ sn1 s1)(1+ sn2 s2)(1+ sn3 s3). (5)

With the FE discretization, we can compute a discrete ap-proximation of the deformation gradient G(s), which resultsas follows:

G(s) =∂x

∂s

∂s

∂X=

8

∑n=1

xn∂Nn(s)

∂s

(

Xn∂Nn(s)

∂s

)

−1

. (6)

For regular hexahedral meshes, such as the ones weused in this work, F = ∂X/∂s reduces to a constant scalematrix from natural to material coordinates. As suggestedin [ITF06], we assemble the world positions of each ele-ment’s nodes in a 3× 8 matrix D; analogously, the deriva-tives ∂Nn/∂s are assembled into a 8×3 matrix H(s). Underthese conventions, the deformation gradient can be writtenas

G(s) = DH(s)F−1 (7)

Note that material and natural coordinates are time-invariant,hence H(s)F−1 can be precomputed for efficiency.

In contrast to tetrahedral discretizations, hexahedral shapefunctions are nonlinear w.r.t. s; therefore, the derivative ma-trix H(s) is not constant throughout an element. In practice,this implies that magnitudes integrated over elements mustbe evaluated at several quadrature points sq ∈ Ωe,q =

Figure 2: Left: the nodes of a hexahedron expressed in iso-parametric natural coordinates, where shape-functions aredefined. Right: innermost tetrahedron, which arises from theCoxeter-Kuhn-Freudenthal cut, used for the estimation ofthe element’s rotation.

1, . . . ,nq. As commonly done for hexahedral elements, weemploy a second-order Gaussian quadrature with points:

sq =1√3((−1)i (−1) j (−1)k)T , (8)

with q = 4(i−1)+2( j−1)+ k and i, j,k = 1,2.In this case, the point weights are trivially wq = 1.

3.3. Elastic Force Computation

For hexahedra defined in natural coordinates, the per-element stiffness matrix Ke is integrated over the volumeof the hexahedron as follows:

Ke =∫

Ωe,X

BT

EBdVX =∫

Ωe,s

BT

EB det(F)dVs, (9)

where B(s) is a 3 × 24 matrix that reassembles thematrix of shape function derivatives H as B(s) =(diag(∂N1/∂s) . . . diag(∂N8/∂s)). The determinant of thescale matrix F relates hexahedral volumes in material andnatural coordinates. As introduced earlier, the integral can beapproximated and precomputed as a weighted sum of valuesevaluated at quadrature points:

Ke =8

∑q=1

wq B(sq)T

EB(sq) det(F). (10)

To better handle large rotations, we apply a co-rotationalstrain formulation [MG04], in which a rotation matrix Re

is estimated per element, and the strain is measured inthe unrotated setting. Then, the per-element matrix is ef-fectively warped as K′

e = Re Ke RTe . Following suggestions

in [NL08], we tessellate each hexahedron using the Coxeter-Kuhn-Freudenthal shown in Fig. 2, and then select the in-nermost tetrahedron to estimate the rotation from the polardecomposition of its deformation gradient.

4. Energy Constraints

In this section, we present our approach for achieveing non-linear elastic behavior using constraints. We first describe the


131


Figure 3: Simulation of a beam under different settings. From left to right: (i) compliant linear-elastic (Y = 150KPa), (ii)compliant constrained (Y = 150KPa and Ue ≤ 1.0J), (iii) compliant constrained (Y = 150KPa and Ue ≤ 0.5J), (iv) compliantconstrained (Y = 150KPa andUe ≤ 0.1J), and (v) stiffer linear-elastic (Y = 2MKPa). Varying the energy limit produces diversehyperelastic behaviors.

energy-limiting constraint used for that purpose, followedby a mathematical derivation of the constraint Jacobians re-quired by our solver.

4.1. Constraint Definition

We aim to control the magnitude of each element’s deforma-tion by imposing constraints over its elastic energyUe. Thisenergy is computed by integrating over the element the strainenergy density e = σ · ε, i.e., the amount of elastic energy inthe deformed configuration per unit volume. The strain iscomputed as a function of the displacement gradient ∇u, asshown in Eq. (2), which in the co-rotational setting is com-puted as

∇u(s) = RTe G− I. (11)

The element’s energy is integrated as:

Ue =∫

Ωe,X

e(ε)dVX =∫

Ωe,s

e(ε) det(F)dVs. (12)

And it can be approximated using Gaussian quadrature as:

Ue =8

∑q=1

wq e(εq)det(F). (13)

Under the assumption of a regular initial mesh, the total elas-tic energy of a hexahedron depends only on the energy den-sity function e(εq) at the quadrature points.

Based on the energy definition, we introduce ourdeformation-limiting constraints, which simply restrict eachelement’s energy under a maximum value Umax. Formally,each element’s energy limit is formulated as a unilateral con-straint:

Cu = 1−Ue

Umax≥ 0. (14)

Fig. 3 shows the simulation of a beam using different per-element energy limits.

4.2. Constraint Jacobians

We enforce deformation constraints following a constrainedoptimization described in the next section. This formulation

requires the computation of constraint Jacobians w.r.t. thedegrees-of-freedom of the system (i.e., the nodal positionsof the FE mesh) due to two reasons. First, constraints arenonlinear, and we locally linearize them in each simulationstep. Second, we enforce constraints using the method of La-grange multipliers, which applies forces in the direction nor-mal to the constraints.

Based on the observation that elastic forces are, by def-inition, nothing else but the negative gradient of elasticenergy, i.e., felastic = −∇U , we could simply use per-element elastic forces (scaled by 1/Umax) as the Jacobiansof energy-limiting constraints. However, as shown by Chaoet al. [CPSS10], warped elastic forces are just an approxima-tion of the co-rotational energy gradient. Indeed, as shown indetail in Section 6, we have observed that using warped elas-tic forces as Jacobians introduces excessive error and affectsnegatively the convergence of the constrained optimization.

We compute constraint Jacobians by substituting Eq. (13)into Eq. (14) and differentiating the resulting expression.Then, the constraint Jacobian w.r.t. a node x j can be writ-ten as:

∂Cu

∂x j=−det(F)

Umax∑q

wq∂e(εq)

∂x j. (15)

To differentiate the strain energy density e(ε) we find itconvenient to express it as the sum of six terms, based onthe components ui j of the displacement gradient∇u and thecoefficients of the matrix of material parameters E:

e(ε) = e1+ e2+ e3+ e12+ e13+ e23, (16)

with ei = uii(αi u11+βi u22+ γi u33)

and eik = 1/2E3(uik +uki)2.

We define as E1 =Y (1−v)

(1+v)(1−2v), E2 =

Y v(1+v)(1−2v)

, and E3 =Y

2(1+v)the three different coefficients of the matrix of mate-

rial parameters based on Young modulus and Poisson’s ra-tio. Then, the coefficients α, β, γ take different values foreach energy density component ei, namely: α1 = E1,β1 =E2,γ1 = E2 for e1; α2 = E2,β2 = E1,γ2 = E2 for e2 andα3 = E2,β3 = E2,γ3 = E1 for e3.

From the expressions in Eq. (7), Eq. (11), and Eq. (16),


132


we derive the derivatives of the energy density terms w.r.t.nodal positions:

∂ei

∂x j= δh ji ri +uii(δ1 δ2 δ3), (17)

∂eik

∂x j= E2(uik +uki)(h jk ri +h ji rk),

where rk is the k-th row of RTe , hi j represents an element of

H, and the coefficients δ and δk are defined as:

δk = (αi r1k h j1+βi r2k h j2+ γi r3k h j3), (18)

δ = (αi u11+βi u22+ γi u33).

In this derivation, we discard the change of the element ro-tation R. We found that this approximation did not endangerthe convergence of our method in our tests.

5. Simulation Algorithm

The unconstrained dynamics of our system follow Newton’ssecond law of motion: Mv = F, being M the mass matrixof the system, v the vector of concatenated nodal velocitiesand F the vector of all nodal forces. We use a backward Eu-ler implicit integration method, which yields the followingunconstrained velocity update:

Av∗ = b, with A = M−h

∂F

∂v−h

2 ∂F

∂x(19)

and b =

(

M−h∂F

∂v

)

v0+hF.

Vectors x0 and v0 denote the nodal positions and velocitiesat the beginning of a simulation step of size h.

Adding constraint forces to this linear system through themethod of Lagrange multipliers, we have:

Av = b+JT λ (20)

where the constrained velocities are expressed as

v = v∗+A

−1J

T λ. (21)

J is the Jacobian matrix of constraints.

We use the unconstrained velocity v∗ to integrate thenodal positions forward in time, x∗ = x0 + hv∗, and eval-uate constraints, as explained in the previous section, fornodal positions x∗. Energy-Limiting constraints are then lin-earized at this point, using the generalized constraint Jaco-bian ∂C

∂x= J, and grouped in a vector C0:

J∆v ≥−1h

C0. (22)

Replacing in Eq. (22) the velocity correction due to theconstraints, ∆v = A−1 JT λ, we obtain the following linearcomplementarity problem (LCP):

0≤ λ ⊥ JA−1

JT λ+

1

hC0 ≥ 0. (23)

We solve the LCP using a projected Gauss-Seidel ap-proach [CPS92]. Although here we refer only to energy con-straints, in practice we also evaluate non-penetration con-straints at the unconstrained positions x∗ through collisiondetection. We solve energy-limiting and non-penetrationconstraints with friction in just one projected Gauss-Seidelloop. In addition, in practice we found that the solution to theLCP often yields excessive visual error due to the lineariza-tion of energy-limiting constraints. To better approximatethe full nonlinear constraints, we iterate the LCP formulationand solution until the nonlinear constraints satisfy an overallerror threshold. This iteration can be regarded as a particularcase of Sequential Quadratic Programming (SQP).

6. Experiments and Results

In this section, we present the results on a set of animationtests to assess the quality of our energy-limiting approach.Moreover, we provide some empirical data on the solver’sconvergence. Simulations were run on a 2.4 GHz Intel Quad-Core i7-3517U with 4GB of DDR3 RAM.

6.1. Animation Tests

We have tested the effect of energy-limiting constraints onthree different animation scenarios. First, we ran a seriesof simulations with a 1m× 12cm× 12cm cylindrical beam,fixed at one of its ends and subject to gravity. The model wasdiscretized using a low resolution hexahedral mesh of 56 el-ements with mass-density fixed at 500Kg/m3. Fig. 3 showsthe maximum deformation of the beam considering five dif-ferent materials. The example illustrates that our energy-limiting approach works effectively under rotated configu-rations; it also shows the variety of non-linear behaviors thatthe method achieves to produce. Left-most and right-mostbeams are smoothly deformed up to a limit that depends onthe material stiffness. Inner beams only show this behaviorwithin the deformation range for which the local elastic en-ergy does not exceed a limit. Beyond this limit, the deforma-tion suddenly stops, resulting in the characteristic hyperelas-tic behavior. It can be easily appreciated from the fact that allelements within each beam are almost equally deformed atthe maximum deformation point.

Second, we have dropped a sphere of radius R = 1m and amass-density of 500Kg/m3 onto the ground, from a height of2m. The model was discretized using a very low resolutionmesh of 27 hexahedra. Fig. 4 shows snapshots of the sphereanimation with three different material configurations: (i) acompliant linear-elastic (Y = 150KPa), (ii) a stiffer linear-elastic (Y = 1MPa), and (iii) a compliant constrained mate-rial (Y = 150KPa and Ue ≤ 500J). The example illustratesthat our framework solves together frictional contact andenergy-limiting constraints within a single solve. The com-pliant linear sphere (top) is severely deformed when hittingthe ground. The stiffer linear material (bottom) prevents the


133


Figure 4: Falling sphere demo captured at three subse-quent time instants. Top: elastic compliant material; center:energy-limited material; bottom: elastic stiffer material. Hy-perelastic behavior in the energy-constrained material pre-vents the sphere from getting deformed but, contrary to thestiffer material, softens its motion.

sphere from deforming, but strengthens the bouncing be-havior as a consequence. On the contrary, the constrainedsphere (middle) is only deformed up to a limit and does notbounce. It shows the highly non-linear behavior of a hard ob-ject surrounded by a soft material layer. Fig. 5 shows the per-formance of our method for extremely compliant materials(Y = 50KPa andUe ≤ 50J), when using a high resolution dis-cretization. In this case, the differences are more noticeableas the constrained sphere (bottom) clearly reaches its defor-mation limit. However, the sphere elements are still allowedto undergo small deformations, with subtle waves appearingthroughout the surface of the sphere, as a consequence of thesudden rigidity.

Finally, we also tested our method on models with morecomplex topologies. Fig. 6 shows an armadillo discretizedwith a regular hexahedral mesh of 365 elements. The modelis approximately 2m high with a constant mass-density of500Kg/m3. As in the other tests, we compared three differentmaterials: (i) a compliant linear-elastic (Y = 500KPa), (ii) astiffer linear-elastic (Y = 5MPa), and (iii) a compliant con-strained material (Y = 500KPa andUe ≤ 100J). Differencesare specially clear at the second time instant. With the linearcompliant material (top), limbs show their natural rotation atjoints, but easily collapse when hitting the ground (right leg).The stiffer material (bottom) results into an excessively rigidbehavior, which does not allow the overall pose of the ar-madillo to change. The energy-constrained material (center)avoids extreme limb deformations while maintaining some

Figure 5: Collapsing sphere captured at two subsequenttime instants. Top: linear-elastic compliant material; bottom:compliant material with energy constraints. Our method ro-bustly constrains elements under large deformations and/orvolume loss.

rotational mobility at joints. It is particularly noticeable howthe right arm rotates when the ground is hit.

6.2. Performance Evaluation

To roughly evaluate the method’s performance, we have runsome tests using the armadillo demo described above. Ourmethod took a total of 128.125s to simulate the 250 framesof the animation, with an average time per frame of 512.5ms.Considering only frames with active constraints, the aver-age number of contraints was 35.19, with only 6.36 energy-limiting constraints. In those frames, the average solvingtime per frame was 520.59 ms.

As discussed earlier in Section 4.2, we have discardedthe use of warped elastic forces as constraint Jacobians dueto their poor convergence. To support this choice, we havecompared them with our method on the beam test describedabove. We have used a compliant material (Y = 150KPa)with four different energy limits: Ue ≤ 0.5J, Ue ≤ 0.1J,Ue ≤ 0.05J and Ue ≤ 0.01J. Our results show that true con-straint derivatives provide better convergence, with an av-erage performance gain of 80.4%. Moreover, this gain in-creases with the number of constraints, reaching a maximumof 130.8% in our tests. For a low number of constraints, bothapproaches provide similar performance, as the computationof constraint Jacobians is more expensive with our method.

7. Discussion and Future Work

In this paper, we have proposed a constrained dynamics al-gorithm for the simulation of hyperelastic materials. The key


134


Figure 6: Falling armadillo demo, captured at four subsequent time instants. Top: linear-elastic compliant material; center:energy-limited material; bottom: elastic stiffer material. Our hyperelasticity simulation method avoids the collapse of the rightleg into the ground, while maintaining the dynamics of upper limbs.

novelty of our algorithm lies on the formulation of elas-tic energy constraints, which avoid limitations of previousstrain-limiting approaches. Our method discretizes the prob-lem using FEM with hexahedral elements and trilinear basisfunctions, to avoid the so-called locking effect. As shown inour results, our approach is capable of robustly simulating alarge variety of extremely nonlinear behaviors.

Our method suffers limitations too. First, our currentsolver implementation should be further optimized for thepurpose of achieving interactive frame rates.

Most importantly, in our examples we succeed to showhyperelastic behavior using energy constraints, but the en-ergy limits were chosen arbitrarily and depend heavily on thegranularity of the discretization. One interesting improve-ment is to make the selection of energy limits more artistfriendly. Another interesting improvement is to estimate en-ergy limits from force-deformation measurements, and thusmimic the behavior of real-world materials.

In addition, our current implementation does not supportinhomogeneous limits or irregular hexahedral meshes. Fu-

ture work could also be devoted to increasing the flexi-bility of our framework, by enforcing limits only on par-ticular components of the elastic energy. This would al-low our method to simulate more complex features suchas anisotropy or to prioritize specific deformation modes,among others.

Acknowledgements

This work was supported in part by grants from the Span-ish Ministry of Economy (TIN2012-35840) and the EU FP7project WEARHAP (601165).

References

[BBO∗09] BICKEL B., BÄCHERM., OTADUYM. A., MATUSIK

W., PFISTER H., GROSS M.: Capture and modeling of non-linear heterogeneous soft tissue. ACM Trans. Graph. 28, 3 (July2009), 89:1–89:9.

[Bec12] BECKER U.: Efficient time integration and nonlinear

model reduction for incompressible hyperelastic materials. PhDthesis, Technische Universitat Kaiserslautern, 2012.


135


[BJ05] BARBIC J., JAMES D.: Real-time subspace integration forSt. Venant-Kirchhoff deformable models. ACM Trans. Graph.

24, 3 (Aug. 2005), 982–990.

[BMF03] BRIDSON R., MARINO S., FEDKIW R.: Simulation ofclothing with folds and wrinkles. Proc. of ACM SIGGRAPH /

Eurographics Symposium on Computer Animation (2003).

[BMOT13] BENDER J., MÜLLER M., OTADUY M. A.,TESCHNER M.: Position-based methods for the simulation ofsolid objects in computer graphics. In EUROGRAPHICS 2013

State of the Art Reports (2013).

[BW97] BONET J., WOOD R. D.: Nonlinear Continuum Me-

chanics for Finite Element Analysis. Cambridge University Press,1997.

[BW00] BASAR Y., WEICHERT D.: Nonlinear Continuum Me-

chanics of Solids. Springer, 2000.

[CPS92] COTTLE R., PANG J., STONE R.: The Linear Comple-

mentarity Problem. Academic Press, 1992.

[CPSS10] CHAO I., PINKALL U., SANAN P., SCHRODER P.: Asimple geometric model for elastic deformations. ACM Transac-

tions on Graphics (2010), 1–6.

[DSB99] DESBRUN M., SCHRÖDER P., BARR A.: Interactiveanimation of structured deformable objects. Proc. of Graphics

Interface (1999).

[GHF∗07] GOLDENTHAL R., HARMON D., FATTAL R.,BERCOVIER M., GRINSPUN E.: Efficient simulation of inex-tensible cloth. Proc. of ACM SIGGRAPH (2007).

[Hol00] HOLZAPFEL G. A.: Nonlinear Solid Mechanics: A Con-

tinuum Approach for Engineering. Wiley, 2000.

[ISF07] IRVING G., SCHROEDER C., FEDKIW R.: Volume con-serving finite element simulations of deformable models. Proc.

of ACM SIGGRAPH (2007).

[ITF04] IRVING G., TERAN J., FEDKIW R.: Invertible finite ele-ments for robust simulation of large deformation. Proc. of ACM


(2004), 131–140.

[ITF06] IRVING G., TERAN J., FEDKIW R.: Tetrahedral and hex-ahedral invertible finite elements. Graphical Models (2006), 66–89.

[MBT∗12] MIGUEL E., BRADLEY D., THOMASZEWSKI B.,BICKEL B., MATUSIK W., OTADUY M. A., MARSCHNER S.:Data-driven estimation of cloth simulation models. Computer

Graphics Forum (Proc. of Eurographics) 31, 2 (may 2012).

[MG04] MÜLLER M., GROSS M.: Interactive virtual materials.Proc. of Graphics Interface (2004).

[MZS∗11] MCADAMS A., ZHU Y., SELLE A., EMPEY M.,TAMSTORF R., TERAN J., SIFAKIS E.: Efficient elasticity forcharacter skinning with contact and collisions. ACM Transac-

tions on Graphics (2011).

[NL08] NGAN W.-H. W., LLOYD J. E.: Efficient deformablebody simulation uing stiffness-warped nonlinear finite elements.In 2008 ACM SIGGRAPH Symposium on Interactive 3D Graph-

ics and Games (2008).

[Ogd97] OGDEN R. W.: Non-Linear Elastic Deformations.Courier Dover Publications, 1997.

[PCH∗13] PEREZ A. G., CIRIO G., HERNANDEZ F., GARRE C.,OTADUY M. A.: Strain limiting for soft finger contact simula-tion. Proc. of World Haptics Conference (2013).

[PMS12] PATTERSON T., MITCHELL N., SIFAKIS E.: Simula-tion of complex nonlinear elastic bodies using lattice deformers.ACM Transactions on Graphics (2012).

[Pro95] PROVOT X.: Deformation constraints in a mass-springmodel to describe rigid cloth behavior. Proc. of Graphics Inter-

face (1995).

[TPS09] THOMASZEWSKI B., PABST S., STRASSER W.:Continuum-based strain limiting. Computer Graphics Forum 28,2 (2009), 569–576.

[WOR10] WANG H., O’BRIEN J., RAMAMOORTHI R.: Multi-resolution isotropic strain limiting. Proc. of ACM SIGGRAPH

Asia (2010).

[WOR11] WANG H., O’BRIEN J., RAMAMOORTHI R.: Data-driven elastic models for cloth: Modeling and measurement.ACM Transactions on Graphics (2011).

136

An interactive graphical tool for dressing virtual bodies based

on mass-spring model, Verlet integration and raycasting

J.I. Blanco, J.P. Molina, P. González, J. Martínez y A. S. García

Departamento de Sistemas Informáticos, Universidad de Castilla-La Mancha

Abstract

This paper presents a visual and interactive tool for defining clothes and simulate their reaction to external

forces, such as gravity, wind and collision with other objects. More importantly, this tool can also import

patterns and sew them onto objects, that is, it can dress virtual bodies, where the user defines the seams and

lets the simulation move the patterns closer to each other and fill the gaps between seams. The physical

modeling of the cloth is based on a mass-spring particle system, and a Verlet integration is used for the

simulation. With regards to the detection of collision with objects, two solutions are presented based on the

raycasting technique. All the performance and potential of the tool is illustrated in several case studies.

Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-

Dimensional Graphics and Realism–Animation; I.3.5 [Computer Graphics] Computational Geometry and

Object Modeling–Physically based modeling;

Keywords: cloth, particle model, verlet, mass-spring, collision detection, raycasting, bounding box hierarchy, physically based animation

1. Introducción

Hoy en día, la simulación de ropa por ordenador ha

tomado posiciones dentro de la industria textil y el mundo de la moda. De igual forma, los efectos especiales del cine en los que la ropa esté involucrada necesitan una total credibilidad; el cine de animación requiere que un personaje sea vestido con ropa virtual, y las secuencias animadas de un videojuego exigen esa misma calidad final.

En este trabajo se implementa un modelo de partículas de tipo masa-resorte, con el objetivo de crear telas y confeccionar prendas que vistan maniquíes virtuales, para lo cual se ha desarrollado una herramienta que hace el proceso interactivo. 2. Trabajos previos

Existen numerosos métodos para la simulación de ropa,

y pueden reunirse en dos grandes grupos: métodos geométricos y métodos físicos. Los métodos geométricos se centran en la apariencia de las telas y prendas ([Wei86], [LV05]), mientras que los métodos físicos intentan simular el comportamiento de las mismas ([TPBF87], [BW98], [CT05]).

Dentro de los métodos físicos, los sistemas de partículas de tipo masa-resorte han sido ampliamente utilizados ([Pro95], [BFA02], [SSIF09], [Jak01]).

En cuanto a la generación de prendas, [HM90] propusieron un método para el diseño interactivo, ensamblando patrones de prendas sobre un maniquí virtual. Posteriores trabajos han seguido este modelo, como [YTT92], [VMT97], [VMT00], [BGR00] y [DG07].

Una aproximación diferente, llamada Método de

sketches, puede verse en [WWY03] y [TWB*07]. En [VCMT05] se hace un extenso repaso a la evolución

de las técnicas de creación de prendas, y en [MTV05] se hace una descripción de las posibles elecciones para desarrollar una herramienta de diseño y simulación de ropa.

Más recientemente, [Yu10] hace un extenso repaso a las diferentes técnicas para vestir cuerpos virtuales, desarrollando un modelo basado en sketches, mientras que [LZY10] presentan distintas técnicas y desarrollos avanzados, así como una discusión sobre las soluciones planteadas a lo largo del desarrollo de este campo.

Recientemente, [GRH*12] han desarrollado un nuevo método llamado DRAPE, con el que se generan prendas como camisetas, pantalones cortos, faldas, camisas de manga larga y pantalones largos. Está basado en datos de aprendizaje distribuidos en dos conjuntos: varios cuerpos humanos de distinta morfología, en una única posición; y un único cuerpo, en diferentes posiciones. Para cada prenda a generar, el sistema aprende un modelo que representa rotaciones rígidas de partes de la prenda, variantes de la prenda independientes de la postura del cuerpo, deformaciones de la prenda no rígidas dependientes de la postura del cuerpo; finalmente se realiza una correspondencia entre los parámetros de los cuerpos sobre los parámetros de la prenda, obteniendo una prenda apropiada para cada cuerpo. Las penetraciones entre ropa y cuerpo se corrigen resolviendo un sistema lineal de ecuaciones.

Una de las mayores dificultades en la creación de prendas se debe a dichas penetraciones, es decir, la detección de colisiones ropa-objeto y ropa-ropa, por lo que ha sido tratado con amplitud. [BFA02] combinan un método de colisiones con un método de repulsión, [VSC01] realizan comprobaciones imagen-espacio, mientras que [Pro97] utiliza una jerarquía de bounding boxes, manejando las colisiones múltiples que se dan en una zona de impacto calculada previamente. [BWK03] analizan el estado global de una tela, resolviendo las intersecciones en base al estado de la simulación. [VMT06] resuelven la intersección entre dos superficies minimizando la longitud del contorno de intersección entre ellas.




137

3. Herramienta interactiva

El proceso de simulación de telas que interaccionan con

objetos es complejo y requiere la realización de una serie de tareas que implican el ajuste de múltiples parámetros necesarios para ajustar este proceso. Por ello, en este trabajo se propone una herramienta que permite manejar los distintos aspectos de la simulación de telas y su reacción al aplicar fuerzas externas como gravedad, viento o colisión con otros objetos. A su vez, junto a la creación de telas aisladas, permite la creación de vestidos y su ajuste al cuerpo de un maniquí. Para su implementación se ha empleado el lenguaje C++ con ayuda de las librerías OpenGL y GLUT como motor gráfico, mientras que la librería GLUI [Rad06] ha posibilitado la creación de la interfaz (figuras 1 y 2).

Figura 1: Ventana de edición.

Figura 2: Panel de herramientas.

En las siguientes secciones se explicarán con detalle los métodos utilizados para cada una de las tareas asociadas al proceso de simulación de ropa y, junto a ello, se explicarán los elementos de la interfaz que ayudan a controlar dichas tareas y a manejar los parámetros que controlan su ejecución. Así, en la siguiente sección se analizan los modelos que facilitan el manejo de telas, tanto desde su comportamiento físico, como aquel que facilita su

renderizado. Tras ello, en la sección 5 se explica el comportamiento dinámico del sistema que facilita la simulación de los distintos efectos producidos al aplicar distintas fuerzas y al controlar la colisión de la tela con otros objetos. Finalmente, en la sección 6 se describe el proceso seguido para realizar el ensamblado o cosido de distintas telas para confeccionar una prenda de ropa que se ajusta al cuerpo de un maniquí. Estas secciones dan paso a la presentación de varios casos de estudio que permiten mostrar las capacidades y versatilidad de esta herramienta.

4. Modelado de las telas

En este trabajo se emplea un modelo físico basado en

partículas, para crear las telas, y un modelo gráfico basado en mallas de triángulos, para renderizarlas. 4.1. Modelo físico: masa-resorte

Como modelo físico se ha elegido el modelo de

partículas masa-resorte, similar al propuesto en [Pro95], donde se tiene una malla de m x n partículas, conectadas entre sí por una serie de resortes (figura 3).

Figura 3: Modelo masa-resorte

Se tiene tres tipos de resortes: estructurales, que mantienen la estructura de la malla; de corte, que mantienen la estructura entre elementos diagonales de la malla; de flexión, que impiden que la malla se doble en exceso.

La principal desventaja de este modelo es la super-elasticidad, que hace que la tela se estire en exceso. [Pro95] propone una solución basada en la posición de las partículas, al igual que [CMC01] aunque con un método diferente, mientras que [VSC01] se basa en variar su velocidad. La figura 4 ilustra el problema. Si se eleva el número de iteraciones por time-step, la superelasticidad queda reducida, pero con un coste computacional mayor. En este trabajo no se ha tratado este problema, si bien se está trabajando en ello.

Figura 4: Tela de 400 partículas. (a) Configuración

inicial. (b) Simulación tras 4 iteraciones: la

superelasticidad se hace patente. (c) Tras 30 iteraciones, el

problema se reduce considerablemente. 4.2. Modelo gráfico: mallas de triángulos

Las telas se han definido en base a mallas de triángulos,

aunque pueden elegirse otras representaciones. Además, la triangulación de las telas podrá ser regular o irregular.

J.I.Blanco, J.P.Molina, P.González, J.Martinez & A.S.Garcia / An interactive graphical tool for dressing virtual bodies

c© The Eurographics Association 2013.138

La densidad de la malla impactará en la definición de los pliegues que pueden generarse: si se crea una tela con una decena de partículas con la intención de cubrir una esfera, no se crearán pliegues demasiado detallados; con una tela de mayor definición, por ejemplo 10000 partículas, las arrugas y pliegues se harán más evidentes y ganarán en calidad, como se aprecia en la figura 5.

Figura 5: Arrugas y pliegues detallados.

La desventaja de trabajar con un número elevado de

polígonos es el coste en tiempo de cálculo, por lo que para la creación de prendas se usará triangulación irregular ([VCMT05]); con menos polígonos, esta triangulación logra adaptarse mejor a un objeto de colisión, debido a su deformación no uniforme.

Para este tipo de triangulación, [SSIF09] proponen unir con un resorte aquellos vértices que comparten una arista, como en la figura 6, definiendo así dos tipos de enlaces: estructurales, que equivalen a las aristas de los triángulos, y de flexión, que equivalen al nuevo resorte.

Figura 6: Resortes de flexión para triangulación irregular.

5. Simulación

5.1. Fuerzas

El movimiento de las partículas está gobernado por la

Ley de Newton: f = m ⋅ a (1) La fuerza f tiene dos componentes, la fuerza interna y la

externa, es decir:

ftotal = finterna + fexterna (2)

Las fuerzas internas resultan de la tensión de los resortes y son generadas por el propio sistema.

Existes diferentes tipos de fuerzas externas, dependiendo de la simulación que se quiera conseguir. Algunas muy comunes son la gravedad, la fuerza del viento y la respuesta a las colisiones.

Para modelar la fuerza del viento se ha usado la siguiente ecuación, como se muestra en el trabajo de [MDDB01].

F = | | v | | ( n · v ) n (3) donde v es el vector viento y n es la normal del triángulo

afectado por el viento. Otros modelos de viento pueden encontrarse en [TSZ09]. La figura 7 muestra los controles para las fuerzas

externas viento y gravedad de la herramienta.

Figura 7: Fuerzas externas.

5.2. Método de integración: Verlet

Para llevar a cabo la simulación del modelo físico

descrito, es necesario emplear un método de integración. [BWK03] hacen uso del método Euler implícito, mientras que [CT05] utilizan el Método de Elementos Finitos; en [VMT01] se puede ver una comparativa entre algunos métodos. Los métodos implícitos y el método de elementos finitos conllevan una gran cantidad de cálculos, aumentando la precisión del resultado final, pero alejándolos de entornos en tiempo real.

Dado que el objetivo inicial de la herramienta es la creación de telas y prendas en tiempo real, un método como Euler explícito [Jak01] se presenta como una primera opción. Sin embargo, se ha elegido la integración Verlet [Jak01], que evita un problema existente en el método Euler explícito: al tener como variables principales la velocidad y la posición de las partículas, el método Euler puede perder la sincronía entre ambas, llegando a resultados indeseados; en cambio, Verlet es un integrador que tiene la velocidad implícita, haciendo muy difícil que ambas variables se desincronicen. Esta característica hace que sea mucho más estable que la integración simple de Euler.

Así, siendo x la posición actual de una partícula, y x* la posición previa, se calcula la siguiente posición x’ del siguiente modo:

x’ = 2x – x* + a ⋅ ∆t2 (4) x* = x

donde a es la aceleración de la gravedad, t el time-step.

Si se limitan las partículas al sistema de integración descrito, se tendrá un conjunto de partículas independientes, es decir, la posición de una de ellas no afectará al resto. Pero se puede establecer una relación entre ellas para que respondan como un grupo.

Sean dos partículas individuales, en las posiciones x1 y x2; se requiere que estén a una distancia de 100. Se puede indicar la siguiente restricción:

| x2 – x1 | = 100 (5) Inicialmente, las partículas tendrán una posición

correcta, respetando la restricción (5); pero tras una iteración sobre la integración, la distancia entre ellas dejará de ser válida debido a que Verlet modifica las posiciones iniciales. Para volver a la distancia correcta, deben desplazarse las partículas proyectándolas en el conjunto de soluciones descritas por (5). Para ello es necesario separar las partículas entre sí o juntarlas, dependiendo de si la distancia errónea es mayor o menor. La figura 8 ilustra esta situación.

En 8.a se muestra la posición original de las partículas y la distancia entre ellas; si tras la integración se llega a la situación de la figura 8.b, donde las partículas se han separado, aumentando su distancia, será necesario juntarlas; por el contrario, en 8.c habrá que separarlas.



139

Figura 8: Partículas desplazadas por la integración.

La distancia entre las partículas se modelará a través de

resortes, y el proceso de corregir sus posiciones se llevará a cabo iterando las restricciones de distancia impuestas por los resortes. Los pasos para satisfacer la restricción son los siguientes:

delta = x2 – x1;

deltalength = sqrt ( delta * delta );

diff = ( deltalength – restlength ) / deltalength;

x1 += delta * 0.5 * diff;

x2 -= delta * 0.5 * diff;

Con esto, cada partícula perteneciente a un resorte recibe

la mitad de la distancia total que ambas deben desplazarse para alcanzar la distancia original. Iterando este proceso, se consigue llegar a dicha distancia.

En este momento es posible resolver las restricciones, iterando un cierto número de veces hasta que el resultado sea el esperado, convergiendo hacia la solución que se busca.

Si se coloca una serie de partículas p1, p2, …, pn en distintas posiciones, separadas una distancia d, la integración Verlet modificará la posición de las partículas y las restricciones harán que nunca pierdan su formación. Extendiendo las partículas en otro eje, se tendrá la configuración de una tela.

Una pequeña variación en (4) permite simular efectos como la fricción del aire. Es importante hacer notar que este aire no está relacionado con el viento, ya que este último es una fuerza externa con su propio modelo. La fórmula queda así:

x’ = ( 2x – x* ) ⋅ kd + a ⋅ ∆t2 (6) siendo kd una variable entre 0 y 1. Cuando kd se

aproxima a 1, la tela ofrece menos resistencia al aire, mientras que aproximarse a 0 implica una mayor resistencia.

La figura 9 muestra el control de los coeficientes kd y t.

Figura 9: Simular: activa la integración. Damping:

coeficiente kd. TimeStep: coeficiente t.

5.3. Detección de colisiones

En este trabajo se tiene tres tipos de objetos de colisión:

planos de frontera, sólidos geométricos y triángulos. Los planos solo existen como una forma de limitar la simulación en el espacio; una esfera es un objeto matemático definido por un radio y un centro, mientras que el triángulo está definido por tres vectores. El manejo de

las colisiones será diferente dependiendo del tipo de objeto que se trate.

5.3.1. Colisión con planos frontera

Se tiene una escena dentro de un cubo de dimensión (0,

0, 0), (1000, 1000, 1000), alineado con los ejes del mundo, y un conjunto de partículas situadas en su interior, desplazándose en la dirección de la gravedad. Es posible mantener las posiciones de las partículas dentro de dicho cubo haciendo uso del método de proyección empleado en [Jak01].

Cuando se produce una colisión con una cara del cubo, las partículas que penetran el obstáculo se proyectan fuera del mismo, es decir, se desplazan lo suficiente para que queden fuera del obstáculo, de forma perpendicular a la superficie de colisión.

El cubo anterior puede verse como un conjunto de restricciones en la posición de las partículas, que deben satisfacerse en todo momento. Siendo x la posición de las partículas, se tiene la restricción:

xi >= 0 , xi <= 1000, para i = 1, 2, 3… (7) Al satisfacer (7), se mantienen las partículas dentro del

cubo modificando las posiciones que atraviesan el obstáculo, proyectando las partículas en su interior.

La figura 10 muestra una tela colgada de dos extremos, colisionando contra el plano que simula el suelo.

Figura 10: Colisión contra el suelo.

5.3.2. Colisión con sólidos geométricos

Partiendo de una esfera definida por un centro y un

radio, este tipo de colisión puede manejarse como se muestra en la figura 11.

En 11.a se muestra una esfera de radio r, y una partícula que cae por efecto de la gravedad en tres instantes de tiempo diferentes, t1, t2 y t3.

En el instante t1 (figura 11.b), la distancia entre la partícula y el centro de la esfera, l1, es mayor que el radio r, por lo que se deduce que no hay intersección; lo mismo ocurre en el instante t2; sin embargo, en el instante t3 (figura 11.c), se sabe que la partícula está dentro de la esfera, porque la distancia entre su posición y el centro de la esfera, l3, es menor que r. Por tanto, se ha detectado una colisión, y la partícula debe desplazarse fuera del obstáculo. Siendo dp la cantidad a mover, el cálculo es el siguiente:

dp = v * ( r – l3 ) (8)

siendo v el vector que va desde el centro de la esfera hasta la posición de la partícula en el instante t3, normalizado.



Figura 11: Colisión con una esfera

La figura 12 muestra distintas etapas en la colisión entre

una esfera y una tela, que responde a la fuerza del viento.

Figura 12: Colisión ropa-esfera y reacción al viento.

5.3.3. Colisión con objetos definidos por mallas de

triángulos

En este trabajo, los objetos sólidos contra los que la ropa

colisiona también pueden estar formados por triángulos. Para la colisión con estos objetos se ha usado raycasting, siguiendo el algoritmo detallado en [MT97] para la intersección rayo-triángulo.

Para este tipo de colisión se han desarrollado dos aproximaciones; en ambas, los triángulos del objeto de colisión son preprocesados en primera instancia por el algoritmo desarrollado en [KK86]: siguiendo un esquema median-cut, el algoritmo crea un árbol binario top-down dividiendo la escena recursivamente hasta que los nodos hoja del árbol contienen un único triángulo.

En una primera aproximación, cuando la tela se pone en movimiento, se lanza un rayo desde el centro de cada uno de sus triángulos en la dirección de la normal de los mismos. Los rayos recorren el árbol creado haciendo uso de un montón, tal y como se indica en [KK86].

Cuando se lanza un rayo (desde un triángulo de la tela) y se detecta una colisión con un nodo del árbol (un triángulo de colisión), la respuesta a la colisión solo se produce cuando la distancia que separa ambos triángulos es menor que un umbral dado. Si es así, el triángulo perteneciente a la tela se desplazará lo suficiente para evitar la colisión.

La figura 13 muestra la colisión entre una tela y una esfera; la figura 14 muestra los controles apropiados.

Figura 13: Colisión rayo-triángulo (rayos visibles).

Figura 14: Se ha activado la detección de colisiones. Los

rayos necesitan invertirse, ya que por defecto apuntan en

la dirección opuesta.

Una segunda aproximación está basada en el trabajo de [Etz02] (de manera similar, [Jak01] sugiere usar cilindros en lugar de rayos). El rayo que se lanza parte de la posición anterior del triángulo y termina en la posición actual del mismo triángulo, de manera que si hay alguna intersección en ese tramo, la tela habrá atravesado necesariamente un objeto de colisión.

La figura 15 ilustra este proceso: arriba, un triángulo cambia de posición desde el instante t0 al t1, sin que el rayo r choque con el cuadrado (objeto de colisión); abajo, el rayo r que se lanza desde la posición del triángulo en t0 hasta t1 choca con el cuadrado y se detecta la colisión en el punto i.

Una vez detectada la colisión, el triángulo se desplazará lo suficiente para evitar el solapamiento.

Figura 15: Arriba, no hay colisión. Abajo, existe colisión.

En ambos casos, el cálculo del rayo que parte de un

triángulo para detectar una colisión tiene la misma dirección que la normal de dicho triángulo. Pero las telas tienen dos caras, y en la mayoría de las situaciones ambas entran en juego. Por esto, para calcular las colisiones se debe contemplar la posibilidad de lanzar rayos en las dos direcciones.

6. Confección de prendas

Se ha seguido el método estándar para coser una prenda, que es similar al empleado en el mundo real: se dibujan patrones sobre una tela, se cortan y se cosen entre sí ([DG07], [VCMT05]).

En esta primera versión, la herramienta no permite el diseño de patrones (también llamados paneles), por lo que se han utilizado patrones cargados de forma externa (descritos en archivos .OBJ).

6.1. Costuras

Para coser una prenda, en el mundo real es necesario

definir las costuras sobre la tela con la que se trabaja, para después cortar y coser. En este trabajo, las costuras tienen la siguiente aproximación.

Se define una costura como un conjunto de partículas que pertenecen a un mismo segmento, recto o curvo, delimitado por un punto inicial y otro final, como se ve en la figura 16, a la izquierda. Para coser dos telas es necesario definir dos costuras, como se ve en la figura 16, a la derecha (en la imagen, ambas costuras son paralelas, pero esto no es una condición necesaria).

Colocados los paneles que definen la futura prenda, para coserlos hay que aplicar una fuerza entre ellos para que se atraigan. Para generar dicha fuerza, se utilizarán los resortes estructurales ya definidos (figura 17.a).

Se ha dicho que, sobre la configuración de la tela, un resorte intenta mantener dos partículas a una cierta distancia. Puede aprovecharse esta característica para crear un resorte entre dos partículas de dos costuras distintas,


c© The Eurographics Association 2013. 141

indicando una distancia que no vendrá dada por la posición original de las partículas, sino por cómo de cerca o lejos se quiere dejar ambos paneles.

Figura 16: Izquierda, definición de una costura (línea

punteada). Derecha, dos costuras definidas (líneas verdes). La figura 17 ilustra esta aproximación. En 17.a, los

paneles se encuentran en su posición original y se han definido dos costuras y los resortes necesarios para coserlas; estos resortes se han configurado para que dejen los paneles a una distancia d = 0,01; en 17.b se observa cómo los paneles se van atrayendo, conforme se cumplen las restricciones en cada instante de tiempo; por último, en 17.c los dos paneles están cosidos a la distancia a la que está configurado el resorte de la costura.

Para garantizar la estabilidad del algoritmo, la longitud de los resortes usados como método de cosido varía entre 0,01 y cualquier valor superior; por esto, nunca habrá un resorte que mantenga dos partículas a una distancia 0, con lo que se genera un espacio entre ambos paneles.

Figura 17: (a) Resortes estructurales recién creados y

partículas en su posición original. (b) Aproximación de las

partículas por la fuerza de los resortes. (c) Situación final.

Para cerrar este espacio, se añaden nuevos resortes que

afianzan las dos partes, pero esta vez serán resortes de flexión; además, es necesario generar nueva geometría. El ancho de esta geometría se ha considerado de un solo triángulo, ya que los paneles de una prenda quedan tan juntos que añadir más definición resulta innecesario; esta nueva geometría es de tipo regular, dejando la posibilidad de triangulación irregular para mejoras posteriores.

Por otra parte, es importante que las costuras tengan el mismo número de partículas, de lo contrario quedarán zonas sin coser (figura 18).

Figura 18: Zona sin coser.

Una vez definidas las costuras, deben coserse siguiendo un orden determinado (figura 19): se seleccionan de dos en dos las costuras que irán cosidas entre sí.

Figura 19: (a) Orden correcto. (b) Orden incorrecto.

7. Casos de estudio

Para mostrar los resultados del método expuesto, se han

desarrollado las siguientes escenas.

7.1 Simulación interactiva: cortina

Se muestra un sencillo modelo de una cortina, que podrá

abrirse o cerrarse con ayuda de un control en la aplicación. En el borde superior del modelo se han inmovilizado dos

partículas seguidas, dejando tres partículas libres entre cada par inmovilizado.

Iniciada la simulación, la fuerza de la gravedad actúa sobre la cortina (figura 20).

Las partículas inmovilizadas servirán como puntos de desplazamiento de todo el sistema masa-resorte, con lo que se podrá abrir y cerrar la cortina. La figura 21 muestra dos etapas de esta animación.

Figura 20: Izquierda, se aprecia la curva suave de algunos

de los pliegues. Derecha, control sobre la cortina, cuando

ésta se selecciona como escena.

Figura 21: Distintas formas de la cortina mientras se abre.



Figura 22: Renderizado final de la cortina.

7.2. Detección de colisiones: mantel

En la figura 23 se muestra un mantel y un objeto de

colisión (en color púrpura), que simula una mesa. Mientras el primer método de colisión rayo-triángulo

(sección 5.3.3) es válido para los anteriores ejemplos, no es lo suficientemente robusto para detener los triángulos del mantel cuando se detecta el choque con la mesa. En la figura 24 se aprecia que la colisión comienza correctamente pero, conforme las partículas que no colisionan caen por efecto de la gravedad, arrastran al resto de la configuración y atraviesan el objeto de colisión.

Figura 23: (a) Vista de perfil del mantel y su objeto de

colisión. (b) Modelo de alambre del mantel.

Figura 24: Fallo en el método de intersección.

Por este motivo se desarrolló la segunda aproximación,

presentada en la sección 5.3.3, resultando más efectiva (figura 25).

Figura 25: Resultado del segundo método de colisión.

No obstante, algunos triángulos atraviesan el objeto de colisión, con lo que se ha optado por una simulación en dos pasos (figura 26): primero, se detiene el movimiento de aquellas partículas que colisionan con la mesa; cuando la tela se ha estabilizado, el segundo paso vuelve a permitir que las partículas detenidas reaccionen a las fuerzas aplicadas.

Figura 26: El mantel mantiene su estabilidad.

En la figura 27 se puede apreciar cómo la

superelasticidad estira en exceso algunos resortes de la tela y la deformación aparece incorrecta.

Figura 27: Triángulos afectados por la superelasticidad

(en rojo) al aplicarse la fuerza del viento al mantel.

Figura 28: Renderizado final del mantel.

7.3 Confección de prendas: cojín

El primer modelo de confección que se va a tratar no

será una prenda como tal, sino un cojín. En el mundo real, tanto este tipo de funda como un simple vestido siguen el mismo procedimiento de corte de patrones y posterior cosido.

La figura 29 muestra la situación inicial del modelo. El cojín está compuesto por dos paneles, uno sobre el otro, y se coserá adaptándose a la forma de un objeto de colisión que simula el relleno. En la figura 29.b pueden verse las costuras definidas y los resortes creados, para coser el cojín por dichas costuras.

Figura 29: Paneles del cojín y objeto de colisión. (a) Las

costuras se muestran en los bordes, en verde; (b) Los

resortes se crean entre pares de partículas de las costuras. En la figura 30 se observa cómo los paneles se atraen por

las fuerzas que los resortes generan; en la etapa final, algunos triángulos del cojín atraviesan el objeto de colisión, debido a la resolución de los patrones.



Figura 30: Distintas etapas de cosido del cojín.

Figura 31: Renderizado final del cojín.

7.4. Confección de prendas: vestido

El segundo modelo de confección se trata de un sencillo

vestido que se coserá sobre el cuerpo de una muñeca. Este vestido consta de dos paneles diseñados a la manera de los patrones de costura de un vestido real.

En la figura 32 aparece el patrón ya colocado. Los dos paneles se sitúan uno frente al otro, para coserlos ajustándose al cuerpo del modelo de colisión.

Figura 32: Posición inicial de los patrones del vestido.

En la figura 33 se aprecia cómo los paneles van tomando forma al ver modificada su posición por la fuerza aplicada por los resortes, y al colisionar con el maniquí virtual sobre el que se está cosiendo.

Figura 33: Distintas etapas en el proceso de cosido del

vestido, desde su posición inicial.

Figura 34: Tres vistas del vestido.

En la figura 35 se observa cómo los paneles presentan un pequeño espacio entre ellos, y cómo la adición de triángulos corrige el problema.

Figura 35: Fusión de los paneles en una sola malla.

Durante todo este proceso, la tela ha tenido que ajustarse a un objeto geométrico detectando colisiones, al mismo tiempo que se intenta mantener la formación de las partículas en base a restricciones. Por esto, existe la posibilidad de que algunos de los triángulos de la ropa se introduzcan, en todo o en parte, en la geometría del modelo que se está vistiendo. Además, la resolución de la tela juega un papel importante, ya que a menos resolución, más difícil será que la tela se adapte a una forma geométrica y más fácil, por tanto, que se den estas penetraciones. La figura 36 muestra este problema.

Figura 36: Parte del tirante se introduce en el hombro.

Para solucionarlo, la aplicación desarrollada permite modificar la posición de las partículas, independientemente de los cálculos de la simulación. Finalizado el reajuste, el recálculo de la integración y de las restricciones conseguirá que la nueva posición de las partículas se reajuste de forma natural. La figura 37 muestra el proceso de corrección del tirante.

Figura 37: (a) Vértice problemático desplazado en el eje

+Y. (b) Desplazamiento en el eje –Z. (c) Desplazamiento

en el eje +Y. (d) Resultado final.

[DG07] sugieren sujetar partes de la prenda al cuerpo, para no perder eficiencia. Para ello, emplean un tipo de restricción que mantiene unida una partícula al polígono más cercano del cuerpo virtual. De este modo, estas partes de la prenda se mueven con el modelo. En este trabajo no se ha considerado la animación del modelo a vestir, pero se ha simulado este tipo de restricción al mantener inmóviles algunas partículas, sobre todo en las partes más comprometidas en la colisión, como en los tirantes. De no hacerlo, el peso de las partículas al verse afectadas por la gravedad tiran con demasiada fuerza del vestido y lo arrastran hasta el suelo. La figura 38 muestra el vestido terminado.



144

Figura 38: Vestido finalizado.

8. Implementación GPU

La constante actualización de la posición de las

partículas en cada time-step conlleva una gran carga de CPU; además, uno de los cálculos más costosos en la simulación de ropa es la detección de colisiones, por lo que una implementación GPU resulta muy adecuada para llevar a cabo estas tareas. Así, resulta de gran interés desarrollar una versión de los métodos presentados bajo esta tecnología, tomando como referencia los trabajos que se citan a continuación.

El integrador Verlet puede ser llevado a una implementación GPU fácilmente, como muestra [Gre03].

[RNSS05] realizan una implementación GPU tanto de la simulación de la ropa como de las colisiones con un cuerpo humano en movimiento. Para ello, basan su trabajo en la implementación GPU del integrador Verlet realizada por [Gre03] y la simulación de la tela realizada por [Jak01]. Introducen una nueva técnica llamada quasi-feedback

method, que detiene el proceso iterativo que satisface las restricciones de la tela hasta que la deformación no sea significativa.

En [VR08] se proponen estructuras de datos y un algoritmo para implementar un sistema general de tipo masa-resorte, basándose en la integración Euler simple.

APEX Clothing es el módulo de simulador de telas que forma parte del framework NVIDIA APEX. [KCMF12] desarrollaron el algoritmo incorporado en el PhysX SDK

3.x Cloth solver. Bajo la observación de que, en videojuegos, es habitual que un gran número de partículas que conforman la geometría de la ropa se adjunten a parte del personaje, siendo la cinemática del mismo la que guía su animación, los autores crean un algoritmo llamado Long

Range Attachment Constraints, que proyecta las partículas que se mueven libremente, cuando han sobrepasado una distancia inicial precalculada, dentro del conjunto de soluciones válidas. Haciendo uso de CUDA como GPU, se incluye una descripción del algoritmo.

9. Conclusiones y trabajo futuro

En este trabajo se ha presentado una herramienta que

permite manejar los distintos aspectos de la simulación de telas y su reacción al aplicar fuerzas externas como gravedad, viento o colisión con otros objetos. Para ello, se ha descrito un modelo capaz de crear una tela que responda a dichos estímulos. A su vez, junto a la creación de telas aisladas, permite la creación de vestidos y su ajuste al cuerpo de un maniquí, ofreciendo la posibilidad de confeccionar prendas simples sobre objetos estáticos. Aunque el resultado obtenido en la implementación realizada es bastante satisfactorio, como puede apreciarse en las imágenes, se está trabajando en mejorar algunos aspectos de la herramienta. En primer lugar, se está mejorando el sistema de detección de colisiones. Con ello

se pretende evitar ciertas situaciones en las que el algoritmo implementado provoca fallos en la detección, debido a que analiza sólo los rayos que parten del baricentro del triángulo, por lo que si se produce colisión con un vértice del triángulo es probable que dicha colisión no se detecte.

Una posible solución consiste en lanzar rayos desde cada partícula, lo que implica una sobrecarga importante en el proceso de cálculo. Por ello se está estudiando la posibilidad de realizar este tipo de cálculos adicionales en la frontera de colisión, esto es, en la zona de la tela dónde ésta deja o comienza a colisionar con otro objeto.

Por otra parte, se está trabajando en la mejora del método de cosido, analizando un método donde no se produzcan espacios entre paneles, unificando las partículas pertenecientes a cada par de costuras.

Otros aspectos futuros son la corrección de la superelasticidad, el manejo de auto-colisiones, el modelado de la fricción entre tela y objetos y el diseño y posicionamiento de patrones; de igual forma, la integración Verlet resulta insuficiente si se quiere una simulación más robusta y precisa, por lo que se está trabajando sobre los métodos implícitos. Referencias

[BFA02] BRIDSON R., FEDKIW R., ANDERSON J.,

2002. “Robust treatment of collisions, contact, and friction for cloth animation”, ACM Transactions on Graphics (ACM. SIGGRAPH 2002), pp 594–603.

[BGR00] BONTE, T., GALIMBERTI, A., RIZZI, C.,

2000, “A 3D graphic environment for garments design”. In Proceedings of the Workshop on Geometric Modeling, pages 137-150.

[BW98] BARAFF D., WITKIN A., 1998, “Large Steps in

Cloth Simulation”, Proceedings of ACM SIGGRAPH 98, ACM Press, 32, p.106-117.

[BWK03] BARAFF, D., WITKIN, A., KASS, M., 2003,

“Untangling Cloth”. In Proc. of Siggraph '03, 862.869. [CMC01] CORDERO, J.M., MATELLANES, J.,

CORTES, J., 2001, “Corrección del efecto super elástico en dinámica de telas”, XI Congreso Español de Informática Gráfica, CEIG 2001, pp. 141-150.

[CT05] CORDERO, J.M., TORRES, J., 2005, “Simulación

Realista de Telas”, XV Congreso Español de Informática Gráfica. Primer Congreso Español de Informática (Cedi 2005). Granada. Thomson-Paraninfo. pp. 21-30

[DG07] DURUPINAR, F., GUDUKBAY, A., 2007, “A

Virtual Garment Design and Simulation System”, International Conf. Information Visualization, 862-870.

[Etz02] ETZMUß, O., 2002, “Animation of Surfaces with

Applications to Cloth Modelling”, PhD thesis, Tübingen. [Gre03] GREEN, S., 2003, “Stupid OpenGL Shader

Tricks”, Presentation at Game Developers Conference. [GRH*12] GUAN, P., REISS, L., HIRSHBERG, D.A.,

WEISS, A., BLACK, M.J., 2012, “DRAPE: DRessing Any Person”, ACM TOG – SIGGRAPH 2012 Conference Proceedings Volume 31 Issue 4.



[HM90] HINDS, B.K., MCCARTNEY, J., 1990, “Interactive Garment Design”, The Visual Computer, 6, 53-61.

[Jak01] JAKOBSEN, T., 2001, “Advanced Character

Physics”, In Game Developers Conference, 383-401. [KK86] KAY, T.L., KAJIYA, J.T., 1986, “Ray Tracing

Complex Scenes”, Computer Graphics (Proceedings of SIGGRAPH ’86), 20(4):269-278.

[KCMF12] KIM, T.Y., CHENTANEZ, N., MÜLLER-

FISCHER, M., 2012, “Long Range Attachments – A Method to Simulate Inextensible Clothing in Computer Games”, Proc. of the ACM SIGGRAPH Eurographics Symposium on Computer Animation, 305-310.

[LV05] LI, L., VOLKOV, V., 2005, “Cloth Animation

with Adaptively Refined Meshes”, 28th Australian Computer Science Conference, The University of Newcastle Vol. 38.

[LZY10] LIU, Y.J., ZHANG. D.L., YUEN, M.F., 2010, “A survey on CAD methods in 3D garment design”, Computers in Industry, Volume 61 Issue 6.

[MDDB01] MEYER, M., DEBUNNE, G., DESBRUN, M., BARR, A.H., 2001, “Interactive Animation of Cloth-like Objects in Virtual Reality”. The journal of Visualization and Computer Animation, 12, 1, 1-12.

[MT97] MÖLLER, T., TRUMBORE, B., 1997, “Fast, Minimum Storage Ray/Triangle Intersection”, Journal of Graphics Tools, 2(1):21-28.

[MTV05] MAGNENAT-THALMANN, N., VOLINO, P.,

2005, “From early draping to haute couture models: 20 years of research.”, The Visual Computer, Vol. 21, No. 8-10, pp. 506–519.

[Pro95] PROVOT, X., 1995, “Deformation constraints in a

mass-spring model to describe rigid cloth behaviours”. In Graphics Interface ’95, Québec, Canada, 17-19.

[Pro97] PROVOT, X., 1997, “Collision and self-collision

handling in cloth model dedicated to design garments”, Graphics Interface, 177–89.

[Rad06] RADEMACHER, P. Mantenido por BAXTER, B.

y STEWART, N. http://www.billbaxter.com/code/glui/ http://www.cs.unc.edu/~rademach/glui/

[RNSS05] RODRIGUEZ-NAVARRO, J., SAINZ, M.,

SUSIN, A., 2005, “Body-Cloth Simulation with Moving Humanoids”, Short presentations EG’05, 85-88.

[SSIF09] SELLE, A., SU, J., IRVING, G., FEDKIW, R.,

2009, “Robust High-Resolution Cloth Using Parallelism, History-Based Collisions and Accurate Friction”, IEEE

Transactions on Visualization and Graphics, 339-350.

[TPBF87] TERZOPOULOS, D., PLATT, J., BARR, A., FLEISCHER, K., 1987, “Elastically Deformable Models,” Computer Graphics (Proc. SIGGRAPH), Vol. 21(nº 4), pp 205-214.

[TSZ09] TIANLU, M., SHIHONG, X., ZHAOQI, W., 2009, “Evaluating simplified air force models for cloth simulation”, CAD/Graphics, 81-86.

[TWB*07] TURQUIN, E., WITHER, J., BOISSIEUX, L.,

CANI, M.P., HUGHES, J., 2007, “A Sketch-Based Interface For Clothing Virtual Characters.”. IEEE Comput. Graph. Appl., 27 (1), 72–81.

[VCMT05] VOLINO, P., CORDIER, F., MAGNENAT-

THALMANN, N., 2005, “From early virtual garment simulation to interactive fashion design”, Journal Computer-Aided Design, Volume 37 Issue 6, 593-608.

[VMT97] VOLINO, P., MAGNENAT-THALMANN, N.,

1997, “Developing Simulation Techniques For An Interactive Clothing System”, Proceedings of Virtual Systems and MultiMedia ’97.

[VMT00] VOLINO, P., MAGNENAT-THALMANN, N.,

2000, “Virtual Clothing, Theory and Practice”, ISBN 3-540-0-67600-7, Springer-Verlag, Berlin Heidelberg, New York.

[VMT01] VOLINO, P., MAGNENAT-THALMANN, N., 2001, “Comparing Efficiency of Integration Methods for Cloth Simulation”, Computer Graphics International 2001, 265-274.

[VMT06] VOLINO, P., MAGNENAT-THALMANN, N., 2006, “Resolving surface collisions through intersection contour minimization”, Journal ACM Transactions on Graphics – Proceedings of ACM SIGGRAPH, Volume 25, Issue 3, 1154-1159.

[VR08] VASSILEV, T., ROUSEV, R., 2008, “Algorithm

and Data Structures for Implementing a Mass-spring Deformable Model on GPU”, Biomedical Physics Papers, Research and Laboratory University Ruse.

[VSC01] VASSILEV, T., SPANLANG, B.,

CHRYSANTHOU, Y., 2001, ”Fast Cloth Animation on Walking Avatars”, EuroGraphics 2001 – Vol. 20 – nº 3.

[Wei86] WEIL, J., 1986, "The synthesis of Cloth Objects",

Computer Graphics (Proc. SIGRAPH), 20(4):49-53. [WWY03] WANG, C., WANG, Y., YUEN, M., 2003,

“Feature based 3D garment design through 2D sketches”. Computer-Aided Design, 35(7):659-672.

[YTT92] YANG, Y., MAGNENAT-THALMANN, N.,

THALMANN, D., 1992, “3D garment design and animation - a new design tool for the garment industry”. Computers in Industry, 19:185-191.

[Yu10] YU, H., 2010, “An Investigation on the Framework

of Dressing Virtual Humans”, School of Engineering and Design, Brunel University, Thesis.



146



Dynamic Footsteps Planning for Multiple Characters

A. Beacco1, N. Pelechano1 & M. Kapadia2

1Universitat Politècnica de Catalunya2University of Pennsylvania

Abstract

Animating multiple interacting characters in real-time dynamic scenarios is a challenging task that requires not

only positioning the root of the character, but also placing the feet in the right spatio-temporal state. Prior work

either controls agents as cylinders by ignoring feet constraints, thus introducing visual artifacts, or use a small

set of animations which limits the granularity of agent control. In this work we present a planner that given any

set of animation clips outputs a sequence of footsteps to follow from an initial position to a goal such that it

guarantees obstacle avoidance and correct spatio-temporal foot placement. We use a best-first search technique

that dynamically repairs the output footstep trajectory based on changes in the environment. We show results of

how the planner works in different dynamic scenarios with trade-offs between accuracy of the resulting paths and

computational speed, which can be used to adjust the search parameters accordingly.

Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional

Graphics and Realism—Animation

1. Introduction

Animating groups of human characters in real time is a dif-

ficult but necessary task in many computer graphics appli-

cations, such as video games, training and immersive virtual

environments. There is a large amount of work in the crowd

simulation and pedestrian dynamics literature, but most ap-

plications still lack convincing character animation that offer

a variety of animation styles without noticeable artifacts.

Humans walking in the real world have a cognitive map

of the environment which they use for calculating their path

through waypoints (doors, corners, etc), Then, we navigate

along the path by choosing footsteps to avoid collisions with

nearby humans and obstacles. Likewise, a virtual character

can be simulated within an environment by first deciding a

high level path (sequence of waypoints) using a navigation

mesh [Mon09] [OP13] and then calculating the exact trajec-

tory to walk from one waypoint to the next one. That trajec-

tory is going to be defined by the chosen steering behavior

algorithm, the output of which is going to encode the state of

the agent over time. An agent state can be modeled by differ-

ent granularities going from a simple point and radius with

a velocity vector in a low level representation, to a complete

high resolution mesh with joint velocity vectors, rotational

angles, torques and any other elements that might improve

the simulation on a higher level representation. Intermediate

representations [SKRF11] can perform simulations in real-

time by using an inverted pendulummodel of the lower body

of a biped which can be controlled to generate biomechani-

cally plausible footstep trajectories.

This paper focuses on the computation of natural footsteps

trajectories for groups of agents. Most work in the litera-

ture uses crowd simulation approaches (rules based models,

social forces, cellular automata models, continuum forces)

to calculate the root displacement between two consecutive

waypoints. This leads to smooth root trajectories, but with

many artifacts due to lack of constraints between the feet and

the floor. There are some approaches that do focus on correct

foot placement, but in most cases they are quite limited in

the range of animations available or else can only deal with

a small number of agents. Our work enforces foot placement

constraints and uses motion capture data to produce natu-

ral animations, while still meeting real-time constraints for

many interacting characters.

Figure 1 illustrates an example of four agents planning

their footstep trajectory towards their goal while avoiding

collision with other agents, and re-planning when necessary.

The resulting trajectories not only respect ground contact


147

A. Beacco, N. Pelechano & M. Kapadia / Dynamic Footsteps Planning for Multiple Characters

Figure 1: Footstep trajectories planning for four agents reaching goals in opposite directions

constraints, but also create more natural paths than tradi-

tional multi agent simulation methods.

This paper is organized as follows. We first examine pre-

vious approaches in crowd simulations and their methods.

Next we give an overview of our framework and explain in

detail our pre-process step, planning algorithm and anima-

tion system. Finally we show some of our results and present

a discussion about the strength of our method and its limita-

tions along with conclusions and future work.

2. Related Work

Crowd simulation approaches can be classified into two

main sets based on whether they only focus on calculating

the position of the root ignoring the animations, or whether

they plan respecting the underlying animations. The first set

focuses on simulating realistic behaviors regarding overall

character navigation and do not worry about animations. In

fact sometimes their goal is to simply model agents as cylin-

ders that move around a virtual environment avoiding colli-

sions. The second set, which carries out planning while be-

ing aware of the animation clips available, need to perform

some pre-process to analyze the set of animation clips avail-

able to plan paths respecting constraints between the feet and

the floor. In some cases, if the animation set is handmade,

then the analysis is not necessary because the animations

have already been built with specific parameters (such as

speed, angle of movement and distance between feet) which

are taken into consideration when planning.

The first group works with root velocities and forces

or rules working on a continuous space, or displace-

ments within a grid. Different models include social forces

[HFV00], rule-based models [Rey87], cellular automata

[TLCDC01], flow tiles [Che04], roadmaps [SAC∗07], con-

tinuum dynamics [TCP06], local fields [KSHF09], hybrid

methods [SKH∗11], and forces models parameterized by

psychological and geometrical rules [PAB07]. They can eas-

ily represent agents by discs or cylinders to illustrate their

steering behavior, but do not care about a final representa-

tion using 3D animated characters, so the output trajectory

needs to be used to synthesize an animation following it.

Synthesizing the animation from a small database can cause

artifacts such as foot-sliding that need additional work to be

removed [PSB11].

The second group works directly with the set of avail-

able animations to construct motion graphs [KGP08, ZS09,

RZS10,MC12], or precomputed search trees [LK06]. These

approaches try to reach the goal by connecting motions to

each other [WP95], sometimes limiting the movements of

the agents. Other methods try to use motion graphs in the

first group combining it with path planners [vBEG11]. Hav-

ing a large animation database reduces the limitations in

terms of freedom of movement, but also makes the planning

more time consuming. The ideal solution would be one that

could find a good trade-off between these two goals: free-

dom of movement and fast planning.

Some approaches have tried to change the simulation

paradigm by using more complex agent representations,

such as footsteps. They can be physically based but gener-

ated off-line [FM12]. Or they can be generated online from

an input path computed by a path planner [EvB10], or plan-

ning them using an inverse pendulum model instead of root

positions [SKRF11]. Recent work [KBG∗13] proposes the

use of multiple domains of control focusing searches in more

complex domains, only when necessary. The resulting be-

havior offers better results giving characters a better interac-

tivity with the environment and other agents, but they fall in

the first group of our classification since they do not take an-

imation into account and need another process to synthesize

it.

Some locomotion controllers are able to synthesize in

real-time animations according to velocity and orientation

parameters [TLP07]. Other locomotion controllers can accu-

rately follow a footstep trajectory by extracting and param-

eterizing the steps of a motion capture database [vBPE10].

However they all need a very large database and their com-

putational time does not allow to have many characters in

real-time.

Our work belongs to the second group of the classifica-

tion, since it uses an animation-based path planner. However

instead of pre-computing a search tree with a few handmade

animation clips, we pre-process motion capture data (which

allows us to have more natural looking animations and larger


148


Figure 2: Diagram showing the process required for the dynamic footstep planning algorithm

variety), and extract actions from the input animations to

compute a graph on the fly with an intelligent pruning based

on logical transitions and a collision prediction system. Col-

lisions are predicted and avoided for both static and deter-

ministic dynamic obstacles, as well as for other agents since

we expose all known trajectories.

3. Overview

Figure 2 illustrates the process of dynamic footstep planning

for each character in real-time. The framework iterates over

all characters in the simulation to calculate each individual

foot step trajectory considering obstacles in the environment

as well as other agents’ calculated trajectories.

The Preprocess phase is responsible for extracting anno-

tated animation clips from a motion capture database. The

real-time Planner uses the annotated animations as transi-

tions between state nodes in order to perform a path plan-

ning task to go from an input Start State to a Goal State.

The output of the planner is a Plan consisting of a sequence

of actions A0,A1, ...,An, which are clips that the Animation

Engine must play in order to move the Character along

the computed path. Both state and plan of the Character

are then input to the World State and thus exposed to other

agents’ planners, together with the nearby static or dynamic

obstacles. The World State is used to prune and accelerate

the search in order to predict and avoid potential collisions.

The Time Manager is responsible for checking the elapsed

time between frames to keep track of the expiration time of

the current plan. Finally the Events Monitor is in charge of

detecting events that will force the planner to recompute a

new path. The Events Monitor receives information from the

World State, the Time Manager, Goal State and the charac-

ter’s current Plan. Events include: a possible invalid plan or

the detection of a new dynamic obstacle or the goal position

changing.

3.1. Events Monitor

The events monitor is the module of the system in charge of

deciding when a new path needs to be recomputed. Elements

that will trigger an event are:


149


• Goal state changed: when the goal changes its position or

a new goal is assigned for the current character.

• New agent or deterministic dynamic obstacle nearby:

other agents or dynamic obstacles enter the surrounding

area of our character. A new path needs to be calculated

to take into account the potential collision.

• Collision against non-deterministic obstacle: sometimes

an unpredictable dynamic obstacle could lead to a colli-

sion (for example: a dynamic obstacle moved by the user),

so when the events monitor detects such situation it trig-

gers an event in order to react to it.

• Plan expiration: a way to ensure that each agent is tak-

ing into account the latest plans of every other agent is to

give every plan an expiration time and force re-planning

if this is reached. A time manager helps monitoring this

task, but instead of a time parameter this event can also be

measured and launched by a maximum number of actions

that we want to perform (play) before re-planning.

4. Preprocess

During an offline stage, we analyze a set or a database of an-

imation clips in order to extract the actions that our planner

will then use as transitions between states. Each action con-

sists of a sequence of skeleton configurations that perform

a single animation step at a time, i.e., starting with one foot

on the floor, until the other foot (swing foot) is completely

resting on the floor. Our preprocess should work with any an-

imation clip, since we tried both handmade and motion cap-

ture clips (from the CMU database [CMU13]). After analyz-

ing each animation clip, we calculate mirrored animations.

Mirroring animations is done in order to have each analyzed

animation clip with either feet starting on the floor. The out-

put of this stage is a set of annotated animations that can be

used by the planner and the animation engine. This set can be

easily serialized and stored to be reused for all instances of

the same character type (same skeleton and the same scale,

otherwise even if they share animations these could produce

displacements of different magnitudes), reducing both pre-

process time and the global memory consumption.

4.1. Locomotion Modes

In order to give our characters a wider variety and agility of

movements we define different locomotion modes that need

to be treated differently. Each animation clip will be tagged

with its locomotion mode. We thus have the following set of

locomotion modes:

• Walking: these are the main actions that will be used by

the planner and the agents since they represent the most

common way to move. We therefore have a wide variety

of walks going from very slow to fast and in different an-

gles (not just forward and backwards).

• Running: these are going to be treated in the same way as

the walking actions with an additional cost penalty (since

running consumes more energy than walking). We have

also noticed empirically that for running actions it is not

necessary to have as many different displacement angles

as for walking actions.

• Turns: turns are going to be clips of animation where the

agent turns in place or with a very small root displace-

ment. They are going to be defined by their turning angle

and velocity.

• Platform Actions: in this group we will find actions like

jumping or crouching in order to avoid some obstacles.

Such actions should have a high energy cost and should

only be used in case of an imminent danger of collision.

While turns and platform actions need to be performed

completely from start to end, and they do not have any intrin-

sic pattern we can easily detect, walking and running anima-

tions can be segmented by clips containing a single step. So

animations of both walking and running locomotion modes

will have a special treatment as we will need to extract the

footsteps and keep only the frames of the animation covering

a single step.

4.2. Footsteps Extraction

As previously mentioned in the paper, an action starts with

one foot on the floor and ends when the other foot is planted

on the floor. But animation clips, especially motion capture

animations, do not always start and end in this very specific

way. Therefore we need a foot plant extraction process to

determine the beginning and end ending of each animation

clip that will be used as an action.

Simply checking for the height of the feet in the motion

capture data is not enough, since it usually contains noise

and artifacts due to targeting. In most cases, when swing-

ing the foot forwards while walking, the foot can come very

close to the ground, or even traverse it.

Other techniques also incorporate the velocity of the foot

during foot plant, which should be small. However this so-

lution can also fail, since foot skating can introduce a large

velocity. We detect foot plants using a height and velocity

based detector similar to the method described in [vBE09],

where foot plant detection is based on both height and time.

First, the height-based test provides a set of foot plants, but

only those where the foot plant occurs in a group of adjacent

frames, are kept.

Our method combines this idea with changes on velocity

for more accurate results, so we detect a foot plant when for

a discretisized set of frames the foot is close to the ground for

a few adjacent frames and with a change in velocity (deceler-

ation, followed by being still for a few frames, and finishing

with an acceleration). Notice that this method works for any

kind of locomotion ranging from slow walking to running

including turns in any direction.


150


4.3. Clip annotation

An analysis is performed by computing some variables over

the whole duration of the animation. Each analyzed anima-

tion clip is annotated with the following information:

Lmod Locomotion mode

Fsp Supporting Foot

Fsw Swing Foot

~vr Root velocity vector~f Foot displacement

t Time duration

t0 Initial time

tend End time

α Movement angle

θ Rotation angle

P Set of Sampled positions

Table 1: Information stored in each annotated animation

clip.

Locomotion mode, indicates the type of animation (walk

short step, walk long step, run, walk jump, climb, turn, etc).

Supporting foot is the foot that is initially in contact with the

floor, and the swing foot corresponds to the foot that is mov-

ing in the air towards the next footstep. The supporting foot

is calculated automatically based on its height and velocity

vector from frame to frame.

The root velocity vector indicates, taking the starting

frame of the extracted clip as reference, the total local dis-

placement vector of the root during the whole step. We there-

fore know the magnitude, the speed in m/s and the angle of

its movement. Similarly, foot displacement tracks the move-

ment of the swing foot.

Movement angle in degrees indicates the angle between

the swing foot displacement vector and the initial root orien-

tation. Therefore an angle equal to 0 means an action moving

forward and 180 means it is a backward action. An Angle

equal to 90 means an action moving to the left if the swing

foot is the left one, or the right if the swing foot is the right

one. Finally the rotation angle is the angle between the root

orientation vector in the first and last frame of the clip.

t indicates the total time duration of the extracted clip,

with t0 and tend storing the start and end point of the original

animation that the extracted clip covers. These values will be

used by the animation engine to play the extracted clip.

P corresponds to a set of sampled positions for certain

joints of the character within an animation clip, and it is used

for collision detection (see section 5.5)

5. Planning Footstep Trajectories

In this section, we first present the high level path planning

on the navigation mesh. Then we define the problem domain

we are dealing with when planning footsteps trajectories.

Next we give details of the real-time search algorithm that

we use as well as the pruning carried out to accelerate the

search. Finally we explain how the collision detection and

prediction is performed.

Figure 3: High level path with local footstep trajectory be-

tween consecutive visible waypoints.

5.1. High Level Path Planning

Footstep trajectories are calculated between waypoints of

the high level path (see Figure 3). This path is calcu-

lated over the navigation mesh using Recast [Mon09]. An

A* algorithm is used to compute the high level path, and

then footstep trajectories are calculated between consecu-

tive visible waypoints. So given a sequence of waypoints

wi,wi+1,wi+2, ...,wi+n), if there is a collision-free straightline between wi and wi+n, then the footstep trajectory is cal-

culated between those two waypoints, and any other inter-

mediate point is ignored. This provides more natural trajec-

tories as it avoids zig-zagging over unnecessary waypoints.

Waypoints are considered by the planner as goal states, and

each time that we change a waypoint the change of goal is

detected by the events monitor, thus forcing a new path to be

computed.

5.2. Problem Definition

The algorithm for planning footstep trajectories needs to cal-

culate the sequence of actions that each agent needs to fol-

low in order to go from their start position to their goal posi-

tion. This means solving the problem of moving in a footstep

domain between two given positions in a specific amount

of time. Therefore, characters calculate the best trajectory

based on their current state, the cost of moving to their des-

tination and a given heuristic. The cost associated with each

action is given by the bio-mechanical effort required to move

(i.e: walking has a smaller cost than running, stopping for a

few seconds may have a lower cost than wandering around a

moving obstacle). The problem domain that we are dealing

with is thus defined as:

Ω =(

S,A,c(

s,s′)

,h(s,sgoal))


151


Where S is the state space and is defined as the set of states

composed of the character’s own state self, the world com-

position environment, and the other agents state. The action

space A indicates the set of possible transitions in the state

space and thus will have an impact on the branching factor of

the planner. Each transition is an action, so we will have as

many transitions as extracted clips times the possible speed

variations we allow to introduce (we can for example repro-

duce a clip at half speed to obtain its displacement two times

slower). Actions are then going to be defined by their corre-

sponding annotated animation. c(

s,s′)

is the cost associated

with moving from state s to state s′. Finally h(s,sgoal) is theheuristic function estimating the cost to go from s to sgoal .

5.3. Real-Time Planning Algorithm

Planning footsteps trajectories in real time requires finding

a solution in the problem domain Ω described earlier. The

planner solution consists of a sequence A0,A1, ...,An of ac-

tions. Our planner interleaves planning with execution, be-

cause we want to be able to replan while consuming (play-

ing) the action. For this purpose, we use a best-first search

technique (e.g., A*) in the footstep problem domain, defined

as follows:

• S: the state space will be composed of the character’s

own state (defined by position, velocity, and the collision

model chosen), the state of the other agents plus their plan,

and the state and trajectory of the deterministic dynamic

obstacles. For more details about collision models and ob-

stacles avoidance see section 5.5.

• A: the action space will consist of every possible action

that can be concatenated with the current one without

leading to a collision, so before adding an action we will

perform all necessary collision checks.

• c(

s,s′)

: the cost of going from one state to another will

be given by the energy effort necessary to perform the an-

imation:

c(

s,s′)

= M

∫ t=T

t=0es + ew |v|2 dt

where M is the agent mass, T is the total time of the ani-

mation or action being calculated, v the speed of the agent

in the animation, and es and ew are per agent constants (for

an average human, es = 2.23 JKg.s and ew = 1.26 J.s

Kg.m2 )

[KWRF11].

• h(s,sgoal): the heuristic to reach the goal comes from the

optimal effort formulation:

h(s,sgoal) = 2Mcopt(s,sgoal)

√esew

where copt(s,sgoal) is the cost of the optimal path to go

from s to sgoal , in our case we chose the euclidian distance

between s and sgoal [KWRF11]. The optimal effort for an

agent in a scenario is defined as the energy consumed in

taking the optimal route to the target while traveling at the

average walking speed: vav =√

es

ew= 1.33m/s

Taking all these components into consideration the plan-

ner can search for the path with least cost and output the foot-

step position with their time marks that the animation engine

will follow by playing the sequence of actions planned (see

figure 4).

Figure 4: Footsteps trajectory with time constraints that

need to be followed by the animation controller.

5.4. Pruning Rules

In order to accelerate the search we can add simple rules

to help prune the tree and reduce the branching factor. A

straight forward way to halve the size of the tree consists of

considering only consecutive actions starting with the oppo-

site foot. So given a current node with a supporting foot, ex-

pand the node only for transitions that have that same foot

as the swing foot. Actions which are not possible due to

locomotion constraints on speed or rate of turning are also

pruned to ensure natural character motion (so after a stay-

ing still animation, we will not allow a fast running anima-

tion). The next pruning applied is based on collision pre-

diction as we will see in the following section. The idea is

that when a node is expanded and a collision is detected, the

whole graph that could be expanded from it gets automati-

cally pruned. The pruning process reduces the branching fac-

tor of the search, and also ensures natural footstep selection

5.5. Collision Prediction

While expanding nodes the planning algorithm must check

for each expanded node whether the future state is collision

free or not. If it is collision free, then it maintains that node

and continues expanding it. Otherwise, it will be discarded.

In order to have large simulations in complex environments

we need to perform this pruning process in a very fast man-

ner.

In order to predict collisions against other agents or obsta-

cles (both dynamic or static), we introduce a multi-resolution

collision detection scheme which performs collision checks

for two resolution levels. Our lowest resolution collision de-

tection model is a simple cylinder centered at the root of the

agent with a fixed radius. The higher resolution model con-

sists of five cylinders around the end joints (head, hands and

feet) that are used to make finer collision tests Figure 5.


152


We could introduce more collision models, where high

resolution ones will be executed only in case of detecting

collisions using the coarser ones. At the highest complexity

mode we could have the full mesh collision check, but for

the purpose of our simulation the 5 cylinders model gives us

enough precision to avoid agents walking with their arms in-

tersecting against other agents as they swing back and forth.

Compared against simpler approaches that only consider ob-

stacle detection against a cylinder, our method gives better

results since it allows us to have closer interactions between

agents. All obstacles have simple colliders (boxes, spheres,

capsules) to accelerate the collision checks by using a fast

physics ray casting test.

It is also important to mention that collision tests are not

only performed using the initial and end positions of the

expanded node, but also with sub-sampled positions inside

the animation (for the 5 cylinder positions). For example, an

agent facing a thin wall as a start position and the other side

of the wall as end position of its current walk forward step.

If we only check for possible collisions with those start and

end positions we would not detect that the agent is actually

going through the wall.

The sub-sample for each animation is performed off-line

and stored in the annotated animation. To save memory, this

sampling is performed at low frequencies and then in real

time intermediate positions can be estimated by linear inter-

polation.

Finally, we provide the characters with a surrounding

view area to maintain a list of obstacles and agents that are

potential threats to our path (see figure 6). For each agent, we

are only interested in those obstacles/agents that fall within

the view area in order to avoid running unnecessary collision

tests.

5.5.1. Static World

Static obstacles are part of the same static world that is used

to compute the navigation mesh with Recast [Mon09]. They

Figure 5: Collision model of 5 cylinders around the head,

the left and right hands, and the left and right foot.

Figure 6: When planning we only consider obstacles and

agents that are inside the view area. Obstacles A, B and

agent a are inside it and the agent will try to avoid them,

while it will ignore obstacles C, D and agents b and c .

do not need to have a special treatment since the high-level

path produces waypoints that avoid collisions with static ob-

stacles..

5.5.2. Deterministic Dynamic Obstacles and Other

Agents

Deterministic obstacles move with a predefined trajectory.

Other agents have precomputed paths which can be queried

to predict their future state. To avoid interfering with those

paths we allow access to their temporal trajectories. So, for

each expanded node with state time t we check for collisions

with every obstacle and agent that falls inside his view area

at their trajectory positions at time t. Figure 7 shows an ex-

ample of an agent avoiding two dynamic obstacles.

5.5.3. Unpredictable Dynamic Obstacles

Unlike deterministic dynamic obstacles and other agents,

unpredictable dynamic obstacles are impossible to be ac-

counted for while planning. Therefore they can be ignored

when expanding nodes, but we need a fast way to react to

them. This is the reason why we need the events monitor to

detect immediate collisions and force re-planning. Figure 8

shows an example where a wall is arbitrarily moved by the

user and the agent needs to continuously re-plan its trajec-

tory.

6. Animation Engine

The animation engine is in charge of playing the output se-

quence of actions given by the planner. These actions contain

all the data in the annotated animation. When a new action is


153


Figure 7: An agent planning with two dynamic obstacles

in front of him (top). After executing some steps the path is

re-planned. The blue obstacle indicates that it is not in his

nearby area anymore, so that obstacle is not considered in

the collision check of this new plan. (bottom)

played it sets t0 as the initial time of the animation. When the

current animation reaches tend the animation engine blends

the current animation with the next one in the queue.

The Animation Engine also tracks the global root posi-

tion and orientation, and applied rotation corrections by ro-

tating the whole character using the rotation values of the

annotated animation (rotation angle θ). The blending time

between actions can be user defined within a short time (for

example 0.5s).

7. Results

The presented framework has been implemented using the

ADAPT simulation platform [SMKB13] which works with

Unity Game Engine [Uni13] and C# scripts. Our current

framework can simulate around 20 agents at approximately

59-164 frames per second (depends on the maximum plan-

ning time allowed), and 40 agents at 22-61 frames per

second (INtel Core i7-2600k CPU @ 3.40GHz and 16GB

RAM). Figure 9 shows the frame rates achieved on average

for an increasing number of agents. The black line corre-

sponds to a maximum planning time of 0.01s, and the red

line corresponds to 0.05s. Additionally, by setting planner

Figure 8: An agent reacting to a non-deterministic obstacle

by re-planning his path.

parameters such as the horizon of the search, we can achieve

significant speedup at the expense of solution fidelity. For

example, we can produce purely reactive simulations where

the character only plans one footstep ahead by reducing the

search horizon to 1.

Figure 9: This graph shows the frames per second on av-

erage for different simulations with increasing number of

agents. We have used two values for the maximum planning

time: 0.01 resulting in higher frame rates, and 0.05 resulting

in lower frame rates but better quality paths

The results showed have been made with a database of 28

motion captured animations. This is a small number com-

pared to approaches based on motion graphs (generally hav-


154


ing around 400 animation clips), but a large number com-

pared with techniques based on handmade animation (such

as pre-computed search trees). This decision allows us to

achieve results that look natural and yet can be used for real

time applications.

Our approach solves different scenarios where several

agents are simulated in real-time achieving natural looking

paths while avoiding other obstacles and characters (see ac-

companying videos). The quality of the results in terms of

natural paths and collision avoidance depends on the plan-

ner. The planner will be given a specific amount of time to

find a solution (which translates in how many nodes of the

graph are expanded). Obviously when we allow larger search

times (larger number of nodes to expand) the resulting tra-

jectory looks more natural and is collision free, but at the

expense of being more computationally expensive. Alterna-

tively, if we drastically reduce the search time (smaller num-

ber of nodes to expand) we may end up having collisions as

we can see in the resulting videos and in Figure 10.

Interleaving planning with execution provides smooth an-

imations, since not all the characters plan their paths simu-

lateneously. At any time, the new plan is calculated with the

start position being the end position of the current action.

We have also shown how the Events Monitor can suc-

cessfully plan routes when deterministic obstacle invalidate

a character’s plan, as well as efficiently react to non deter-

ministic obstacles (see Figures 7 and 8)

8. Conclusions and Future Work

We have presented a multi-agent simulation approach where

planning is done in the action space of available animations.

Animation clips are analyzed and actions are extracted and

annotated, in order to be used in real time to expand a search

tree. Nodes are only expanded if they are collision free. To

predict collisions we sample animations and use a new col-

lision model with colliders for each end joint (head, hands

and feet). This way we are able to simulate agents avoiding

more detailed collisions. The presented framework handles

both deterministic and non-deterministic obstacles, since the

former can be taken into consideration when planning, while

the later needs a completely reactive behavior.

Unlike pre-computed search trees our set of transitions is

composed of actions, and mainly footsteps, which allows us

to build online the search tree and to dynamically prune it,

considering not only start and goal positions, but also de-

parture and arrival times. An events monitor can help us to

decide when to re-plan the path, based on the environment

situation such as obstacle proximity or velocity.

We would like to further extend the hierarchical nature of

this work to add granularity (both in models and domains)

to adaptively switch between them [KCS10, Lac02, SG10].

Solutions from a coarser domain could also be reused to

accelerate the search into a finer domain, using techniques

such as tunneling [GCB∗11]. Another idea would be to have

a special class of actions constituting a reactive domain that

would only be used in case of an imminent threat. Since non-

deterministc obstacles invalidating the current plan force

to replan constantly, it would be interesting to carry out

a quantitative study on the impact of the number of non-

deterministic obstacles in the frame rate obtained for differ-

ent number of agents.

As Illustrated in 9, the computational complexity of our

framework scales linearly with number of agents. By re-

ducing the search depth and maximum planning time, we

can simulate a larger crowd of characters at interactive rates.

Choosing the optimal value of these parameters that balance

computational speed and agent behavior is an interesting re-

search direction, and the subject of future work. Our frame-

work is not memory bound, and is amenable to paralleliza-

tion with each agent planning on an independent thread.

Notice that memory is required per animation ( to store

sub-sampled animations) and not per agent in the simula-

tion, therefore increasing the size of the simulated group of

agents would not have an impact on the memory require-

ments of our system. If we wanted to simulate crowds of

characters we would need more CPU power, but not mem-

ory as long as we had more instances of characters sharing

the same skeleton and animations.

We would also like to improve our base search algorithm

with a faster one taking into account repairing capacities

such as ARA* [LGT03]. Having more characters and dif-

ferent sets of actions that can be used depending on the

situation, like a reaction domain, would also accelerate the

search and give better results to our simulations in constantly

changing dynamic virtual environments.

Acknowledgements

This work has been partially funded by the Spanish Min-

istry of Science and Innovation under Grant TIN2010-

20590-C01-01. A. Beacco is also supported by the grant

FPUAP2009-2195 (Spanish Ministry of Education). We

would also like to acknowledge Francisco Garcia for his im-

plementation of the Best First Search algorithm.

References

[Che04] CHENNEY S.: Flow tiles. In Proceedings of the 2004

ACM SIGGRAPH/Eurographics symposium on Computer ani-

mation (2004), Eurographics Association, pp. 233–242. 2

[CMU13] CMU: Cmu graphics lab motion capture database,2013. http://mocap.cs.cmu.edu/. 4

[EvB10] EGGES A., VAN BASTEN B.: One step at a time: An-imating virtual characters based on foot placement. The Visual

Computer 26, 6-8 (apr 2010), 497–503. 2

[FM12] FELIS M., MOMBAUR K.: Using Optimal Control Meth-ods to Generate Human Walking Motions. Motion in Games

(2012), 197–207. 2


155


Figure 10: Example with four agents crossing paths with a drastically reduced search time resulting in agents a and c not

being able to avoid intersection as seen in the last two images of this secuence. Also notice how agent b, walks straight towards

c, steps back and then continues, instead of following a smooth curve around c.

[GCB∗11] GOCHEV K., COHEN B., BUTZKE J., SAFONOVA A.,LIKHACHEV M.: Path planning with adaptive dimensionality. InFourth Annual Symposium on Combinatorial Search (2011). 9

[HFV00] HELBING D., FARKAS I., VICSEK T.: Simulating dy-namical features of escape panic. Nature 407, 6803 (2000), 487–490. 2

[KBG∗13] KAPADIA M., BEACCO A., GARCIA F., REDDY V.,PELECHANO N., BADLER N. I.: Multi-Domain Real-time Plan-ning in Dynamic Environments. In Proceedings of the 2013 ACM

SIGGRAPH/EUROGRAPHICS Symposium on Computer Anima-

tion (2013), SCA. 2

[KCS10] KRING A. W., CHAMPANDARD A. J., SAMARIN N.:Dhpa* and shpa*: Efficient hierarchical pathfinding in dynamicand static game worlds. In Sixth Artificial Intelligence and Inter-

active Digital Entertainment Conference (2010). 9

[KGP08] KOVAR L., GLEICHER M., PIGHIN F.: Motion graphs.In ACM SIGGRAPH 2008 classes (2008), ACM, p. 51. 2

[KSHF09] KAPADIA M., SINGH S., HEWLETT W., FALOUTSOS

P.: Egocentric affordance fields in pedestrian steering. In Pro-

ceedings of the 2009 symposium on Interactive 3D graphics and

games (New York, NY, USA, 2009), I3D ’09, ACM, pp. 215–223. 2

[KWRF11] KAPADIA M., WANG M., REINMAN G., FALOUT-SOS P.: Improved benchmarking for steering algorithms. In Mo-

tion in Games. Springer, 2011, pp. 266–277. 6

[Lac02] LACAZE A.: Hierarchical planning algorithms. InAeroSense 2002 (2002), International Society for Optics and Pho-tonics, pp. 320–331. 9

[LGT03] LIKHACHEV M., GORDON G., THRUN S.: Ara*: Any-time a* with provable bounds on sub-optimality. Advances in

Neural Information Processing Systems (NIPS) 16 (2003). 9

[LK06] LAU M., KUFFNER J. J.: Precomputed search trees: plan-ning for interactive goal-driven animation. In Proceedings of the

2006 ACM SIGGRAPH/Eurographics symposium on Computer

animation (2006), Eurographics Association, pp. 299–308. 2

[MC12] MIN J., CHAI J.: Motion Graphs++. ACM Transactions

on Graphics 31, 6 (Nov. 2012), 1. 2

[Mon09] MONONEN M.: Recast navigation toolkit webpage,2009. http://code.google.com/p/recastnavigation/. 1,5, 7

[OP13] OLIVA R., PELECHANO N.: Neogen: Near optimal gen-erator of navigation meshes for 3d multi-layered environments.Computer & Graphics 37, 5 (2013), 403–412. 1

[PAB07] PELECHANO N., ALLBECK J. M., BADLER N. I.:

Controlling individual agents in high-density crowd simula-tion. In Proceedings of the 2007 ACM SIGGRAPH/Eurographics

symposium on Computer animation (Aire-la-Ville, Switzerland,Switzerland, 2007), SCA ’07, Eurographics Association, pp. 99–108. 2

[PSB11] PELECHANO N., SPANLANG B., BEACCO A.: Avatarlocomotion in crowd simulation. In International Conference

on Computer Animation and Social Agents (CASA) (Chengdu,China, 2011), vol. 10, pp. 13–19. 2

[Rey87] REYNOLDS C. W.: Flocks, herds and schools: A dis-tributed behavioral model. In ACM SIGGRAPH Computer

Graphics (1987), vol. 21, ACM, pp. 25–34. 2

[RZS10] REN C., ZHAO L., SAFONOVA A.: HumanMotion Syn-thesis with Optimization-Based Graphs. Computer Graphics Fo-

rum (Proceedings of Eurographics 2010) 29, 2 (2010). 2

[SAC∗07] SUD A., ANDERSEN E., CURTIS S., LIN M.,MANOCHA D.: Real-time path planning for virtual agents in dy-namic environments. In Virtual Reality Conference, 2007. VR’07.

IEEE (2007), IEEE, pp. 91–98. 2

[SG10] STURTEVANT N. R., GEISBERGER R.: A comparison ofhigh-level approaches for speeding up pathfinding. Artificial In-

telligence and Interactive Digital Entertainment (AIIDE) (2010),76–82. 9

[SKH∗11] SINGH S., KAPADIA M., HEWLETT B., REINMAN

G., FALOUTSOS P.: A modular framework for adaptive agent-based steering. In Symposium on Interactive 3D Graphics and

Games (New York, NY, USA, 2011), I3D ’11, ACM, pp. 141–150 PAGE@9. 2

[SKRF11] SINGH S., KAPADIA M., REINMAN G., FALOUTSOS

P.: Footstep Navigation for Dynamic Crowds. Computer Anima-

tion And Virtual Worlds 22, April (2011), 151–158. 1, 2

[SMKB13] SHOULSON A., MARSHAK N., KAPADIA M.,BADLER N. I.: Adapt: the agent development and prototyp-ing testbed. In Proceedings of the ACM SIGGRAPH Symposium

on Interactive 3D Graphics and Games (New York, NY, USA,2013), I3D ’13, ACM, pp. 9–18. 8

[TCP06] TREUILLE A., COOPER S., POPOVIC Z.: Contin-uum crowds. In ACM Transactions on Graphics (TOG) (2006),vol. 25, ACM, pp. 1160–1168. 2

[TLCDC01] TECCHIA F., LOSCOS C., CONROY-DALTON R.,CHRYSANTHOU Y.: Agent behaviour simulator (abs): A plat-form for urban behaviour development. 2

[TLP07] TREUILLE A., LEE Y., POPOVIC Z.: Near-optimalcharacter animation with continuous control. In ACM Transac-

tions on Graphics (TOG) (2007), vol. 26, ACM, p. 7. 2


156


[Uni13] UNITY: Unity - game engine, 2013. http://unity3d.com/. 8

[vBE09] VAN BASTEN B. J. H., EGGES A.: Evaluating distancemetrics for animation blending. In Proceedings of the 4th In-

ternational Conference on Foundations of Digital Games (NewYork, NY, USA, 2009), FDG ’09, ACM, pp. 199–206. 4

[vBEG11] VAN BASTEN B., EGGES A., GERAERTS R.: Com-bining Path Planners and Motion Graphs. Computer Animation

and Virtual Worlds 22, 1 (2011), 59–78. 2

[vBPE10] VAN BASTEN B. J. H., PEETERS P. W. A. M., EGGES

A.: The step space: example-based footprint-driven motion syn-thesis. Computer Animation and Virtual Worlds 21, 3-4 (May2010), 433–441. 2

[WP95] WITKIN A., POPOVIC Z.: Motion warping. In Proceed-

ings of the 22nd annual conference on Computer graphics and

interactive techniques (1995), ACM, pp. 105–108. 2

[ZS09] ZHAO L., SAFONOVA A.: Achieving good connectivityin motion graphs. Graphical Models 71, 4 (2009), 139–152. 2


157

Sesion 5

Games and Education

An extensible framework for teaching Computer Graphics

with Java and OpenGL

Carlos J. Ogáyar, Juan José Jiménez, José M. Noguera

Grupo de Gráficos y Geomática, Universidad de Jaén. Campus Las Lagunillas, Edificio A3. 23071 Jaén, Spaincogayar, juanjo, [email protected]

Abstract

This paper reports our experience teaching realistic visualization techniques to undergraduate students during

four academic years. Our students use a framework designed by the professors and implemented in Java and

JOGL (Java Binding for OpenGL). This framework combines the ease of use of the Java programming language

with a number of auxiliary classes that provides starting points for 2D, 3D and user interface elements. This

framework can also be combined with standard low-level OpenGL calls in order to allow students to learn all the

basic concepts but in a faster and more efficient way than with classic tools such as plain C++/OpenGL. Since we

introduced our framework, our students managed to achieve more complex results with the same working time.

Categories and Subject Descriptors (according to ACM CCS): K.3.2 [Computer Graphics]: Computing milieux/-Computers and education—Computer and Information Science Education

1. Introducción

La Informática Gráfica juega un papel importante en va-rias titulaciones de estudios superiores en Informática. Lamayoría de las asignaturas duran un semestre, a lo largo delcual debe impartirse una cantidad considerable de concep-tos, algunos de elevada dificultad. Para la parte práctica, elenfoque clásico habitual está basado en el uso de C/C++,OpenGL y alguna librería sencilla como GLUT [Kil96], má-xime cuando se busca el acceso a una funcionalidad gráfi-ca de bajo nivel. El problema principal que se presenta alusar dichas herramientas es la cantidad de tiempo requeridopara desarrollar cualquier aplicación gráfica medianamentecompleja. Todo es responsabilidad del estudiante, desde laconfiguración de la cámara hasta la textura de los objetosy la creación de shaders. Si además se desea implementarfuncionalidades más avanzadas como animación en tiemporeal, detección de colisiones o carga de escenarios dinámica,la tarea se vuelve imposible sin algún software adicional.

Este trabajo presenta una experiencia de cuatro años aca-démicos (2009/2010 a 2012/2013) utilizando un framework

basado en Java+Swing y JOGL [XYC05] que hemos desa-rrollado expresamente con fines docentes. El framework eslo suficientemente polivalente como para poder usarse encualquier asigantura relacionada con la Informática Gráfica.

No obstante, la experiencia docente descrita en este artículose centra en una asignatura de “Visualización y Realismo”.

Nuestro framework implementa una jerarquía de clasesque permite al estudiante encapsular casi todo lo referenteal uso de OpenGL. La gran ventaja es que toda la libreríaestá abierta para los alumnos, por lo que pueden adaptar yextender cualquier parte a sus necesidades. Además de be-neficiarse de un grafo de escena y de múltiple clases de ayu-da para la visualización, los estudiantes siguen teniendo queutilizar OpenGL a bajo nivel, por lo que la docencia de losconceptos básicos no se ve mermada.

2. Trabajos previos

La idea de desarrollar este software para la docencia deInformática Gráfica surge de experiencias similares desarro-lladas por otras universidades. Por ejemplo, la necesidad deobligar al alumno a “no reinventar la rueda” y de incenti-var el uso de librerías para solucionar tareas básicas ya fueseñalada por Anderson y Peters [AP10]. En su trabajo seclasifican gran cantidad de librerías estándar (en su mayoríade software libre) que pueden ser de utilidad en una asig-natura de gráficos. Pero estas librerías son genéricas, y nosiempre se adaptan a las necesidades de un aula universi-taria. Por tanto, varios autores han preferido implementar




161

C.J. Ogáyar, J.J. Jiménez & J.M. Noguera / An extensible framework for teaching Computer Graphics with Java and OpenGL

sus propias librerías y marcos de trabajo para la docenciade gráficos [RY09,GC09] o incluso programación en gene-ral [FLCS10,Yan09].

Respecto al lenguaje de programación empleado comomedio para enseñar los conceptos de gráficos, Noguera etal. [NSO10] ya indicaron que la complejidad C++/OpenGLpodía restar tiempo a la adquisición de las destrezas objeti-vo de la asignatura. Por esa razón, propusieron emplear ellenguaje declarativo X3D y combinarlo con el desarrollo devideojuegos para incentivar el aspecto lúdico de la práctica.Rhodes y Yan [RY09] también coinciden en las ventajas deusar Java en lugar de C++/OpenGL para la docencia gráfica,y presentan un marco de trabajo enfocado al desarrollo de unsistema gráfico completo.

3. Materiales

El objetivo principal del framework desarrollado es faci-litar la programación de pequeñas aplicaciones gráficas pa-ra la práctica de asignaturas relacionadas con la InformáticaGráfica. Los requisitos mínimos que se impusieron desde elprincipio son los siguientes:

Posibilidad de programar OpenGL a bajo nivel. Esto esfundamental para una docencia correcta de gráficos 3D.Es innegable la necesidad de adquirir las competenciasy habilidades para la programación gráfica de alto rendi-miento a bajo nivel con OpenGL/OpenGL-ES. Los traba-jos más importantes del sector, como simulación, video-juegos o aplicaciones móviles exigen un alto control sobreel cauce gráfico y un diseño cuidadoso de cara a un máxi-mo rendimiento.Ajustar el esfuerzo del alumno al tiempo disponible. Al-gunas tareas como la carga de texturas, shaders o modelos3D, la gestión del grafo de escena, etc., son tareas muy la-boriosas pero no excesivamente complicadas, por lo quees mejor ofrecerlas al alumnado ya resueltas a un nivelbásico. En caso de centrar la docencia en estos aspectos,el alumno mejorará el sistema existente aprendiendo delcódigo suministrado. Mediante una serie de clases de ayu-da, se encapsula gran parte de la aplicación gráfica, perode una forma extensible, lo que permite al alumno mante-ner el control total.Sustitución de la interfaz típica de GLUT/GLUI por unaalternativa más completa y flexible. En este sentido, laprogramación de la interfaz de ventanas de las aplicacio-nes de prácticas es algo interesante e ilustrativo de caraal diseño de software (que es una competencia básica adesarrollar por el alumno). No obstante, este aspecto que-da fuera de los contenidos de las asignaturas de Informá-tica Gráfica. Suministrando un framework extensible, elalumno puede integrar sólo los componentes de interfazque precise (checkboxes, combos, frames, etc.).

El framework que hemos desarrollado consiste en unaaplicación-librería que el alumno extiende para implementar

su propio software de entorno virtual 3D basado en OpenGL.Hay una serie de clases agrupadas en paquetes adaptadas alpatrón modelo-vista-controlador. De esta forma hay clasesque se encargan de los datos del software para definir el en-torno 3D (modelo), la representación de datos 3D y la inter-faz de ventanas (vistas) y los controladores asociados paragestionar eventos de entrada.

Además, para facilitar la elaboración de las prácticas, seproporciona el código necesario para definir una interfaz decontroles y un entorno 3D básico, a modo de plantilla, quesirve para acelerar las primeras fases de desarrollo y evitarel miedo del estudiante a enfrentarse a una “pantalla vacía”,ver Figura 1. Hay que tener en cuenta que nuestro objeti-vo no es desarrollar un editor de escenas 3D genérico, sinobrindar al estudiante una ayuda que le permita implemen-tar rápidamente pequeñas aplicaciones basadas en OpenGL,incluyendo una interfaz de usuario completa.

Las clases más importantes del framework son:

Clases de gráficos. Son clases que encapsulan código deOpenGL y algunas utilidades. Se incluyen los llamadosgadgets, que son clases gráficas de ayuda para la visuali-zación: suelo virtual, ejes, horizonte artificial, panoramas,etc. También entran en esta categoría las clases para dibu-jar objetos como esferas, cajas, modelos OBJ, etc.Clases de interfaz de usuario. El framework basa su in-terfaz en Swing. Se proporciona un entorno inicial conuna ventana principal que contiene una serie de panelescon diversos contenidos: barras de botones, ventanas deestado, bitácora y un panel lateral donde los alumnos in-troducen los controles que necesitan para su aplicación,ver Figura 1. Por supuesto este esquema es muy adap-table. No obstante, es interesante ofrecer una interfaz departida que resuelve la mayoría de las necesidades de in-terfaz de una aplicación básica.Clases principales. Para facilitar la realización de prácti-cas básicas se utilizan dos clases principales, una de vistade la escena 3D y otra de vista de interfaz de usuario (elpanel lateral). La idea es que unas prácticas básicas pue-dan resolverse modificando solamente en dos puntos dela aplicación. Obviamente este es un enfoque a modo deplantilla que puede extenderse con facilidad. Se ha plan-teado de esta forma para poder implementar aplicacionespequeñas de una forma tan rápida como con el tándemclásico C++/GLUT.

Desde el punto de vista de la funcionalidad, los compo-nentes más relevantes del sistema son los siguientes:

Gestión de la ventana de vista y control de cámara. Seencapsula el código de OpenGL destinado a la ventanade vista. La gestión de cámara encapsula los parámetrosoperativos de una cámara virtual. Permite proyeccionesen perspectiva y en paralela y proporciona las matrices deproyección y modelado que necesita OpenGL. Incorpo-ra además métodos para los movimientos de cámara en3D, como rotación alrededor del punto objetivo, rotación


162


Figure 1: Plantilla inicial proporcionada al estudiante. Se aprecia el panel principal, la barra de botones y estado, y algunos

gadgets (panorama, rejilla, ejes de coordenadas).

del punto objetivo, cabeceo, translación y dolly, mirar aun punto, o el ajuste a caja envolvente. También se im-plementa un controlador que acepta eventos de ratón yteclado para el control de cámara.

Creación de objetos. Para recrear un entorno 3D es nece-sario crear instancias de distintos objetos. Se ofrecen cla-ses de creación de objetos básicos como planos, cajas, ci-lindros, conos, esferas, la tradicional tetera y modelos enformato OBJ a cargar desde memoria secundaria. Todoslos objetos se crean teniendo en cuenta una generación decoordenadas de textura personalizable. En el caso de mo-delos OBJ se admite la carga de materiales y texturas.

Gestión de materiales. La creación de materiales es bas-tante simple. La clase base disponible incluye los paráme-tros típicos de un sistema de iluminación local basado enPhong. Para una visualización más avanzada, el alumnoextiende esta clase para adaptarla a sus necesidades. Estoes muy común en el caso de que se implementen shaders

personalizados, la que la conexión semántica entre mate-rial y shader impone ciertas restricciones.

Gestión de texturas y shaders. Al igual que ocurre conlos objetos 3D cargados desde memoria secundaria, lagestión de texturas y de shaders está centralizada. De estaforma, si en el programa se carga el mismo archivo variasveces, sólo se utilizará una copia en memoria y el resto deusos se resolverá mediante referencias. Esto permite utili-zar el mismo mapa de textura (o el mismo shader) desdepuntos independientes de la aplicación (distintos objetos3D) de una forma muy eficiente.

Gestión de un grafo de escena. La forma clásica de orga-nizar elementos en una escena 3D es mediante un grafo.En las aplicaciones clásicas basadas en OpenGL/GLUT elgrafo es implícito en el código fuente y se genera median-te las instrucciones de dibujado pertinentes, es decir, lapila de matrices, la máquina de estados de OpenGL, etc.Este enfoque sigue siendo válido en el framework. Peroademás, se incluye la gestión de un grafo de escena queayuda a organizar los distintos elementos de forma jerár-quica, incluyendo objetos, shaders, luces, etc.

Gadgets. Son elementos gráficos que ayudan a la visua-lización y depuración del entorno virtual, como ejes decoordenadas virtuales, suelo virtual de rejilla adaptablesegún la posición de la cámara, horizonte artificial, me-didor de fotogramas por segundo, etc.

Otras clases de utilidad. Se incluyen clases para manipu-lación de luces, gestión de matrices de transformación (almargen del grafo de escena), gestión del estado de visua-lización, animaciones y soporte para efectos adicionalescomo sistemas de partículas, reflexiones o sombras basa-das en texturas, entre otros.

Sistema de bitácora. Mediante una clase de bitácora yla correspondiente ventana integrada en la interfaz, elalumno puede gestionar mensajes de información, alertasy errores de forma ordenada y categorizada. Esto permi-te tener un registro de mensajes para partes concretas delsoftware, lo que ayuda en gran medida el proceso de de-puración y control.


163


Figure 2: Algunas prácticas de nuestros alumnos.

4. Métodos

El framework ha sido diseñado con la intención de em-plearlo en las diversas asignaturas de Informática Gráficaimpartidas en la Universidad de Jaén. No obstante, a fin deevaluar su valor docente, se ha incorporado inicialmente a laasignatura semestral de “Visualización y Realismo” imparti-da en el primer curso del segundo ciclo de Ingeniería Infor-mática. Los alumnos de esta asignatura tienen en su mayoríaexperiencia previa en Informática Gráfica, aunque esto no esun prerrequisito.

Antes de implantar el nuevo framework, las prácticas con-sistían en la elaboración de un software que visualizara unentorno virtual en 3D utilizando C++ y OpenGL. La inter-faz solía resolverse con GLUT, GLUI o alguna librería másavanzada. Los requisitos de las prácticas eran implementaruna escena con un número mínimo de elementos (objetos)y una serie de efectos de visualización realista (iluminaciónglobal, sombras, reflexiones, etc). Las principales desventa-jas de este enfoque eran la falta de una idea clara de cómoconseguir la implementación del software, la dificultad deri-vada del uso de C++ (algo más problemático que Java) o elenorme tiempo dedicado a construir las soluciones a proble-mas más básicos (como el control de la cámara, la gestión detexturas y shaders, la carga de modelos de disco, etc). Losalumnos más avezados buscaban tutoriales en la web paracada uno de los problemas y construían su software a par-tir de código existente. Esto último nunca estuvo exento degrandes problemas, ya que los programas de ejemplo de laweb suelen tener implementaciones poco claras y muy malestructuradas. En general, podemos determinar que el prin-cipal problema era la falta de tiempo para la implementaciónde una aplicación medianamente compleja.

La introducción de nuestro framework en las prácticas dela asignatura nos permitió salvar estos problemas al propor-cionar al alumnado una documentación completa y unifica-da, así como un tutorial para dar los primeros pasos medianteuna aplicación de ejemplo. Además el código fuente sumi-nistrado está debidamente documentado (estilo JavaDoc).

En concreto, la metodología docente que hemos empleadopara la parte práctica de la asignatura se basa en la realiza-ción individual de un proyecto de largo término. El desarro-llo del mismo se prolonga durante todo el semestre y consis-te en la creación de una escena 3D de temática libre. Los re-quisitos indispensables exigidos al estudiante para obtener elaprobado son los siguientes. En primer lugar, la escena debeconstar de un contexto en forma de decorado o entorno (máso menos detallado en función del esfuerzo del estudiante).Esta escena debe incluir obligatoriamente una serie de obje-tos con materiales y texturas, así como de diversas luces dedistintos tipos. Para poder insertar las luces, el estudiante de-be implementar diversos modelos de iluminación local (lam-bertiano, phong, oren-nayar, strauss, anisotrópico, etc.) condistintos parámetros. Por último, también se pide la creaciónde controles de usuario (con Swing) para controlar distintosaspectos de la escena a elegir por el alumno (por ejemplo,parámetros de las luces). La Figura 2 muestra algunas panta-llas de aplicaciones realizadas por nuestros alumnos usandonuestro framework como base.

Es de destacar que partimos de la asunción de que un es-tudiante motivado aprenderá más [FLCS10]. Por tanto, paraobtener mayor calificación se ofrece al estudiante la posibi-lidad de implementar elementos adicionales a su libre elec-ción. La lista se deja abierta para incentivar la curiosidady ganas de investigar del estudiante. Algunos de estos ejer-cicios optativos más frecuentes son: sistemas de partículas,niebla volumétrica, sombras, animaciones, y demás conteni-dos vistos en teoría o relacionados con el temario.

Además de todo lo anterior, también se pide a los estu-diantes la entrega de una documentación que incluya ins-trucciones de uso, descripción de interacciones y órdenesadicionales implementadas, así como diagramas UML sobrela estructura de clases de la parte desarrollada.

La evaluación consiste en una presentación pública en cla-se de cada proyecto. El estudiante evaluado debe describirlos principales elementos que ha implementado en su apli-cación, tanto desde el punto de vista conceptual como de


164


código. Se le pide hacer especial hincapié en las principalesdificultades encontradas y en las soluciones desarrolladas.La presentación termina con una demostración en vivo delproyecto de cada estudiante, seguido de una ronda de pre-guntas (tanto del profesor como del resto de compañeros).

5. Resultados

Hasta ahora, el framework propuesto se ha utilizado enla asignatura “Visualización y Realismo” de la Universi-dad de Jaén durante cuatro años académicos (2009/2010 a2012/2013). De los 111 alumnos evaluados, 89 (80.2%) eranhombres y 22 (19.8%) eran mujeres. La edad media era21.76 años. La mayoría de los estudiantes (74.2%) tenía yaconocimientos previos de Informática Gráfica gracias a ha-ber cursado la asignatura optativa “Introducción a la Infor-mática Gráfica” durante el tercer curso. Todos los estudian-tes dieron su consentimiento para participar en este estudio.

Al final de curso se pidió a los estudiantes que cumpli-mentaran un cuestionario anónimo para recoger sus impre-siones y grado de satisfacción con las prácticas de la asig-natura. El objetivo del cuestionario era averiguar el grado desatisfacción del alumnado, averiguar la dificultad que entra-ña usar nuestro framework, y por último, descubrir cómo lovalora el alumnado como herramienta docente.

El cuestionario contenía 11 preguntas de respuesta limi-tada. Cada respuesta se contestaba con una escala Likertde cinco puntos, donde 1 significa “muy mal” y 5 significa“muy bien”. Los resultados se han analizado usando estadís-tica descriptiva. La Tabla 1 muestra los resultados obtenidos.Para cada pregunta se proporciona la media (X) y la desvia-ción típica (σ).

6. Discusión

A la vista de los resultados de la Tabla 1, podemos afirmarque el nivel de satisfacción de los alumnos al realizar la partepráctica de la asignatura con el nuevo framework ha sidomuy elevado. En la mayoría de los casos, las respuestas delalumnado han sido muy positivas. Hay que destacar que lasdos preguntas mejor valoradas del cuestionario (Q4 y Q5)señalan la facilidad y eficiencia al realizar las prácticas conel framework propuesto en lugar de con el enfoque clásicoC++/OpenGL, ya conocido por la mayoría de estudiantes deotras asignaturas previas.

El uso de nuestro framework en las prácticas ha permi-tido simplificar de forma efectiva el proceso de desarrollodel software por parte del estudiante. En los años previos ala introducción del framework, los alumnos pasaban apro-ximadamente dos tercios del tiempo total disponible duranteel semestre implementando un esqueleto básico de programaen C++/OpenGL que les permitiera un mínimo de funciona-lidad. Ahora, en cambio, los alumnos encuentran una curvade aprendizaje mucho menos pronunciada (Q4), lo que les

permite obtener resultados iniciales mucho más rápidamen-te. Esta inmediatez en la obtención de resultados incentiva alalumno a continuar su trabajo y no abandonar la asignatura.

La motivación también ha aumentado notablemente, yaque se deja más espacio a la parte creativa y lúdica de laconstrucción de un entorno virtual. Al reducir el tiempo de-dicado a las labores tediosas del lenguaje C++ (Q5), haymucho más tiempo para experimentar con objetos comple-jos (directamente importados de editores 3D), creación deshaders e implementación de animaciones y otros efectos.Muchos alumnos han implementado con éxito efectos adi-cionales de bajo nivel, como reflexiones y sombras con sten-cil; cuestiones que intencionadamente no están resueltas porcompleto en el framework.

Las dos preguntas menos valoradas (Q7 y Q8) nos indi-can aspectos a mejorar en futuras versiones de nuestro fra-

mework. En general, los alumnos han demandado que éstese expanda para cubrir mayor cantidad de funcionalidadesde OpenGL de una forma sencilla desde Java. No obstante,este resultado hay que tomarlo con la debida cautela. Losestudiantes suelen buscar la forma más cómoda de llegar alresultado. No es el objeto de este framework el proporcionarlos resultados ya desarrollados, sino simplemente brindar losmedios para que los estudiantes lleguen a ellos por sus mis-mos medios.

También queremos señalar que la defensa pública de lasprácticas ha demostrado ser un medio muy útil para estimu-lar la competitividad entre los estudiantes. Recogiendo demanera informal las valoraciones y comentarios de los alum-nos en clase, hemos observado que los estudiantes se esfor-zaron a fin de desarrollar un proyecto de mayor compleji-dad. Ello se debía, sencillamente, al afán de presumir antesus compañeros. De hecho, la defensa de los proyectos fuevista como un proceso festivo, en lugar de como una evalua-ción. Los estudiantes disfrutaban mostrando sus creacionespersonales ante el resto de la clase.

En el aspecto académico hay que resaltar que el nivel delas calificaciones se ha mantenido a un nivel similar al quehabía antes de introducir nuestro framework. Ello se debe aque los criterios de evaluación y el nivel mínimo exigido seha adaptado de forma correspondiente. No obstante, pode-mos afirmar que el nivel de los trabajos ha aumentado muynotablemente (y por ende el nivel exigido). Es decir, con elmismo tiempo se ha conseguido multiplicar la cantidad deconceptos puestos en práctica y la complejidad de los entor-nos virtuales construidos.

7. Conclusiones

El uso del framework basado en Java en las prácticas hapermitido que los alumnos se centren mucho más en la cons-trucción de una escena 3D incluyendo modelado de objetosy shaders. Anteriormente usando C++/OpenGL perdían mu-cho tiempo con los problemas intrínsecos de dicho lenguaje,


165


Pregunta X σ

Q1 Valora la facilidad de instalación de la librería para su uso en prácticas 3.823 0.635Q2 Valora la facilidad de uso de la librería en prácticas 4.235 0.664Q3 Valora el manual de usuario, de instalación y uso de la librería, en el entorno de desarrollo

utilizado en prácticas.4.294 0.771

Q4 Valora la facilidad para crear y visualizar una escena 3D básica. 4.702 0.587Q5 Puntúa si te ha resultado más eficiente y has perdido menos tiempo programando en Java que

programando en C++4.705 0.685

Q6 Valora el aspecto de no tener que programar líneas de código directamente en OpenGL 3.937 0.928Q7 Teniendo en cuenta la teoría y los ejemplos de OpenGL a bajo nivel, ¿Las clases proporcionadas

te han sido de utilidad para conseguir resultados?2.882 0.857

Q8 Valora la nomenclatura de los métodos a implementar en prácticas respecto a la facilidad deinterpretar el objetivo de cada uno.

3.176 1.074

Q9 ¿La experiencia con Swing ha sido mejor que con GLUT en otras asignaturas? 4.235 0.903Q10 Valora la facilidad de realizar las prácticas, utilizando la librería, con respecto a prácticas de

otras asignaturas en las que no se utilice.4.529 0.717

Q11 En general, ¿El esfuerzo empleado ha producido los resultados esperados dentro del tiempodisponible?

4.062 0.853

Table 1: Cuestionario de evaluación.

especialmente el uso de la memoria dinámica, los punterosy la búsqueda de fallos. Pese a que Java también tiene suspropios problemas, los alumnos se han adaptado mucho másrápidamente a este lenguaje. La utilización de Swing para lainterfaz de usuario ha sido muy efectiva, ya que ha permiti-do a cada alumno personalizar la interfaz de usuario y elabo-rar controles para manipular distintos aspectos de su entorno3D, como la posición de las luces, activación de grupos deobjetos, cambiar las propiedades de algunos shaders, etc.

El uso de OpenGL ha sido mucho más provechoso queen años anteriores. Las clases proporcionadas por el softwa-re resuelven las tareas más tediosas, por lo que los alumnoshan podido centrarse en los aspectos gráficos, sin renunciara una aplicación de interfaz potente. Mediante el uso de lasclases proporcionadas en el framework, cada alumno ha po-dido centrarse en lo más importante sin tener que empezardesde cero. Y lo más destacable, han trabajado con una apli-cación de complejidad media a todos los niveles, en lugardel típico “miniprograma” basado en GLUT, como el de lospequeños tutoriales que pueden encontrar en la Red. Esto esimportante para practicar las imprescindibles competenciasen ingeniería del software, programación de alto nivel y pa-trones de diseño.

Por último, los cuestionarios de satisfacción cumplimen-tados por los estudiantes nos han permitido saber que el es-fuerzo invertido por los profesores en el desarrollo de estesoftware se ha visto recompensado por una mayor motiva-ción y satisfacción general del alumnado.

Agradecimientos

Este trabajo ha sido parcialmente financiado por el Minis-terio de Economía y Competitividad de España y la Unión

Europea (mediante fondos FEDER) a través del proyecto deinvestigación TIN2011-25259; y por la Universidad de Jaéna través del proyecto de innovación docente PID43B.

References

[AP10] ANDERSON E. F., PETERS C. E.: No More Reinventingthe Virtual Wheel: Middleware for Use in Computer Games andInteractive Computer Graphics Education. In Eurographics 2010- Education Papers (2010), pp. 33–40. 1

[FLCS10] FERNÁNDEZ LEIVA A. J., CIVILA SALAS A. C.:Practices of advanced programming: Tradition versus innova-tion. Computer Applications in Engineering Education (2010).doi:10.1002/cae.20465. 2, 4

[GC09] GANOVELLI F., CORSINI M.: envymycar: A multipla-yer car racing game for teaching computer graphics. Computer

Graphics Forum 28, 8 (2009), 2025–2032. doi:10.1111/j.1467-8659.2009.01425.x. 2

[Kil96] KILGARD M. J.: The OpenGL Utility Toolkit(GLUT) Programming Interface, 1996. [accedido 17 Jun2013]. URL: http://www.opengl.org/resources/libraries/. 1

[NSO10] NOGUERA J. M., SEGURA R. J., OGAYAR C. J.: Usode X3D como herramienta de aprendizaje de técnicas de Reali-dad Virtual y Animación. In Actas del XX Congreso Español de

Informática Gráfica (CEIG) (2010), pp. 301–304. 2

[RY09] RHODES P. J., YAN B.: Easel: A Java Based Top-DownApproach to 3D Graphics Education. In Eurographics 2009 -

Education Papers (2009), pp. 29–36. 2

[XYC05] XU Z., YAN Y., CHEN J. X.: OpenGL Programming inJava. Computing in Science and Engineering 7, 1 (2005), 51–55.doi:10.1109/MCSE.2005.20. 1

[Yan09] YAN L.: Teaching object-oriented programming with ga-mes. Information Technology: New Generations, Third Interna-

tional Conference on (2009), 969–974. doi:10.1109/ITNG.2009.13. 2


166


Nuevo Grado en Diseño y Desarrollo de Videojuegos

I. Remolar , C. Rebollo, M. Chover

Departamento de Lenguajes y Sistemas InformáticosUniversitat Jaume I, Castellón, España

Abstract

El sector de los videojuegos es un sector emergente en el que los jugadores buscan cada día nuevos contenidos de

calidad. Debido a la gran cantidad de videojuegos actuales, los usuarios son más exigentes a la hora de elegir un

juego, valorando aspectos tan diferentes desde el punto de vista tecnológico como el realismo conseguido en los

gráficos, la interacción definida o incluso la portabilidad del juego. Estos niveles de exigencia hacen necesarios

que los desarrolladores tengan una formación multidisciplinar, debido a que la creación de un videojuego abarca

varios campos de conocimiento como son, básicamente, técnicos, artísticos y de comunicación. Debido a esto, se

hacía necesaria la creación de un grado específico para este campo en la universidad pública, de forma que esta

formación pionera estuviera al alcance de todos. Con esta finalidad, este artículo analiza y detalla la implantación

del nuevo Grado en Diseño y Desarrollo de Videojuegos en la Universitat Jaume I de Castellón.

Categories and Subject Descriptors (according to ACMCCS): I.3.3 [Computer Graphics]: Education—Videogaming

1. Introduction

Los videojuegos, considerados una curiosidad hace cuarentaaños, son hoy en día una de las formas más populares deentretenimiento y un componente dominante de la culturaglobal [cas10]. La ubicuidad y el crecimiento de la tec-nología de videojuegos hacen necesario entenderlos no sólocomo productos comerciales, sino como objetos estéticos,contextos de aprendizaje, construcciones técnicas o fenó-menos culturales. Además, los videojuegos son una parteimportante de los sistemas de entretenimiento digital debidoa que el usuario puede interactuar con los contenidos de laaplicación y compartir sus experiencias con amigos. Estohace que tengan una fuerte relación con las redes sociales,los entornos virtuales o los dispositivos móviles, entre otros.

La ISFE, Federación Europea de Software Interactivo[int10] ha publicado los resultados de una investigación quereflejan las tendencias cambiantes entre los jugadores devideojuegos europeos. Los encuestados, procedentes de 15mercados y con edades comprendidas entre los 16 y 49 años,afirmaron que dedican tanto tiempo a los videojuegos comoa ver la televisión o a relacionarse con la familia y los ami-gos, e indicaron que optan por los juegos como una forma di-vertida de pasar el tiempo mientras estimulan la imaginacióny agilizan la mente.

Las oportunidades de crecimiento del sector de desar-rollo de videojuegos son inimaginables: nuevas tecnologíasy nuevos dispositivos, hábitos de uso y consumo de jue-gos en aumento, nuevos modelos de negocio y conexióncon las redes sociales, son algunos de los aspectos clave.Según un estudio realizado por Global Entertainment andMedia Outlook [glo10] para el periodo 2009-2013, el sectorde los videojuegos en el mundo movió en 2009 una cifra esti-mada de facturación cercana a los 45.000 millones de euros,siendo en Europa, Oriente Medio y África donde se registraun mayor consumo. En este sector ya son 11 los países quehan superado los 1.000 millones de dólares de facturaciónanual (813 millones de euros). Para poder visualizar la mag-nitud de este sector con el resto de industrias del entreten-imiento, se puede indicar que, por ejemplo, los ingresos delsector de la música o del cine en EEUU representan aproxi-madamente un 45,4% de los ingresos de este sector, mientrasque los videojuegos representan un 54,6%. Según datos delsector, se espera que la industria crezca a nivel mundial porencima del 9% anual.

A nivel nacional, el Informe Anual de los ContenidosDigitales en España [dig12] de 2012 afirma que España semantuvo como séptimo país del mundo en facturación porconsumo de videojuegos y como cuarto país europeo. Aquí,las ventas de software de videojuegos alcanzaron en 2009


167

I. Remolar & C. Rebollo & M. Chover / Nuevo Grado en Diseño y Desarrollo de Videojuegos

los 638 millones de euros, habiéndose registrado una tasade crecimiento medio anual del 5% entre 2005 y 2009. Deesta forma, el sector de los videojuegos en España acaparael 53% del mercado de entretenimiento audiovisual e in-teractivo. Se estima que la venta de software de entreten-imiento en España seguirá aumentando de forma estable, conun crecimiento medio del 10% anual en los próximos años.

Viendo que las salidas profesionales son variadas dadoque es un sector claramente en crecimiento, algunos estu-dios analizaron el perfil del diseñador de videojuegos. Estedebía tener un caracter técnico, debido a la programaciónque lleva implícita toda aplicación interactiva, un caracterartístico, debido a la creatividad y diseño requeridos en todojuego y finalmente un caracter narrativo, que permita generarjuegos dentro de una temática atractiva para el usuario. Re-visando la bibliografía sobre este tema, el programa "Pro-fesionales Digitales", de la Secretaría de Estado de Teleco-municaciones y para la Sociedad de la Información [pro10],puesto en marcha a través de la entidad pública empresarialred.es y la CRUE (Conferencia de Rectores de las Univer-sidades Españolas) planteó en 2011 como objetivo princi-pal: "Desarrollo por parte de las Universidades de ProgramasFormativos de estudiantes y profesionales del sector con unperfil híbrido técnico-artístico, definiendo módulos forma-tivos, másteres o cursos de especialización en estas materiaspara fomentar la formación y cultura en este ámbito". Asímismo, en el informe publicado por el Ministerio de Indus-tria, Energía y Turismo sobre Contenidos Digitales [ofe13],se hace destacar que "las empresas del sector de contenidosdigitales demandan formación expresa en el ámbito del dis-eño gráfico, la programación informática y en idiomas".

Siguiendo con esta línea, en el informe "La industria deldesarrollo de videojuegos en España 2010. Resumen actual-izado de las oportunidades de negocio del Sector" [ind10],realizado por la Asociación española de empresas desarrol-ladoras de videojuegos y software de entretenimiento, seafirma que "el reto recae ahora en que sean las universidadespúblicas quienes ofrezcan también este tipo de titulaciones,en donde las empresas de desarrollo deberán tener un impor-tante papel". Actualmente, son mayoritariamente las empre-sas los principales centros de formación en este sector.

En la Universitat Jaume I, analizamos el mundo de la for-mación en el desarrollo de videojuegos y se planteó recogerel reto planteado por las distintas asociaciones y proponerun nuevo grado tecnológico específico orientado a este sec-tor. Con esta finalidad se analizó tanto el marco formativo enel extranjero como en España. En este análisis, se comprobóque dada la gran actividad desarrollada en el campo de losvideojuegos en el extranjero, era fácil encontrar referentesexternos en otros países que avalaran nuestra propuesta. Ac-tualmente, a nivel internacional, existen numerosas univer-sidades que imparten grados y másteres relacionados con eldesarrollo de videojuegos y el entretenimiento digital.

En cambio, a nivel nacional, se observó que la oferta en

Figure 1: Grado en Diseño y Desarrollo de Videojuegos en

la Universitat Jaume I.

formación de postgrado en desarrollo de videojuegos era am-plia. Sin embargo, a nivel de grado la oferta se limitaba auna titulación oficial en una universidad privada y dos tit-ulaciones en centros adscritos a universidades extranjeras.Faltaba, por tanto, oferta de la universidad pública, que fuela que se ofertó: Grado en Diseño y Desarrollo de Videojue-

gos (Figura 1).

2. Perfil del graduado en Diseño y Desarrollo de

Videojuegos

El objetivo general del Grado en Diseño y Desarrollo deVideojuegos se concibe como una formación para el diseño ydesarrollo de nuevas aplicaciones de ocio interactivo digitalo sistemas interactivos en general. La finalidad de este Gradoes impulsar la creación de profesionales en el ámbito de lasindustrias de creación de contenidos digitales, de modo queen la posible comercialización de los productos aparezca unvalor añadido que los haga competitivos en un mercado cadadía más pendiente de las nuevas tecnologías.

El graduado en Diseño y Desarrollo de Videojuegos es unprofesional capaz de participar en todo el proceso involu-crado en la creación y desarrollo de videojuegos. Es capazde diseñar con criterio estético y funcional su historia o juga-bilidad y también de programarlo adaptándolo a las últimastecnologías y tendencias de mercado. El alumno formado enesta titulación será capaz de generar sistemas que presentenun diseño atractivo y se configuren fácilmente para adaptarse


168


Figure 2: Organización de líneas de conocimiento en el

Grado en Diseño y Desarrollo de Videojuegos.

a cualquier requerimiento, y que permitan que los usuariosinterrelacionen con los contenidos digitales presentados.

Por esta razón, en el nuevo grado se ha planteado una for-mación multidisciplinar que capacita al graduado para tra-bajar en distintos sectores además de los propiamente in-volucrados en la creación de videojuegos, como son la in-formática y el diseño. Por un lado, el titulado será capaz degestionar y planificar de forma completa proyectos de soft-ware, manejará metodologías de diseño y técnicas para eldesarrollo de guiones y diseño de niveles. Por otro, enten-derá el videojuego dentro de un contexto a la hora de sudiseño y sabrá escoger la estrategia más adecuada para sucreación teniendo en cuenta todos los factores involucradosen el proceso.

En resumen, el titulado deberá conocer y estudiar variaslíneas de conocimiento, mostradas gráficamente en la Figura2, que comprenden:

• Formación básica.• Creación de arte.• Tecnologías de la información y la comunicación.• Programación de videojuegos• Producción de contenidos digitales.

En ”IGDA Curriculum Framework. The Study of Gamesand Game Development” [IGD10] se definen una serie deposibles salidas profesionales adecuadas al perfil de la titu-lación propuesta. Estas son tan variadas que permiten ade-cuarse tanto al diseño y el desarrollo de videojuegos, como atareas más propias de informáticos o auditores de la calidadde los juegos.

Algunos ejemplos de salidas agrupadas en categorías son:

• Analista / Programación.• Diseñador y desarrollador de software de diseño.• Producción.

• Diseño, modelado y animación en 3D.• Guionista de juegos.• Auditor de la calidad del juego.• Operador de redes y comunicaciones.• Otros perfiles

2.1. Objetivos, competencias generales y específicas

Los objetivos de la titulación en Grado en Diseño y De-sarrollo de Videojuegos y Sistemas Interactivos son formarprofesionales adecuados al perfil anteriormente descrito. Noobstante, al estar enmarcada esta titulación en el ámbito tec-nológico, el nuevo grado tiene todas las competencias bási-cas propias de la rama de la Informática. Además, se handefinido una serie de competencias generales y específicas,algunas de las cuales se relacionan a continuación.

En cuanto a las competencias generales, destacan la cre-atividad, la capacidad de gestión de la información, la ca-pacidad de organización y planificación y, por último, la ini-ciativa y espíritu emprendedor, entre otras.

Las más destacadas de las competencias específicas seríanlas basadas en la programación, algoritmia, manejo de técni-cas y herramientas de expresión artística, diseño y creaciónde elementos gráficos y en la generación y análisis de re-cursos expresivos y narrativos aplicados a métodos audiovi-suales.

Estas características específicas muestran que claramentese han creado tres líneas básicas de conocimientos: orien-tadas a la informática, al diseño y a la comunicación. Es-tas líneas han sido las que finalmente han definido el gradopropuesto.

3. Organización del grado en Diseño y Desarrollo de

Videojuegos

El plan de estudios diseñado para la obtención del título deGraduado o Graduada en Diseño y Desarrollo de Videojue-gos se desarrollará a lo largo de cuatro cursos académicos,con 60 créditos ECTS en cada uno de ellos, y con una dis-tribución homogénea del trabajo a realizar en cada semestrede 30 créditos ECTS.

Cada curso consta de 40 semanas lectivas, dividido en 2semestres de 20 semanas incluyendo el periodo de evalu-ación. La equivalencia establecida para el crédito ECTS enel presente plan de estudios es de 25 horas de trabajo del es-tudiantado, lo que supone un volumen anual de trabajo de1.500 horas.

De los 240 créditos ECTS totales de la titulación, se cur-sarán 60 de formación básica y 138 de materias obliga-torias, quedando 18 créditos ECTS de materias optativas.Una de estas materias optativas de 6 créditos ECTS puedeser reconocida por actividades extraacadémicas o materiascursadas en los programas de movilidad. Los 24 créditos


169


Figure 3: Distribución de asignaturas en el Grado en Diseño y Desarrollo de Videojuegos .

restantes se asignarán a las asignaturas de Prácticas exter-

nas y Trabajo de fin de grado.

Todas las asignaturas del Grado son de 6 créditos ECTS,a excepción de las Prácticas externas y el Proyecto de fin

de grado que son de 12 créditos, como se ha dicho anterior-mente. Una asignatura de 6 créditos ECTS tendrá una cargade trabajo de 150 horas (25 horas x 6 créditos ECTS). Laactividad presencial de las materias supondrá entre 45 y 60horas de trabajo y la actividad no presencial entre 90 y 105horas, lo que equivale a un porcentaje de actividad presencialcomprendido entre el 30 y el 40%, excepto en la asignaturaPrácticas externas y Trabajo de fin de grado.

De los 60 créditos ECTS de formación básica, 42 podríanser reconocidos por otros estudios oficiales de grado dentrode la misma rama de conocimiento. Los otros 18 créditosECTS corresponden a dos materias de la rama de Arte y Hu-manidades (Expresión Artística e Idioma Moderno) y una dela rama de Ciencias Sociales y Jurídicas (Empresa). La mate-ria de Idioma Moderno, junto con la de Iniciativa Empresar-

ial, obligatoria de cuarto curso, permite que la propuesta deplan de estudios cumpla con las directrices del Documentode Estilo de la Universitat Jaume I.

3.1. Estructura global del Grado

La Figura 3 muestra qué asignaturas se imparten en cadacurso y semestre del grado propuesto, organizadas por loscampos de conocimiento mostrados en la Figura 2.

El primer curso del grado consta de 60 créditos ECTS de

los que 42 son de formación básica. En este primer curso,el alumnado adquiere conocimientos y desarrolla competen-cias pertenecientes a las materias básicas propias de la ramade Ingeniería y Arquitectura, como la Informática (18 crédi-tos), las Matemáticas (6 créditos), la Física (6 créditos) yla Expresión Artística (6 créditos). También se cursan con-tenidos de la materia Idioma Moderno (6 créditos), en la quese imparte inglés. El resto de créditos ECTS del curso secorresponden con la materia Diseño de Contenido Gráfico (6créditos), la materia Comunicación (6 créditos) y Tecnología

de Computadores (6 créditos).

En segundo curso se incluyen otros 18 créditos ECTSde formación básica que incluyen las materias Empresa (6créditos), Matemáticas (6 créditos) y Expresión Gráfica (6créditos) de la rama de Ingeniería y Arquitectura. El resto delas materias son obligatorias e incluyen: Narrativa Hiper-

media y Análisis de Videojuegos (6 créditos), Algoritmos

y Estructuras de Datos(6 créditos), Diseño de Contenido

Grafico (6 créditos), y un grupo de materias relacionadascon la Informática (Bases de datos, Consolas y Dispositivos

de Videojuegos, Diseño y Desarrollo de Juegos Web) hastacompletar 18 créditos.

El tercer curso se compone de cuatro grupos de mate-rias. El primer grupo incluye cuatro materias relacionadascon la informática (Ingeniería del software, Sistemas Oper-

ativos, Redes y Sistemas Multijugador, y Aplicaciones para

Dispositivos Móviles). En el segundo grupo se encuentranlas materias relacionadas con la programación gráfica: In-

formática Gráfica (6 créditos) y Motores de Juegos (6 crédi-


170


tos). El tercer grupo incluye dos materias del ámbito de lacomunicación (Teoría y Técnica de la Producción y la Re-

alización Audiovisual y Diseño Conceptual de Videojuegos).Finalmente, el cuarto grupo incluye las materias de Diseño

de Contenido Gráfico y Arte del Videojuego.

El cuarto curso consta de tres materias obligatorias: Ini-

ciativa Empresarial, Aspectos Legales de los Videojuegos, eInteligencia Artificial, y otras tres optativas (a elegir de entrelas seis que se ofertarán). Además, se han programado paraeste curso las materias de Prácticas Externas (12 créditos),y Trabajo Fin de Grado (12 créditos).

4. Resultados de la implantación

La titulación se ofertó el curso 2012/13, teniendo una granaceptación entre los estudiantes. La Universitat Jaume I fijócomo límite de matrícula 60 alumnos, aunque el númerode pre-inscritos triplicó ampliamente al número de posiblesadmisiones: una vez finalizado el periodo de matrícula, sehabían matriculado 63 alumnos, quedando 170 en lista deespera. Esto hizo posible fijar una nota de corte por encimade un 8 sobre 10.

El gran éxito de la preinscripción ha demostrado la am-plia demanda de estos estudios que había en España, debidoa que el alumnado que se interesó por el grado y los que fi-nalmente se han matriculado, proceden de gran variedad deprovincias españolas, siendo un 30% del alumnado actual defuera de la comunidad valenciana.

Como conclusión final y como docentes del primer cursode esta titulación, nos gustaría destacar que los alumnos delgrado tienen una formación autodidacta muy amplia en jue-gos, debido a que pertenecen a la generación conocida comonativos digitales. Por esta razón, muestran un gran interés entodos las materias cursadas, analizando su aplicación finalen el diseño y desarrollo de los videojuegos que realizan encada curso.

References

[cas10] Casual gaming market update, 2010. URL: http://www.parksassociates.com. 1

[dig12] Informe anual de los contenidos digitales en españa,2012. URL: http://www.ontsi.red.es/ontsi/es/estudios-informes/. 1

[glo10] Global entertainment and media outlook, 2010. URL:http://www.pwc.com. 1

[IGD10] Igda curriculum framework. the study of games andgame development. international game developers association,2010. 3

[ind10] La industria del desarrollo de videojuegos en españa 2010.resumen actualizado de las oportunidades de negocio del sector.asociación española de empresas desarrolladoras de videojuegosy software de entretenimiento., 2010. URL: http://www.dev.org.es/files/resume_informe_2010.pdf. 2

[int10] Federación europea de software interactivo, 2010. URL:http://www.isfe-eu.org. 1

[ofe13] Contenidos digitales oferta y demanda de profesionalesen contenidos digitales, 2013. URL: http://www.ontsi.red.es/ontsi/es/estudios-informes/. 2

[pro10] Programa ’profesionales digitales’. secretaría de estadode telecomunicaciones y para la sociedad de la información,2010. 2


171


Graphics Systems in a Software Engineering Curriculum

F.J. Melero

Dpto. Lenguajes y Sistemas Informáticos, Universidad de Granada

[email protected]

Abstract

In this paper we describe the approach taken to integrate in a quite natural manner the development of graphics

systems into the Software Engineering specialization branch of the new degree in Computer Engineering. Given

the fact that Computer Graphics is a mandatory course for all students of any of the branches, the Graphics

Systems course is more oriented towards a high level design and programming of software and to serve as a basis

for other existing optional courses (e.g. Animation, Game Programming, Human-Computer Interaction, etc.).

The approach taken during the first year of this course has been quite successful given the results of the survey

passed to students before the end of the semester and hence before the qualifications. The lab works of the whole

semester have been pivoting around the same problem, a solar system, and it has been developed using different

technologies.

Categories and Subject Descriptors (according to ACM CCS): Computers and education [K.3.2]: Computers andInformation Science Education—

1. Introducción

El Grado en Ingeniería Informática por la Universidad deGranada consta de 240 créditos (4 años), y está vigente des-de su publicación en el BOE el 19/02/2011, y tiene comoobjetivo fundamental la formación científica, tecnológica, y

socioeconómica y la preparación para el ejercicio profesio-

nal en el desarrollo y aplicación de las tecnologías de la

información y las comunicaciones (TIC), en el ámbito de la

Informática [UGR].

Este plan de estudios constituye una oferta integrada de laformación necesaria para acceder a la profesión de IngenieroTécnico en Informática, siendo nuestra universidad la únicade toda la comunidad autónoma que ofrece la posibilidad decursar cualquiera de las cinco especialidades de 48 créditos,correspondientes a todas las tecnologías específicas, defini-

das en la OM del BOE de 4 de agosto de 2009, por la que

se establecen los requisitos para la verificación de los títu-

los universitarios oficiales que habiliten para el ejercicio de

la profesión de Ingeniero Técnico en Informática, que tiene

carácter de directriz nacional y que condiciona el 75% (180

créditos) de las Enseñanzas. Estas especialidades son:

Computación y Sistemas Inteligentes

Ingeniería de Computadores

Ingeniería del Software

Sistemas de Información

Tecnologías de la Información

El esquema general del plan de estudios se representa

en la figura 1, donde además se refleja la carga en crédi-

tos ECTS de cada bloque. En el caso concreto de asignatu-

ras relacionadas directamente con la Informática Gráfica y la

Realidad Virtual, podemos encontrar en el plan de estudios

las siguientes:

Informática Gráfica. Obligatoria de Rama (5o semestre).

Sistemas Gráficos. Obligatoria de especialidad Ingeniería

del Software (6o semestre).

Figura 1: Esquema del Plan de Estudios del Grado en In-

formática por la Universidad de Granada


173

F.J. Melero / Graphics Systems in a Software Engineering Curriculum

Nuevos Paradigmas de Interacción. Obligatoria de espe-

cialidad Computación y Sistemas Inteligentes (7o semes-

tre).

Programación Gráfica de Videojuegos. Optativa de espe-

cialidad Ingeniería del Software (7o semestre).

Animación por Ordenador. Optativa de especialidad Inge-

niería del Software (8o semestre).

Sistemas de Información Geográficos. Optativa de espe-

cialidad Sistemas de Información (7o semestre).

2. Sistemas Gráficos como asignatura

En el contexto de este plan de estudios, la asignatura Sis-

temas Gráficos se presenta en el segundo semestre de ter-

cero para todos los alumnos que han optado por obtener la

especialidad de Ingeniería del Software, habiéndose matri-

culado en el curso 2012/2013 un total de 60 alumnos. De

estos matriculados, un buen número de ellos manifestaron

que escogieron la especialidad precisamente por las asigna-

turas de gráficos y videojuegos, si bien es cierto que otros

-los menos-, manifestaron su desagrado por tener que ver al-

go según su criterio ajeno a la especialidad, la Ingeniería del

Software. Previendo de antemano estos intereses y motiva-

ciones contradictorios del alumnado, se estructuró la asigna-

tura para intentar cubrir la expectativas de la mayor parte de

ellos, tanto aquellos interesados en profundizar en los grá-

ficos por ordenador como aquellos que quieren orientar su

carrera hacia el análisis y diseño de proyectos informáticos,

y alcanzar los objetivos (como resultados del aprendizaje)

planteados en la ficha de la asignatura, a saber:

Conocer los métodos de representación de grandes mode-

los geométricos y métodos de indexación espacial.

Conocer aplicaciones en las que se pueden generar gran-

des modelos y saber valorar sus requerimientos.

Conocer los fundamentos de la digitalización 3D.

Conocer el concepto de modelo volumétrico y el proceso

de generación de modelos volumétricos.

Saber diseñar grafos de escena como representación en

aplicaciones gráficas y utilizarlos sobre un motor gráfico.

Conocer los fundamentos de la realidad virtual.

Saber diseñar aplicaciones de procesamiento de modelos

médicos.

Saber usar herramientas de visualización de volúmenes.

Conocer el funcionamiento de la GPU.

Saber diseñar, implementar y evaluar algoritmos en GPU.

Para las clases teóricas se ha redactado una guía didáctica,

con parte de los apuntes de la asignatura y una serie de ejer-

cicios intercalados que los alumnos debían entregar antes de

la lección, para intentar motivar a la lectura previa del tema

y generar un posterior debate en clase. Esta entrega de acti-

vidades teóricas previas consistían bien en la resolución de

ejercicios, contestación a preguntas de forma razonada, aná-

lisis de artículos científicos clásicos o bien completar ciertos

epígrafes de los apuntes que intencionadamente se dejaban

en blanco, usando para ello la bibliografía recomendada.

El temario teórico de este primer año de implantación de

la asignatura ha consistido en los siguientes temas:

1. Aplicaciones de los sistemas gráficos. Tema introductorio

motivador de las situaciones en las que pueden requerir

un sistema gráfico.

2. Modelado de sólidos. Indexación espacial. Gestión de

grandes modelos. Introducción a la representación de

modelos poligonales y resolución de distintas problemá-

ticas que se pueden presentar.

3. Modelado y visualización de volúmenes. Introducción a

la representación y visualización de modelos volumétri-

cos y sus aplicaciones.

4. Grafos de escena. Aproximación teórica y estudio de di-

versas librerías basadas en grafos de escena, haciendo es-

pecial hincapié en OpenSceneGraph.

5. Sistemas Gráficos en Web. Análisis de la problemática de

la visualización 3D en los navegadores web y estudio de

soluciones existentes: WebGL a bajo nivel y X3D a alto

nivel. Introducción a otras tecnologías propietarias.

6. Altas prestaciones gráficas. Introducción al pipeline pro-

gramable.

Figura 2: Ejemplo de sistema solar realizado por un

alumno.

3. Un Sistema Solar como eje de las prácticas

El mayor peso de la evaluación de la asignatura lo tienen

las prácticas de laboratorio, suponiendo un 55% de la nota

final más un 20% si se realiza una práctica optativa.

Con idea de que los alumnos, al finalizar el curso, tuvie-

sen una perspectiva global de diversas tecnologías con las

que representar un mismo sistema gráfico y poder evaluar

las posibilidades y dificultades de cada una de ellas, el pro-

blema ha sido el mismo durante todo el curso: un sistema

solar en movimiento, con sus planetas y satélites, y una nave

espacial paseando por el mismo. Se ha desarrollado al me-

nos con tres tecnologías: OpenGL, OpenSceneGraph y X3D,

más una cuarta opcional Java3D. De esta forma se preten-

día ir ascendiendo en nivel de abstracción e ir reduciendo la

complejidad de desarrollo del mismo sistema. En la figura 2

se muestra la captura de pantalla del trabajo de uno de los

alumnos.

3.1. C++ y OpenGL

El primer ejercicio que se plantea es el desarrollo des-

de cero de un sistema solar, con todas las funcionalidades y


174


características que se supone que los alumnos saben imple-

mentar tras su paso por la asignatura de Informática Gráfica,

como pueden ser el tema de animación, jerarquías, control

de cámara, materiales y texturas, iluminación, etc. Además,

era condición imprescindible el desarrollo en C++ y con un

diagrama de clases, a desarrollar por el alumno, coherente y

correcto desde el punto de vista de la Ingeniería del Softwa-

re. Esto implicaba desarrollar la clases Object3D, Textura,

Camara, Luz, y otras para la gestión de elementos gráficos,

además de las clases Universo,Planeta, GrupoOrbital, Sate-

lite y otras similares para la gestión de la escena.

El requisito de exigir un buen diseño del sistema ha resul-

tado algo más laborioso de lo esperado, pues era la primera

vez que afrontaban el diseño de un sistema gráfico desde

cero (en Informática Gráfica partían de esqueletos ya pro-

gramados, en C o C++ según la práctica). Además, a pesar

de la gran cantidad de créditos de Análisis y Diseño de Soft-

ware, resultaba patente durante el desarrollo de la práctica

la falta de experiencia en diseño, especialmente en cuanto

a la delegación de responsabilidades en las clases. La la-

bor del profesorado ha consistido en ir guiando el diseño

con unos criterios de calidad suficientes como para que el

alumno comprendiera la importancia de una buena jerarquía

de clases, con sus herencias y relaciones bien definidas.

El hecho de obligar a un adecuado diseño de clases ha

conseguido que los alumnos maduren y perfeccionen los

conceptos adquiridos en Informática Gráfica, y delimitar de

forma natural la representación y la visualización. También

ha destapado algunas carencias en cuanto a conceptos bási-

cos que se intentarán mejorar en coordinación con los profe-

sores de Informática Gráfica.

Esta práctica la entregaron 36 de los 60 alumnos, si bien

varios de los que no la han entregado han manifestado su in-

tención de entregarla en septiembre para obtener mejor nota.

3.2. OpenSceneGraph

Para la segunda práctica se utiliza OpenSceneGraph

(OSG). En una primera sesión se analiza en clase el códi-

go del ejemplo que incluye OSG de un sistema solar, donde

está todo en una única clase SolarSystem. Los propios alum-

nos fueron capaces de determinar que a nivel técnico es un

ejemplo válido, en tanto en cuanto muestra el uso de varios

nodos de OSG, pero a nivel de diseño es muy mejorable.

Durante el mes siguiente, los alumnos comenzaron, de

nuevo desde cero, el desarrollo de un sistema solar pero es-

ta vez usando OSG y añadiendo con respecto a los requi-

sitos de la práctica anterior un sistema de partículas y un

movimiento de la nave espacial de planeta a planeta. Se les

planificó semana a semana qué nueva funcionalidad debían

incluir, explicando en teoría con anterioridad los nodos que

intervienen en dicha funcionalidad.

También se ha forzado a seguir un diseño de clases correc-

to, si bien la totalidad las clases heredaban de algún elemento

de OSG. En esta práctica, los alumnos han podido ver como

en la práctica anterior se habían montado su propio grafo

de escena sin darse cuenta. La han entregado 40 de los 60

alumnos.

3.3. Java3D

Dado que había algunos alumnos que no habían cursa-

do Informática Gráfica, o bien no obtuvieron los resultados

esperados en la práctica uno, o simplemente querían optar a

una mayor nota, se ofreció la posibilidad de también desarro-

llar el mismo sistema con Java3D, ya que en la parte teórica

de la asignatura se les pedía que completaran sus apuntes de

la explicación de OSG con sus paralelos en Java3D. De los

comentarios informales en clase, la mayoría de los alumnos

destacan la mayor facilidad de desarrollo con Java3D, así

como la mayor cantidad de documentación sobre el sistema.

3.4. Gráficos en la web: X3D

Una posible aproximación al desarrollo de sistemas gráfi-

cos en la web hubiera sido la programación con WebGL, pe-

ro su uso suponía un esfuerzo desmesurado para el porcenta-

je final de la nota, ya que los alumnos desconocían completa-

mente el pipeline programable, pues en Informática Gráfica

se ve OpenGL 2.0, con el pipeline fijo. Por tanto, se optó

por el uso de X3D [ISOIEC06] con X3DOM (http://www.

x3dom.org) como librería de renderizado, y el uso de X3D-

Edit (http://savage.nps.edu/X3D-Edit/) como herra-

mienta de desarrollo para la validación del XML. En apenas

tres semanas los alumnos han sido capaces de desarrollar un

sistema solar, con ciertas limitaciones impuestas por la falta

de soporte de X3DOM a alguna de las funcionalidades del

estándar X3D, como puede ser el nodo TouchSensor.

En esta práctica hubiera sido deseable profundizar en

la gestión dinámica del grafo de escena con Javascript y

JQuery, pero debido a la falta total de conocimientos de estas

tecnologías por parte de los alumnos, se desechó esta posi-

bilidad, centrándonos sólo en los aspectos básicos de inter-

acción.

Esta práctica ha sido entregada por 40 de los 60 alumnos.

4. Evaluación de la asignatura por alumnos

Unas dos semanas antes de la finalización del semestre

se pasó a los alumnos una encuesta online anónima para la

evaluación de su actitud personal, la asignatura y la labor

del profesorado. Dicha encuesta fue respondida por 38 de

los 70 alumnos (62%). Era una encuesta larga, con más de

50 preguntas, algunas con texto libre para desarrollar.

Con respecto a la motivación antes y después de cursar la

asignatura, se puede observar en la figura 3 que sigue siendo

alta, a pesar de que la inmensa mayoría de los alumnos califi-

can la asignatura como más difícil que el resto de asignaturas


175


Figura 3: Grado de motivación de los alumnos antes (izda.) y después (dcha.) de cursar Sistemas Gráficos.

Figura 4: Grado de dificultad de la asignatura con respecto

a otras del grado.

Figura 5: Respuestas a la pregunta: El contenido de la asig-

natura está actualizado y será útil para mi vida profesional

(1: nada de acuerdo; 5. Muy de acuerdo)

Figura 6: Respuestas a: Dada tu experiencia con la asigna-

tura, y antes de ser calificado. ¿Recomendarías la asigna-

tura a tus compañeros de posteriores promociones? (1: No,

bajo ningún concepto; 5: Absolutamente)

cursadas (ver figura 4). En el texto libre que se les ofrecía a

la pregunta sobre la dificultad, más de la mitad de los alum-

nos matriculados coinciden en la falta de asentamiento o de

conocimiento de OpenGL y conceptos gráficos que se su-

ponían adquiridos al comienzo de la asignatura. Para cursos

posteriores, se ha planificado ya una Práctica 0 donde se re-

pasen los conceptos necesarios de Informática Gráfica.

Otro dato significativo es la impresión sobre el grado de

actualización del temario y contenidos prácticos de la asig-

natura y su relación con la práctica profesional futura, mos-

trado en la figura 5. Estas respuestas son coherentes con la

percepción final de la asignatura, pues a pesar de la difi-

cultad manifestada, 30 de los alumnos que respondieron a

la encuesta la recomendarían a compañeros de promociones

posteriores, según se ve en la gráfica de la figura 6.

5. Conclusiones

La configuración de una asignatura en su primer año de

impartición no deja de ser una tarea extremadamente labo-

riosa, y no exenta de contratiempos. Sin embargo, la respues-

ta de los alumnos en cuanto a participación en clase (una me-

dia del 60% de asistencia) y a las respuestas proporcionadas

en las encuestas de calidad indican que el trabajo va bien

encaminado. En el momento de redacción de este artículo

no estamos aún en disposición de poder evaluar los resulta-

dos académicos, y dado que las encuestas anteriormente no

permiten sacar conclusiones sobre el impacto real de la asig-

natura desde el punto de vista formativo, habrá que esperar

a que el plan de estudios avance para valorar adecuadamente

los contenidos y aprovechamiento de los mismos por parte

del alumnado. Para próximos cursos se desplazará parte de

las explicaciones de las clases de teoría a las de prácticas,

para fomentar la asistencia a éstas, y se complementará el

temario teórico con fundamentos de programación de GPU.

Otro aprendizaje transversal adquirido por los alumnos,

derivado del uso de tecnologías libres en la asignatura, es

la importancia de la documentación y el mantenimiento del

software libre, ya que han tenido ciertos problemas con la

documentación de OSG y X3DOM.

Bibliografía

[ISOIEC06] INTERNATIONAL STANDARDS ORGANIZATION I.,INTERNATIONAL ENGINEERING CONSORTIUM I.: Exten-sible 3D (X3D) Part 1, ISO 19775: Architecture and basecomponents, Scene Access Interface (SAI). 2006. URL:www.web3d.org/x3d/specifications. 3

[UGR] UGR: Grado en Ingeniería Infor-mática. Universidad de Granada. URL:http://grados.ugr.es/informatica/. 1


176

CEIG – Spanish Computer Graphics Conference (2013) M. Carmen Juan and Diego Borro (Editors)

!

© The Eurographics Association 2013.

!

A Computer-Based Learning Game for Studying History

J.F. Martín-SanJosé1, M.C. Juan1, M. Giménez2 and J. Cano2

1Instituto Universitario de Automática e Informática Industrial, Universitat Politècnica de València, Camino de Vera, s/n. 46022 Valencia, Spain

2Escola d'Estiu, Universidad Politécnica de Valencia, Camino de Vera, s/n. 46022 Valencia, Spain

Abstract New types of visualization and interaction are changing our daily life. The 3D visualization of applications or

movies lets the user the opportunity to live a similar experience to the real world. When combining the 3D

perception with the use of natural interaction, the applications could become near real. In this paper, a

computer graphic system for learning a period of history is presented. It combines autostereoscopic

visualization and Natural User Interfaces (NUI) for interaction. A study in which participated one hundred

and twenty seven children from 7 to 11 years old was carried out. From the results, we can affirm that the

children appreciated the combination of 3D + NUI. The children gave high scores when asked about

satisfaction and interaction. They scored the game as a mean of 9.61 over 10. When asked about the 3D

perception, children gave also a high score, meaning that the depth perception was achieved perfectly.

Therefore, the use of autostereoscopic visualization and NUI is a promising combination that can be exploited

not only for learning, but also in many other areas.

Categories and Subject Descriptors (according to ACM CCS): H.5.1 [Computer Graphics]: Multimedia Information Systems—Artificial, augmented, and virtual realities

1. Introducción

Los sistemas de visualización están cambiando muy rápido en los últimos años. Hoy en día, es frecuente encontrar películas que han sido grabadas expresamente para ser reproducidas con la capacidad de transmitir una sensación de tridimensionalidad. También se dispone de dispositivos que permiten visualizar los contenidos en 3 dimensiones: videoconsolas, teléfonos móviles o televisiones, entre otros. Esto permite que se puedan ver los elementos de la escena que se esté viendo como si se tratara del mundo real, con la misma percepción de profundidad. Que el mundo digital se parezca cada vez más al mundo real en la actualidad es casi un hecho. Además de la visión 3D, se cuenta con una interacción con los sistemas de forma natural, donde el propio cuerpo es el encargado de transmitir las órdenes a las aplicaciones, como si de un objeto de la vida real se tratara y se estuviera manipulando. Este tipo de interacción se conoce como Interfaces Naturales de Usuario (NUI).

Hasta la fecha, se han realizado algunos estudios sobre pantallas autoestereoscópicas. Por ejemplo, los trabajos de Wang se basan en pantallas construidas combinando lentes lenticulares [WWL08] [WWZ*11] [WWLL12]. Maimone et al. [MBPF12] presentaron un sistema de telepresencia autoestereoscópica utilizando cámaras de profundidad. Kim

et al. [KLY*12] propusieron una plataforma autoestereoscópica para compartir datos entre dos o más usuarios utilizando dos pantallas, esta plataforma también utilizaba la tecnología NUI mediante el dispositivo Kinect y la biblioteca OpenNI. Nocent et al. [NPB*12] utilizaron visualización autoestereoscópica para crear una plataforma de inmersión para WorldWideWeb, donde también utilizaron dispositivos de seguimiento del usuario. En el trabajo de Lee et al. [LHP*07] se construyó una pantalla de 15.1” LCD de alta resolución que era capaz de generar 36 vistas, la cual permitía a un número mayor de usuarios compartir la experiencia tridimensional. Taherkhani y Kia [TK12] presentaron una pantalla con seguimiento para los ojos utilizando un monitor LCD y la técnica de barras de paralaje.

En cuanto a NUI, Fishkin [Fis04] argumentó que facilitan la aceptación de las aplicaciones por parte de los usuarios, aunque las personas mayores son más reticentes y requieren mayor esfuerzo para introducirse en esta tecnología. Las NUI se han incorporado a un gran número de aplicaciones, como pueden ser videoconferencias con percepción de profundidad [DYIR11], juegos multitáctiles para introducir a las personas mayores en las nuevas tecnologías [CBO*12], interactuar con objetos 3D para estudiar cómo las personas realizamos los movimientos de las manos

177

J.F. Martín-SanJosé, M.C. Juan, M. Giménez & J. Cano / A Computer-Based Learning Game for Studying History


cuando manipulamos objetos [CH12], o la ayuda a la navegación de Google Earth [KBW*11]. Anacleto et al. [AFS12] presentaron un sistema para convertir un proceso basado en documentos en papel en un nuevo proceso utilizando NUI en un hospital de enfermos crónicos. Otros trabajos se han centrado en la rehabilitación cognitiva [CCC11] o física [CCH11] [LCS*11].

El objetivo de nuestro estudio es comprobar cómo afecta a la satisfacción de niños de primaria la utilización de estas tecnologías combinadas con el proceso de aprendizaje de un tema determinado: la Línea del Tiempo. Nuestra hipótesis es que a los niños les gustará el sistema y otorgarán unos valores altos a las preguntas de satisfacción.

Este artículo está organizado de la siguiente forma: la Sección 2 detalla el hardware y software utilizados para implementar la aplicación, la implementación propiamente dicha y una descripción de los contenidos del juego. La Sección 3 presenta la descripción del estudio realizado. La Sección 4 muestra los resultados obtenidos. Por último, la Sección 5 termina el artículo con las conclusiones. En el Anexo A se muestra el cuestionario utilizado en la fase de validación del sistema.

2. Desarrollo

2.1. Hardware utilizado

El sistema se ha desarrollado en un equipo con los siguientes componentes hardware:

! Procesador Intel Core i7-2600 CPU @ 3.40GHz

! Tarjeta gráfica NVIDIA GeForce GT545 Aparte de los componentes internos del ordenador, se han

utilizado otros componentes hardware especiales, que son los siguientes:

! Un Microsoft XBOX Kinect con una resolución de 640×480 pixels.

! Una pantalla autoestereoscópica modelo XYZ3D8V46 de 46 pulgadas y resolución full HD (1920×1080).

Para generar la visualización autoestereoscópica, este tipo de pantalla utiliza un método llamado LCD/Lenticular [OSM98] mediante el cual se pueden visualizar 8 vistas distintas a la vez. Este hecho implica que alrededor de la pantalla existirán 8 puntos desde los cuales los usuarios podrán percibir la sensación de profundidad sin necesidad de llevar dispositivos externos como gafas o cascos. 2.2. Implementación El sistema ha sido íntegramente desarrollado en el lenguaje de programación C++, al igual que todas las bibliotecas que se han utilizado en su implementación.

2.2.1. Parte gráfica Al tratarse de un videojuego, la parte de visualización de los elementos es una de las más importantes. Para esta parte se ha utilizado OpenSceneGraph (OSG) en su última versión (3.0.1). Este grafo de escena open source permite generar de forma sencilla las escenas de cada uno de los mini-juegos y cambiar entre ellas. En cada una de estas escenas se cargan los distintos modelos 3D de los elementos que se van a visualizar. Todos ellos están modelados en formato .osg y .osgt para una compatibilidad máxima. La mayoría de ellos han sido modelados o modificados con la herramienta de modelado Blender con el fin de eliminar la fuente de luz de cada uno de ellos, ya que solamente debe existir la fuente de luz del propio grafo de escena para lograr una visualización correcta. Con el fin de poder cambiar el tamaño de los elementos dentro del sistema, se activó dentro de OSG el reescalado de las normales de los modelos 3D, con lo que se consigue siempre un renderizado correcto al hacer los elementos más grandes o más pequeños. Otra característica de OSG es el agrupamiento de elementos en nodos osg::Group que pueden activarse o desactivarse de forma sencilla mediante los nodos osg::Switch. De esta forma, se puede hacer el cambio entre cada uno de los mini-juegos de los que se compone nuestra aplicación. Cada uno de los mini-juegos consiste en un nodo group desde el que cuelgan cada uno de los elementos de la escena (imágenes y objetos 3D). En otro nodo group están los botones con los que se interactuará, con el fin de poder desactivarlos de forma sencilla. Todos los nodos del grafo de escena se han manejado mediante punteros inteligentes del tipo osg::ref_ptr, los cuales realizan una gestión más óptima de los recursos dentro del grafo de escena.

Uno de los puntos fuertes del sistema desarrollado es que los usuarios pueden verse a sí mismos en la pantalla. Esta imagen debe permanecer por debajo de todos los elementos que se muestran en todo momento. Para ello, es necesario deshabilitar el buffer de profundidad al dibujar el fondo. El resultado es que cuando se dibujan todos los demás elementos de la escena, no tienen en cuenta la profundidad a la que se encuentra el fondo, con lo que siempre se dibujarán por encima. La imagen capturada por la cámara del Kinect se encapsula dentro de un objeto osg::ImageStream, que pasa a dibujarse en una cámara ortho2D.

En lo referente a las imágenes del juego, todas están codificadas en formato .png, por lo que se puede aprovechar la ventaja que este formato proporciona en cuanto a transparencias y aprovechamiento de espacio. Para conseguir el resultado esperado se debe activar el blending para las imágenes en las que se vaya a requerir

178

J.F. Martín-SanJosé, M.C. Juan, M. Giménez & J. Cano / Learning History Using a Computer Graphic System


transparencia, estableciendo el valor de su atributo GL_BLEND a osg::StateAttribute::ON. Para visualizar las imágenes en el juego, primero debemos cargar el fichero de imagen en un objeto de la clase osg::Image, después se coloca en una textura osg::Texture2D y ésta es la que se aplica al quad que será renderizado. Con el propósito de aumentar la velocidad de ejecución del video juego se ha desactivado el reescalado que OSG aplica a las texturas antes de renderizarlas. Para desactivar este reescalado se realiza una llamada al método setResizeNonPowerOfTwoHint(false) en cada objeto de textura.

La actualización del grafo de escena se realiza mediante la implementación de una máquina de estados que comprueba en cada momento el estado actual del juego y cambia al siguiente estado cuando la interacción lo requiere. En el cambio de estado, desde el nodo switch desactivamos el grupo actual y activamos el siguiente. De esta forma, desaparecen todos los elementos de la escena actual y se visualizan los nuevos. Además, se crea un nuevo QUAD donde se reproducirá un vídeo con la explicación de la siguiente etapa.

Para que la actualización se realice de forma correcta, el bucle principal de OSG debe implementarse mediante un bucle, llamando constantemente al método osgViewer::frame() en vez de la forma común que consiste una llamada al método osgViewer::run(). Hecho que permite poder ejecutar código entre frame y frame con el fin de actualizar audios, vídeos, la imagen obtenida por la cámara del Kinect, la sincronización multimedia o la escritura en el fichero de log.

2.2.2. Parte autoestereoscópica Para conseguir la sensación de profundidad por parte de los usuarios se utilizó la biblioteca MirageSDK que ha sido desarrollada expresamente para la pantalla utilizada en el presente trabajo, en la cual se debe establecer una frecuencia de actualización de 59Hz para un correcto visionado. Esta biblioteca ejecuta un sistema basado en shaders mediante el cual es capaz de generar las 8 vistas en las que se enviará una imagen a cada ojo del usuario, permitiendo de esta forma percibir el 3D. Para utilizar este SDK, en primer lugar, se debe establecer el filtro adecuado para cada tipo de pantalla (en nuestro caso mediante la realización de una llamada al método osgStereo::loadXYZ46Filter()). A continuación se debe crear una instancia del objeto osgStereo::MultiViewNode proporcionado por el SDK e incluirlo dentro del grafo de escena. Todos los elementos que cuelguen de este nodo multivista se visualizarán en 3D cuando la aplicación se ejecute en la pantalla adecuada para

el filtro cargado anteriormente (en nuestro caso la pantalla XYZ de 46 pulgadas).

Una forma de aumentar la percepción tridimensional de los elementos es dotarlos de movimiento. Para ello, los modelos 3D incluyen animaciones de giros y balanceos sinusoidales que facilitan esta percepción. Un ejemplo de estos movimientos se encuentra en los botones de interacción con el usuario, los cuales se balanceaban de forma suave. Para activar la animación de los objetos 3D desde el propio código fuente, utilizamos la clase osgAnimation::BasicAnimationManager. Hecho que permite realizar una gestión de las diferentes animaciones que tenga el objeto 3D y activarlas cuando se necesite mediante una llamada al método playAnimation().

Con la finalidad de no interferir entre la generación de las 8 vistas autoestereoscópicas y el dibujado de la imagen del mundo real en el fondo de la aplicación, este dibujado se realiza en un hilo de ejecución aparte, extendiendo la clase OpenThreads::Thread. De esta forma se evita que el fondo se dibuje 8 veces, con el consiguiente ahorro de recursos del sistema.

2.2.3. Parte multimedia La utilización de ficheros multimedia es algo imperativo para la realización de un sistema de este tipo. A parte de la visualización en 3D, nuestro sistema es capaz de reproducir audio y video para proporcionar una experiencia de usuario más rica y aumentar la satisfacción del mismo.

Los sonidos del juego están presentes en cada momento que el usuario interacciona con el sistema, ya sea para informarle de lo que debe realizar a continuación o de si la acción ejecutada es correcta o incorrecta. Para ello, se han utilizado ficheros de audio en formato .wav, los cuales contenían grabaciones de voces reales para cualquier situación contemplada en el juego. Para la reproducción de los ficheros de audio se ha utilizado la biblioteca FMOD, la cual se ha encapsulado dentro de una clase Sonido utilizando el patrón de diseño Singleton. De este modo, se puede utilizar la clase para reproducir sonidos desde cualquier parte del código sin necesidad de generar instancias innecesarias de la clase.

Todos los ficheros de video se han utilizado en formato .mpeg y se han normalizado a 29 frames por segundo y el sonido a 44kHz para una correcta visualización y audición al reproducirse dentro del grafo de escena. Para la reproducción de los ficheros de video se ha utilizado la biblioteca ffmpeg. Al utilizar material audiovisual es muy importante que las dos partes, audio y video, se reproduzcan de manera sincronizada. Cuando se usa este tipo de ficheros en OSG se tiene por un lado el video en la clase osg::ImageStream y el audio en la clase

179



osg::AudioStream. Para lograr la sincronización entre ambas se ha utilizado la biblioteca Simple DirectMedia Layer (SDL).

2.2.4. Parte de interacción La interacción de los usuarios con nuestro sistema se realiza mediante NUI. Esta tecnología se basa en que el usuario no necesita de ningún dispositivo externo a su propio cuerpo para controlar el sistema. Gracias a la utilización de dispositivos dotados de cámaras como el Kinect se puede conseguir este objetivo. Para poder combinar este modo de interacción con todo lo mostrado anteriormente, es necesario contar con una biblioteca compatible tanto con el lenguaje de programación de los demás componentes utilizados como con el grafo de escena utilizado, en este caso, OSG. Para ello se ha utilizado la biblioteca Open Natural Interaction (OpenNI) mediante su wrapper para C++, XnCppWrapper, y los drivers de Kinect para Windows.

Esta biblioteca dispone de métodos para detectar usuarios, realizar la calibración y el seguimiento de los mismos, detección de poses y detección de cada uno de los puntos del cuerpo del usuario. Estos puntos son los llamados SkeletonJoints, en nuestro caso se han utilizado los correspondientes a las manos; de esta forma, se conoce en todo momento dónde se encuentran las manos de los usuarios. Sin embargo, este punto es en el mundo real, y el sistema debe conocer el punto en el mundo proyectado (dibujado como fondo de la aplicación). Para realizar esta conversión OpenNI dispone del método ConvertRealWorldToProjective(). De esta forma el sistema es capaz de colocar unos cursores en forma de manos en las propias manos de los usuarios, lo que les da una retroalimentación visual y les ayuda a interactuar de forma más sencilla e intuitiva.

La detección de usuarios se controla mediante el uso de callbacks que OpenNI permite establecer cuando un nuevo usuario entra dentro del rango de visión de la cámara, cuando un usuario se pierde, cuando algún usuario se encuentra en alguna pose conocida o cuando se quiere comenzar el proceso de calibración. Mediante estas callbacks se pueden establecer las acciones que se deben ejecutar cuando una de estas acciones sucede.

La calibración de los usuarios que van a jugar es un paso casi obligatorio, ya que OpenNI permite detectar varias personas a la vez, y para mantener la estabilidad del sistema es muy importante decirle quiénes son los que van a jugar para que solamente realice el seguimiento de ellos. Para la calibración de los usuarios se utiliza la pose Psi, llamada así por su parecido a la letra griega !, que consiste en estar de pie recto y con los brazos levantados en un ángulo de 90º.

2.2.5. Otros Para mantener un control de las acciones que realizan los usuarios al interactuar con la aplicación, se creó un sistema de Log, el cual redirige a un fichero una línea de texto con información como el tiempo transcurrido desde el inicio de la ejecución, el estado actual del juego, el usuario que ha ejecutado la acción y si esa acción ha sido correcta o errónea. Para guardar los datos, se utiliza el tipo de ficheros .csv, los cuales se generan con un nombre de fichero con el formato dia-mes-hora.minuto.csv, de esta forma se evita la sobreescritura de datos y se consigue un control fechado sobre los mismos. La escritura del fichero se realiza mediante la instanciación de la clase std::ofstream, a la cual se redirigen cadenas de texto del tipo std::stringstream, con la finalidad de concatenar cada uno de los elementos a escribir de una forma sencilla y cómoda.

Para controlar el tiempo transcurrido desde el inicio de la aplicación se cuenta con un temporizador encapsulado en la clase osgTimer de OSG. Esta clase permite iniciar el contador al arrancar la aplicación mediante el método setStartTick(), y obtener los segundos transcurridos utilizando la función time_s() con una precisión de 3 decimales.

Debido a la duración de los ficheros de audio y video, no es aconsejable realizar la depuración del código esperando a que se reproduzcan completamente. Para ello se ha introducido un atajo de teclado mediante el cual se puede detener la reproducción de los ficheros multimedia permitiendo saltar a la siguiente fase del juego. Para ello, se utiliza la clase de OSG osgGA::GUIEventHandler, la cual permite tener un control sobre los eventos que se produzcan. De esta forma, al pulsar la barra espaciadora del teclado se detiene la reproducción, haciendo de la depuración una tarea menos costosa.

El funcionamiento interno del cambio entre los distintos mini-juegos se ha realizado mediante una máquina de estados, lo que permite pasar de un estado a otro, e incluso iniciar el juego en un estado determinado. Gracias a dicha máquina de estados se puede comenzar el juego en la Edad Media, por ejemplo, y no tener que ejecutar el juego entero para depurar cada época histórica.

2.3. Descripción del juego

El juego consiste en recorrer la línea del tiempo mediante la realización de un viaje a través de las cinco épocas históricas que han marcado la historia: la Prehistoria (los hombres de las cavernas), la Edad Antigua (en tiempos de los romanos), la Edad Media (castillos y caballeros medievales), la Edad Moderna (tiempo de los navegantes y

180



Cristóbal Colón), y la Edad Contemporánea (época en la cual vivimos en la actualidad).

El sistema está dividido en varios mini-juegos. Cada uno de los cuales pertenece a una época histórica. Aparte de estos mini-juegos, el sistema consta de explicaciones basadas en videos y audios para reforzar la adquisición del conocimiento por parte de los niños. Por ejemplo, al inicio de cada época histórica, se reproduce un video con los acontecimientos e informaciones más relevantes de la época a la que se va a jugar, y esa misma información es la que necesitarán los niños para completar las actividades que cada mini-juego les propondrá. Para afianzar el sentimiento de los niños de sentirse informados sobre lo que deben hacer en cada etapa del juego, se dispone de un personaje a modo de avatar llamado “El señor Tic Tac”, el cual está modelado con la forma de un reloj despertador. Este personaje es el encargado de informar a los niños sobre las acciones que deben llevar a cabo en cada mini-juego. Por lo que, el juego por si solo es suficientemente intuitivo como para que los niños sepan utilizarlo la primera vez. No es necesario leerse el manual de instrucciones o recibir instrucciones de un monitor. Solamente con la información recibida desde el propio juego los niños son capaces de interactuar con él sin problemas.

La interacción con el sistema se realiza mediante botones. Dichos botones se activan colocando la mano encima, o utilizando la técnica drag&drop (arrastrar y soltar), mediante la cual los niños pueden coger literalmente elementos del juego y llevarlos donde se requiera. Los botones están colocados a los lados de la pantalla, como se muestra en la Figura 1, con el fin de evitar que sean activados de forma accidental. En esta Figura también se puede observar el modelo 3D empleado, que es un cursor en forma de mano. Dicho cursor se coloca sobre las manos de los usuarios como sistema de retroalimentación. Los botones también están separados suficientemente como para que no se active por error un botón cuando se pretende activar otro. Inicialmente todos los botones presentan una textura de color beige, indicando que el botón no ha sido activado. Una vez que se activa cada botón, dependiendo de si en ese estado del juego ese botón representa una acción correcta o incorrecta, el color de su correspondiente textura cambiará a verde o a rojo, respectivamente. Se puede apreciar la diferencia entre cuando el juego tiene desactivada la visión autoestereoscópica (Figura 1a) y cuando la tiene activada (Figura 1b). Activando la visión autoestereoscópica y visualizando la aplicación en nuestra pantalla XYZ se puede apreciar la sensación de profundidad, pero en una pantalla normal el resultado de la visualización del juego es el de la Figura 1b. Además, como los propios usuarios se ven de fondo en la pantalla, les resulta mucho más fácil jugar gracias a la retroalimentación visual.

a) Visión autoestereoscópica desactivada

!

b) Visión autoestereoscópica activada!Figura 1: Disposición de los botones en el área del juego

Al inicio del juego se realiza la calibración de los niños.

Proceso que permite reconocer solamente a aquellos niños que van a jugar. Una vez que los jugadores están calibrados, empieza el juego. Mediante un video explicativo, se introduce a los niños en la etapa prehistórica, donde pueden ver a hombres de las cavernas pintando pinturas rupestres y otras costumbres de la época. Al finalizar el video explicativo comienzan los dos mini-juegos de esta época. En el primero deben localizar dos pinturas rupestres y colorearlas con los colores que se utilizaban en la Prehistoria. En el segundo, deben elegir un color de esos mismos y colocar literalmente su mano en la cueva prehistórica, donde se queda impresa con el color elegido. Al acabar, los jugadores deben elegir la siguiente edad histórica: la Edad Antigua. En este mini-juego, los niños deben construir una antigua ciudad romana, colocando construcciones como un acueducto, un anfiteatro, un circo romano o una calzada romana. Una vez que están todos los elementos de la ciudad romana colocados, se les pregunta acerca de la utilidad de estos edificios. Al terminar, se pasa a la Edad Media. En esta etapa del juego, los niños deben construir un castillo medieval mediante una serie de preguntas acerca de las distintas partes del mismo. Al ir contestando correctamente a cada una de las preguntas, se van colocando las partes correspondientes, hasta que, al final, los niños contemplan un castillo medieval completo en 3 dimensiones construido por ellos mismos. La siguiente época histórica es la Edad

181



Moderna. En esta etapa, se les explica la época de los navegantes y, en concreto, el descubrimiento de América por Cristóbal Colón. El mini-juego de esta etapa consiste en encontrar 3 elementos que se utilizaban en la navegación con barcos en la Edad Moderna de entre 6 elementos distintos. Cuando encuentran uno correcto, se muestra un video explicativo con la utilidad y funcionamiento de dicho elemento, en concreto, la brújula, el astrolabio y los mapas. Al finalizar esta etapa, se llega a la Edad Contemporánea, donde se explica que es la época en la que vivimos actualmente. Para terminar, se propone un último mini-juego en el que deben construir la línea del tiempo como si de un puzzle se tratara, cogiendo las distintas épocas históricas como si fueran las piezas del puzzle y colocándolas en el lugar correcto de la línea del tiempo.

3. Descripción del estudio 3.1. Participantes

En este estudio han participado un total de 127 niños, de los cuales 54 fueron chicos (42.5%) y 73 fueron chicas (57.5%). Todos ellos tenían edades comprendidas entre 7 y 11 años de edad, correspondientes a los cursos de entre segundo y quinto de primaria. La edad media fue de 8.46 ± 1.02 años de edad. Todos los niños que participaron en el estudio asistían a distintas escuelas de verano de diferentes localidades. 3.2. Medidas

En este estudio se ha utilizado un cuestionario que constaba de 13 preguntas en las que se pregunta por la satisfacción subjetiva obtenida al jugar y la usabilidad que percibieron los niños al utilizar el sistema. La totalidad de preguntas utilizadas se muestra en la Tabla 4, donde además se muestran las diferentes opciones que los niños podían responder. 3.3. Procedimiento

Todos los niños que se han considerado en este estudio han interactuado con nuestro sistema de una de las dos formas siguientes:

! En parejas: varios niños jugaron en parejas, cada uno se ponía a un lado enfrente de la pantalla e interactuaba con los botones que tenía a su lado. Jugaban indistintamente parejas de chicos, de chicas, y de chicos con chicas.

! Individual: el resto de niños jugaban individualmente, con la mano derecha interactuaban con los botones de su lado derecho,

y con la mano izquierda con los botones del lado izquierdo.

La Figura 2 muestra la disposición de la pantalla autoestereoscópica y el Kinect, y una pareja de niños jugando con el sistema.

!

Figura 2: Una pareja de niños jugando con el sistema. La actividad que se muestra es del tipo drag&drop

Una vez que los niños habían terminado de jugar con el juego, rellenaban el cuestionario de la Tabla 4. Los niños rellenaban el cuestionario on-line. Dicho cuestionario era un formulario HTML almacenado en nuestro servidor. De esta forma, la recogida de los datos se hacía de forma automática para facilitar el posterior análisis estadístico de los mismos. 4. Resultados

Los datos han sido analizados con el toolkit open source R. 4.1. Análisis de interacción y satisfacción

Las preguntas utilizadas en el cuestionario pueden dividirse en dos grupos. Las preguntas cuyas respuestas se basan en una escala Likert (preguntas de la Q01 a la Q12), y la pregunta Q13 en la que los niños podían marcar más de una opción.

Para el primer grupo de preguntas se ha calculado la media y la desviación típica de todas ellas. Todos estos valores se muestran en la Tabla 1. Como se puede observar en la misma, los resultados de todos los valores son bastante cercanos al máximo valor, lo que quiere decir que la satisfacción en cada uno de los ámbitos de cada pregunta resultó ser bastante alta. La valoración final del juego fue de 9.61 ± 0.76 puntos sobre 10, con lo que se puede afirmar la aceptación positiva del juego.

182



# Media ± Desv. Típ. Escala Q01 4.81 ± 0.56 1-5 Q02 4.24 ± 1.16 1-5 Q03 3.95 ± 0.79 1-5 Q04 4.57 ± 0.69 1-5 Q05 3.92 ± 1.01 1-5 Q06 4.64 ± 0.51 1-5 Q07 4.46 ± 0.76 1-5 Q08 4.06 ± 1.08 1-5 Q09 4.39 ± 0.80 1-5 Q10 6.43 ± 1.68 1-7 Q11 5.73 ± 2.30 1-7 Q12 9.61 ± 0.76 1-10 Tabla 1: Media y desviación típica de Q01-Q12

Para poder hacer un análisis más exhaustivo de los datos

se ha creado la variable Satisfacción, que tendrá un valor igual a la suma de los valores de cada una de las respuestas de cada niño desde Q01 hasta Q12. De esta forma, tenemos una variable con la cual comparar distintos grupos de niños, ya sea por edad o por género.

Analizando la variable Satisfacción de forma más detallada, se pueden comprobar las diferencias obtenidas entre los niños agrupándolos por género y por edad. En la Figura 3 se puede observar cómo los valores de satisfacción se mantienen de forma casi idéntica tomando todos los niños a la vez, o haciendo grupos entre chicos y chicas. Sin embargo, estos valores ya no son tan similares si agrupamos los niños por edades. La Figura 4 muestra los resultados de esta comparación. Se observa que los valores más altos los comprenden las edades de entre 8 y 10 años, mientras que los valores más bajos pertenecen a los niños de 7 y 11 años.

Para comprobar las diferencias entre los diferentes grupos, realizamos un análisis ANOVA multifactorial, en el que comparamos los factores de Género, Edad, Curso, y las interacciones entre ellos. Los resultados de este análisis se muestran en la Tabla 2. Estos resultados indican que no existen diferencias estadísticamente significativas entre los distintos factores que intervienen en el estudio. Si nos fijamos en el valor del tamaño del efecto (!2

G) de cada factor e interacción de factores se puede comprobar como todos tienen un valor pequeño.

La Figura 5 muestra el gráfico de interacción para la variable Satisfacción agrupando los datos por género y por edad. En esta figura se puede observar una tendencia similar a la observada en el boxplot de la Figura 4, solo que, en este caso, los valores altos de los niños de 8 años corresponden a los chicos y los valores altos de los niños de 10 años corresponden a las chicas, siendo estas edades las que mayor valor han obtenido en la variable Satisfacción.

En el segundo grupo de preguntas antes expuesto, tenemos la pregunta Q13 en la que preguntábamos a los niños sobre los mini-juegos que les habían gustado más. En

esta pregunta, ellos podían marcar más de uno de ellos. El mini-juego más votado fue el de la Prehistoria (80 votos), seguido de cerca por de la Edad Antigua (75 votos). Los demás mini-juegos, en orden de mayor a menor votados fueron la Edad Contemporánea (66 votos), la Edad Media (61 votos) y, por último, el mini-juego de la Edad Moderna, con 57 votos.

Figura 3: Valores de la variable Satisfacción en total y por género

Figura 4: Valores de la variable Satisfacción por edad

Factor g.l. F p !2G

Género 1 0.10 0.747 0.0009 Edad 4 0.89 0.469 0.031 Curso 3 0.87 0.456 0.023 Género:Edad 3 1.61 0.189 0.042 Género:Curso 3 1.02 0.382 0.027 Edad:Curso 2 0.78 0.459 0.014

Tabla 2: Análisis ANOVA multifactorial de la variable Satisfacción y valor del tamaño del efecto Eta-squared

Todos Chicos Chicas

4550

5560

6570

Participantes

Sati

sfac

ción

7 8 9 10 11

4550

5560

6570

Edad

Sati

sfac

ción

183



Figura 5: Interacción por género y edad

Para analizar los mini-juegos preferidos de los niños se ha

realizado un análisis Chi-cuadrado. Los resultados del test se muestran en la Tabla 3. Estos resultados muestran que existen diferencias estadísticas significativas entre chicos y chicas en el recuento de votos del mini-juego de la Prehistoria. Analizando estos datos, se observa que las chicas que han marcado la Prehistoria como uno de sus mini-juegos preferidos han sido 52 y los chicos han sido 28. Estos números muestran una diferencia muy grande en la votación de este mini-juego, mientras que en los demás mini-juegos la diferencia entre chicos y chicas no ha sido tan evidente.

Época Chicos Chicas !2 p V

Prehist. 1 1 4.20 0.04** 0.20 Antigua 1 1 1.74 0.188 0.13 Media 0 0 0.00 1.000 0.00 Moderna 0 0 0.00 1.000 0.01 Comtemp. 0 1 0.32 0.574 0.07

Tabla 3: Modas de los mini-juegos preferidos por los niños, análisis Chi-cuadrado y V de Cramer. g.l. = 1 4.2. Análisis de correlación

Para comprobar la correlación existente entre las diferentes preguntas de satisfacción se ha realizado un análisis de correlación entre todas ellas. La correlación más fuerte encontrada ha sido entre la pregunta Q10 (El castillo salía de la pantalla) y la pregunta Q11 (Sensación de poder tocar el castillo) con unos valores de correlación de 0.527, p < 0.001. Estas dos preguntas pretendían medir la sensación de profundidad percibida por los niños. La correlación entre ellas demuestra que los niños percibieron dicha sensación.

Analizando las correlaciones dentro de grupos de niños del mismo género, se encuentran correlaciones distintas. Si se considera el grupo de niños formado por chicos, las correlaciones más fuertes que aparecen son entre las preguntas Q01 (Cómo te lo has pasado) y Q12 (Valora el juego de 1 a 10) (0.527, p < 0.001), entre Q02 (Recomendarías este juego a tus amigos) y Q06 (Te han gustado las imágenes que aparecen) (0.541, p < 0.001), entre Q02 y Q07 (Te ha gustado el señor Tic-Tac) (0.520, p

< 0.001), y entre Q10 y Q11 (0.517, p < 0.001). Si se considera el grupo de niños formado por chicas, las correlaciones encontradas son entre las preguntas Q06 y Q12 (0.520, p < 0.001), entre Q02 y Q05 (Seleccionar las respuestas) (0.531, p < 0.001), y entre Q10 y Q11 (0.539, p < 0.001).

Analizando las correlaciones dentro de grupos de niños de la misma edad, se encuentran correlaciones distintas. Si se considera el grupo de niños de 7 años, las correlaciones encontradas más fuertes son entre las preguntas Q01 y Q12 (0.557, p = 0.005), entre Q06 y Q07 (0.529, p = 0.009), entre Q10 y Q11 (0.537, p = 0.008), y entre Q03 (Dificultad del juego) y Q06 (0.559, p = 0.005). Si se considera el grupo de niños de 8 años, se encuentra una correlación entre Q07 y Q08 (Tic-Tac te ha ayudado) (0.540, p < 0.001). En los niños de 9 años las correlaciones son más fuertes; se encuentran entre las preguntas Q01 y Q02 (0.616, p < 0.001), entre Q01 y Q05 (0.634, p < 0.001), entre Q01 y Q07 (0.635, p < 0.001), entre Q01 y Q12 (0.663, p < 0.001), entre Q02 y Q07 (0.643, p < 0.001), entre Q02 y Q12 (0.678, p < 0.001), entre Q06 y Q08 (0.768, p < 0.001) y entre Q07 y Q12 (0.680, p < 0.001). Finalmente, tomando el grupo de niños de 10 años, las correlaciones halladas son entre las preguntas Q02 y Q12 (0.542, p = 0.02), entre Q09 (Cuánto has aprendido) y Q12 (0.651, p < 0.001), y entre Q10 y Q11 (0.809, p < 0.001). Los niños de 11 años no se han considerado debido a su pequeño número.

5. Conclusiones

En este artículo se ha presentado un sistema de ayuda al aprendizaje de la Línea del Tiempo combinando autoestereoscopía y NUI, integrando todo el sistema dentro del grafo de escena OpenSceneGraph. Los niños pueden aprender utilizando un sistema con el que perciben una sensación tridimensional, y lo único que necesitan para interactuar es su propio cuerpo. Proceso que es metafóricamente similar a lo que sería jugar en el mundo real. Los niños se ven dentro del juego y los distintos elementos del mismo están dotados de profundidad.

Analizando los datos del estudio, podemos afirmar que hemos corroborado nuestra hipótesis, en la que predecíamos que a los niños les iba a gustar la aplicación

5658

6062

Edad

Sat

isfa

cció

n

7 8 9 10 11

Género

ChicosChicas

184



desarrollada, asignando unos valores altos a las preguntas de satisfacción. Todas las preguntas han obtenido unas puntuaciones medias muy altas, por lo que se puede deducir que: los niños se lo han pasado muy bien, que recomendarían el juego a sus amigos, que han entendido muy bien las reglas del juego, que les ha parecido que han aprendido mucho durante el juego y que el juego les ha gustado mucho. En el análisis de la variable Satisfacción se ha observado que no existen diferencias estadísticamente significativas entre los distintos factores que intervienen en el estudio. Sin embargo, los niños más pequeños (7 años) y los más mayores (11 años) otorgan puntuaciones menores. Nuestra opinión a este respecto es que la franja de edad más adecuada para este sistema o sistemas similares es de 8 a 10 años. Para niños más pequeños, el sistema debería ser más sencillo, sobre todo en conceptos. Para niños más mayores, el sistema debería ser más motivador, con más componente lúdica y que incluyera competición.

Respecto a las preguntas relacionadas con la percepción de profundidad, les preguntábamos si les parecía que los modelos se salían de la pantalla, y si les daba la impresión de que podían llegar a tocarlos realmente. Estas dos preguntas han conseguido unas valoraciones muy altas, Además, en el análisis de correlación se ha observado que estas dos preguntas están fuertemente correlacionadas. Estos datos demuestran que los niños han percibido la sensación de profundidad. Desde nuestro punto de vista, este resultado es muy prometedor e indica que esta sensación de profundidad puede ser explotada en temas en los que dicha sensación sea importante para comprender los conceptos, por ejemplo, en geometría.

En cuanto a los mini-juegos que los niños han votado más como preferidos, el más votado ha sido el de la Prehistoria, seguido por el de la Edad Antigua. El mini-juego menos votado ha sido el de la Edad Moderna. La Prehistoria y la Edad Antigua son dos de los períodos junto con la Edad Media que incluyen mayor cantidad de animaciones e interacción por parte de los niños. Por su parte, la Edad Moderna es uno de los períodos en los que los contenidos se transmiten principalmente mediante videos y con poca interacción. Considerando dichas características y los datos, nuestra recomendación para este tipo de sistemas sería transmitir el contenido con material interactivo y limitar la inclusión de vídeos.

En lo relativo a trabajos futuros, una mejora sería el añadir más material interactivo a las épocas históricas en las que casi toda la información se transmite mediante videos explicativos, como se ha sugerido anteriormente. Otro punto a tener en cuenta en una futura mejora del sistema, sería el poder visualizar los vídeos también en 3 dimensiones, ya que por el momento estos videos son visualizados en 2 dimensiones, como si se tratara de un video en una pantalla normal. También cabría la posibilidad

de llevar este sistema a las aulas de los colegios de primaria, donde pudieran utilizarlo para enseñar a los niños la Línea del Tiempo o cualquier otro concepto que se imparta en clase. Si bien un ordenador con una tarjeta gráfica potente y un dispositivo Kinect no requieren una gran inversión, la pantalla autoestereoscópica, por el momento, requiere una inversión considerable. Sin embargo, el sistema puede ejecutarse en pantallas normales desactivando la visión autoestereoscópica. Otro punto a tener en cuenta al llevar este sistema a las aulas es el número de usuarios que pueden estar jugando a la vez. Si bien el dispositivo Kinect puede detectar más jugadores, este juego ha sido diseñado para el uso con 1 ó 2 jugadores como máximo. Esto implica que mientras una pareja de niños está jugando, los demás pueden estar presentes y recibir las mismas explicaciones multimedia de audio y video que el sistema ofrece. Referencias [AFS12] ANACLETO J. C., FELS S., SILVESTRE R.: Transforming a paper based process to a natural user interfaces process in a chronic care hospital. Procedia

Computer Science 14 (2012), 173–180. [CBO*12] CARVALHO D., BESSA M., OLIVEIRA L., GUEDES

C., PERES E., MAGALHÃES L.: New interaction paradigms to fight the digital divide: A pilot case study regarding multi-touch technology. Procedia CS 14 (2012), 128–137. [CCC11] CHANG Y.-J., CHEN S.-F., CHUANG A.: A gesture recognition system to transition autonomously through vocational tasks for individuals with cognitive impairments. Resarch in Developmental Disabilities 32, 6 (2011), 2064–2068. [CCH11] CHANG Y.-J., CHEN S.-F., HUANG J.-D.: A kinect-based system for physical rehabilitation: A pilot study for young adults with motor disabilities. Resarch in

Developmental Disabilities 32, 6 (2011), 2566–2570. [CH12] COHÉ A., HACHET M.: Beyond the mouse: Understanding user gestures for manipulating 3D objects from touchscreen inputs. Computers & Graphics 36, 8 (2012), 1119–1131. [DYIR11] DEVINCENZI A., YAO L., ISHII H., RASKAR R.: Kinected conference: augmenting video imaging with calibrated depth and audio. In Proceedings of the ACM

2011 conference on Computer supported cooperative work (2011), pp. 621–624. [Fis04] FISHKIN K. P.: A taxonomy for and analysis of tangible interfaces. Personal Ubiquitous Computing 8, 5 (2004), 347–358. [KBW*11] KAMEL BOULOS M. N., BLANCHARD B. J., WALKER C., MONTERO J., TRIPATHY A., GUTIERREZ-OSUNA

R.: Web GIS in practice X: a Microsoft Kinect natural user

185



interface for Google Earth navigation. International

Journal of Health Geographics 10, 45 (2011), 1–14. [KLY*12] KIM H., LEE G., YANG U., KWAK T., KIM K.-H.: Dual Autostereoscopic Display Platform for Multi-user Collaboration with Natural Interaction. ETRI Journal 34 (2012), 466–469. [LCS*11] LANGE B., CHANG C. Y., SUMA E., NEWMAN B., RIZZO A. S., BOLAS M.: Development and evaluation of low cost game-based balance rehabilitation tool using the microsoft kinect sensor. In Proceedings of the Annual

International Conference of the IEEE Engineering in

Medicine and Biology Society (2011), vol. 2011, pp. 1831–1834. [LHP*07] LEE B., HONG H., PARK J., PARK H., SHIN H., JUNG

I.: Multiview autostereoscopic display of 36view using an ultra-high resolution LCD. In Proceedings of SPIE, the

International Society for Optical Engineering. (2007). [MBPF12] MAIMONE A., BIDWELL J., PENG K., FUCHS H.: Enhanced personal autostereoscopic telepresence system using commodity depth cameras. Computers & Graphics

36, 7 (2012), 791–807. [NPB*12] NOCENT O., PIOTIN S., BENASSAROU A., JAISSON

M., LUCAS L.: Toward an immersion platform for the world

wide web using autostereoscopic displays and tracking devices. In Proceedings of the 17th International

Conference on 3D Web Technology (2012), pp. 69–72. [OSM98] OMURA K., SHIWA S., MIYASATO T.: Lenticular autostereoscopic display system: multiple images for multiple viewers. Journal of the Society for Information

Display 6, 4 (1998), 313–324. [TK12] TAHERKHANI R., KIA M.: Designing a high accuracy 3D auto stereoscopic eye tracking display, using a common LCD monitor. 3D Research 3, 3 (2012), 1–7. [WWL08] WANG A., WANG Q., LI D.: Three dimensional display technology. Electronic Devices 31, 1 (2008), 299. [WWLL12] WANG A., WANG Q., LI X., LI D.: Combined lenticular lens for autostereoscopic three dimensional display. Optik - International Journal for Light and

Electron Optics 123, 9 (2012), 827–830. [WWZ*11] WANG Q., WANG A., ZHAO W., TAO Y., LI D.: Autostereoscopic display based on multi-layer lenticular lens. Optik - International Journal for Light and Electron

Optics 122, 15 (2011), 1326–1328.

A. Cuestionario Este anexo presenta el cuestionario diseñado para este estudio. Las posibles respuestas que se podían dar se muestran debajo de cada pregunta. La columna etiquetada con # muestra la numeración de cada pregunta.

# Pregunta Q01 ¿Cómo te lo has pasado? [1. Muy mal / 2. Mal / 3. Regular / 4. Bien / 5. Muy bien] Q02 ¿Recomendarías este juego a tus amigos?

[1. A ninguno / 2. A casi ninguno / 3. No lo sé / 4. A algunos / 5. A todos] Q03 ¿Qué te ha parecido el juego? [1. Muy difícil / 2. Difícil / 3. Regular / 4. Fácil / 5. Muy fácil] Q04 Valora cómo has entendido las reglas del juego

[1. Muy mal / 2. Mal / 3. Regular / 4. Bien / 5. Muy bien] Q05 Seleccionar las respuestas ha sido…

[1. Muy difícil / 2. Difícil / 3. Regular / 4. Fácil / 5. Muy fácil] Q06 ¿Te han gustado las imágenes que aparecen?

[1. Poquísimo / 2. Poco / 3. Regular / 4. Mucho / 5. Muchísimo] Q07 ¿Te ha gustado el señor Tic-Tac (el reloj)?

[1. Poquísimo / 2. Poco / 3. Regular / 4. Mucho / 5. Muchísimo] Q08 ¿El señor Tic-Tac te ha ayudado durante el juego?

[1. Poquísimo / 2. Poco / 3. Regular / 4. Mucho / 5. Muchísimo] Q09 ¿Cuánto has aprendido durante el juego?

[1. Poquísimo / 2. Poco / 3. Regular / 4. Mucho / 5. Muchísimo] Q10 Valora en una escala del 1 al 7 la sensación que has tenido al ver el castillo/puente. ¿Te parecía que se

salía de la pantalla? [1-7] Q11 ¿Te ha parecido que podías tocas el castillo/puente?

[1-7] Q12 Puntúa el juego de 1 a 10

[1-10] Q13 De todos los juegos de la línea del tiempo, ¿cuáles son los que más te han gustado? (Puedes marcar varias

opciones). [Pinturas rupestres / Ciudad Romana / Castillos / Viaje de Colón / Puzzle de la Línea del Tiempo]

Tabla 4: Preguntas de satisfacción

186

Posters

Towards a 3D Cadastre

M.D. Robles-Ortega1 and F.R.Feito1 and L. Ortega1

1Department of Computer Science. Campus Las Lagunillas, s/n. University of Jaén, Spain

Abstract

Urban modeling is an important research area with a wide range of applications like emergency management

or urban planning. In this paper we propose a semi-automatic method to create a 3D urban model using real

cadastral 2D data. We create the 3D models of the street surface by means of our own method and the 3D buildings

using the CityEngine software. Finally, we integrate both models and generate a realistic 3D web scene in which

users can move freely.

Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Computational Geometryand Object Modeling—I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—

1. Introducción

El modelado urbano de ciudades tiene un amplio rango deaplicaciones como reconstrucción de espacios urbanos, en-tretenimiento o gestión de emergencias [VAW∗10]. Sin em-bargo, generar un modelo realista a partir de datos 2D hasido tradicionalmente considerado un proceso manual.

En este artículo se describe el trabajo inicial realizado pa-ra generar automáticamente un modelo 3D urbano a partir delos datos reales del Catastro (http://www.catastro.meh.es/).Este proceso consta de tres fases: generación de las callesusando un método propio, creación de los modelos 3D delos edificios a partir de las parcelas catastrales mediante elsoftware CityEngine e integración de ambas escenas.

2. Generación de calles

A partir de las polilíneas de las calles y los polígonos delos edificios, se generan una serie de entidades que permitencrear el modelo 3D que representa la superficie de las calles,según el procedimiento descrito en [ROOC∗13]. El algorit-mo es robusto y funciona, aproximadamente, en el 98% delos cruces, usándose un procedimiento manual para el resto.Tras la ejecución, se obtienen dos tablas en formato MapIn-fo: los polígonos de cruces y los tramos de calles.

3. Generación de edificios

Para la generación de edificios también se dispone de unalgoritmo propio explicado en [ROOC∗13]. Sin embargo, en

este artículo se ha usado el software CityEngine basado enel modelado procedural de edificios [MWH∗06]. Así, la ge-neración de las fachadas se ha realizado mediante la defi-nición de una serie de reglas que determinan su aparienciageneral. Otra alternativa es el uso de imágenes reales co-mo en [AYLM13], donde se propone una herramienta semi-automática que genera nuevas texturas a partir del análisis delos elementos arquitectónicos más comunes de una fachada.

4. Integración de las calles generadas con los edificios

de CityEngine

Una vez generados los modelos de las calles y de los edifi-cios, el siguiente paso es la integración de ambos elementosen la misma escena. Para esta fase se ha usado CityEngine,puesto que se deseaba exportar el modelo final utilizando elvisor web propio que proporciona.

CityEngine no permite una importación directa de las ta-blas de MapInfo. Por ello, fue necesario exportar los archi-vos a un tipo de formato compatible con ambos programas:ESRI Shapefile (SHP). Sin embargo, los ficheros SHP crea-dos por MapInfo son bidimensionales, ya que no incluyeninformación de alturas. Para asignar valores de alturas a lospolígonos de las calles a partir del DEM de la ciudad y ge-nerar el modelo 3D se usó el software ArcGIS.

Tras realizar este proceso, ya es posible disponer de lainformación de las manzanas y de las calles en una mismaescena de CityEngine. El procedimiento concluye con la ge-neración del modelo 3D compuesto por ambos elementos.




189

M.D. Robles-Ortega & F.R.Feito & L. Ortega / Towards a 3D Cadastre

(a) Calles (método descrito en [ROOC∗13])

(b) Edificios (CityEngine)

Figura 1: Capas generadas del modelo de la ciudad

5. Resultados

En esta sección se exponen los resultados obtenidos usan-do como datos de entrada la información del Catastro parala ciudad de Jaén. Aunque no se obtiene un modelo exactoa los edificios de Jaén, las imágenes obtenidas pueden con-siderarse realistas puesto que se asemejan a la perspectivageneral de cualquier ciudad siguiendo un proceso semiauto-mático que apenas necesita intervención del usuario.

En la Figura 1(a) se muestran las calles generadas siguien-do el método propio descrito anteriormente. En esta imagense puede apreciar que tanto los tramos como los cruces decalles han sido creados correctamente, pudiendo usarse pararepresentar la imagen 3D de los tradicionales callejeros. Aligual que ocurre en la realidad, algunas calles tienen lige-ras desviaciones en tramos aparentemente rectos. La segun-da capa implementada (las manzanas modeladas medianteCityEngine) se observan en la Figura 1(b).

Además de la visualización de las calles y los edificiosde forma independiente, es posible moverse libremente porla escena tanto desde una perspectiva peatonal como aérea(Figura 2).


En este artículo se han descrito los trabajos iniciales rea-lizados para generar un catastro 3D utilizando como datosde entrada la información 2D disponible actualmente sobrecalles y edificios. Las calles se han generado siguiendo unmétodo propio descrito en [ROOC∗13] mientras que paralos edificios se ha usado CityEngine. Finalmente, se han in-tegrado ambas escenas obteniendo resultados realistas.

En trabajos futuros nos gustaría extender este proceso pa-

(a) Perspectiva peatonal

(b) Perspectiva aérea

Figura 2: Imágenes finales de la escena generada

ra incluir nuevas funcionalidades que permitan ampliar elmodelo generado. Así, sería interesante añadir la informa-ción ya disponible en los datos del catastro para mejorar lainteractividad con el usuario y permitirle que obtenga datosadicionales sobre los edificios. Otro aspecto que se desea es-tudiar es la mejora de la construcción procedural de edificioscon estética y geometría histórica propia del área mediterrá-nea con el objetivo de incrementar el realismo final.

Agradecimientos

Este trabajo ha sido parcialmente financiado por la Jun-ta de Andalucía a través del Proyecto de Investigación P07-TIC-02773 y el Ministerio de Economía y Competitividady la Unión Europea a través de los fondos FEDER bajo elProyecto de Investigación TIN2011-25259.

References

[AYLM13] ALHALAWANI S., YANG Y.-L., LIU H., MITRA

N. J.: Interactive facades analysis and synthesis of semi-regularfacades. Computer Graphics Forum 32, 2 (2013), 215–224. 1

[MWH∗06] MÜLLER P., WONKA P., HAEGLER S., ULMER A.,VAN GOOL L.: Procedural modeling of buildings. ACM Trans.

Graph. 25, 3 (July 2006), 614–623. 1

[ROOC∗13] ROBLES-ORTEGA M. D., ORTEGA L., COELHO

A., FEITO F., DE SOUSA A.: Automatic street surface mode-ling for web-based urban information systems. Journal of UrbanPlanning and Development 139, 1 (2013), 40–48. 1, 2

[VAW∗10] VANEGAS C. A., ALIAGA D. G., WONKA P., MÜ-LLER P., WADDELL P., WATSON B.: Modelling the appearanceand behaviour of urban spaces. Computer Graphics Forum 29, 1(2010), 25–42. 1


190

Illumination of large urban scenes

M.D. Robles-Ortega1 and J.R. Jiménez1 and L. Ortega1

1Departamento de Informática de la Universidad de Jaén

Abstract

Illumination and shadows are essential to obtain realistic virtual environments in dark scenes. In this paper we

propose a novel real-time method to determine the shadowed and illuminated areas in large scenes in the dark,

especially suitable for urban environments. Our approach uses the polar diagram as a tessellation plane, and a

ray-casting process to obtain the visible areas. This solution derives the exact illuminated area with a higher per-

formance than classical real-time shadow methods like shadow map or shadow volume. Moreover, our approach

is also used to determine the visible portion of the scene from a pedestrian point of view. As a result, we only have

to render the visible part of the scene, which is considerably lower than the global scene. This feature is especially

important in client-server systems.

Categories and Subject Descriptors (according to ACM CCS): I.3.5 [Computer Graphics]: Computational Geometry

and Object Modeling—Geometric algorithms, languages, and systems

1. Introducción

La iluminación en escenas urbanas nocturnas es de espe-

cial importancia en numerosas aplicaciones como simula-

ción o videojuegos. Por esta razón, las técnicas de genera-

ción de sombras son un tema recurrente en el renderizado de

escenas [ESAW12]. En el caso de escenarios urbanos que re-

presentan ciudades reales en las que se puede navegar libre-

mente, la generación de sombras debería poder solucionarse

para cualquier lugar de ésta, lo que conlleva la resolución de

un problema complejo, sobre todo en sistemas web.

En este trabajo se afronta el problema de la iluminación

de grandes entornos urbanos nocturnos mediante farolas, las

cuales proyectan luz en los edificios cercanos, dejando el res-

to de la escena en penumbra. En cualquier caso, la ilumina-

ción que llega a un edificio puede provenir de varias fuentes

de luz (se asumen como puntuales), dependiendo de la dis-

posición y la distancia de éstas. El modelo urbano se genera

a partir de información catastral en 2D, de modo que los edi-

ficios son objetos 2.5D procedentes de la extrusión de dicha

geometría 2D. El problema planteado, a pesar de ser comple-

jo porque no limita ni el tamaño de la ciudad ni el número

de fuentes de luz, se puede afrontar resolviendo un clásico

problema de visibilidad. Por un lado, cada farola alumbra un

espacio reducido de su entorno y por otro, los edificios son a

priori grandes oclusores, lo que reduce también el conjunto

visible al observador. Utilizar un método de visibilidad, o en

su caso de oclusión, permite reducir la cantidad de geometría

a renderizar, haciendo posible enviar la escena a través de la

red y alcanzar cierto nivel de interacción en tiempo real.

2. Trabajos previos

En la bibliografía existen distintos métodos para el ren-

derizado de grandes escenarios urbanos mediante el uso de

estrategias de aceleración tradicionales [CDBG∗07]. En la

mayoría de ellos se explotan las características particulares

de estas escenas: edificios considerados como objetos 2.5D,

estructura regular y fuerte oclusión. Sin embargo, las técni-

cas clásicas de iluminación como los mapas de sombras o

los volúmenes sombra [ESAW12] no son válidas para gran-

des escenarios urbanos, particularmente en sistemas web.

La generación de sombras para obtener realismo es un

campo de investigación fructífero en el que han sido nume-

rosos los trabajos publicados abordando tanto la generación

de las sombras como la inclusión de múltiples fuentes de luz.

La principal novedad en este artículo se basa en la importan-

te disminución de la geometría a tratar por el método.

3. Descripción del método

El método que proponemos trabaja en dos fases:




191

M.D. Robles-Ortega & J.R. Jiménez & L. Ortega / Illumination of large urban scenes

Figura 1: Lanzado de rayos para obtener el área visible.

1. Se resuelve la visibilidad sobre toda la escena para la po-

sición del observador.

2. Partiendo de la escena anterior en penumbra, se resuelve

el problema de iluminación para cada fuente de luz:

Realizar el trazado de rayos siguiendo el método des-

crito en [ROOF09]

Determinar y clasificar los puntos de intersección de

los rayos con la escena en primarios y secundarios

Generar las zonas iluminadas y realizar el renderizado

La primera parte del proceso se realiza mediante el Dia-

grama Polar [GMO06], una teselación del plano que usada

como preprocesamiento permite acelerar problemas relacio-

nados con ángulos, como el cálculo del mapa de visibili-

dad [ROOF09]. El método realiza un lanzado de rayos tan-

gentes no homogéneos desde el observador hasta detectar las

colisiones con los objetos. Este abanico de rayos alcanza a

todos y cada uno de los objetos visibles sin olvidar ninguno,

gracias a las características angulares y la topología de es-

ta teselación. El resultado es el conjunto exacto de edificios

visibles desde el punto de vista del observador de la escena.

Para el segundo apartado del problema también se utiliza

el diagrama polar de un modo similar. Este mismo abanico

de rayos, partiendo del foco de luz en vez del observador,

proporciona también la zona directamente iluminada (Figu-

ra 1). El siguiente paso es clasificar los puntos de corte entre

los rayos lanzados y los objetos de la escena (los edificios).

Un rayo lanzado ri, que es tangente a un objeto ot , genera

un punto de intersección primario si antes de pasar por otinterseca con otro objeto oc. Este caso se puede apreciar en

la Figura 1, donde por ejemplo los rayos tangentes al objeto

5 intersecan previamente con los objetos 2 y 3. Finalmente

un punto de intersección es secundario cuando llega a tocar

el punto tangente del objeto ot antes de intersecar con oc, tal

y como ocurre con los dos puntos de corte con el objeto 5,

con rayos que son tangentes a los objetos 2 y 3.

Tras clasificar los puntos de intersección del abanico de

rayos, a continuación se determina si cada rango angular en-

tre cada par de rayos consecutivos es iluminado o sombra:

Figura 2: Ejemplo que sólo renderiza un 0.1% del total.

Si dos puntos de intersección son primarios, entonces el

área comprendida entre ambos está iluminada.

Si uno de los puntos es primario y el otro secundario, en-

tonces también definen un área iluminada.

Si ambos son secundarios, entonces se tratará de un área

iluminada si los puntos tangentes tocan a objetos diferen-

tes (área iluminada del objeto 5). En caso contrario (tocan

el mismo objeto), es zona en penumbra.

Tras aplicar este método se han conseguido resultados en

tiempo real que se muestran en la Figura 2, en la que se

puede apreciar que las zonas directamente iluminadas por

las farolas corresponden con lo esperado. En algunas zonas

se suma la iluminación de varias de estas farolas generando

imágenes totalmente realistas.

4. Conclusiones y trabajo futuro

En este trabajo describimos el procedimiento para ilumi-

nar una escena urbana nocturna de forma eficiente de modo

que se pueda navegar libremente por ella a través de disposi-

tivos móviles y a través de Internet. En el futuro pretendemos

ampliar los resultados a luces móviles, como pueden ser la

de vehículos circulando por la ciudad.

5. Agradecimientos

Este trabajo ha sido parcialmente financiado por la Jun-

ta de Andalucía a través del Proyecto de Investigación P07-

TIC-02773 y el Ministerio de Economía y Competitividad

y la Unión Europea a través de los fondos FEDER bajo el

Proyecto de Investigación TIN2011-25259.

References

[CDBG∗07] CIGNONI P., DI BENEDETTO M., GANOVELLI F.,GOBBETTI E., MARTON F., SCOPIGNO R.: Ray-casted block-maps for large urban models visualization. Computer Graphics

Forum 26, 3 (2007), 405–413. 1

[ESAW12] EISEMANN E., SCHWARZ M., ASSARSSON U.,WIMMER M.: Real Time Shadows. AK Peters/CRC Press, 2012.1

[GMO06] GRIMA C., MARQUEZ A., ORTEGA L.: A new 2d tes-sellation for angle problems: The polar diagram. Computational

Geometry. Theory and Applications 34, 2 (2006), 58–74. 2

[ROOF09] ROBLES-ORTEGA M. D., ORTEGA L., FEITO F.: Anexact occlusion culling method for navigation in virtual archi-tectural environments. In Proceedings of the IV Iberoamerican

Symposium in Computer Graphics (2009), pp. 23–32. 2


192

Documents

XXIII Congreso Español de Informática Gráfica, CEIG2013