2
Transformative Reality: Augmented reality for visual prostheses Wen Lik Dennis Lui Damien Browne Lindsay Kleeman Tom Drummond Wai Ho Li * ECSE and Monash Vision Group, Monash University, Australia ABSTRACT Visual prostheses such as retinal implants provide bionic vision that is limited in spatial and intensity resolution. This limitation is a fun- damental challenge of bionic vision as it severely truncates salient visual information. We propose to address this challenge by per- forming real time transformations of visual and non-visual sensor data into symbolic representations that are then rendered as low resolution vision; a concept we call Transformative Reality. For example, a depth camera allows the detection of empty ground in cluttered environments that is then visually rendered as bionic vi- sion to enable indoor navigation. Such symbolic representations are similar to virtual content overlays used in Augmented Reality but are registered to the 3D world via the user’s sense of touch. Preliminary user trials, where a head mounted display artificially constrains vision to a 25x25 grid of binary dots, suggest that Trans- formative Reality provides practical and significant improvements over traditional bionic vision in tasks such as indoor navigation, object localisation and people detection. 1 BACKGROUND In 1968, Brindley and Lewin discovered that electrical stimulation of the visual cortex caused patients to perceive bright dots of light called phosphenes, which occur in predictable locations within the visual field [1]. Subsequently, it was found that phosphenes can be elicited through electrical stimulation of earlier stages in the visual pathway, such as the retina and optic nerve. Visual prostheses such as retinal and cortical implants apply electrical stimulation to the visual pathway using an electrode array to generate a 2D pattern of phosphenes similar to a low resolution dot pattern. The spatial and intensity resolution of the phosphene patterns produced by a visual prosthesis is constrained by biology, technol- ogy and safety. Cell damage during surgery, spatial spreading of electrical charge during neural stimulation and material technology limit electrode density and thereby reduce spatial resolution. Next generation prostheses, such as the cortical implant being developed by the Monash Vision Group, are expected to deliver a pattern of 25x25 phosphenes. From existing reports of bionic vision, it is un- clear whether a range of phosphene intensities can be produced reli- ably and repeatably. Therefore, we assume binary phosphenes that are either on or off. Phosphenes are visualised using an isotropic Gaussian as recommended by the extensive survey of patient re- ports and simulated prosthetic vision compiled by Chen et al [2]. From here, we shall refer to this phosphene pattern as bionic vision. Traditionally, images from a head-worn camera are converted into low resolution bionic vision using simple image processing. An example is shown in Figure 1a. The camera image on the left is converted to greyscale, down sampled by averaging patches of pixels and then adaptively thresholded into binary, resulting in the bionic vision output on the right. The visual locations of objects on the table have been lost in the process. Low spatial frequency sam- pling of the textured table cloth has also caused phosphene noise. Such severe truncation of salient information makes it very difficult to discern any meaning from the bionic vision output. * Email: [email protected] (a) Traditional Bionic Vision (b) Transformative Reality - Symbolic rendering of structural edges Figure 1: Traditional bionic vision compared to Transformative Reality bionic vision, where a depth camera image is transformed into a symbolic visual rendering of structural edges. 2 TRANSFORMATIVE REALITY We propose the Transformative Reality concept to improve the saliency of visual information provided through low resolution visual displays such as bionic vision. Transformative Reality (TR) works by performing real time transformations of visual and/or non-visual sensor data into multiple user-selectable modes of symbolic representations of the world that are then visually ren- dered in low resolution. The TR concept is similar to Steve Mann’s mediated reality [4] concept with the addition of multimodal sens- ing and explicit output resolution constraints. An example TR output for the scene in Figure 1a is shown in Figure 1b. The depth image of the tabletop scene (left) has been transformed into a symbolic rendering of structural edges (right), where phosphenes represent non-planar regions in the 3D scene. Non-planar regions are detected by applying Principal Component Analysis to regions of the depth image corresponding to spatial lo- cations of phosphenes in the bionic vision output. Structural edges include both depth discontinuities and crease edges such as folds in fabric. The inclusion of crease edges was suggested by primate vi- sion experts in the Monash Vision Group. Notice the improvement of TR bionic vision when compared to traditional bionic vision, de- spite both making use of the same resolution visual output. Figure 2 compares traditional bionic vision against another TR mode, which renders the empty space on the ground as contigu- ously filled regions of phosphenes. The dark regions represent non- ground areas, which either delineate the edge of the floor or repre- sent potential obstacles. The use of non-visual sensors allows the detection of low contrast obstacles, such as the legs of the computer chair at the bottom left of the scene. This TR mode was designed specifically for the navigation of cluttered indoor environments. 253 IEEE International Symposium on Mixed and Augmented Reality 2011 Science and Technolgy Proceedings 26 -29 October, Basel, Switzerland 978-1-4577-2185-4/10/$26.00 ©2011 IEEE

[IEEE 2011 IEEE International Symposium on Mixed and Augmented Reality - Basel (2011.10.26-2011.10.29)] 2011 10th IEEE International Symposium on Mixed and Augmented Reality - Transformative

  • Upload
    wai-ho

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 IEEE International Symposium on Mixed and Augmented Reality - Basel (2011.10.26-2011.10.29)] 2011 10th IEEE International Symposium on Mixed and Augmented Reality - Transformative

Transformative Reality: Augmented reality for visual prosthesesWen Lik Dennis Lui Damien Browne Lindsay Kleeman Tom Drummond Wai Ho Li∗

ECSE and Monash Vision Group, Monash University, Australia

ABSTRACT

Visual prostheses such as retinal implants provide bionic vision thatis limited in spatial and intensity resolution. This limitation is a fun-damental challenge of bionic vision as it severely truncates salientvisual information. We propose to address this challenge by per-forming real time transformations of visual and non-visual sensordata into symbolic representations that are then rendered as lowresolution vision; a concept we call Transformative Reality. Forexample, a depth camera allows the detection of empty ground incluttered environments that is then visually rendered as bionic vi-sion to enable indoor navigation. Such symbolic representationsare similar to virtual content overlays used in Augmented Realitybut are registered to the 3D world via the user’s sense of touch.Preliminary user trials, where a head mounted display artificiallyconstrains vision to a 25x25 grid of binary dots, suggest that Trans-formative Reality provides practical and significant improvementsover traditional bionic vision in tasks such as indoor navigation,object localisation and people detection.

1 BACKGROUND

In 1968, Brindley and Lewin discovered that electrical stimulationof the visual cortex caused patients to perceive bright dots of lightcalled phosphenes, which occur in predictable locations within thevisual field [1]. Subsequently, it was found that phosphenes can beelicited through electrical stimulation of earlier stages in the visualpathway, such as the retina and optic nerve. Visual prostheses suchas retinal and cortical implants apply electrical stimulation to thevisual pathway using an electrode array to generate a 2D pattern ofphosphenes similar to a low resolution dot pattern.

The spatial and intensity resolution of the phosphene patternsproduced by a visual prosthesis is constrained by biology, technol-ogy and safety. Cell damage during surgery, spatial spreading ofelectrical charge during neural stimulation and material technologylimit electrode density and thereby reduce spatial resolution. Nextgeneration prostheses, such as the cortical implant being developedby the Monash Vision Group, are expected to deliver a pattern of25x25 phosphenes. From existing reports of bionic vision, it is un-clear whether a range of phosphene intensities can be produced reli-ably and repeatably. Therefore, we assume binary phosphenes thatare either on or off. Phosphenes are visualised using an isotropicGaussian as recommended by the extensive survey of patient re-ports and simulated prosthetic vision compiled by Chen et al [2].From here, we shall refer to this phosphene pattern as bionic vision.

Traditionally, images from a head-worn camera are convertedinto low resolution bionic vision using simple image processing.An example is shown in Figure 1a. The camera image on the leftis converted to greyscale, down sampled by averaging patches ofpixels and then adaptively thresholded into binary, resulting in thebionic vision output on the right. The visual locations of objects onthe table have been lost in the process. Low spatial frequency sam-pling of the textured table cloth has also caused phosphene noise.Such severe truncation of salient information makes it very difficultto discern any meaning from the bionic vision output.

∗Email: [email protected]

(a) Traditional Bionic Vision

(b) Transformative Reality - Symbolic rendering of structural edges

Figure 1: Traditional bionic vision compared to TransformativeReality bionic vision, where a depth camera image is transformedinto a symbolic visual rendering of structural edges.

2 TRANSFORMATIVE REALITY

We propose the Transformative Reality concept to improve thesaliency of visual information provided through low resolutionvisual displays such as bionic vision. Transformative Reality(TR) works by performing real time transformations of visualand/or non-visual sensor data into multiple user-selectable modesof symbolic representations of the world that are then visually ren-dered in low resolution. The TR concept is similar to Steve Mann’smediated reality [4] concept with the addition of multimodal sens-ing and explicit output resolution constraints.

An example TR output for the scene in Figure 1a is shown inFigure 1b. The depth image of the tabletop scene (left) has beentransformed into a symbolic rendering of structural edges (right),where phosphenes represent non-planar regions in the 3D scene.Non-planar regions are detected by applying Principal ComponentAnalysis to regions of the depth image corresponding to spatial lo-cations of phosphenes in the bionic vision output. Structural edgesinclude both depth discontinuities and crease edges such as folds infabric. The inclusion of crease edges was suggested by primate vi-sion experts in the Monash Vision Group. Notice the improvementof TR bionic vision when compared to traditional bionic vision, de-spite both making use of the same resolution visual output.

Figure 2 compares traditional bionic vision against another TRmode, which renders the empty space on the ground as contigu-ously filled regions of phosphenes. The dark regions represent non-ground areas, which either delineate the edge of the floor or repre-sent potential obstacles. The use of non-visual sensors allows thedetection of low contrast obstacles, such as the legs of the computerchair at the bottom left of the scene. This TR mode was designedspecifically for the navigation of cluttered indoor environments.

253

IEEE International Symposium on Mixed and Augmented Reality 2011Science and Technolgy Proceedings26 -29 October, Basel, Switzerland978-1-4577-2185-4/10/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 IEEE International Symposium on Mixed and Augmented Reality - Basel (2011.10.26-2011.10.29)] 2011 10th IEEE International Symposium on Mixed and Augmented Reality - Transformative

(a) Traditional Bionic Vision

(b) Transformative Reality - Symbolic rendering of empty ground

Figure 2: Traditional bionic vision compared to a TR rendering ofempty ground. The red pixels in the bottom left image representRANSAC inliers, which are rendered as empty ground using TR inthe bottom right image.

The empty ground TR mode operates by first generating aground plane estimate using a depth image and the direction ofgravity provided by an accelerometer. The use of an accelerome-ter allows the algorithm to run in real time by restricting the searchspace of potential ground planes to those with normals that pointupwards against gravity. The ground plane inliers found duringRANSAC [3] plane fitting (painted red in left image of Figure 2b)are considered to be empty ground. Phosphenes are rendered atcorresponding locations in the TR bionic vision output.

Apart from structural edges and empty ground, we have also de-veloped a TR mode for people detection. An example result of thepeople TR mode is shown in Figure 3. A colour camera is usedto detect frontal human faces using the Viola-Jones algorithm [5].Detected faces are represented using a symbolic avatar. A person’sbody is found by searching below a detected face for a continuoussegment in the depth image, which is then symbolically rendered asa contiguous filled region.

Figure 3: TR rendering of people. An avatar represents frontal facesand a contiguous filled region represents a human body.

3 PRELIMINARY USER TRIALS AND DISCUSSION

In Section 2, we described three TR modes that transformed sensordata into symbolic visual representations of the world: structuraledges, empty ground and people detection. These modes were fullyimplemented in a functional prototype as described by the systemdiagram in Figure 4. A Microsoft Kinect is used for sensing and TRalgorithms are implemented using C++ on a consumer-level PC. Allthree TR modes are able to run concurrently in real time with neg-ligible processing delay. The user is able to select different modesin real time including enabling multiple modes, which are automat-ically blended by our system.

Figure 4: Transformative Reality system used in HMD user trials

Preliminary user trials were conducted using the equipmentshown in Figure 5. A bare Kinect is mounted in front of an NVISSX-60 head mounted display (HMD) to enable an immersive visualexperience of bionic vision while allowing tactile interactions withthe environment, including bipedal navigation of indoor environ-ments and manipulation of objects on tabletops.

Figure 5: Head mounted display with Kinect used in user trials.

User trial results suggest that TR provides practical and signif-icant improvements over traditional bionic vision for indoor nav-igation, object localisation and people detection. There appearsto be a learning effect where user performance improves quicklyover the first 10 to 15 minutes. Future work includes new TRmodes such as gesture recognition as well as improved visualisa-tions of bionic vision based directly on models of cortical stimula-tion. Psychophysics trials conducted in collaboration with medicalresearchers and Vision Australia are planned for the near future.

Note: The attached video is highly recommended as it containsmoving images that give a better sense of the TR user experience.

ACKNOWLEDGEMENTS

The authors thank Ben Yong for his help with the HMD. Thiswork was supported by the Australian Research Council SpecialResearch Initiative in Bionic Vision and Sciences (SRI 1000006).

REFERENCES

[1] G. S. Brindley and W. S. Lewin. The sensations produced by electricalstimulation of the visual cortex. The Journal of Physiology, 196:479–493, 1968.

[2] S. C. Chen, G. J. Suaning, J. W. Morley, and N. H. Lovell. Simulatingprosthetic vision: I. Visual models of phosphenes. Vision Research,49(12):1493–1506, June 2009.

[3] M. A. Fischler and R. C. Bolles. RANSAC: a paradigm for modelfitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, June 1981.

[4] S. Mann. Mediated Reality. Technical Report 1, M.I.T. Media Lab,Cambridge, Massachusetts, Mar. 1994.

[5] P. Viola and M. J. Jones. Robust Real-Time Face Detection. Interna-tional Journal of Computer Vision, 57(2):137–154, May 2004.

254