The Virtues of Virtual Reality - Princeton Universityalaink/SmartDrivingCars/SDC...[10] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. Learning

The Virtues of Virtual RealityArtur Filipowicz and Nayan BhatPrinceton UniversityMay 18th, 2017

Uses for Virtual Reality in SmartDrivingCars Train

- Develop new algorithms and software

Test

- Individual components and whole vehicles

Improve

- Recreate crashes and fix points of failure

Virtual Reality and Deep Learning Learning with respect to a task

Optimization with respect to an objective function

(X,Y)

X may not capture the full spectrum of conditions the system will encounterY may not be easily accessible Y may not contain enough information

Benefits and Drawbacks of VisionBenefits

- Robust to weather conditions [4] - Provides appearance information (color)- Inexpensive

Drawback

[2]

Virtual Environments ● Can generate Big Data

○ Variations in weather and lighting conditions ○ Variations in stop sign appearance ○ Variations in landscape (urban, suburban, rural)

● Can generate Smart Data ○ Precise and correct labels ○ Variety of labels

● Event and scene reconstruction ○ Tesla and Uber crashes

Simulator

Which set would you use for humans?Training Set A Training Set B

Images from [3].

Environment - Grand Theft Auto 5*- Dynamic and “realistic” source of data - Highways, rural roads, intersections, ramps- 250 models of vehicles, pedestrians, bicyclists, motorcyclists - Traffic lights, street signs - Time and weather control

* Grand Theft Auto 5 is property of Rockstar Games, Inc. Used in this work for research and educational purposes.

Image from [3].

Images from [3]

GTA and Real World Comparison

A stop sign at 10 meters in GTA (left) and Princeton NJ (right). Image from [3].

The Problem

STOP● Detect the presence of a stop sign and determine the distance to it based on an image.

Our Solution● Collect images from a virtual

environment

● Collect ground truth labels along with the images

● Train a deep convolutional neural network to predict the labels

Input Image [3]

Output Prediction

Datasets Used for This Research Synthetic Data From GTA 5 [3]

● Camera: 800x600 pixels, focal length 8.79 mm● 24 hours of the day, rain, snow, thunder, sunny, clear ● Training Set - 494,483 images (18 locations)● Testing Set - 193,285 images (6 locations)

Real Data From MIT 2007 DARPA Challenge [2]

● Camera: 720x576 pixels, focal length 2.879 mm● Sunny afternoon● Training Set - 3,472 images (11 locations) ● Testing Set - 5,064 images (11 locations)

Training and Testing on Real Data● Performance

○ 96% detection accuracy within 10 m. of stop sign○ Less than 50% accuracy beyond 10 m. ○ 40% false positive rate

● Adding synthetic data to training: ○ 94% detection accuracy within 10 m. of stop sign ○ Less than 75% accuracy beyond 10 m. ○ 2% false positive rate

● There is a faded stop sign in the test set○ Using our training data, the CNN cannot detect it ○ This stop sign is removed from the test set for the following results

Training and Testing on Synthetic Data

False Positive Rate: 4%

Training and Testing on Synthetic Data

Direct Application to Real Data Data

False Positive Rate: <1%

Direct Application to Real Data Data

http://www.youtube.com/watch?v=YigXg5luM-U

Domain Adaptation● Domain Adaptation [7] - methods for making the training and testing domains

more alike● Methods: experiment, fine tuning [8] or learn transformation [9, 10]

Image from [3]. Image from [2].

Performance On Real Data After Fine Tuning

False Positive Rate: 5%

Performance On Real Data After Fine Tuning

http://www.youtube.com/watch?v=LcGXCko3knk

Effect of Fine Tuning on Performance On Real Data

Effect of Fine Tuning on Performance On Real Datang

● Collision Avoidance

● Key Affordance in Autonomous Driving Systems

Vehicle Following Distance

● Virtual Driving

○ Data Generation in GTAV

○ Transfer Learning / Domain Adaptation

● Neural Network-based Regression

○ Directly estimating distance rather than classifying distance buckets

● 3D Convolutional Neural Network

○ Compare performance to 2D model

Paper Highlights

Data Set● 1.3M images labeled across 13 dimensions

● 1,660 distinct driving sequences

● Split: 980K Training, 150K Validation, 180K Testing

Preservation of Time Dependencies

Images from [19], [4].

Prediction Error Distribution

Statistics from [4]

Testing Video

https://docs.google.com/file/d/0BwA1Oys6n-iVYzlqZUp2R1huZTA/preview

High Percentage Error Coordinates

Transfer Learning● Models: 10 motorbikes, 1 golf cart, 3 bicycles

● 100K Images

Real World Driving

https://docs.google.com/file/d/0BwA1Oys6n-iVSmpLeVFkRjV0dkk/preview

Future Research● Stop Object Recognition and Localization

○ Meet the performance goals across all of the distances○ Recognize non-ideal stop signs - faded, or graffiti covered○ Test in various real world conditions. ○ Expand the range of stop objects - red lights, pedestrians, police officers

● Simulators ○ Efficiently parameterize scene generation○ Determine minimum level-of-fidelity and level-of-detail [6]○ Develop label-free develop domain adaptation technique

● Convolutional Neural Networks ○ Architecture selection ○ Solution to catastrophic forgetting [5]

Future Research● Improve distance estimation

○ Use a real world sensor (lidar / radar) to quantitatively test domain adaptation

○ Train separate CNNs for low distance and open road detection

● Development of real-time, online driving model within GTAV○ Repeat process for each affordance in Direct Perception [4]○ Study impact of model structure on stability

References [1] FHWA. Manual on uniform traffic control devices, 2016.

[2] Albert S Huang, Matthew Antone, Edwin Olson, Luke Fletcher, David Moore, Seth Teller, and John Leonard. A high-rate, heterogeneous data set from the darpa urban challenge. The International Journal of Robotics Research, 29(13):1595–1601, 2010.

[3] Grand Theft Auto 5 is property of Rockstar Games, Inc.

[4] Chenyi Chen. Extracting Cognition out of Images for the Purpose of Autonomous Driving. PhD thesis, PRINCETON UNIVERSITY, 2016.

[5] Robert M French. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, 3(4):128–135, 1999.

[6] VSR Veeravasarapu, Constantin Rothkopf, and Visvanathan Ramesh. Model-driven simulations for deep convolutional neural networks. arXiv preprint arXiv:1605.09582, 2016.

[7] David Vazquez, Antonio M Lopez, Javier Marin, Daniel Ponsa, and David Geronimo. Virtual and real world adaptation for pedestrian detection. IEEE transactions on pattern analysis and machine intelligence, 36(4):797–809, 2014.

[8] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.

x

References [9] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. Adapting visual category models to new domains. In European conference on computer vision, pages 213–226. Springer, 2010.

[10] Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Josh Susskind, Wenda Wang, and Russ Webb. Learning from simulated and unsupervised images through adversarial training. arXiv preprint arXiv:1612.07828, 2016.

[11] Ravi Shanker, Adam Jonas, Scott Devitt, Katy Huberty, Simon Flannery, William Greene, Benjamin Swinburne, Gregory Locraft, Adam Wood, Keith Weiss, et al. Autonomous cars: Self-driving the new auto industry paradigm. Morgan Stanley Blue Paper, November, 2013.

[12] National Highway Traffic Safety Administration et al. 2015 motor vehicle crashes: overview. Traffic safety facts research note, 2016:1–9, 2016.

[13] Santokh Singh. Critical reasons for crashes investigated in the national motor vehicle crash causation survey. Technical report, 2015.

[14] Statistics Department National Safety Council. Nsc motor vehicle fatality estimates. http://www.nsc.org/NewsDocuments/2017/12-month-estimates.pdf, 2017. Accessed: 2017-2-26.

[15] Andrew Bacha, Cheryl Bauman, Ruel Faruque, Michael Fleming, Chris Terwelp, Charles Reinholtz, Dennis Hong, Al Wicks, Thomas Alberi, David Anderson, et al. Odin: Team victortango’s entry in the darpa urban challenge. Journal of Field Robotics, 25(8):467–492, 2008.

[16] David Schrank, Bill Eisele, and Tim Lomax. Tti’s 2012 urban mobility report. Technical report, 2012.

References [17] James J Flink. The Automobile Age. MIT Press, 1990.

[18] Kareem Habib. Odi resume the automatic emergency braking. Technical report, 2017.

[19] Grzegorz Gwardys. Convolutional Neural Networks backpropagation: from intuition to derivation, 2016.