Detection and Segmentation of Road Images with
Deep Learning
GTC Europe, October 2017, Talk #23304
Frank Geujen – Senior Product Manager
William Raveane – Computer Vision Engineer
Mapscape, a Navinfo company
CONTENTS
目录Who is NavInfo
SD & HD Map Making
Road Feature Extraction
Traffic Sign Detection
Looking Ahead
NavInfo is the leading map provider in China, with focus on location big data platform, HD / SD map, Telematics and ADAS comprehensive solutions.
• Established on 2002 in Beijing China
• More than 4500 employees Globally
NavInfo Introduction
Automated Driving
Connected Car Navigation
Branches
International business expansion Advanced technology research
America Netherlands Mapscape: Compilation Technology (NDS) EU Technology Centre:
• Computer Vision• Deep Learning
China 31 localization base for data collect and technology service. 4 R&D Centers(Shanghai、Xian、Shenyang、Wuhan) Beijing Headquarters
Challenge of Map Making
Ingestion/ Extraction Data Source
100+
Collection vehicles
600+dispersed field staff in China
360+ citiesBig data mining99% Highway80% Main Road
22M+community contributions
3.2+ MillionSigns processed per year
20+ Million
POI updated per year
4+ MillionRoad distance updated per year
Map Creation
> 500production staff
4,000+page specifications
Delivery
6.16+ Million Kilometer
24.95+ MillionCore POI
260+Attributes in the portfolio
60+Cities of ground truth testing in 2017 Q1
80%/70%
Update80% POI and 70% road linkin China per year
field local offices31
Hong Kong, Laos,Macao, CambodiaMap data
SD Map
POI
Standard Definition Map, is primarily for A to B routing & guidanceand is a simplified representation of the road in links and nodes.
HD Map
High Definition Map, is used for automated/autonomous drivingand includes high accurate lane and road features.
The Field drive use cases
Live feedmono
camera
Real Timesign
extraction
(Semi) automatedcore mapupdating
2 FPSstoredmono
cameraimagery
Off lineFeature
extraction
(Semi) automatedcore mapupdating
Panoramic imagery
Road feature
extraction
Projection on LIDAR
data
Feature extraction
from LIDAR
HD map creation/ updating
On-boardSD map
Off-boardSD map
Off-boardHD map
Real Time Traffic Sign Detection
Use case: In-car data collection on an NVIDIA Jetson TX2
• Over 180 traffic sign classes supported today
• Up to 32 fps at 1920x1080 in 15W MAX-N mode
• Detection based on Single Shot Detector (SSD)
• Training on a Titan X GPU server
• Inference through TensorRT
Supported Features
Supported today• Speed Limits• Warning Signs• Information Signs• Prohibition Signs• Directional Signs
In development• Gantry Sign Boards• Traffic Lights• Digital Traffic Signs
Optimization of SSD on Jetson TX2
• With TensorRT:• 6x speedup in inference performance• 3x reduction in memory consumption
• And with our in-house CUDA Kernels• Additional 3x speedup in inference performance• Allows full utilization of GPU resources
Implementation of SSD
Two-stage system:• ResNet-based SSD for Detection• ResNet for Fine Classification
Custom Layer API:• Bridges both TensorRT Stages
Detector: SSD
Classifier:ResNet
SSD Custom Layers
• Implementation of SSD layers as custom CUDA kernels:• Executed by Custom Layer API• Priors replaced by on-demand calculations• Softmax calculated only when required• Non-maximum suppression replaced by a
batched data feeder for the classifier
SSD on the Jetson TX2
SSD Caffe TensorRT + our CUDA kernels
Profile visualization of SSD inference
510ms 31ms
Detection Accuracy
• Single Image:• Precision: 92.5%• Recall: 98%
• Tracking over Time:• Precision: 96.0%• Recall: 98.5%
Single Image Per-Class Detection Accuracy
Single Image Per-Class Classification Accuracy
Single Image Detection PR Curve
Class ID
Class ID
Acc
ura
cyA
ccu
racy
0
100
0
100
Road Feature Extraction
• Road feature and object extraction
• Semantic segmentation network architecture
• Automatic lane grouping
• Training & inference on NVIDIA Titan X GPU server Lane numbering
Road features
Gantry sign boards
Supported today:• Surface level:
• Lane markings• Text, numbers, speed limits• Arrows
• Road objects:• Gantry sign boards• Guard Rails• Curbs
In Development:• Poles• Traffic Lights• Tunnels
Road Features
The On-road feature extraction process
Crop from Panoramic Image
Camera Calibration Transformation
to Top View
Segmentation Network
Transformation to Front View
Semantic Segmentation
Deep Neural
Network
Lane Number Grouping
Semantic Segmentation Performance• Inference at 5 images per second using an NVIDIA Titan X GPU
• Common lane marking classes• Recall: 92.8%• Precision: 82%
• Common road arrow marking classes• Recall: 85.6%• Precision: 72.8%
Confusion MatrixPerformance of the system expected to further improve as we continue development
Looking Ahead
• Deep Learning continues being more integrated into our:• Field collection• Map creation• Distribution processes
• On-going developments:• Real-time semantic segmentation system on-board vehicles• Crowdsource data processing supporting self-healing maps• Applications for crowdsourcing