Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Damage Detection from Aerial Images
via Convolutional Neural Networks
*National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
†Nagoya University, Aichi, Japan
5/8/2017 MVA2017@Nagoya University 1
Aito Fujita*
Riho Ito*
Ken Sakurada† Tomoyuki Imaizumi*
Shuhei Hikoska* Ryosuke Nakamura*
Aerial images for damage detection
5/8/2017 MVA2017@Nagoya University 2
In the event of catastrophic disasters, fast assessment ofthe extent of damage is crucial for recovery.
E.g. “Which buildings got washed away by tsunami? And how many?”
Satellite/aerial images as source for damage assessment:
◦ The wide coverage can facilitate the assessment.
◦ However, currently, the assessment is performed manually (by human eye).
Aerial view of Tohoku quake
5/8/2017 MVA2017@Nagoya University 3
Post-quake imagePre-quake image
2km
4km
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 4
Post-quake imagePre-quake image
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 5
Post-quake imagePre-quake image
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 6
Post-quake imagePre-quake image
5/8/2017 MVA2017@Nagoya University 7
Post-quake imagePre-quake image
Damage assessment in practice
This building is “washed-away”.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 8
Post-quake imagePre-quake image
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 9
Post-quake imagePre-quake image
This building is “surviving”.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 10
Post-quake imagePre-quake image
Repeated for all buildings.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 11
Post-quake imagePre-quake image
Repeated for all buildings.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 12
Post-quake imagePre-quake image
Repeated for all buildings.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 13
Post-quake imagePre-quake image
Repeated for all buildings.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 14
Manual assessmentPost-quake imagePre-quake image
RED: washed-away bldg.
YELLOW: surviving bldg.
Post-quake imagePre-quake image
RED: washed-away bldg.
YELLOW: surviving bldg.
Damage assessment in practice
5/8/2017 MVA2017@Nagoya University 15
Manual assessment
Manual assessment takes too much time.
Can we automate such assessment process?
Background and Contributions
Background of damage detection:
◦ Lack of labeled dataset
◦ No study on Deep Learning applied to satellite/aerial images
Related works in this context:
◦ Gueguen+(CVPR2015): hand-designed features (tree-of-shapes); satellite images
◦ Cooner+(Remote Sensing 2016): shallow neural networks; aerial images
◦ Nia+(CRV2017): Convolutional networks but ground-level images
Our contributions:◦ A new labeled dataset (ABCD dataset)
◦ Comprehensive analyses of CNNs for washed-away building detection
5/8/2017 MVA2017@Nagoya University 16
Image coverage
New dataset for washed-away building detection
AIST Building Change Detection (ABCD) dataset:
◦ Based on images taken before/after the Tohoku earthquake
◦ Over 10K pairs of pre/post-tsunami patches
◦ Collected from 66km2 of aerial images of the Tohoku coastal region
◦ A target building at the center of the patch
◦ A damage label assigned to each pair
5/8/2017 MVA2017@Nagoya University 17
Washed-away
buildings
Surviving
buildings
◦ The damage label data is based on the survey result of the Great East Japan earthquake,
which was conducted by the Japanese government (MLIT) after March 11, 2011.
◦ Now preparing to release to the public.
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 18
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 19
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 20
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 21
Overview of washed-away detection framework
5/8/2017 MVA2017@Nagoya University 22
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
5/8/2017 MVA2017@Nagoya University 23
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 24
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 25
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 26
Input to a CNN: a pair of pre/post patches
Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 27
Input to a CNN: a pair of pre/post patches
Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)
Fully-connected
Convolution
Max pooling
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 28
In practice, pre-tsunami images may not be available.
Practical considerations: Input scenario
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
◦ Both pre- and post-tsunami images are available.
◦ Only post-tsunami images are available.
5/8/2017 MVA2017@Nagoya University 29
Input to a CNN: a post-tsunami patch
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale
◦ Size-adaptive
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 30
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aims to encompass most of the buildings.
◦ Size-adaptive -> Makes tiny buildings more conspicuous.
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 31
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale: 160 x 160 pixels (64m x 64m) so that most buildings are encompassed.
5/8/2017 MVA2017@Nagoya University 32
160 pixels
16
0 p
ixel
s
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aim to encompass most of the buildings.
◦ Size-adaptive: the crop size depends on buildings—more emphasis on small buildings.
5/8/2017 MVA2017@Nagoya University 33
Size-adaptively crop and resize
Fixed-scale (160x160) Resized version
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale -> Aim to encompass most of the buildings.
◦ Size-adaptive -> Make tiny buildings more conspicuous.
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 34
Practical considerations
From a practical viewpoint, we investigated the following:
Input scenario (to address availability of pre-tsunami image)
Input scale (to address variability of building size)
◦ Multiple crop sizes
◦ Fixed-scale
◦ Size-adaptive
◦ Multi-scale CNN
5/8/2017 MVA2017@Nagoya University 35
Inspired by Zagoruyko+(CVPR15)
Experiment
Experimental setting, the same across all conditions (input scenario and scale)
The number of data◦ Randomly sampled 8,500 pairs; balanced class distribution
Evaluation◦ 5-fold cross validation (train/val per fold = 6800 : 1700)
◦ Classification accuracy
CNN hyper-parameters◦ Conv-Pool-Conv-Pool-Conv-Conv-FC-FC with ReLU nonlinearity
◦ No batch normalization (due to degradation in performance)
◦ SGD with momentum; constant learning rate; weight decay
◦ Shared or unshared weights between branches for Siamese case
Data pre-processing◦ Zero mean and unit variance (pixel-wise)
◦ Augmentation with vertical and horizontal flip
5/8/2017 MVA2017@Nagoya University 36
Accuracy
5/8/2017 MVA2017@Nagoya University 37
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Accuracy
5/8/2017 MVA2017@Nagoya University 38
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Accuracy
5/8/2017 MVA2017@Nagoya University 39
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Accuracy
5/8/2017 MVA2017@Nagoya University 40
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixed Adaptive
Accuracy
5/8/2017 MVA2017@Nagoya University 41
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Wrongly classified
as “surviving”
Correctly classified
as “washed-away”
Correctly classified
as “washed-away”
Wrongly classified
as “surviving”
Fixed-scale Adaptive
The different scales are complementary w.r.t. prediction.
vs
vs
Accuracy
5/8/2017 MVA2017@Nagoya University 42
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Accuracy
5/8/2017 MVA2017@Nagoya University 43
93.0%
94.0%
95.0%
96.0%
6-channel Siamese Only post image
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
93.0%
94.0%
95.0%
96.0%
Accuracy
5/8/2017 MVA2017@Nagoya University 44
Without multi-scale
With multi-scale
v.s.
Accuracy: in terms of input scenario
5/8/2017 MVA2017@Nagoya University 45
Only post-tsunami images may be sufficient in the context of washed-away building detection.
93.0%
94.0%
95.0%
96.0%
Both pre/post images Only post image
Accuracy: in terms of input scale
5/8/2017 MVA2017@Nagoya University 46
93.0%
94.0%
95.0%
96.0%
No good-and-bad between fixed-scale and adaptive-scale.The ensemble is always better.
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Accuracy: in terms of input scale
5/8/2017 MVA2017@Nagoya University 47
93.0%
94.0%
95.0%
96.0%
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Without multi-scale
With multi-scale
Multi-scale CNNs always beat single-scale counterparts.
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Fixe
d
Ad
apti
ve
Ense
mb
le
Qualitative result
5/8/2017 MVA2017@Nagoya University 48
Ground truthCNN prediction
RED: washed-away
YELLOW: surviving
Conclusion
This work in a nutshell:
◦ Effective use of CNNs for washed-away detection was explored.
◦ To this end, we compiled a new labeled dataset.
◦ On this dataset, we performed some types of experiments from an application viewpoint (input scenario and input scale).
◦ Overall, the accuracy was reasonably good.
Future work:
◦ Generalizability (other regions, other events, other data and so on)
◦ End-to-end framework from localization to classification such as instance-segmentation (Pinheiro+ECCV16)
5/8/2017 MVA2017@Nagoya University 49
5/8/2017 MVA2017@Nagoya University 50
Reference:◦Cooner et al.: Detection of urban damage using remote sensing and machine learning algorithms:
Revisiting the 2010 Haiti earthquake, in Remote Sensing, 2016.
◦Gueguen and Hamid: Large-scale damage detection using satellite imagery, in CVPR, 2015.
◦Simo-Serra et al.: Discriminative learning of deep convolutional feature point descriptors, in ICCV, 2015.
◦Zagoruyko and Komodakis: Learning to compare image patches via convolutional neural networks, in
CVPR, 2015.
◦Lin et al.: Learning deep representations for ground-to-aerial geolocalization" in CVPR, 2015.
◦Ministry of Land, Infrastructure, Transport, and Tourism: First report on an assessment of the damage
caused by the Great East Japan earthquake", http://www.mlit.go.jp/common/000162533.pdf
(published in Japanese)
◦Pinheiro: Learning to re ne object segments, in ECCV, 2016.
◦Nia and Mori: Building Damage Assessment Using Deep Learning and Ground-Level Image Data, in CRV,
2017.
Acknowledgement: This presentation is based on results obtained from a project commissioned by the New Energy and
Industrial Technology Development Organization (NEDO).
Appendix
5/8/2017 MVA2017@Nagoya University 51
Related work
Damage detection◦ Gueguen & Hamid (CVPR, 2015)
◦ Pre- and post-disaster satellite images
◦ Semi-supervised classification (BoF + SVM)
◦ Cooner et al. (Remote Sensing, 2016)
◦ Pre- and post-disaster satellite images
◦ Two-layer neural network
◦ Nia & Mori (CRV, 2017)
◦ Only post-disaster ground-level image
◦ Dilated CNN
5/8/2017 MVA2017@Nagoya University 52
Related work
Two image patches as input to CNN
◦ Simo-Serra et al. (ICCV, 2015)
◦ Lin et al. (CVPR, 2015)
◦ Zagoruyko & Komodakis (CVPR, 2015)
5/8/2017 MVA2017@Nagoya University 53
5/8/2017 MVA2017@Nagoya University 54
5/8/2017 MVA2017@Nagoya University 55
9 days after the quake (20/Mar/2011)114 days before the quake (17/Nov/2010)
Human annotations derived from the field survey are superimposed.
Red arrows are buildings that were washed away by tsunami.
Visual changes of buildings with tsunami are clear, but how can we describe these?
5/8/2017 MVA2017@Nagoya University 56
Cross validation result
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 8
15 22 29 36 43 50 57 64 71 78 85 92 99
10
6
11
3
120
12
7
13
4
14
1
14
8
15
5
16
2
16
9
17
6
183
19
0
19
7
20
4
21
1
21
8
22
5
23
2
23
9
246
25
3
26
0
26
7
27
4
28
1
28
8
29
5
Acc
ura
cy /
Lo
ss
Iteration (x10)
accuracy (fixed_size/noAug)
testloss (fixed_size/noAug)
trainloss (fixed_size/noAug)
5/8/2017 MVA2017@Nagoya University 57
Size-adaptive classification
Improved:
Worsened:
5/8/2017 MVA2017@Nagoya University 58
5/8/2017 MVA2017@Nagoya University 59
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
0 1 2 3 4
Cro
ss-v
alidati
on m
ean o
f accura
cy
Number of conv layers shared
Effect of # of shared layers on accuracy
Visualization via Global Average Pooling
Global Average Pooling is the technique for:
1) Regularizing big networks by forgoing FC layers (e.g., NIN and GoogLeNet)
2) Visualizing feature maps learned by networks in an intuitive way for human
3) Localizing object instances even under the weakly-supervised setting
Recent papers that delve into GAP:
◦ Learning Deep Features for Discriminative Localization (Zhou+, CVPR16)
◦ Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization (Selvaraju+, arXiv16)
5/8/2017 MVA2017@Nagoya University 60
in Zhou+ (CVPR16)
Replace fc-layers w/ GAP and need to retrain (leading to some drops in classification accuracy, but improving localization power)
5/8/2017 MVA2017@Nagoya University 61
Visualization using GAP: Class Activation Map
Visualization using GAP: Class Activation Map
in Selvaraju+ (arXiv16)
Retraining is unnecessary thanks to gradient backprop (keeping models intact; hence it doesn’t impair classification performance)
5/8/2017 MVA2017@Nagoya University 62
resized
fixed
5/8/2017 MVA2017@Nagoya University 63
resized
fixed
5/8/2017 MVA2017@Nagoya University 64
resized
fixed
5/8/2017 MVA2017@Nagoya University 65