Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
POSTER TEMPLATE BY:
www.PosterPresentations.com
AeroSynth: Aerial Scene Synthesis from Images
David Nilosek, Lt. Col. Karl WalliRochester Institute of Technology, AFIT/CI
Carlson Center for Imaging Science
Digital Imaging and Remote Sensing Lab
Introduction
Automated synthetic terrain and architecture generation is now
becoming feasible with calibrated camera remote sensing. This
poster shows computer vision techniques that have recently
become popular to extract ‘structure from motion’ of a calibrated
camera with respect to a target. This process will build off of
Microsoft’s popular ”PhotoSynth” technique and apply it to
geographic scenes imaged from an airborne platform. Images
taken from an airborne platform often have wide baselines and a
sparse number of images covering the desired target.
Generation of both sparse and dense point clouds will be used to
increase the fidelity of the 3D structure for realistic scene
modeling.
Our Approach
Scenes are reconstructed on both a macro (sparse) and a
micro (dense) scale
The macro process establishes an initial correspondence and
from that derives the epipolar geometry between images and a
sparse point cloud of scene coordinates
The micro process uses the found epipolar geometry and a
region of interest over a target to generate a dense point cloud of
scene coordinates
A point in left image corresponds to a line in right image, this is
called the epipolar constraint [1]
The fundamental matrix F describes the relationship between
two images [1]
Using RANSAC with the equation above, the outliers in the
initial correspondence from SIFT are removed [1]
Initial correspondence
Epipolar Geometry & RANSAC (RANdom SAmple Consensus)
The points are put into UTM coordinates by projecting the
Scene coordinates through the collinearity equations onto the
base image, these equations use the calibrated camera
information
0L
T
R FXX
Finding the scene coordinates
Photogrammetry is used to calculate an initial estimate of the
scene coordinates [4]
Derived from figure 3:
RL
axx
BfHZ
RL
La
xx
ByY
RL
La
xx
BxX
Where a,b are vectors of camera information, i is the image
point the in the right image, j is the image number. P is the
predicted projection of point i onto image j, d represents the
Euclidean distance operator and v is a binary operator that is 1 if
the point exists and 0 otherwise
n
i
m
j
ijijijba
xbaPdvij 1
2
1,
,,min
Scale invariant feature transform (SIFT) is a scale and
orientation invariant detector [3]
Image is convolved with Gaussian kernels of different widths
Features are detected by finding local extrema in the
difference between Gaussian images
Features are described by the scale of the Gaussian curve
and by the relative orientation of the area around the feature to
create scale and orientation invariant description vectors
Description vectors are matched using nearest neighbor
approach
)(
)(
)(
)(
3301320131
2301220121
3301320131
1301120111
HZmYYmXXm
HZmYYmXXmfyy
HZmYYmXXm
HZmYYmXXmfxx
aaa
aaaRa
aaa
aaaRa
coscoscossinsin
cossinsinsincoscoscossinsinsinsincos
sinsincossincossincoscossinsincoscos
m
Dense Correspondence
Using principles of epipolar geometry, every point over the ROI
is matched along epipolar line in other camera to generate dense
correspondence
Once correspondence is determined the scene coordinates
are extracted using the previously mentioned methods
This model is facetized and an image is projected onto the
model
The estimates along with the calibrated camera information
and the correspondences are put through a sparse bundle
adjustment to refine the scene coordinate estimation [2]
AeroSynth output
Combination of macro and micro scene reconstruction with a
comparison to a hand created CAD modelUnordered Images
A
B
C
Feature Extraction
A
B
CB
C
Image Correspondences
A
B CB CB C
A
B CB C
Image Group Relationship
Sparse Bundle Adjustment
Sparse Point Cloud &
Coordinate Transformation
3D Models
Pass Relevant AeroSynth Data from
Macro Process to Micro Process
Cam1 -
Base
(ωφκxyzfpk)
H
Region
of
Overlap
(ROO)ROO
LRLL
UL UR
Camera 2
(ωφκxyzfpk)
Camera 3
(ωφκxyzfpk)
ROI
Region
of
Interest
(ROI)
References
[1] HARTLEY, R., AND ZISSERMAN, A. 2003. Multiple view geometry in computer vision.
Cambridge Univ Pr.
[2] LOURAKIS, M., AND ARGYROS, A. 2004. The design and implementation of a
generic sparse bundle adjustment software package based on the levenberg-
marquardt algorithm. ICS/FORTH Technical Report TR 340.
[3] LOWE, D. 1999. Object recognition from local scale-invariant features. In International
Conference on Computer Vision, vol. 2, Corfu, Greece, 1150–1157.
[4] WOLF, P., AND DEWITT, B. 1983. Elements of photogrammetry. McGraw-Hill
Singapore.
*Environment used to display models is Google Earth
Figure 2: Basic epipolar geometry showing the epipolar constraint
Figure 1: The AeroSynth workflow
Camera Parameters
& Dense Point Cloud
F
(ωφκxyzfpk)
H
Figure 3: Simple geometry of straight baseline aerial photography
Figure 4: TOP: Five images overlapping a scene of a wastewater treatment plant. BOTTOM: The images
projected onto a map to show the overlapping regions*
Figure 5: TOP: The five overlapping regions and initial scene coordinate estimate before sparse bundle adjustment.
BOTTOM LEFT: The five overlapping regions after the sparse bundle adjustment. BOTTOM RIGHT: The points
projected back onto the base image to calculate the UTM coordinates for each point.
Figure 6: LEFT: Target chosen in base image with single point chosen. MIDDLE/RIGHT:
Corresponding epipolar line
Figure 7: LEFT: Point cloud derived from dense correspondence. RIGHT: Facetized point cloud with image
texture map overlaid
Figure 7: AeroSynth output with comparison models (Comparison models provided by Pictometry
International Corp)