17
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tgei20 Download by: [Bulent Ecevit University] Date: 27 July 2017, At: 08:05 Geocarto International ISSN: 1010-6049 (Print) 1752-0762 (Online) Journal homepage: http://www.tandfonline.com/loi/tgei20 Novel fusion approach on automatic object extraction from spatial data: case study Worldview-2 and TOPO5000 U. G. Sefercik , S. Karakis, C. Atalay, I. Yigit & U. Gokmen To cite this article: U. G. Sefercik , S. Karakis, C. Atalay, I. Yigit & U. Gokmen (2017): Novel fusion approach on automatic object extraction from spatial data: case study Worldview-2 and TOPO5000, Geocarto International, DOI: 10.1080/10106049.2017.1353646 To link to this article: http://dx.doi.org/10.1080/10106049.2017.1353646 Accepted author version posted online: 12 Jul 2017. Published online: 27 Jul 2017. Submit your article to this journal Article views: 6 View related articles View Crossmark data

Novel fusion approach on automatic object extraction from spatial … · 2019. 2. 15. · borders and manual extraction, NetCAD 5.2 software was utilized. Segmentation and automatic

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=tgei20

    Download by: [Bulent Ecevit University] Date: 27 July 2017, At: 08:05

    Geocarto International

    ISSN: 1010-6049 (Print) 1752-0762 (Online) Journal homepage: http://www.tandfonline.com/loi/tgei20

    Novel fusion approach on automatic objectextraction from spatial data: case studyWorldview-2 and TOPO5000

    U. G. Sefercik , S. Karakis, C. Atalay, I. Yigit & U. Gokmen

    To cite this article: U. G. Sefercik , S. Karakis, C. Atalay, I. Yigit & U. Gokmen (2017): Novel fusionapproach on automatic object extraction from spatial data: case study Worldview-2 and TOPO5000,Geocarto International, DOI: 10.1080/10106049.2017.1353646

    To link to this article: http://dx.doi.org/10.1080/10106049.2017.1353646

    Accepted author version posted online: 12Jul 2017.Published online: 27 Jul 2017.

    Submit your article to this journal

    Article views: 6

    View related articles

    View Crossmark data

    http://www.tandfonline.com/action/journalInformation?journalCode=tgei20http://www.tandfonline.com/loi/tgei20http://www.tandfonline.com/action/showCitFormats?doi=10.1080/10106049.2017.1353646http://dx.doi.org/10.1080/10106049.2017.1353646http://www.tandfonline.com/action/authorSubmission?journalCode=tgei20&show=instructionshttp://www.tandfonline.com/action/authorSubmission?journalCode=tgei20&show=instructionshttp://www.tandfonline.com/doi/mlt/10.1080/10106049.2017.1353646http://www.tandfonline.com/doi/mlt/10.1080/10106049.2017.1353646http://crossmark.crossref.org/dialog/?doi=10.1080/10106049.2017.1353646&domain=pdf&date_stamp=2017-07-12http://crossmark.crossref.org/dialog/?doi=10.1080/10106049.2017.1353646&domain=pdf&date_stamp=2017-07-12

  • Geocarto InternatIonal, 2017https://doi.org/10.1080/10106049.2017.1353646

    Novel fusion approach on automatic object extraction from spatial data: case study Worldview-2 and TOPO5000

    U. G. Sefercika  , S. Karakisa, C. Atalaya, I. Yigitb and U. Gokmenc

    aDepartment of Geomatics engineering, Bulent ecevit University, Zonguldak, turkey; bDepartment of civil engineering, Bulent ecevit University, Zonguldak, turkey; cGraduate School of natural and applied Sciences, Bulent ecevit University, Zonguldak, turkey

    ABSTRACTThe automatic extraction of information content from remotely sensed data is always challenging. We suggest a novel fusion approach to improve the extraction of this information from mono-satellite images. A Worldview-2 (WV-2) pan-sharpened image and a 1/5000-scaled topographic vector map (TOPO5000) were used as the sample data. Firstly, the buildings and roads were manually extracted from WV-2 to point out the maximum extractable information content. Subsequently, object-based automatic extractions were performed. After achieving two-dimensional results, a normalized digital surface model (nDSM) was generated from the underlying digital aerial photos of TOPO5000, and the automatic extraction was repeated by fusion with the nDSM to include individual object heights as an additional band for classification. The contribution was tested by precision, completeness and overall quality. Novel fusion technique increased the success of automatic extraction by 7% for the number of buildings and by 23% for the length of roads.

    1. Introduction

    At the end of the 1990s, with the launch of the first high-resolution commercial satellite IKONOS, space-borne optical remote sensing gained considerable popularity. High-spatial resolution especially benefited the geometric accuracy and acquired information content of spatial imagery. As a conse-quence, a large variety of professional disciplines began to prefer spatial imagery in such applications as temporal change detection (Baiocchi et al. 2010), the establishment of geographical information systems (GISs) (Font et al. 2010), the creation of three-dimensional views and relief maps (Fraser 2003), disaster monitoring and management (Vassilopoulou et al. 2002; Navalgund et al. 2007), for-estry (Stereńczak and Kozak 2011), agriculture (Thompson et al. 2001; Schmidt and Persson 2003) and drainage (Jain and Singh 2005). In parallel, satellite manufacturers developed new satellites equipped with advanced imaging technologies, and the spatial resolution of panchromatic (PAN) optical imagery reached

  • 2 U. G. SEFERCIK ET AL.

    high flight altitudes and limited imaging capabilities of optical sensors limit the estimation of terrain and non-terrain objects. To minimize these disadvantages, numerous studies have been conducted on improving the quality of spatial imagery (Meenakshisundaram 2005). The basic parameter of the quality of an image is the heterogeneity of visual objects which simplifies their identification. To improve the heterogeneity, two basic criteria are significant, namely, spatial resolution, which affects the detection of the shape of the object, and spectral (colour) structure (Aguilar et al. 2012). In space-borne optical imagery, high-spatial resolution is obtained from the PAN band, and colour information is obtained from multispectral (MS) bands through the visual wavelengths of red, green, and blue (RGB) (Maxwell 2005). Ordinarily, the spatial resolution of MS imagery is two or four times lower than PAN in advanced optical sensors; however, MS and PAN data are fused with the ‘pan- sharpening’ method, and colourful imagery can be obtained with the spatial resolution of PAN (Hong 2007). Pan-sharpening can be applied using various methods, such as principle component analysis (PCA) (Hong 2007; Rani and Sharma 2013) and hue intensity saturation (HIS) (Rahmani et al. 2010), Wavelet, Gram Schmidt and University of New Brunswick (UNB) (Nikolakopoulos 2008). Increasingly, VHR pan-sharpened space-borne optical imagery is used for object extraction and the revision of vector maps (Sportouche et al. 2009; Bhadauria et al. 2013).

    To extract the information content of space imagery, the two main methods commonly preferred are manual and automatic (Theng 2006; Benarchid et al. 2013). In comparison with the manual extraction process, the automatic extraction process is rapid and time saving and is more reasonable for wide areas. In addition, this process enables the extraction of vector semantic data, which can be easily analysed in GIS software and applications (Baltsavias et al. 2001). However, the basic principle of automatic object extraction is heterogeneity, but considerable errors are caused when objects with similar shape and spectral reflectance that should be in different classes are placed into the same class. Therefore, missing or falsely extracted objects causes the production of misleading topographic vector maps (Shaw and Burke 2003; Pedelty et al. 2004; Bedard et al. 2012).

    In this study, we aimed to minimize the automatic object extraction faults using a novel fusion approach, and maximize the information content obtained from spatial data. Accordingly, a large coverage Worldview-2 (WV-2) VHR mono-image was chosen as the sample data, and performance improvement and verification steps were realized utilizing a 1/5000 scale topographic vector map (TOPO5000). In our approach, the elevation information of non-terrain objects was provided by generating a normalized digital surface model (nDSM) derived from digital aerial photos used in the production of TOPO5000, and later fused with WV-2 data as an additional band in classifica-tion. NDSM is a differential model, calculated by the subtraction of a digital terrain model (DTM), which represents only the three-dimensional (3D) bare earth topography, from a digital surface model (DSM), the 3D earth surface including all terrain and non-terrain objects, such as buildings, vegetation and roads (Niederöst 2000; Straub et al. 2001). NDSM is mainly used in the discipline of forestry for height detection of trees and forest stands (Stereńczak et al. 2008; Smreček 2012; Sefercik and Atesoglu 2013).

    The contribution of the novel fusion approach was validated by standard quantitative measures, such as precision (correctness), completeness and overall quality (Heipke et al. 1997; Karantzalos and Paragios 2009), in comparison with the information content from the reference TOPO5000. Considering the objectives of the research, the paper is organized as follows: Section 2 describes the study area and materials used followed by the methodology section. The results and discussion will be presented in Section 4 followed by the conclusions.

    2. Study area and materials

    The research was carried out for a national TOPO5000 map named ‘H21C05D’, which has a cover-age area of approx. 6 km2 (2.2 × 2.8 km). The area is located in Bursa, which is the fourth biggest

  • GEOCARTO INTERNATIONAL 3

    metropolitan in Turkey with a population of 2.8 million and an annual increase of 5%. The city is located in the Marmara region in the northwest of Turkey, with various topographic formations and orthometric altitudes of up to 2500 m. The study area is a rapidly changing residential area, and a difficult one for automatic object extraction, containing different roof types and materials. The Bursa Greater Municipality completed a digital photogrammetric mapping project called Metropolitan 3 (M3) between 2008 and 2011. Using a digital camera, the image quality was improved, allowing for the increase of speed and accuracy of photogrammetric triangulation and processing. Additionally, more qualitative orthophoto and vector maps were generated. In the project, approximately 4500 1/1000 scale topographic vector maps (TOPO1000) and 300 TOPO5000 maps were produced in an area of 1650 km2 using traditional photogrammetry under the supervision of the first and second authors of this paper. The aerial photos of the study area were obtained using Vexcel UltraCam-Xp high-quality digital camera during 14–20 June 2009 having 11 cm ground sampling distance (GSD). The TOPO5000 maps were obtained by generalization from TOPO1000. As known, the seeing limit of human eye is approx. ‘0.2 mm’ which corresponds ‘1 m’ in 1/5000 scale and an object can be recognized individually at least by 2 pixels. Considering this parameters, some areal thresholds are determined by the Countries in generalization from TOPO1000 to TOPO5000 to eliminate over small objects which cause noise in the maps. In Turkey, the areal threshold is 25 m2 for TOPO5000 and accordingly, the objects that have the area ≤25 m2 were eliminated in generalization. Some of the eliminated details by this process are the manholes, rainwater removals, electric and phone poles and outbuildings.

    In addition to this project, to ensure full coverage of the Bursa city area (~12,000 km2), the Greater Municipality provided the WV-2 image with a physical GSD of 46 cm in the nadir view. In Figure 1, showing Bursa city in Turkey, the full city coverage WV-2 image and the study area are shown. The information about the UltraCam-Xp, photogrammetric flight, and WV-2 imaging are given in Table 1.

    Figure 1. (a) Map of turkey (red circle is Bursa city), (b) used WV-2 image of entire Bursa city, (c) ortho-photo of the study area derived from stereo digital aerial photos of toPo5000.

  • 4 U. G. SEFERCIK ET AL.

    3. Methodology

    The main steps of the methodology applied to improve the performance of automatic object extraction are shown as a workflow diagram in Figure 2. During the workflow: to narrow over large WV-2 image borders and manual extraction, NetCAD 5.2 software was utilized. Segmentation and automatic object extraction were completed in Ecognition 8.8.2. To achieve DSM and DEM point clouds from aerial photos and their processing and filtering, Agisoft Photoscan 1.1.6 and TerraScan Microstation V8i were used, respectively. DSM, DTM and nDSM generation and all of visualizations were realized by Surfer 11 and LISA 4.7. For WV-2 Wallis filtering and the entire DSM and DTM processes, Bundle Block Adjustment Leibniz University Hannover (BLUH) version 2015 were employed. Finally, for calculating the number of buildings and road lengths, ArcGIS 9.1 software was preferred.

    3.1. Image enhancement

    The enhancement process of the WV-2 image was completed in two steps as pan-sharpening by the UNB algorithm and Wallis filtering. The pan-sharpening method facilitates using the sensors’ spectral capabilities simultaneously with high spatial resolution. The PAN WV-2 images are obtained in the wavelength of 0.4–0.8 μm, are sensitive for green colour, and partially cover the near infrared band. In comparison with PCA, HIS, Wavelet and Gram Schmidt pan-sharpening algorithms, UNB has the advantage of green band sensitivity. The algorithm works on a statistical basis, and uses the least squares method to calculate the contribution of each imaging band to the final product to find the best coherence between the combined imaging bands’ grey values, and reduce the spectral distortion independent of the data-set (Zhang 2002). In the first step, a 0.5 m pan-sharpened image was achieved with 0.5 m PAN and 2 m MS imagery of WV-2 by UNB. Second, objects which cannot be identified due to low contrast or shadow, are unveiled by Wallis filtering. Wallis is a local statistical filter, used for local contrast improvement by modifying the grey value of the pixels towards the direction of the desired mean grey value in the pixel’s neighbourhood (Jazayeri and Fraser 2008; Ballabeni et al. 2015).

    Table 1. Flight and imaging information.

    Parameter UltraCam-Xp Aerial Flight (M3) WV-2 Instrument

    Date – 2009 2011Image acquisition type – Mono and stereo MonoImaging bands Pan + MS + nIr Pan-sharpened (l3) Pan-sharpenedImage size (W×l) 11,310 × 17,310 pixel (Pan) – 320,000 × 240,000 pixel

    3770 × 5770 pixel (rGB)Pixel size 6 μm – 0.5 mGSD ((Ps/f ) × h)) 0.1034 m –coverage – 1650 km2 11,500 km2

    Distance between columns – 1100 m –camera focal length (f ) 100.5 mm – 13.3 mBase – 340 m –Flight height (h) – ~57,000 ft = 1733 m 770 kmPhoto-scale – ~(100.5/1,733,000) = 1/17,300 –Direct sensor orientation GPS/IMU – GPS/IMU/Star trackerFrontlap and slidelap of aerial

    photos– 70–30% –

    radiometric resolution >12bit – 11bitWeight 55 kg – 2615 kg

  • GEOCARTO INTERNATIONAL 5

    Therefore, the filtered image histogram will be normally distributed. Equation (1) gives the calculation of a pixel grey value by Wallis filtering;

    In the formula, Gin and Gout are the grey value of a pixel before and after Wallis filtering, respectively. Min and Mout are the mean and desired mean grey values of neighbour pixels. Sin and Sout are the standard deviation and desired standard deviation of grey values, respectively. The relation between Sin and Sout is limited by ‘gain limit’. This formula can be modified as follows by the addition of a factor for desired mean ‘F’, reducing the change from the Wallis filter.

    Wallis filtering was applied to the pan-sharpened WV-2 image with several parameter combinations, and the outputs were interpreted based on the visual quality of the heterogeneity of different land forms. In Figure 3, the contribution of the Wallis filter can be clearly seen in the WV-2 image of the study area.

    3.2. Manual and automatic object extraction

    After the enhancement processes, buildings and roads were manually and automatically extracted from the WV-2 image. Manual extraction provided an opportunity to validate the pros and cons of the automatic process in comparison with the human eye. In addition, this approach facilitates deter-mining the real information content of the WV-2 image with the advantage of one by one vectorisa-tion and classification of visible buildings and road centrelines. Automatic extraction was realized as object-based using the combination of shape and spectral (colour) heterogeneity (Baatz et al. 2004). The heterogeneity is determined by fused adjacent objects in a stable direction (e.g. top-down, bot-tom to up), as limited by the pre-determined scale parameter. In principle, the fused objects continue

    (1)Gout = Mout + (Gin −Min) ×SoutSin

    (2)Gout = Mout × F +Min × (1 − F) + (Gin −Min) ×SoutSin

    Figure 2. Methodology.

  • 6 U. G. SEFERCIK ET AL.

    as the first object (obj1) and the second object (obj2), and the fusion of obj1 and obj2 is named as obj1 for the next merge, and merge with another obj2 again. The shape heterogeneity consists of two sub-heterogeneities as compactness, which represents the closeness of the pixels clustered in an object by comparing it to a circle; and smoothness, the similarity between the image object borders and a perfect square (UTSA 2013). The spectral heterogeneity is the standard deviation of digital numbers (DN) of the pixels that cover the object. As a consequence, the overall heterogeneity can be calculated by Equations (3)–(7). In the formulas, Δhoverall, Δhsp, Δhsh, Δhcm, Δhsm are the overall, spectral, shape, compactness, and smoothness heterogeneity, respectively. N is the number of channels of the seg-mented satellite image, W represents the channel weight, n describes the number of pixels belonging to the object 1 and 2 (obj1 and obj2), σ is the standard deviation, l and b are the factual border length and the perimeter of the bounding obj1, obj2 (Tian and Chen 2007).

    (3)Δhsp =N∑

    i=1

    Wi

    (

    nmerge�merge

    i−

    (

    nobj1�obj1

    i+ nobj2�

    obj2

    i

    ))

    (4)Δhcm = nmergelmerge

    nmerge−

    nobj1

    lobj1√

    nobj1+ nobj2

    lobj2√

    nobj2

    (5)Δhsm = nmergelmerge

    bmerge−

    (

    nobj1

    lobj1

    bobj1+ nobj2

    lobj2

    bobj2

    )

    (6)Δhsh = WcmΔhcm + (1 −Wcm)Δhsm

    Figure 3. Pan-sharpened WV-2 image before (left) and after (right) Wallis filtering.

  • GEOCARTO INTERNATIONAL 7

    Table 2 shows the segmentation parameters in four levels used for the most successful automatic extraction on WV-2 and the Wallis filtered version. The optimal values for segmentation parameters are defined considering the image characteristics used (e.g. resolution, viewing geometry, radiometry, and distortions), object properties (e.g. length, width, and roof types), and the resulting segmentation outcome. In the table, ‘level’ determines whether the newly generated image level will either overwrite a current level, or whether the generated objects shall become sub- or super-objects of a still existing level. The order of generating the levels affects the objects’ shape (top-down, bottom-up segmenta-tion). ‘Scale’ describes a threshold value to stop the fusion of objects and segmentation. That means, if the larger scale is used, more objects are fused in segmentation and larger objects occur. ‘Colour and Shape’ are the weight of spectral and shape homogeneity in segmentation and their total has to be 1. Accordingly, a high spectral homogeneity value indicates a lower influence of shape homogeneity in segmentation. ‘Compactness and Smoothness’ define whether the objects shall become more compact (fringed) or more smooth. ‘Weight of image channels’ determines the weight of an image channel on the segmentation. For the images that have comparable channels in size and content, such as WV-2, each channel should be weighted equally (Hofmann 2001; Benz et al. 2004). The used WV-2 image is not a true-ortho, which is why the nDSM version was not utilized in the segmentation, but used in classification functions. In image classifications, several customized object features, which are arith-metically calculated on an average band, and shape values were defined. For instance, different multi-plication functions such as the ratio of band to brightness (mean of all bands), nDSM and red, nDSM and brightness were determined. To ease the vectorization in classification, angular and rectangular membership functions were utilized, and the objects were extracted without additional defuzzification.

    In segmentation, the fusion of shape and spectral heterogeneity is designed on a two dimensional (2D) base. However, space-borne imagery has some distortions that cause misleading results in 2D segmentation. In Figure 4, on the WV-2 image, there are two instances of distortions as spectral explosions (4(c) and (d)), and unclear building borders mix with roads (4(a) and (b)), disrupting the spectral heterogeneity and preventing the acquisition of regular segments. To eliminate the influence of distortions and provide the correct separation of buildings and roads, we aimed to add elevation information as an additional band to the automatic classification.

    The performance of object extractions was validated by common classification criteria as precision (correctness), completeness, and the overall quality presented in Equations (8)–(10) as P, C and Q, respectively. In the equations, TP, FP, FN mean true positives, false positives, and false negatives (Clode et al. 2004; Mancini et al. 2009). In this research, TP represents the number of correctly extracted objects, FP symbolises incorrectly extracted objects, and FN stands for not extracted objects. For the calculations in the road layer, length is used instead of the number.

    (7)Δhoverall = (1 −Wsh)Δhsp +WshΔhsh

    (8)P =TP

    (TP + FP)

    Table 2. Segmentation parameters and applied thresholds for WV-2 imagery.

    DataSegmentation

    levels Scale Colour Shape Compactness Smoothness

    Weight

    R G BWV-2 1 10 0.4 0.6 0.5 0.5 1,1,0.1

    2 20 0.1 0.9 0.2 0.8 1,1,13 30 0.5 0.5 0.5 0.5 1,1,14 8 Spectral difference 1,1,1

    WV-2 Wallis 1 10 0.4 0.6 0.5 0.5 1,1,0.12 20 0.2 0.8 0.5 0.5 1,1,13 30 0.5 0.5 0.5 0.5 1,1,14 8 Spectral difference 1,1,1

  • 8 U. G. SEFERCIK ET AL.

    3.3. DSM, DTM, and nDSM generation and refinement

    The elevation of non-terrain objects was obtained by the generation of nDSM formulized in Equation (11). As seen in the equation, nDSM is a derivative product, obtained from DSM and DTM height differences (Hashemi 2008). DSMs and the DTMs are the basic components for mapping and are largely utilized for the production of credible ortho and true ortho-images bringing 3D functionality to aerial and space-borne imagery through ortho-rectification. The quality of these models directly effects the quality of the orthos. The WV-2 mono image used was ortho-rectified by image distributor Digital Globe.

    In DSM and DTM generation, first point-clouds were obtained from the automatic semi-global match-ing (SGM) of 24 aerial photos that cover the study area completely (Hirschmüller 2005; Guidi et al. 2014). The variables of automatic image matching are the non-terrain objects on the stereo-imagery and the most significant parameter is the correlation coefficient (r). The absolute r changes between −1 and 1, and depends on the linear relationship between the same objects in the stereo-imagery. The strongest linear relationship is determined with −1 or 1 which can be achieved from totally coherent

    (9)C =TP

    (TP + FN)

    (10)Q =TP

    (TP + FP + FN)

    (11)nDSM = DSM − DTM

    Figure 4. Distortions of WV-2 image; (a) unclear building borders, (b) influence of unclear building borders on segmentation, (c and d) spectral explosion.

  • GEOCARTO INTERNATIONAL 9

    objects. Conversely, 0 describes no matching and, if high quality matching is needed, a value of r near 0 is unacceptable. Normally, the absolute r is expected to be strong in open and flat areas, and weak in forest and rough terrain. The study area includes very low forest cover (3%), and the main targets are buildings and roads, which is why a high absolute r was preferred in the SGM. The point-clouds were determined under two classes as ‘ground’ and ‘others’. The ground points were used as DTM data, and the total of ground and others were utilized as DSM data. The outliers occurred by automatic matching algorithmic errors were filtered in two steps, before and after DSM/DTM generation. In the first step, the elevation limits of DSM and DTM data were calculated by determining the minimum and maximum elevations in vertical profiles. In this way, outliers out of elevation limits were excluded in DSM/DTM generation. The second step was carried out on created DSM and DTM raster data using automatic median filtering on neighbour pixels. With this method, the median of relative height differences between neighbour pixels was calculated, and disparities were detected and eliminated. The gaps which occur due to eliminated pixels are filled by a smooth least square interpolation. Figure 5(b) and (c) shows part of the DSM after the filtering of outliers and refinement, respectively.

    For the generation of raster DSMs and DTMs, due to the irregular nature of point clouds, interpo-lation is required. The interpolation of a DSM is more difficult than a DTM due to sudden changes in the height depending on buildings and vegetation (Clark et al. 2004). This is why a kriging algorithm, effective for dense point clouds, and a narrow search radius to detect sudden changes, were preferred for DSM generation. Conversely, considering the nearly flat topography, the nearest neighbour algorithm

    Figure 5. (a) Focused instance area on WV-2, (b) filtered DSM, (c) refined DSM.

  • 10 U. G. SEFERCIK ET AL.

    and a twice as large search radius were applied in DTM generation (Li and Heap 2008). During the creation of nDSM, the most significant parameter was the minimum height difference, which helps to separate buildings and vegetation. While the low values of this parameter cause a noisy nDSM because of non-eliminated vegetation, high thresholds may induce missing buildings or details apart from the highest parts of the roofs. In the study, considering minimum building heights in the area, different thresholds were used to get the most realistic nDSM. Normally, the minimum building height in the area is approx. 4 m, but given possible small height errors in the DSM and the DTM, the threshold was applied as 2.5 m. In this way, the heterogeneity of buildings and roads were increased and effect of vegetation was considerably reduced. Figure 6 illustrates the outcomes obtained from 2.5 to 5 m minimum height difference. In 5 m, the vegetation cover was considerably reduced; however, certain buildings and significant contents were lost.

    4. Results

    4.1. Manual extraction results

    In Figure 7 and Table 3, the manual extraction results are presented visually and statistically for the building and road classes. The average of precision, completeness and overall quality in manual extraction is approximately 95%. This high value is greatly significant in this study as it is a unique confidence level to interpret the real information content of WV-2.

    In Table 3, TP and related indicator completeness determine the information content of WV-2 coherent with TOPO5000. Completeness is approx. 97% both for buildings and roads, which demon-strates a considerable coherence. Conversely, 118 buildings and 7620.62 m of road were described as FP, indicating that they were newly constructed between the acquisition dates of TOPO5000 and WV-2 (2009 and 2011). These results were used to interpret the automatic object extraction and caused errors.

    Figure 6. effect of minimum height difference on nDSM; (a) 2.5 m, (b) 5 m.notes: For the view of WV-2 to the same area, please see Figure 3.

  • GEOCARTO INTERNATIONAL 11

    4.2. Automatic extraction results

    Automatic object extraction was run for three data-sets as the WV-2 image, the Wallis filtered WV-2 image, and the Wallis filtered WV-2 image with nDSM. Figure 8 shows the generated DSM, DTM, and the nDSM, which was included in the classification as an additional band after segmentation. As seen in the Figure, the generated nDSM does not have any remarkable gaps or distortions, and shows the objects in the study area with an elevation over 2.5 m. Figure 9 displays the automatic extraction results of the WV-2 image, the Wallis filtered WV-2 image, and the Wallis filtered WV-2 image with nDSM. Table 4 shows the precision, completeness and overall quality of the automatic extractions separately.

    The previously mentioned overhanging building roofs in Figure 4(a) and (b) make parts of the roads appear rusty, disrupting the spectral heterogeneity and preventing the acquisition of regular segments. Therefore, in Figure 9(a), several roads were not correctly extracted and were included in falsely extracted buildings as FP. In Figure 9(b), it is clear that the Wallis filter curbed this problem through contrast adjustment, and the roads were mostly detected. However, a part of the buildings were detected as roads due to unusual roof properties. As mentioned before, the study area is a diffi-cult one for automatic object extraction, since there are six different roof types: flat, m-shaped, dome, hip, gable, hip and valley and six different roof materials: clay tile, asphalt shingle, aluminium, lead sheeting, glazed tile and gravel. For the m-shaped aluminium roofs in industrial areas, depending

    Figure 7. Manual extraction results; (a) building, (b) road centreline.

    Table 3. Precision, completeness, and overall quality of manual extraction.

    Building (number)

    P (%) C (%) Q (%)

    Road (length) (m)

    P (%) C (%) Q (%)TP FN FP TP FN FPref 3116 0 0 96.2 96.6 93.0 123,495.77 0 0 94.0 97.2 91.6WV-2 3009 107 118 120,080.52 3415.25 7620.62

  • 12 U. G. SEFERCIK ET AL.

    Figure 8. (a) DtM, (b) DSM, and (c) nDSM with height scale.

    Figure 9. automatic extracted classes (building = red, road = grey, vegetation = green); (a) WV-2, (b) Wallis filtered WV-2, (c) Wallis filtered WV-2 with nDSM.

  • GEOCARTO INTERNATIONAL 13

    upon the viewing geometry of the satellite, the grey values of pixels in the two parts of the m-shape behave differently (dark and bright). After Wallis filtering, the segments split and decreased because of contrast enhancement, and some of the isotropic parts were extracted as road (yellow rectangular in Figure 9(b)). Another cause of errors in industrial areas is the overly long and narrow factories with aluminium roofs. Given their shape and similar spectral structure with roads, they cannot be correctly extracted with Wallis filtering (black rectangular in Figure 9(b)). Another instance, the domes of mosques are lead sheeting and shine in the sunlight, explaining why they are not included in the building class (white rectangular in Figure 9(b)). All of these problems were solved with the nDSM-based novel fusion approach (yellow, black and white rectangles in Figure 9(c)).

    In Table 4, only the road centrelines which fully overlap with the reference TOPO5000 road layer are accepted and calculated as road length. Considering the temporal decorrelation between WV-2 and TOPO5000 (2009 and 2011), the TP and the related completeness parameter are assumed to be the main indicators for the validation of extraction performance. The large portion of FP in out-comes with nDSM is newly constructed buildings and roads that can also be utilized for the revision of TOPO5000. It is clear in Table 4, the novel approach improves the extraction of TP and decreases the FP at the same time both for buildings and roads. In comparison with one of the most common filtering method Wallis, the overall qualities using nDSM fusion are considerably higher.

    5. Conclusions

    Accurate automatic object extraction is always a challenging issue for space-borne remotely sensed imagery due to imaging geometry and the properties of the CCD sensors. In this study, we suggested a novel approach to improve the performance of automatic object extraction on the basis of fusing WV-2 VHR mono-image and TOPO5000 data. Utilizing digital stereo aerial photos, used in the production of TOPO5000; DSM, DTM, and consequently nDSM, were generated and validated in the classification as an additional band. Automatic object extraction was performed three times on a WV-2 ortho-image, a Wallis filtered version and a Wallis + nDSM version, respectively. Additionally, manual object extraction was realized to determine the information content of the used WV-2 image, and more accurate interpretation of the automatic extraction outcomes. The improvement in the performance of an automatic building and road extraction was monitored visually and statistically using common indicators as precision, completeness and overall quality. Wallis filtering increased the heterogeneity of the objects, and consequently the potential of automatic extraction, particularly for the road layer through well-balanced contrast enhancement. On the other hand, contrast enhance-ment caused segmentation problems on different roof types and materials, explaining why a group of buildings was counted in the road layer. These challenges were overcome through the inclusion of elevation information on the objects, derived from the nDSM fusion in the classification. The results demonstrated that the novel fusion approach improved the overall quality of the automatic object extraction by 7% for buildings and 23% for roads.

    Table 4. Precision, completeness and overall quality of automatic extractions.

    Data

    Building (number)

    P (%) C (%) Q (%)

    Road midline (length)(m)

    P (%) C (%) Q (%)TP FN FP TP FN FPref 3116 0 0 100 100 100 123,495.77 0 0 100 100 100WV-2 2487 629 573 81.3 79.8 67.4 42,874.93 80,620.84 9064.59 82.5 34.7 32.3WV-2 2460 656 355 87.4 79.0 70.9 87,649.80 35,845.97 55,380.46 61.3 71.0 49.0WallisWV-2 2538 578 296 89.6 81.5 74.4 90,074.60 33,421.17 38,467.71 70.1 72.9 55.61Wallis + nDSM

  • 14 U. G. SEFERCIK ET AL.

    The used WV-2 image is an ortho-image, ortho-rectified by the distributor, which is why nDSM is incorporated in the classification. If our technique is used on a true ortho-image, nDSM can be used in the segmentation with a greater improvement in automatic object extraction. This investigation is a future goal of our research group.

    The users, who do not have space-borne stereo data, can increase the automatic extraction perfor-mance of information content from mono data by following the processing steps that we proposed. The proposed techniques may be beneficial especially in underdeveloped and developing countries which have limited financial sources for achieving stereo satellite imagery.

    AcknowledgementsThe authors are grateful to the Bursa Greater Municipality for providing the Worldview-2 image, digital aerial photos and TOPO5000. Additionally, many thanks are due to Dr Karsten Jacobsen and Dr Wilfred Linder for BLUH and LISA software.

    Disclosure statementNo potential conflict of interest was reported by the authors.

    ORCIDU. G. Sefercik   http://orcid.org/0000-0003-2403-5956

    ReferencesAguilar MA, Saldana MM, Aguilar FJ. 2012. GeoEye-1 and WorldView-2 pan-sharpened imagery for object-based

    classification in urban environments. Int J Remote Sens. 34(7):2583–2606.Baatz M, Benz U, Dehghani S, Heynen M, Höltje A, Hofmann P, Lingenfelder I, Mimler M, Sohlbach M, Weber M,

    et al. 2004. Ecognition professional: user guide: 4. Munih: Definiens Imaging; p. 486.Baiocchi V, Brigante R, Radicioni F. 2010. Three-dimensional multispectral classification and its application to early

    seismic damage assessment. Eur J Remote Sens. 42:49–65.Ballabeni, A, Apollonio, FI, Gaiani, M, Remondino, F. 2015. Advances in image pre-processing to improve automated

    3d reconstruction. In: ISPRS – International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 25–27 Feb; Avila, Spain, 40(5), p. 315–323.

    Baltsavias EP, Gruen A, Van Gool L. 2001. Automatic extraction of man-made objects from aerial and space images (III). Ascona: Taylor & Francis. A. A. Balkema.

    Bedard N, Hagen N, Gao L, Tkaczyk TS. 2012. Image mapping spectrometry: calibration and characterization. Optical Eng. 51(11):14.

    Benarchid O, Raissouni N, El Adib S, Abbous A, Azyat A, Achhab NB, Lahraoua M, Chahboun A. 2013. Building extraction using object-based classification and shadow information in very high resolution multispectral images, a case study: Tetuan, Morocco. Can J Image Process Comput Vision. 4(1):1–8.

    Benz UC, Hofmann P, Willhauck G, Lingenfelder I, Heynen M. 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS J Photogramm Remote Sens. 58(3–4):239–258.

    Bhadauria A, Bhadauria H, Kumar A. 2013. Building extraction from satellite images. IOSR J Comput Eng. 12(2):76–81.Clark ML, Clark DB, Roberts DA. 2004. Small-footprint lidar estimation of sub-canopy elevation and tree height in a

    tropical rain forest landscape. Remote Sens Environ. 91:68–89.Clode, S, Kootsookos, P, Rottensteiner F. 2004. The automatic extraction of roads from LiDAR data. In: XXth ISPRS

    Congress; 12–23 Jul; Istanbul 35. p. 231–236.Font M, Amorese D, Lagarde J-L. 2010. DEM and GIS analysis of the stream gradient index to evaluate effects of tectonics:

    the normandy intraplate area (NW France). Geomorphology. 119(3–4):172–180.Fraser CS. 2003. Prospects for mapping from high resolution satellite imagery. Asian J Geoinformatics. 4(1):3–10.Guidi, G, Gonizzi, S, Micoli, LL. 2014. Image pre-processing for optimizing automated photogrammetry performances.

    In: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 23–25 Jun; Riva del Garda, 2(5).

    Hashemi, SAM. 2008. Automatic peaks extraction from normalized digital surface model (ndsm). In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 3–11 Jul; Beijing, 37(3), pp. 491–496.

    http://orcid.orghttp://orcid.org/0000-0003-2403-5956

  • GEOCARTO INTERNATIONAL 15

    Heipke, C, Mayer, H, Wiedemann, C, Jamet, O. 1997. Evaluation of automatic road extraction. In: 3D Reconstruction and Modeling of Topographic Objects; 17–19 Sep; Stuttgart, IAPRS, 32(3) pp. 47–56.

    Hirschmüller, H. 2005. Accurate and efficient stereo processing by semi-global matching and mutual information. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 20–26 Jun; San Diego, CA, USA.

    Hofmann, P. 2001. Detecting informal settlements from Ikonos image data using methods of object oriented image analysis-an example from Cape Town (South Africa). In: Jürgens C, editor. Remote Sensing of Urban Areas / Fernerkundung in urbanen Rumen, Regensburger Geographische Schriften; Munich: Regensburger Geographische Schriften; p. 107–118.

    Hong G. 2007. Image fusion, image registration, and radiometric normalization for high resolution image processing [PhD]. Fredericton: University of New Brunswick.

    Jain MK, Singh VP. 2005. DEM-based modelling of surface runoff using diffusion wave equation. J Hydrol. 302 (1–4):107–126.

    Jazayeri, I, Fraser, CS. 2008. Interest operators in close-range object reconstruction. In: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 3–11 Jul; Beijing, 37(5), pp. 69–74.

    Karantzalos, K, Paragios, N. 2009. Competing 3D priors for object extraction in remote sensing data, In: Stilla U, Rottensteiner F, Paparoditis N, editors. Object Extraction for 3D City Models, Road Databases and Traffic Monitoring – Concepts, Algorithms and Evaluation; 3–4 Sep; Paris, 38(3), pp. 127–132.

    Li, J, Heap, AD. 2008. A review of spatial interpolation methods for environmental scientists. Geoscience Australia, Record 2008/23.

    Mancini, A, Frontoni, E, Zingaretti, P. 2009. A winner takes all mechanism for automatic object extraction from multi-source data. In: Geoinformatics, 17th International Conference, 12–14 Aug. p. 1–6; Fairfax, VA.

    Maxwell T. 2005. Object-oriented classification of pan-sharpening quickbird imagery and a fuzzy approach to improving image segmentation efficiency [PhD]. Fredericton: University of New Brunswick.

    Meenakshisundaram V. 2005. Quality assesment of ikonos and quickbird fused images for urban mapping [PhD]. Calgary: University of Calgary.

    Navalgund RR, Jayaraman V, Roy PS. 2007. Remote sensing applications: an overview. Curr Sci. 93(12):1747–1766.Niederöst, M. 2000. Reliable reconstruction of buildings for digital map revision. In: International Archives of

    Photogrammetry and Remote Sensing; 16–23 Jul; Amsterdam, 38(3), pp. 635–642.Nikolakopoulos KG. 2008. Comparison of nine fusion techniques for very high resolution data. Photogramm Eng

    Remote Sens. 74(5):647–659.Pedelty JA, Morisette JT, Smith JA. 2004. Comparison of Landsat-7 enhanced thematic mapper plus (ETM+) and earth

    observing one (EO-1) advanced land imager. Optical Eng. 43(4):954–962.Rahmani S, Strait M, Merkurjev D, Moeller M, Wittman T. 2010. An adaptive IHS pan-sharpening method. IEEE Geosci

    Remote Sens Lett. 7(4):746–750.Rani K, Sharma R. 2013. Study of different image fusion algorithm. Int J Emerg Technol Adv Eng. 3(5):288–291.Schmidt F, Persson A. 2003. Comparison of DEM data capture and topographic wetness indices. Precis Agric. 4(2):

    179–192.Sefercik, U, Atesoglu, A. 2013. Uydu verileri kullanılarak meşcere boyu belirlenmesine yönelik yeni bir yaklaşım.

    TUFUAB 7thTechnical Symposium; 23–25 May; Trabzon. p. 5.Shaw GA, Burke HK. 2003. Spectral imaging for remote sensing. Lincoln Lab J. 14(1):3–28.Smreček R. 2012. Utilization of ALS data for forestry purposes. In: Jekel T, Car A, Strobl J, Griesebner G, editors. GI

    forum 2012: geovizualisation, society and learning; Berlin: Herbert Wichmann Verlag; p. 365–375.Sportouche H, Tupin F, Denise L 2009. Building extraction and 3D reconstruction in urban areas from high resolution

    optical and SAR imagery. In: Urban Remote Sensing Event, 20–22 May. Shanghai: Institute of Electrical and Electronics Engineers (IEEE); p. 1–11.

    Stereńczak K, Kozak J. 2011. Evaluation of digital terrain models generated in forest conditions from airborne laser scanning data acquired in two seasons. Scand J Forest Res. 26(4):374–384.

    Stereńczak K, Będkowski K, Weinacker H. 2008. Accuracy of crown segmentation and estimation of selected trees and forest stand parameters in order to resolution of used DSM and nDSM models generated from dense small footprint LIDAR data. In: International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences; 3–11 Jul; Beijing, 37(6), pp. 27–32.

    Straub BM, Gerke M, Koch A. 2001. Automatic extraction of trees and buildings from image and height data in an urban environment. In: Proceedings of the International Workshop on Geo-Spatial Knowledge Processing for Natural Resource Management, 28–29 Jun; Varese.

    Theng LB. 2006. Automatic building extraction from satellite imagery. Eng Lett. 13(3):255–259.Thompson JA, Bell JC, Butler CA. 2001. Digital elevation model resolution: effects on terrain attribute calculation and

    quantitative soil-landscape modeling. Geoderma. 100(1–2):67–89.Tian J, Chen DM. 2007. Optimization in multi‐scale segmentation of high‐resolution satellite images for artificial feature

    recognition. Int J Remote Sens. 28(20):4625–4644.UTSA. 2013. Object-oriented classification, Lecture 11. [online] (updated May 2016). [accessed 2016 Jun 6] http://www.

    utsa.edu/lrsg/Teaching/EES5083/L11.ppt.

    http://www.utsa.edu/lrsg/Teaching/EES5083/L11.ppthttp://www.utsa.edu/lrsg/Teaching/EES5083/L11.ppt

  • 16 U. G. SEFERCIK ET AL.

    Vassilopoulou S, Hurni L, Dietrich V, Baltsavias E, Pateraki M, Lagios E, Parcharidis I. 2002. Orthophoto generation using IKONOS imagery and high resolution DEM: a case study on volcanic hazard monitoring of Nisyros Island (Greece). ISPRS J Photogramm Remote Sens. 57(1–2):24–38.

    Zhang, Y. 2002. A new automatic approach for effectively fusing Landsat 7 as well as IKONOS images. In: Geoscience and Remote Sensing Symposium, IGARSS; 24–28 Jun; Toronto.

    Abstract1. Introduction2. Study area and materials3. Methodology3.1. Image enhancement3.2. Manual and automatic object extraction3.3. DSM, DTM, and nDSM generation and refinement

    4. Results4.1. Manual extraction results4.2. Automatic extraction results

    5. ConclusionsAcknowledgementsDisclosure statementReferences