arXiv:1811.07769v1 [cs.CV] 10 Nov 2018 · MIT Media Lab Abstract More than half of the world’s roads lack adequate street addressing systems. Lack of addresses is even more visible

Addressing the Invisible: Street Address Generationfor Developing Countries with Deep Learning

Ilke DemirFacebook

Ramesh RaskarMIT Media Lab

Abstract

More than half of the world’s roads lack adequate street addressing systems. Lackof addresses is even more visible in daily lives of people in developing countries.We would like to object to the assumption that having an address is a luxury, byproposing a generative address design that maps the world in accordance withstreets. The addressing scheme is designed considering several traditional streetaddressing methodologies employed in the urban development scenarios around theworld. Our algorithm applies deep learning to extract roads from satellite images,converts the road pixel confidences into a road network, partitions the road networkto find neighborhoods, and labels the regions, roads, and address units using graph-and proximity-based algorithms. We present our results on a sample US city, andseveral developing cities, compare travel times of users using current ad hoc andnew complete addresses, and contrast our addressing solution to current industrialand open geocoding alternatives.

1 Introduction

Street addresses enhance precise physical presence and effectively increase the connectivity all aroundthe world. Currently 75% of the roads in the world are not mapped [16], and this number is increasingin developing countries. United Nations claims to have 4 billion invisible people in the world dueto the addressing problem in developing countries. This problem is even more critical in disasterzones, areas with limited resources, and geographically challenging locations. As the remote sensingtechnology has been significantly improving over the past decade, the organic growth of urban andsuburban areas outruns the deployment of addressing schemes. We want to address the invisible. Weimagine an algorithm that automatically creates meaningful addresses for unmapped areas, areas withno street name or address.

In order to realize our goal, we designed and implemented a generative addressing system to bridgethe gap between grid-based digital addressing schemes and traditional street addresses. We (i) designa physical addressing scheme, which is linear, hierarchical, flexible, intuitive, perceptible, robust, (ii)propose a segmentation method to obtain road segments and regions from satellite imagery, usingdeep learning and graph-partitioning, (iii) implement a labeling method to name urban elementsbased on current addressing schemes and distance fields, and (iv) develop ready-to-deploy prototypeapplications supporting forward and inverse geoqueries.

2 Related Work

Geocoding approaches: Popular geocoding solutions are either not in human-readable form (e.g.,GooglePlaceID and OkHi), or tend to de-correlate from the topological structure (e.g., [16], Zipprand MapTags), and they lack essential properties of a street addressing system, such as linearityand hierarchy. Those properties as well as their human intersection become even more crucial indeveloping countries [1].

32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.

arX

iv:1

811.

0776

9v1

[cs

.CV

] 1

0 N

ov 2

018

Procedural generation: Automating the generation of maps is extensively studied in the urbanprocedural modeling world [8, 29, 20, 3, 26], creating detailed and structurally realistic models,but none of them are applicable as a synthesis method on real world. On the other hand, inverseprocedural modeling approaches [2] process real-world data for generative representations. We followthis last path and rely on satellite imagery as the input of our synthesis approach.

Remote sensing: Several approaches automate the extraction of geospatial information using alreadyexisting data resources [31, 25, 19, 18, 35]. Similar approaches extract road networks using neuralnetworks [30, 36, 17, 33, 21, 22, 12].Processing the road topology has also been studied by clusteringand graph partitioning approaches [32, 4, 5].

Addresses around the world: Traditional street addresses and names are usually the result ofcultural dynamics, politics, economies, and other long-term processes adopted by urban authorities.We conducted some research on the current addressing methods such as the London postal codesystem [27], South Korea street naming, Berlin house numbering, and more [13].

3 The Street Address Format

Semantic properties emphasize user-friendly features, implying linear, hierarchical, universal, andmemorable addresses. Structural properties enable the format of street addresses to be computerfriendly, necessitating linear, hierarchical, compressible, robust, extendible, and queryable codes.Following these design principles (Figure 1), the last field indicates version, the fourth field containsthe country and state information when applicable, preceded by the city information in the third field.The second field contains the road name, which starts with the region label, followed by the roadnumber. Lastly, the first field is composed of the meter marker along the road and the block letterfrom the road, animating the house number and apartment number consecutively.

Figure 1: Street Address Format: house number, road name, city, state (if applicable), country,followed by the version year.

4 Our Generative Addressing System

The segmentation step extracts roads, breaks them into road segments and clusters them into regions.The labeling step names the regions, road segments, and place markers and assigns block letters toindividual addressable units. Details of each module can be found in our extended journal paper [10].

Predicting Road Pixels: The first step of our approach creates binary road prediction images fromthree channel satellite images of 0.5 m resolution and of size 19 K * 19 K. Both training and testingare done with patches of 192 × 192. The training set includes 4–16 tiles per country, and the testset includes all the rest of the tiles, manually spatially distributed to sample all areas, keeping theratios mostly at 70% to 30%. We use a modified version of SegNet [6], which consists of the first13 convolutional layers of the VGG16 network for the encoder, having a corresponding decoderlayer for each encoder. We modify the last soft-max layer to change the multi-class structure to havebinary classes for road detection, by substituting it with a convolutional layer. We experimented witharchitectures (Figure 2) such as VGG [24], U-Net [23], and ResNet [15] variations; however, weachieved the best result with the SegNet model trained on dense and diverse tiles, resulting in 72.6%precision and 57.2% recall. We also experimented with DeepLab [9] variations and achieved 75.4%precision and 75.9% recall; however the model showed signs of overfitting after epoch 30.

2

Figure 2: Comparison of NN Models. An example (a) satellite image and (b) ground truth; and roadpredictions using (c) VGG; (d) U-Net; (e) ResNet50; (f) ResNet101; (g) SegNet; and (h) DeepLab.

Creating the road graph: The post-processing includes binarizing the image with thresholding,running a depth-first search to join connected roads using the confidences, applying an orientation-based adaptive median filtering on the road end points, bucketing the road segments based on theirorientations, and processing intersections. Then we convert the road segments into a road graph,nodes as intersections, edges as road segments, and weights as the segment distance.

Defining regions: We partition the road graph into communities that have the maximum intercon-nectivity and minimum intraconnectivity. We experimented with normalized min-cut [34], Newman–Girvan [14], and optimal modularity-based [7] graph partitioning approaches, concluding on [34],being accurate and significantly more efficient choice.

Labeling regions, roads, and address units: We compute the most dense region by averagingnumber of roads per unit area, and we name this region “CA” for the city center. We divide all otherregions into four orientations: N(orth), S(outh), W(est), and E(ast) and assign letters in that specificorder, following the spiral pattern of the London post code system. The roads in each region aredivided into two main directions: odd for north–south bound, and even for east–west bound, thennumbered according to their order. For each road segment, we place a virtual meter marker every 5m, on the left and right sides of the roads by even and odd numbering. Finally, we compute a distancefield of the roads and discretize that field by a 5 m step size, as the block letter.

5 Results and Applications

Our system is written in Python and C++; the implementation is on the CPU (except road prediction).We use Chainer [28], networkx, and sci-kit libraries. We used our approach to process more than 10cities, totaling up to more than 16 K km2. The source code to convert .osm files and geotiffs to streetaddresses is available on our repository1.

We compare the road predictions with ground truth for the extracted roads of an unmapped suburbanarea. Our SegNet model and post-processing approach were able to learn 90.51% of the roads. Thissuccess ratio was close to 80% on average per city.

Figure 3: We show (a) input satellite tile; (b) extracted roads; (c) created regions; and (d) generatedmap; compared to (e) OpenStreetMap (OSM) of the same area.

We evaluate the usefulness of our generative maps with some treasure hunt-like user experiences.We compared the travel times using the old and new addressing schemes. Overall, the travel times

1https://github.com/facebookresearch/street-addresses

3

decreased by 21.7%, with our system producing a 52.4 s improvement on average and decreasing thelast mile of activity, proving the accuracy of our addresses.

We compare street segments dictated by the traditional addresses and our generative addresses(Figure 3). Comparing the road segments, we accomplished extracting 95% of the roads in thatparticular city tile. Comparing the addresses, although traditional addresses are more established, ouraddresses are easier to remember and support intuitive self-location and navigation.

Figure 4: Satellite image, extracted roads, labeled regions and roads, and meter markers and blocksof three example developing cities.

However, keeping the motivation of providing street addresses to the approximately 4 billion uncon-nected people, our results in fact shine for developing countries. Figure 4 shows our generative mapsin the same format, on three different unmapped developing cities. We accomplished automaticallyaddressing more than 80% of the populated areas, which significantly improved map coverage.

We compare our maps to other popular addressing solutions. For the same point on earth, [16] outputsthree random words parrot.failed.casino, Google Maps contains some unlabelled roads; however itoutputs Green Park for a couple of kilometers around the point. OSM does not contain roads, andthe point can be reached only by its latitude and longitude. However, our approach extract the roadsalmost completely and assign a unique address as 715D.NE127.Dhule.MhIn.

6 Conclusions

We demonstrated that deep learning can be used in a world-wide system for automatically creatingstreet addresses from satellite imagery for developing countries in the world. Physically connectingthe unconnected populations should increase the economic, juridical, and life-sustaining involvementof people all around the world. It improves the outreach of businesses and the economy, as well asthe accuracy and efficiency of providing first aid in disaster zones. More evaluation, results, anddetails about determinism and complexity analysis of submodules, constructing a global addressspace, versioning and updating, handling missing city boundaries, overflowing regions, and 3D roadscan be found in our previously published works [10, 11].

4

References[1] What is the right addressing scheme for india? http://mitemergingworlds.com/blog/2017/11/

22/what-is-the-right-addressing-scheme-for-india. Accessed: 2017-12-01.[2] D. G. Aliaga, I. Demir, B. Benes, and M. Wand. Inverse procedural modeling of 3d models for virtual

worlds. In ACM SIGGRAPH 2016 Courses, SIGGRAPH ’16, pages 16:1–16:316, New York, NY, USA,2016. ACM.

[3] D. G. Aliaga, C. A. Vanegas, and B. Benes. Interactive example-based urban layout synthesis. ACM Trans.Graph., 27(5):160:1–160:10, Dec. 2008.

[4] R. Alshehhi and P. R. Marpu. Hierarchical graph-based segmentation for extracting road networks fromhigh-resolution satellite images. {ISPRS} Journal of Photogrammetry and Remote Sensing, 126:245 – 260,2017.

[5] T. Anwar, C. Liu, H. L. Vu, and C. Leckie. Partitioning road networks using density peak graphs: Efficiencyvs. accuracy. Information Systems, 64:22 – 40, 2017.

[6] V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecturefor image segmentation. CoRR, abs/1511.00561, 2015.

[7] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in largenetworks. Journal of Statistical Mechanics: Theory and Experiment, (10):P10008+, Oct. 2008.

[8] G. Chen, G. Esch, P. Wonka, P. Müller, and E. Zhang. Interactive procedural street modeling. ACM Trans.Graph., 27(3):103:1–103:10, Aug. 2008.

[9] L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic imagesegmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactionson Pattern Analysis and Machine Intelligence, PP(99):1–1, 2017.

[10] I. Demir, F. Hughes, A. Raj, K. Dhruv, S. M. Muddala, S. Garg, B. Doo, and R. Raskar. Generative streetaddresses from satellite imagery. ISPRS International Journal of Geo-Information, 7(3), 2018.

[11] I. Demir, F. Hughes, A. Raj, K. Tsourides, D. Ravichandran, S. Murthy, K. Dhruv, S. Garg, J. Malhotra,B. Doo, G. Kermani, and R. Raskar. Robocodes: Towards generative street addresses from satellite imagery.In IEEE International Conference on Computer Vision and Pattern Recognition Workshops, July 2017.

[12] I. Demir, K. Koperski, D. Lindenbaum, G. Pang, J. Huang, S. Basu, F. Hughes, D. Tuia, and R. Raskar.Deepglobe 2018: A challenge to parse the earth through satellite images. In IEEE Conference on ComputerVision and Pattern Recognition (CVPR) Workshops, June 2018.

[13] C. Farvacque-Vitkovic and W. Bank. Street addressing and the management of cities / Catherine Farvacque-Vitkovic ... [et al.]. World Bank Washington, D.C, 2005.

[14] M. Girvan and M. E. Newman. Community structure in social and biological networks. Proceedings of thenational academy of sciences, 99(12):7821–7826, 2002.

[15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEEConference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, June 2016.

[16] G. R. Jones. Human friendly coordinates. GeoInformatics, 18(5):10–12, Jul 2015.[17] P. Li, Y. Zang, C. Wang, J. Li, M. Cheng, L. Luo, and Y. Yu. Road network extraction via deep learning

and line integral convolution. In 2016 IEEE International Geoscience and Remote Sensing Symposium(IGARSS), pages 1599–1602, July 2016.

[18] G. Mattyus, W. Luo, and R. Urtasun. Deeproadmapper: Extracting road topology from aerial images. InThe IEEE International Conference on Computer Vision (ICCV), Oct 2017.

[19] G. Mattyus, S. Wang, S. Fidler, and R. Urtasun. Enhancing road maps by parsing aerial images around theworld. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1689–1697, Dec 2015.

[20] Y. I. H. Parish and P. Müller. Procedural modeling of cities. In Proceedings of the 28th Annual Conferenceon Computer Graphics and Interactive Techniques, SIGGRAPH ’01, pages 301–308, New York, NY, USA,2001. ACM.

[21] R. Peteri, J. Celle, and T. Ranchin. Detection and extraction of road networks from high resolution satelliteimages. In Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429),volume 1, pages I–301–4 vol.1, Sept 2003.

[22] C. Poullis, S. You, and U. Neumann. A vision-based system for automatic detection and extraction of roadnetworks. In 2008 IEEE Workshop on Applications of Computer Vision, pages 1–8, Jan 2008.

[23] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation.CoRR, abs/1505.04597, 2015.

[24] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition.CoRR, abs/1409.1556, 2014.

[25] G. Skoumas, D. Pfoser, A. Kyrillidis, and T. Sellis. Location estimation using crowdsourced spatialrelations. ACM Trans. Spatial Algorithms Syst., 2(2):5:1–5:23, June 2016.

[26] J. Sun, X. Yu, G. Baciu, and M. Green. Template-based generation of road networks for virtual citymodeling. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology, VRST ’02,pages 33–40, New York, NY, USA, 2002. ACM.

5

http://mitemergingworlds.com/blog/2017/11/22/what-is-the-right-addressing-scheme-for-india

http://mitemergingworlds.com/blog/2017/11/22/what-is-the-right-addressing-scheme-for-india

[27] The City of London. London postal code system, 2016. https://www.london.gov.uk/sites/default/files/gla_postcode_map_a3_map1.pdf.

[28] S. Tokui, K. Oono, S. Hido, and J. Clayton. Chainer: a next-generation open source framework for deeplearning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninthAnnual Conference on Neural Information Processing Systems (NIPS), 2015.

[29] C. A. Vanegas, T. Kelly, B. Weber, J. Halatsch, D. G. Aliaga, and P. Muller. Procedural generation ofparcels in urban modeling. Comput. Graph. Forum, 31(2pt3):681–690, May 2012.

[30] J. Wang, J. Song, M. Chen, and Z. Yang. Road network extraction: a neural-dynamic framework basedon deep learning and a finite state machine. International Journal of Remote Sensing, 36(12):3144–3169,2015.

[31] Y. Wang, X. Liu, H. Wei, G. Forman, and Y. Zhu. Crowdatlas: Self-updating maps for cloud and personaluse. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, andServices, MobiSys ’13, pages 469–470, New York, NY, USA, 2013. ACM.

[32] J. D. Wegner, J. A. Montoya-Zegarra, and K. Schindler. A higher-order crf model for road networkextraction. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1698–1705,June 2013.

[33] L. Xu, T. Jun, Y. Xiang, C. JianJie, and G. LiQian. The rapid method for road extraction from high-resolution satellite images based on usm algorithm. In 2012 International Conference on Image Analysisand Signal Processing, pages 1–6, Nov 2012.

[34] S. X. Yu and J. Shi. Multiclass spectral clustering. In Proceedings of the Ninth IEEE InternationalConference on Computer Vision - Volume 2, ICCV ’03, pages 313–, Washington, DC, USA, 2003. IEEEComputer Society.

[35] D. Zeng, T. Zhang, R. Fang, W. Shen, and Q. Tian. Neighborhood geometry based feature matching forgeostationary satellite remote sensing image. Neurocomputing, 236:65 – 72, 2017. Good Practices inMultimedia Modeling.

[36] J. Zhao and S. You. Road network extraction from airborne lidar data using scene context. In 2012 IEEEComputer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 9–16, June2012.

6

https://www.london.gov.uk/sites/default/files/gla_postcode_map_a3_map1.pdf

https://www.london.gov.uk/sites/default/files/gla_postcode_map_a3_map1.pdf

Documents

arXiv:1811.07769v1 [cs.CV] 10 Nov 2018 · MIT Media Lab Abstract More than half of the world’s roads lack adequate street addressing systems. Lack of addresses is even more visible