{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:

{

U.S. Department of the InteriorU.S. Geological Survey Michael P. Finn

Briefing to a pre-conference workshop of the 27th International Cartographic Conference:Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015)20-21 August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil

Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format

Co-Authors• Jeffrey Wendel – Lead Author

U.S. Geological Survey (USGS), Center of Excellence for Geospatial Information Science (CEGIS)

• John KosovichUSGS, Core Science Analytics, Synthesis, & Libraries (CSAS&L)

• Jeff FalgoutUSGS, CSAS&L

• Yan LiuCyberInfrastructure and Geospatial Information Laboratory

(CIGI)National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-Champaign (UIUC)

• Frank E. VelasquezUSGS, CEGIS

Outline

• Objective• Test Environment/ XSEDE• Study Areas• Creating DEMS from Point Clouds• Modifying open source software• p_las2las Implementation• p_las2las Results• p_points2grid Implementation• p_points2grid Results• Literature• Questions

Objective

•Problem: Create high resolution DEMs from large lidar datasets

•Approach: Modify open source software to run in parallel and test on XSEDE supercomputers (clusters)

XSEDE – Extreme Science and Engineering Development

Environment

•A 5-year 127M (1.27 x 108) dollar project funded by NSF

•Supports 16 supercomputers and high-end visualization and data analysis resources

•One of those is “Stampede” at Texas Advanced Computing Center (TACC)

•Accessed through University of Illinois (Liu Co-PI) allocations

•Current (yearly) allocation 8 million (8.0 x 106) computing hours•Accessed through USGS “Campus Champion”

allocations

Stampede

•Peak Performance (tflops): 9,600•Number of Cores: 522,080•Memory (TB): 270•Storage (PB): 14•https://portal.xsede.org/tacc-stampede

What is a “Supercomputer” Anyway?

• Collection of nodes (Xeon X5 based computers)• Networked together (Infiniband)• Running a common OS (Linux)• Shared parallel file system (Lustre)• Running a scheduler (SLURM)• An inter-process communication (IPC)

mechanism• (MPI: MPI_Send, MPI_Recv,

MPI_File_write_at ... ) • A good example is Stampede

{

Great Smoky Mtn. Study Area

Grand Canyon Study Area

Create high-resolution DEMs from large lidar datasets

•DEM resolution of 1 meter•Lidar datasets with coverage over about a

15x15 minute footprint•This results in a raster size of approximately

27,000 rows X 27,000 columns, about 750 million cells

•Obtained two datasets in LAS 1.2 format•A 16 GB file with 500 million (5.7 x 108) points

in the Smoky Mountains over a 40,000 X 20,000 meter area

•A 120 GB file with 4 billion (4.2 x 109) points over the Grand Canyon over a 25,000 X 30,000 meter area

•Both files are somewhat sparse in that some “tiles” within the coverage are missing

Modify open source software to run in parallel

(and test on XSEDE clusters)

•las2las from the lastools suite to filter all but ground points

•points2grid to make the DEM from the filtered LAS file

•Test on Stampede, at TACC

p_las2las Implementation

• las2las application and supporting LASlib library were extended with the MPI API to allow the application to be run in parallel on a cluster

• Goal was an application that would scale to arbitrarily large input

• Limited only by the amount of disk space needed to store the input and output file

• No intermediate files are generated and individual process memory requirements are not determined by the size of the input or output

p_las2las ImplementationComparison of Native las2las versus

p_las2las•Native las2las algorithm:

•For all points:•Read the point•Apply a filter and/or transformation•Write the possibly transformed point if it passes the filter

•p_las2las algorithm:•Processes determine point range and set input offsets•For all points in a process's point range:

•Read point and apply filter•Keep count of points that pass filter

•Processes gather filtered point counts from other processes•Processes can then set write offsets•Processes set read offsets back to beginning point, begin second read

for all points:•Read point and apply filter and transformation•Write the possibly transformed point if it passes the filter

•Gather and reduce point counts, return counts, min and max x values•Update the header with rank 0 process


• The high level view of the p_las2las application:

• The vertical flow describes the job flow

• While the processes across the top are run in parallel on the flow

p_las2las ResultsSmoky Mountains (16 GB)

* Native unmodified las2las source code from LASTools compiled on Stampede with the Intel C++ compiler

Number of Processes

Filter / Transformation

Output Size

Elapsed Time (seconds)

Native* None 16 GB 138Native* Keep Class 2 2 GB 73Native* Reproject 16 GB 50264 None 16 GB 2064 Keep Class 2 2 GB 664 Reproject 16 GB 26256 None 16 GB 8256 Keep Class 2 2 GB 4256 Reproject 16 GB 91024 None 16 GB 81024 Keep Class 2 2 GB 51024 Reproject 16 GB 8

p_las2las ResultsGrand Canyon (120 GB)


Number of Processes

Filter / Transformation

Output Size


Native* None 120 GB 1211Native * Keep Class 2 25 GB 623Native* Reproject 120 GB 696964 None 120 GB 12864 Keep Class 2 25 GB 5964 Reproject 120 GB 150256 None 120 GB 33256 Keep Class 2 25 GB 18256 Reproject 120 GB 421024 None 120 GB 181024 Keep Class 2 25 GB 91024 Reproject 120 GB 24

p_points2grid Implementation

• points2grid application was extended with the MPI API to allow the application to be run in parallel on a cluster

• Goal was an application that would scale to arbitrarily large input

• Limited only by the amount of disk space needed to store the input and output file and the number of processes available

• No intermediate files are generated and individual process memory needs are not determined by the size of the input or output

p_points2grid ImplementationComparison of Native points2grid versus p_points2grid

•Native points2grid algorithm:•For each point:

•Update output raster cells when the point falls within a circle defined by the cell corner and a given radius

•Optionally fill null cells with adjacent cell values•Write the output raster cell

•p_points2grid algorithm:•Processes are designated as reader or writer•Reader processes are assigned a range of points•Writer processes are assigned a range of rows•Reader processes read LAS points from the input file and sends them to the

appropriate writer processes based on whether the point falls within a circle defined by the cell corner and a given radius

•Writer processes receive LAS points from reader processes and update cell contents with elevation values.

•When all points have been sent and received, Writer processes apply an optional window filling parameter to fill null values

•(This involves writer to writer communication when the window size overlaps two writers)

•Writer processes determine and set write offsets and write their range of rows to the output file. The first writer rank is responsible for writing the output file header and in the case of TIFF output, the TIFF directory contents


• The high level view of the p_points2grid application

• The job flow is described by the boxes on the right side

• While the processes along the left are the internal processes of the flow functions

p_points2grid ResultsSmoky Mountains (16 GB)

12, 1-meter resolution DEMs totaling 70 GB of output for p_points2grid runs

12, 6-meter resolution DEMs totaling 2 GB of output for native run

Number of Processes

Number of Readers

Number of Writers

Time: Reading, Communication

Time: Writing


Native 1 1 NA NA 328128 32 96 33 56 105128 64 64 26 84 125512 32 480 20 13 40512 64 448 10 17 33512 128 384 8 23 36512 256 256 7 26 40512 384 128 11 44 681024 64 940 10 11 321024 384 640 2 14 291024 768 256 6 28 46

p_points2grid ResultsGrand Canyon (120 GB)


Number of Processes

Number of Readers

Number of Writers

Time: Reading, Communication

Time: Writing Elapsed Time (seconds)

Native 1 1 NA NA 1548512 64 448 104 15 135512 128 384 55 21 90512 256 256 60 30 1101024 64 960 80 7 1011024 128 896 39 11 621024 256 768 27 8 511024 384 640 24 15 531024 512 512 26 19 561024 768 256 47 24 901024 896 128 89 44 1674096 256 3840 17 10 634096 512 3584 10 11 534096 1024 3072 8 8 464096 2048 2048 8 18 76

12, 1-meter resolution DEMs totaling 71 GB of output for p_points2grid runs

12, 6-meter resolution DEMs totaling 2 GB of output for native run

Conclusions• Demonstration of novel solution to the handling, exploitation, and

processing of enormous lidar datasets• Especially at a time when their availability is increasing rapidly in the

natural sciences

• Expansion of existing tools that now run in parallel processing modes

• Improves upon previous attempts to exploit lidar data in the parallel processing environment by using MapReduce

• Creating parallel processing algorithms based on the open source las2las and points2grid code bases

• Greatly reduced run times processing extremely large lidar point cloud datasets

• Over 100 GB in file size• Both in classifying the points and in generating DEMs

• p_las2las and p_points2grid provides approximately two or more orders of magnitude reduction in processing time

• Demonstrated scalability up to 4,096 processes

References

•The Apache Software Foundation (2014) Welcome to Apache™ Hadoop®! Internet at: http://hadoop.apache.org. Last accessed 24 November 2014.

•Arrowsmith, J.R., N. Glenn, C. J. Crosby, and E. Cowgill (2008) Current Capabilities and Community Needs for Software Tools and Educational Resources for Use with LiDAR High Resolution Topography Data. Proceedings of the OpenTopography Meeting held in San Diego, California, on August 8, 2008. San Diego: San Diego Supercomputer Center.

•ASPRS (American Society for Photogrammetry and Remote Sensing) (2008) LAS Specification, Version 1.2. Internet at http://www.asprs.org/a/society/committees/standards/asprs_las_format_v12.pdf. Last accessed 24 November 2014.

•ASPRS (American Society for Photogrammetry and Remote Sensing) (2011) LASer (LAS) File Format Exchance Activities. Internet at http://www.asprs.org/Committee-General/LASer-LAS-File-Format- Exchange-Activities.html. Last accessed 05 March 2015.

•Behzad, B., Y. Liu, E.Shook, M. P. Finn, D. M. Mattli, and S. Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH.

•CGR (Center for Geospatial Research), University of Georgia, Athens, GA. Internet at: http://www.crms.uga.edu/wordpress/?page_id=1212. Last accessed 05 March 2015.

•Dean, J., and S. Ghemawat (2004) MapReduce: Simplified Data Processing on Large Clusters. Proceedings of OSDI ’04: 6th Symposium on Operating System Design and Implementation, San Francisco, CA,

• Dec. 2004.

•Dewberry (2011) Final Report of the National Enhanced Elevation Assessment (revised 2012). Fairfax, Va., Dewberry, 84p. plus appendixes. Internet at http://www.dewberry.com/services/geospatial/national- enhanced-elevation-assessment. Last accessed 24 November 2014.

•Factor, M., K. Meth, D. Naor, O. Rodeh, and J. Satra (2005) Object storage: the future building block for storage systems. In LGDI ’05: Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology, pages 119–123, Washington, DC, USA. IEEE Computer Society.

•Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2015). High-Performance Small- Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag.

•Isenburg, Martin (2014) lasmerge: Merge Multiple LAS Files into a Single File. Internet at http://www.liblas.org/utilities/lasmerge.html. Last accessed 03 March 2015.

•Kosovich, John J. (2014). Vertical Forest Structure from Lidar Point-cloud Data for the Tennessee Portion of Great Smoky Mountains National Park. Abstract presented at the 2014 International Lidar Mapping Forum, Denver, CO. Internet at: http://www.lidarmap.org/international/images/conference/Program/ILMF_2014_Grid_with_Abstracts.pdf.

•Krishnan, Sriram, Chaitanya Baru, and Christopher Crosby (2010). Evaluation of MapReduce for Gridding LIDAR Data. 2nd IEEE Internation Conference on Cloud Computing Technology and Science.

•Piernas, J., J. Nieplocha, and E. Felix (2007). Evaluation of active storage strategies for the lustre parallel file system. Proceedings of the ACM/IEEE Conference on Supercomputing. ACM, New York,.

••rapidlasso GmbH (2014) Lastools. Internet at http://rapidlasso.com/lastools/. Last accessed 24 November 2014.

•Rose, Eli T., John J. Kosovich, Alexa J. McKerrow, and Theodore R. Simons (2014). Characterizing Vegetation Structure in Recently Burned Forests of the Great Smoky Mountains National Park. Abstract presented at the ASPRS 2014 Annual Conference, Louisville, KY. Internet at: http://conferences.asprs.org/Louisville-2014/T1-Mar-26-3-30.

•Sakr, Sherif, Anna Liu, and Ayman G. Fayoumi (2014). MapReduce Family of Large-Scale Data-Processing Systems. Chapter 2 in Large Scale and Big Data, Sherif Sakr and Mohamed Medhat Gaver, editors. CRC Press.

•Towns, John, Timothy Cockerill, Maytal Dahan, Ian Foster, Kelly Gaither, Andrew Grimshaw, Victor Hazlewood, Scott Lathrop, Dave Lifka, Gregory D. Peterson, Ralph Roskies, J. Ray Scott, and Nancy Wilkins- Diehr (2014) XSEDE: Accelerating Scientific Discovery, Computing in Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct., doi:10.1109/MCSE.2014.80

•US Army Corps of Engineers (2014). CRREL/points2grid. Internet at: https://github.com/CRREL/points2grid.

•Yoo, Andy B., Morris A. Jette, and Mark Grondona (2003). SLURM: Simple Linux utility for resource management. In Job Scheduling Strategies for Parallel Processing (pp. 44-60). Springer Berlin Heidelberg.

http://www.asprs.org/Committee-General/LASer-LAS-File-Format-

http://www.asprs.org/Committee-General/LASer-LAS-File-Format-

http://www.dewberry.com/services/geospatial/national-



{

U.S. Department of the InteriorU.S. Geological Survey Michael P. Finn

Briefing to a pre-conference workshop of the 27th International Cartographic Conference:Spatial data infrastructures, standards, open source and open data for geospatial (SDI-Open 2015)20-21 August 2015, Brazilian Institute of Geography and Statistics (IBGE), Rio de Janeiro, Brazil

Using the Message Passing Interface (MPI) and Parallel File Systems for Processing Large Files in the LAS Format

Questions?

Backup slides

{


Detailed Explanation of the p_las2las Implementation

Each process opens the LAS input file and reads the file header to determine the number of points in the input file, point size, and size of the header. Based on the point count, process rank and process count, each process calculates the range of LAS points for which it will be responsible. Since point and the header size are known, each process can calculate and set its file pointer to its beginning point.

Each process then reads each point in its range and applies any filter passed to the program, keeping a count of points that pass the filter. After reading and filtering the last point, all processes gather from one another the number of points that have passed the filter, and thus will be writing. Each process uses the results from this gather and its rank order to calculate and set its output file pointer.

Each process then sets its read pointer back to the beginning of its range of points. A second read and filtering of its point range begins, but this time the points that pass the filter are written to the output file. It is this second read pass that allows the program to scale to arbitrary input and output size without allocating extra memory or writing temporary files.

The process with rank 0 is charged with writing the output file header with data gathered from the input header, and gathering and reducing process dependent data such as minx, maxx, miny, maxy, minz, max from the other processes. To minimize the number of calls to MPI_File_write, each process allocates a buffer of configurable size and only calls MPI_File_write when its buffer is full, along with a final flush to disk after the last point is processed.

{


Detailed Explanation of the p_points2grid ImplementationInitializationEach reader process allocates a LAS point buffer for each writer process. These buffers are necessary to keep process-to-process communication at reasonable levels, since without them an MPI send/receive would occur with every LAS point. The size of these buffers is dependent on the writer count and is calculated and capped at run time so as not to exceed available memory.

Each writer process allocates memory to hold the grid cell values for the rows for which it is responsible. The row count is determined by the number of rows in the grid divided by the writer process count. This introduces a memory dependency that our current implementation does not address. As a practical matter, the number of writer processes can be increased to address this limitation. Each writer process also allocates write buffers of configurable size to limit the number of calls to MPI_write. When a window filling parameter is specified, writer processes allocate and fill two dimensional raster cell buffers of up to three rows before and after their range. This is necessary to keep process-to-process communication at reasonable levels.

Reading and Communication:Each reader process calculates and sets its input file pointer to the the beginning of its range of points. It then reads each point and determines which raster cells have overlap with the point and a circle defined by the cell corner and a radius given either by default or as a program input parameter. For each overlap, the point is added to the appropriate LAS point buffer. That is, the buffer corresponding to the writer responsible for processing that cell. When a buffer fills, it is sent with a MPI_Send to the appropriate writer. The writer receives the buffer with MPI_Recv and updates its raster cell values with the point buffer data. When a reader process completes, it flushes its point buffers to all writers one last time.

Writer Processing:Once all points have been received by the readers, Each writer iterates over their raster cells and calculates mean and standard deviation values. If a window size parameter has been passed, each writer iterates over its cells and attempts to fill null cell values with weighted averages of values from adjacent cells up to three cells away. When cells fall near a writer's beginning or ending row, these values are retrieved from the rows of adjacent writer processes.

Writing:Each writer first determines the total number of bytes it will write for each DEM output type and each raster cell type, Since the output types supported are ASCII text, each writer must iterate over its cell values and sum their output length. Once all writer's have determined their output length counts, each gathers these counts from all other writer processes and uses these counts along with rank order and header size to set output file pointer positions. Each process then iterates over its values again, but this time writes the ASCII format of the value to a buffer. When the buffer fills it is written to disk with MPI_File_write. The first writer process is responsible for writing the file header.

Documents

{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference: