Upload
lars-aksel-opsahl
View
67
Download
0
Embed Size (px)
Citation preview
Norwegian Institute of Bioeconomy Research WWW.NIBIO.NO
(from 1. July 2015 Skogoglandskap was merged into
NIBIO together 2 other institutes. )
Lars Aksel Opsahl ([email protected]) , developer.
Is this possible ?
7/18/15 31 billion points in Postgis Topology
Move 1 billion points
Into postgis/topology
The answer is YES!
How long time to add 15 billion ? 15-16 hours.
Is it possible to edit this topo layer ? Yes.
Does edit take long time ? 1 sec and more.
The rest of the slides will go into details about how we solve this and why Topology is good alternative for our case.
7/18/15 41 billion points in Postgis Topology
This presentation we will focus onWHAT type of data we test on.
WHY use Postgis Topology for this layer.
HOW we use Postgis Topology.HOW we f ill this Postgis Topology layer with data.
HOW we plan to update this Topology layer.
AR5 is a high resolution land resource map that covers all of Norway.
● The map describes land resources based on land type, site index, tree species and ground conditions.
● When simple feature it is 8 million polygons with a total of 1 billion points.
View map changes
7/18/15 101 billion points in Postgis Topology
What you see Whats the history of the map
Added by aeb10/01/2011
Added by lop16/06/2015
Rollback a user map update
7/18/15 111 billion points in Postgis Topology
User adds a new lineand surface attribute
Moderator deletesthe new line
The new map
Initial map
No overlap or gaps when map edit
7/18/15 121 billion points in Postgis Topology
User adds a new lineand surface attribute The new map
Initial map
This new line will not cause any overlap or gap with the exiting surface
Old lines will keeptheir history and original points(2 new points)
�
7/18/15 131 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(id serial PRIMARY KEY not null );SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5','ar5_topo_linje', 'geo', 'LINESTRING') As new_layer_id;
-- create a new table for linestring attrubuttesCREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(
id serial PRIMARY KEY not null,-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since
this update outside our range we can not us foreig key her edge_id int not null, objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1 CHECK (objtype_kode in (1,2,-1)), aravgrtype smallint not null,-- contains felles egenskaper from ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,-- used temp data will be deleted after data is adddedsl_sdeid int
);
HOW TO ILUSTRATE
A good picture may say more that any text, but for some people
a SQL fragment may say more that any text or picture.
When you see SQL fragments, I will explain the meaning. You
can actually think of this as a picture.
Database structure for border (lines/edges)
7/18/15 151 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje(id serial PRIMARY KEY not null );SELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_linje', 'geo','LINESTRING') As new_layer_id;
-- create a new table for linestring attrubuttesCREATE UNLOGGED TABLE topo_ar5.ar5_topo_linje_attr(
id serial PRIMARY KEY not null,-- could be a feoreign key to topo_ar5_sysdata.edge_data, but since this update outside our range we can not us foreig key her edge_id int not null, objtype_kode smallint not null CONSTRAINT objtype_kode_1_2_m1 CHECK (objtype_kode in (1,2,-1)), aravgrtype smallint not null,-- contains felles egenskaper from ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,-- used temp data will be deleted after data is adddedsl_sdeid int
);
table that holds Topo object for lines
Holds attribute For egdes
Why store attributes in separate table for lines ?
7/18/15 161 billion points in Postgis Topology
● We want to be sure that any edge can have only one attribute value.
● After a discussion with Sandro Santilli we will look at other ways to do this : My update code becomes complicated and many of the same tests are already done in Topology package by Sandro Santilli. The way I have solved this now needs to be redesigned.
Database structure surface
7/18/15 171 billion points in Postgis Topology
CREATE UNLOGGED TABLE topo_ar5.ar5_topo_flate(id serial PRIMARY KEY not null,artype int4 CONSTRAINT artype_between_0_100 CHECK (artype > 0 and artype < 100),arskogbon int4 CONSTRAINT arskogbon_between_0_100 CHECK (arskogbon > 0 and arskogbon < 100),artreslag int4 CONSTRAINT artreslag_between_0_100 CHECK (artreslag > 0 and artreslag < 100),argrunnf int4 CONSTRAINT argrunnf_between_0_100 CHECK (argrunnf > 0 and argrunnf < 100),-- contains felles egenskaper form ar5felles_egenskaper topo_ar5.sosi_felles_egenskaper,simple_geo geometry(MultiPolygon,4258) NULL);
--add a topogeometry column to the a ref to polygpn surfaceSELECT topology.AddTopoGeometryColumn('topo_ar5_sysdata', 'topo_ar5', 'ar5_topo_flate', 'geo','POLYGON') As new_layer_id;
Used for performance.
Adding the topo geometry
HOW we f ill this Postgis Topology layer with data.
● Content balanced grid.● Parallelize with GNU parallel and the grid cells.● All code is wrapped in PL/pgSQL functions. ● We use simple feature lines and surface
representation points when we create Postgis Topology
-- Core create grid code we use the && Operators to increase index usesql := 'SELECT count(*) FROM ' || table_name || ' WHERE ' || geo_column_name || ' && ' || 'ST_MakeEnvelope(' || x_min || ',' || y_min || ',' || x_max || ',' || y_max || ',' || source_srid || ')';EXECUTE sql INTO num_rows_table_tmp ;IF num_rows_table < max_rowsTHEN
sectors[0] := grid_geom;ELSE
x_delta := (x_max – x_min)/2; y_delta := (y_max – y_min)/2; x_center := x_min + x_delta; y_center := y_min + y_delta;sectors[0] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_min,y_min,x_center,y_center, ST_SRID(grid_geom)), min_distance, max_rows);sectors[1] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_center,y_min,x_max,y_center, ST_SRID(grid_geom)), min_distance, max_rows);sectors[2] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_min,y_center,x_center,y_max, ST_SRID(grid_geom)), min_distance, max_rows);sectors[3] := func_grid.SL_make_contert_based_grid(table_name_column_name_array,ST_MakeEnvelope(x_center,y_center,x_max,y_max, ST_SRID(grid_geom)), min_distance, max_rows);
Create content balanced grid for AR5 in Norway
7/18/15 191 billion points in Postgis Topology
-- Create a grid with around max 4000 lines in each cellSL_make_content_based_balanced_grid01(ARRAY['org_ar5.ar5_linje geo'],4000))
To big, split in 4
Below limit ok to use
Linestring and surface distribution for the grid used.
● Covered by a single cell (does not touch any cell border lines)● Single cell edges : 18988984● Single cell surfaces : 7093814
● Crosses/touches cell border lines● Multi cell edges : 635048● Multi cell surfaces : 534455
221 billion points in Postgis Topology
4 different operation type
7/18/15 231 billion points in Postgis Topology
● A:Process lines covered by single cells.● B:Merge cells to include lines that cross cell borders
(then do the same as in A for lines founs)
● C:Process surfaces covered by single cells.● D:Merge cells to include surfaces that cross cell
borders. (then do the same as in C for surfaces found)
A: Only process data covered by each cell
7/18/15 241 billion points in Postgis Topology
WAIT TO PROCESS:LINE NOT COVERD BY SINGLE CELL
START TO PROCESS :LINE COVERD BY SINGLE CELL
B: Merge cells to include lines that cross cell borders.
7/18/15 251 billion points in Postgis Topology
OK TO PROCESS NOW:LINE COVERD BY SET OF MERGED CELLS
DON'T PROCESS :DON'T TOUCH ANY ORIGNAL BORDERS
Process lines covered by single cells : 1. create topo.
7/18/15 261 billion points in Postgis Topology
SELECT topology.toTopoGeom(geo, 'topo_ar5_sysdata', 1, 0.0000000001) as geo,sl_sdeidFROM (
select arl.sl_sdeid, arl.geo from org_ar5.ar5_linje arlwhere cell_geo_in && arl.geo andST_Contains(cell_geo_in, arl.geo) andarl.objType not in ('KantUtsnitt') andNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)
) AS a
Create the topo object. Extreme performance. Snap to value
Use to find attributes
Merge cells and collect cell borders
7/18/15 271 billion points in Postgis Topology
-- merge cel( SELECT
ST_union(cell.geo) as cell_unionFROM topo_ar5.cell_ad as cellWHERE cell.id >= cell_min_in and cell.id < (stop_cell_id)
) AS r2
-- get cell bordersFROM (
SELECT (ST_Dump(grid_lines)).geom AS grid_lineFROM (
SELECT ST_Collect(ST_ExteriorRing(cell.geo)) as grid_linesFROM topo_ar5.cell_ad as cellWHERE cell.id >= cell_min_in and cell.id < (stop_cell_id)
) AS r ) AS r,
Use merged cells and cell borders to f ind new lines
7/18/15 281 billion points in Postgis Topology
....WHERE ST_intersects(r.grid_line, arl.geo) ANDNOT EXISTS ( select edge_id from topo_ar5_sysdata.edge_data where ST_Intersects(geom, arl.geo) and ST_Intersects(geom, r.grid_line) ) ANDarl.objType not in ('KantUtsnitt') ANDNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)...WHERE ST_Contains(r2.cell_union, arl.geo) ANDNOT EXISTS ( select sl_sdeid from topo_ar5.added_edges f where arl.sl_sdeid=f.sl_sdeid)
Covered by merged cell
Process lines covered by single cells : 2. add attributes
7/18/15 291 billion points in Postgis Topology
SELECT distinct ON (edge_id) edge_id,topo_ar5.ar5_omkod_objtype_2_kode(b.objtype) as objtype_kode,aravgrtype,b.datafangstdato,ARRAY[b.informasjon] as informasjon,(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as kvalitet ,b.opphav,b.verifiseringsdato,(b.registreringsversjon,4.5)::topo_ar5.sosi_registreringsversjon as registreringsversjon,b.sl_sdeid
FROM ( select r.element_id as edge_id , arl.* FROM relation_ids_added ra, topo_ar5_sysdata.relation r , org_ar5.ar5_linje arl WHEREra.topogeo_id = r.topogeo_id and ra.layer_id = r.layer_id andarl.sl_sdeid = ra.sl_sdeid
) AS b Map by id.
Add attributes using user defined types.
Process surfaces covered by single cells: 1 add topo
7/18/15 301 billion points in Postgis Topology
INSERT INTO topo_ar5.ar5_topo_flate (geo)SELECT topology.CreateTopoGeom('topo_ar5_sysdata',3,2,topoelementarray ) as geofrom
( select distinct ST_GetFaceGeometry('topo_ar5_sysdata',l.face_id) as geo,topology.TopoElementArray_Agg(ARRAY[l.face_id,3]) as topoelementarray, ST_union(l.mbr) as union_face
From topo_ar5_sysdata.face as l, topo_ar5.cell_ad cellwhere cell.id = cell_nr_in and ST_Contains(cell.geo,l.mbr) and NOT EXISTS (select re.element_id from topo_ar5_sysdata.relation re where re.layer_id = 2 and re.element_id = l.face_id ) group by l.face_id
) as r1,topo_ar5.cell_ad cell
where cell.id = cell_nr_in andST_Contains(cell.geo, ST_Boundary(r1.union_face));
Build surface created
Find surfaces insideCurrent cell
Create surface Topo geo
Process surfaces covered by single cells: 2 update simple geo
7/18/15 311 billion points in Postgis Topology
update topo_ar5.ar5_topo_flate AS f set simple_geo = geo::geometryfrom arf_id as ft where f.id = ft.id_temp; Just cast from topo geomtry
Process surfaces covered by single cells : 2. update attributes
7/18/15 321 billion points in Postgis Topology
-- update the rest of the attributtesupdate topo_ar5.ar5_topo_flate as f SET (artype, arskogbon, artreslag,argrunnf,felles_egenskaper) =(c.artype,c.arskogbon,c.artreslag,c.argrunnf,(datafangstdato,informasjon,null, kvalitet,null,opphav,null,registreringsversjon,verifiseringsdato)::topo_ar5.sosi_felles_egenskaper ) FROM ( SELECT
b.artype ,b.arskogbon,b.artreslag,b.argrunnf,b.id_temp,b.datafangstdato, ARRAY[b.informasjon] as informasjon,(b.maalemetode,b.noyaktighet,b.synbarhet)::topo_ar5.sosi_kvalitet as kvalitet ,b.opphav, b.verifiseringsdato,(b.registreringsversjon,'4.5')::topo_ar5.sosi_registreringsversjon as registreringsversjonFROM( select p.*, ft.id_temp from org_ar5.ar5_punkt as p,arf_id as ft,topo_ar5.ar5_topo_flate as f2where f2.id = ft.id_temp and ST_Covers(f2.simple_geo,p.geo)) as b
) AS c where f.id = c.id_temp;
Find data by using Representation point
Test performance for the migrations process(16 dual core CPU's and ssd disks)
1 parallel threadfunction_create_topo_ar5.sh vroom2 1 13000 200
15 parallel threadfunction_create_topo_ar5.sh vroom2 15 13000 200
20 parallel threadfunction_create_topo_ar5.sh vroom2 20 13000 200
331 billion points in Postgis Topology
Decreasing processing time when increasing number of parallel threads
Number of threads Total runtime in hours
1 108
15 16
20 18
7/18/15 341 billion points in Postgis Topology
Average operations per second the 4 the different operation types with different number of threads.
Number of threads
A: Single celllinestrings
B: Multi celllinestrings
C: Single cellsurfaces
D: Multi cellsurfaces
1 91 9 305 5
15 1043 48 972 21
20 814 48 934 27
7/18/15 351 billion points in Postgis Topology
Average operations per second at every hour when running single threaded.
7/18/15 361 billion points in Postgis Topology
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBCCCCCCD
0
50
100
150
200
250
300
350
400
450
500
Hours and opr. type
Opr. pr. sec .
Average operations per second at every hour when running 15 parallel threads.
7/18/15 371 billion points in Postgis Topology
A A A A A A B B B B C C C D D D D D D D D0
200
400
600
800
1000
1200
1400
1600
1800
Hours and opr. type
Opr. pr. sec .
Summery convert AR5 to Postgis Topology
7/18/15 381 billion points in Postgis Topology
● Content balanced grid and parallel threads.● Two parallel threads can not work in the same area ● Function based index topo_ar5.get_relation_id( geo
TopoGeometry) and indexes on relation table.● Heavy use of && operator. ● Ok with 16 hours processing time since this is a one
time operation.● ValidateTopology('topo_ar5_sysdata') show no error.
HOW to update the Postgis Topology layer.
● Draw a line and set attribute values ● Use stored procedures● Use one single transaction● Rollback if any errors● Java backend with JSON API● Simple test client using this API
Two comments about update
7/18/15 401 billion points in Postgis Topology
1) Jostein head of AR5 “Don't delete old lines, it's nice toknow the history behind changes”.
2) Ingvild my boss “Why do I have to move old lines aroundwith many hundreds points, why can´t I just give you a newsimple line that just shows the difference ?”
Edit Topology data with surface data
7/18/15 411 billion points in Postgis Topology
Draw a polygon
Split a polygonUpdate surface attributes
Extend a polygon
Edit Topology : Split a polygon- Input : point, line, attribute values
7/18/15 431 billion points in Postgis Topology
Edit Topology : What happens when you have a split surface operation.
1 billion points in Postgis Topology
Java backend calls : apply_line_on_topo_flate( geo_in geometry,p_in geometry, artype_in int, arskogbon_in int,artreslag_in int, argrunnf_in int)
And the following happens- Adjust input input line to current data and take in account that equal surface be equal- Compute the area to be update- Take a copy of the non changed data- Take a copy of data may change- Clear data from the line attribute table- Clear data from the topo surface layer and delete rows to be changed- Add the adjusted line by topology.toTopoGeom- Update the line attribute table- Create new surfaces with new attribute value- Create old surfaces with old value- Check that non changed area is still the same
Edit Topology : Timing issues when you have a split surface operation.
1 billion points in Postgis Topology
Java backend calls this function
topo_ar5.apply_line_on_topo_flate( geo_in geometry, p_in geometry,artype_in int, arskogbon_in int, artreslag_in int, argrunnf_in int)
Small operations that include few changes takes a 1000 ms, but bigger oprations may minutts
http://trac.osgeo.org/postgis/ticket/2083
Edit Topology : Extend a polygon.
1 billion points in Postgis Topology
Java backend call this function:
apply_line_on_topo_flate( geo_in geometry, p_in geometry,artype_in int, arskogbon_in int, artreslag_in int,argrunnf_in int)
Where p_in (0.0) means not set.
Edit Topology : Draw a new polygon.
1 billion points in Postgis Topology
Java backend call this function: apply_polygon_on_topo_flate(geo_in geometry, artype_in int, arskogbon_in int,artreslag_in int, argrunnf_in int)
Further plans this year● Add many new layer to Postgis Topology this fall and
adjust the Topology model to new requirements. ● Create a client that uses JSON API for update of
topology layers.● Extend update API with more functionality.● We have to work more on performance and topology
usage and update client for AR5 .
Postgis Topology is a great tool and you can add one billion points and it's possible to update it afterwords.
Thanks to everybody that has contributed to Postgis Topology and other open source tools.
Questions ?