19
Optimal insert methods Optimal insert methods of geographical of geographical information to Spatio- information to Spatio- temporal DB temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko & Belal Kabat Supervisor: Flora Gilboa

Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Embed Size (px)

Citation preview

Page 1: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Optimal insert methods of Optimal insert methods of geographical information to geographical information to Spatio-temporal DBSpatio-temporal DB

Final Presentation

Industrial Project 234313

June 17,2012

Students: Michael Tsalenko & Belal KabatSupervisor: Flora Gilboa

Page 2: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

IntroductionIntroductionResearch ways of inserting and

querying geographical information related to video represented as polygons or circles into geographic DB.

Page 3: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

GoalsGoalsFinding the optimal way of

inserting Geo-data by determining the best representation of the captured polygon.

What representation is the optimal when querying the Data Base.

Determine whether amount of vertices of the polygon affect the querying efficiency.

Page 4: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology: Main StepsMethodology: Main StepsData creationReading data into memory (from file

system).Measuring insertion times.Measuring querying times.Adding tables with index using R-

tree.Parallelizing insertion and querying.Statistical examination of the results.

Page 5: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology-Methodology-Data creation Data creation (1):(1): We chose 10 different locations, and

randomly created 10,000 quad polygons around

each location. Our data setconsists of 100,000 polygons in total.

Page 6: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology-Methodology-Data creation Data creation (2):(2):

Snapshot of the random data

A closer look at Haifa

Page 7: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology-Methodology-Data creation Data creation (3):(3): Intuitively circle data manipulation

should be more efficient than quad data as it has less fields (3 vs. 8).

So we bounded each quad of the 100,000 with a circle and later improved it to a minimal bounding circle (to compare between different data representations of same locations).

(32.75762277268422, 34.97107551062562),(32.660576514078066, 34.97748364184535),(32.828346159124294, 35.086360632091974),(32.91352284093535, 35.023992366269674)

((32.78704967750671, 35.028718071358796),(0.13898966103824473))

Page 8: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- Reading into Methodology- Reading into memorymemoryIn order to improve the insertion

time we have created a pool which included all the data, so when we insert the data to the data-base we read immediately from the pool and not from the files, to make as less operations as possible after the connection to the server was made.

Page 9: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- Measuring Methodology- Measuring insertion time.insertion time.We have created 2 tables, one

for quad polygons and one for circles.For instance, the circles table:

id Asset name

Area

1

2

3

Circle#1

Circle#2

Circle#3

((32.7870496671, 35.02871358796), (0.138989664473))((29.927900734077, 31.2508816016), (0.27799441977))

((33.87026129199, 35.5400182072), (0.2372748702324))

Page 10: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- Measuring Methodology- Measuring insertion and querying time.insertion and querying time.We started to insert 100,000 objects

to each table, but after few hours of running we stopped the process (single thread), and reduced the data to 10,000 objects, the difference between the times of both insertion and querying was not significant, so we decided to check with indexed tables* for both circles and polygons.

(and started to plan parallelizing the research due to the 100,000 objects insertion inefficiency)

Page 11: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- Measuring Methodology- Measuring insertion time.insertion time. Indexed tables:

Are tables which are built using R-tree Data structure, which is a tree data structure used for spatial access methods, similar to the B-tree, R-tree is also a balanced search tree, so allleaf nodes at thesame height.

Page 12: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- Measuring Methodology- Measuring insertion time.insertion time.The difference between the results of the

querying and insertion to indexed and non-indexed tables were not significant:

(*) Hybrid circle in polygon means the query was to select a circle intersections in polygons table, and Hybrid polygon in circle means the opposite.

Page 13: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Parallelizing inserting.Parallelizing inserting.

We created 10 threads, each thread inserted 10,000 objects to each one of the 2 indexed tables (quads, circles).

This parallelizing enabled us to insert 100,000 objects in a reasonable time (~15mins instead of more than 2 hours when single thread).

We stopped inserting into the non-indexed tables since we saw that indexed tables give better results.

We can see that the circles insertion is better than quads insertion by ~13%.

Page 14: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Parallelizing querying.Parallelizing querying.We created 10 threads, each thread

inserted 10,000 object to each one of the 4 tables (quads, circles)

The difference between the results of the tables were not significant.

Page 15: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- vertices Methodology- vertices amountamountIn order to check if the number of vertices in a

polygon makes a difference, we have converted each polygon to an octagon in the following way:We added a vertex in the middle of each edge of original quadrate and added a factor of 0.001, which will prevent DB vendor optimizations

Page 16: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- vertices Methodology- vertices amountamount

Insertion results:

We can see that the insertion to octagon is worse than to the quad polygons by ~7% and to the cycles by ~18%

Page 17: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Methodology- vertices Methodology- vertices amountamountQueries results:

The results show that there is no significant difference between the querying to the tables. But it may be expected that on average intersecting a circle with quad polygons geo-data yields the best results.

Page 18: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Achievements & Achievements & Conclusions:Conclusions:We helped the IBM researchers to

decide that the best way to represent the geo-data (for both insertion and querying) is just to keep its original representation (no mistake areas) – as long as it is up to 8 vertices (or a circle), and if it has more than 8 vertices – further research is needed.

There is no need to look for other heuristics for data manipulation (bounding or bounded circles, dividing into several geo-objects, etc…).

Page 19: Optimal insert methods of geographical information to Spatio- temporal DB Final Presentation Industrial Project 234313 June 17,2012 Students: Michael Tsalenko

Achievements & Achievements & Conclusions:Conclusions:We learned about:

◦The importance of making a good work plan – founding the basics, ensure all works and building on them the needed extensions.

◦How the indexed tables work◦Using JDBC with geodetic extension ◦Statistical tools and analyzing results◦It’s very helpful to have a supervisor

who is an expert in the project field and can direct us properly and save lots of time.