Upload
job-foster
View
226
Download
1
Tags:
Embed Size (px)
Citation preview
Spatial Indexing of large astronomical databases
László Dobos,István Csabai,
Márton TrencséniELTE, Hungary
Typical datasets
~200M data points
multidimensional parameter spacereal space, magnitudes, colors, redshift
etc.
over several hundred GBs
Typical tasks
Object types are classified by a set of linear inequlities in magnitude space
(n dimensional polihedra)Compute a histogram of the whole
parameter spaceFind similar objectsFind clustersCompare the distribution of two very large
datasets
Database servers
Ideal for storing a large amount of data even when data structure is not so complex
Optimized access of data over file systemsClever caching methodsMSSQL 2005 can be programmed
efficiently
Problems
In DB servers evaluating data points against inequalities is done on a per row basis
Traditionally points close in the magnitude space may be far on the disk(s)
when the expected resultset is small, it is not optimal to run a tablescan
Main idea
divide the parameter space into small cells two main methods:
hierarchical: kd-tree, modified kd-tree etcadaptive: Voronoi tessalation
section cells with search polihedra instead of check every point
check on a per point basis only when needed
Steps for creating the tessalation
1. For adaptive (like Voronoi):choose starting points (randomly?)
2. Calculate the cells, store in the database
3. Lookup the cell of each data point
4. Create a DB index on the cell IDorders dataset on the disk
5. Run queries
Technical details
done in MSSQL using the new SQL CLR features
allows running programs within the process of the database server: very fast!
linear programming, Voronoi etc. libraries are ported to .net/c#
Preliminary results
BoxTree vs. Standard SQL query times
0
10000
20000
30000
40000
50000
60000
70000
80000
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
# returned rows / # total rows
qu
ery
tim
e [m
sec]
BoxTree durations
SQL durations
Scientific ideas
SDSS photometry – 5D – 300M pointsfinding all objects with similar colorssource classificationstar – quasar separationblue – red galaxy locus etc.
Karhunen-Loeve (PCA) coeffs of Bruzual-Charlot models – 5-15D – 100K-100M p
Quick match with observed spectra