21
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU Permutation Index and GPU to Solve efficiently Many to Solve efficiently Many Queries Queries AUTORES Mariela Lopresti Natalia Miranda Fabiana Piccoli Nora Reyes UNIVERSIDAD NACIONAL DE SAN LUIS 1

HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES Mariela Lopresti Natalia Miranda Fabiana Piccoli

Embed Size (px)

Citation preview

Page 1: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

HPCLatAm 2013HPCLatAm 2013

Permutation Index and GPU Permutation Index and GPU to Solve efficiently Many to Solve efficiently Many

QueriesQueries

AUTORESMariela LoprestiNatalia MirandaFabiana PiccoliNora Reyes

UNIVERSIDAD NACIONAL DE SAN LUIS1

Page 2: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

OBJETIVESOBJETIVES

Speed up multimedia database queries through search index using High Performance Computing.

Search Index: Permutation.

High Performance Computing: Parallel programming on NVIDIA GPU.

2

Page 3: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

INTRODUCTIONINTRODUCTION

Multimedia Data.

How to resolve queries?

Similarty Search.

Metric Space Model: is a paradigm that allows to modelize all the similarity search problems.

Metric Data Base: store objects of a metric space and let resolve similiraty search.

3

Page 4: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

INTRODUCTIONINTRODUCTION A metric space (X, d) is composed of a universe of valid objects X and a distance function d : X × X → R+ defined among them.

The distance function determines the similarity (or dissimilarity) between two given objects and satisfies several properties which make it a metric. Similarity Search: given a dataset of | U |= n objects, a query can be trivially answered by performing n distance evaluations.

There are two main queries of interest:

Range Searching. The k Nearest Neighbors(k-NN).

4

Page 5: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

SEARCH INDEXSEARCH INDEX

The saved information in the index can vary, some indices store a subset of distances between objects, others maintain just a range of distance values.

The goal is to preprocess the dataset such that queries can be answered with as few distance computations as possible.

One of these indices is the Permutation Index.

5

Page 6: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

INDEX: PERMUTATIONINDEX: PERMUTATION

The algorithm based on permutation is a probabilistic algorithm.

Predict proximity between elements, using its permutations.

If two elements are similar then their permutations are similar.

Preprocessing step: compute the permutation of each element of the database.

All permutations are stored to form the index.

6

Page 7: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

GPU - CUDAGPU - CUDA

GPU was developed with a highly parallel structure, high memory bandwidth.

GPU has high throughput becouse of the compute capability of thousands of threads.

GPU characteristics:

Several streams multiprocessors. CPU – GPU memory hierarchy. Threads running in parallel.

Page 8: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

PERMUTATION ON GPUPERMUTATION ON GPU

Build a searching index: Permutants Solving similarity queries on a Data Base.

8

Page 9: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

GPU-CUDA PERMUTATION INDEXGPU-CUDA PERMUTATION INDEX

9

The Indexing process has two stages:

1- Calculates the distance among every object in database and the permutants.

2- Sets up the signatures of all objects in database, i.e. all object permutations.

Each thread compute an object permutation.

Page 10: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

SOLVING APPROXIMATE SOLVING APPROXIMATE QUERIESQUERIES

1- Compute the permutation of query object. Each thread compute one permutation.

2- Contrast the permutation of query object with the index, according to footrule distance.

3- Sort the Footrule distances. They are sort with the quicksort implemented in parallel.

10

Page 11: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

SOLVING APPROXIMATE SOLVING APPROXIMATE QUERIESQUERIES

4- Depending on the type of query we evaluate the selected object.

4.1- Range search: select items whose distance is less than a reference range.

4.2 -KNN search: 4.2.1: compute de edit distance.

4.2.2: sort the distances with the quicksort and select the k first items of the sorted list.

11

Page 12: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

SOLVING PARALLELY MANY SOLVING PARALLELY MANY QUERIESQUERIES

12

It is not enough to speed up the time to answer only one query, but it is necessary to leverage the capabilities of the GPU to parallely answer several queries.The permutation index is built once and then is used to answer many queries.GPU receives the queries set and it has to solve all of them.

Page 13: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

We did experiments on:

Size of Data Base: 4KB, 29KB y 84KB.

Metric Data Base: English Words.

Distance Function: Edit Distance.

CPU characteristics: Intel corei3, 2.13 GHz, 3 GB of memory.

13

Page 14: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

GPU CHARACTERISTICS:

Ge Force GPU Global Memory

SM SP Clock Rate Computing capacity

GTX330 512 MB 6 48 1.04GHz 1.2

GTX550Ti 1024 MB 4 192 1.96 GHz 2.1

GTX520MX 1024 MB 1 48 1.8 GHz 2.1

14

Page 15: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

15

#permutantes GT520MX GTX550Ti GTX330

128 27639.72 29310.63 16973.21

64 29539.57 29362.77 16379.24

5 28197.27 29604.32 164740.46

#permutantes GT520MX GTX550Ti GTX330

128 19824.25 19377.68 10850.85

64 19797.83 18857.32 11137.65

5 19906.59 19121.16 11262.48

Range Search Throughput

Knn Throughput

Page 16: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

16

The next figure show the obteined acceleration in range queries and K-NN queries for 80 queries solved in parallel.

Range queries show improvements respect to k-NN queries.

The best case is for largest database and maximun number of permutant.

Page 17: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

17

Speedup of Range search Queries on three different GPUs.

Speedup of k-NN Search Queries for different number of parallel queries

Page 18: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

ANALYSIS OF ANALYSIS OF EXPERIMENTAL RESULTSEXPERIMENTAL RESULTS

18

Speedup of GPU-Qsort and Thrust on three different GPUs

Our implementation obtains better speed up than the solution using thrust library.it is important to notice the independence of GPU-Qsort from GPU characteristics, it works fine in all GPU

Page 19: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

CONCLUSIONSCONCLUSIONS

Implementation of an Index: Permutantes used to approximate similarity searches in databases of words.

Empirical Evaluation: improvements obteined in different architectures considered.

19

Page 20: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

FUTURE WORKFUTURE WORK

We plan to make an exhaustive experimental evaluation considering others kinds of database, comparing with other solutions that apply GPU in the scenario of metric space similarity searches.

We need also to evaluate retrieval effectiveness of the answer of the Permutation Index, as the number of objects directly compared with the query grows, by using Recall and precission measures.

Exploiting the power of GPUs using optimization techniques to increase performance in solving many parallel query.

20

Page 21: HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli

THANKS FOR YOUR ATTENTIONTHANKS FOR YOUR ATTENTION

Questions?Questions?

21

Mariela Lopresti: [email protected] Miranda:[email protected]

Fabiana Piccoli:[email protected] Reyes:[email protected]