Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Similarity of mobile usersbased on sparse location history
Pasi FräntiRadu Mariescu-Istodor
Karol Waga
21.2.2019
P. Fränti, R. Mariescu-Istodor and K. Waga
”Similarity of mobile users based on sparse location history”
Int. Conf. Artificial Intelligence and Soft Computing (ICAISC)
Zakopane, Poland, 593-603, June 2018
Introduction
Motivation
•
Cold start users in Recommender systems
•
Activity in the same area •
Opinions of local experts more valuable [Bao’2012]
•
Identify the same user
Existing methods
•
Complete trajectories [Ying’10]•
Longest common subsequence with speed and turn points [Liu’12]
•
Stop points (>30 min) [Zheng’10]
•
Most common stops: home and work [Biagioni’13]
•
User given priority for parts more [Wang’12]
Problems?
•
Full GPS trajectories not always stored
•
Areas of activity are often more important than the exact tracks
•
Explicit activities such as visits, check-ins and photo taking are widely recorded.
Test case: APR trio
Andrei 201214
412
8
4
5
3
4
20
7
9 20
5
3
Andrei 2013
46
11
6
8
3
Andrei 2014
Pasi 2012
10
12
Pasi 2013
8
5
8
5
Pasi 2014
9
9
Radu 2012
18
28
Radu 2013
13
11
16
Radu 2014
6
21
8
11
A11
A13
A12
A14
46
11
6
8
3
20
7
9 20
5
3
14
412
8
4
5
3
410
6
6
11
Example of User A
Pasi 2011
Pasi 2013
Pasi 2012
Pasi 2014
5
610
12
8
4
5
8
59
9
Example of User P
Radu 2011
Radu 2013
Radu 2012
Radu 2014
19
32
18
28
13
11 6
16 21
8
11
Example of User R
Two approaches
1. Histogram comparison-
Construct histogram from some common data
-
Map points to the bins-
Histogram matching of the two counts
2. Clustering comparison-
Cluster each data separately
-
Map centroids between the data-
Calculate similarity based on the mapping
Selected approaches
1. Histogram comparison-
Construct histogram from some common data
-
Map points to the bins-
Histogram matching of the two counts
2. Clustering comparison-
Cluster each data separately
-
Map centroids between the data-
Calculate similarity based on the mapping
Approach 1:Histograms
Location historyActivity points of users
Histogram matchingCreate bins from popular locations
Location similarityCount visit statistics
1 0 0
1 0 0
0 0 0
0 0 3
0 0 2
2 2 0
1 1 1
4 3 19
76
Histogrambins
Visitcounts
Histogram of the toy example
4 3
1
2 2
0
0 0
3
1 1
1
0 0
2
1 0
0
1 0
0
0 0
0
9 6 7
8
43
3
2
1
1
0
0.44 0.50
0.14
0.22 0.33
0.00
0.00 0.00
0.43
0.11 0.17
0.14
0.00 0.00
0.28
0.11 0.00
0.00
0.11 0.00
0.00
0.00 0.00
0.00
Pekka Bartek Julia Pekka Bartek JuliaFrequencies Relative frequencies
Distance measures
Bhattacharyya distance
4 3
1
2 2
0
0 0
3
1 1
1
0 0
2
1 0
0
1 0
0
0 0
0
9 6 7
8
43
3
2
1
1
0
0.44 0.50
0.14
0.22 0.33
0.00
0.00 0.00
0.43
0.11 0.17
0.14
0.00 0.00
0.28
0.11 0.00
0.00
0.11 0.00
0.00
0.00 0.00
0.00
ii qpD ln
0.47
0.27
0.00
0.14
0.00
0.00
0.00
0.00
= 0.88
-ln = 0.13
0.26
0.00
0.00
0.15
0.00
0.00
0.00
0.000.410.89
Pekka Bartek Julia Pekka Bartek JuliaFrequencies Relative frequencies
Pekkavs
Bartek
Bartekvs
Julia
L1 distance
4 3
1
2 2
0
0 0
3
1 1
1
0 0
2
1 0
0
1 0
0
0 0
0
9 6 7
8
43
3
2
1
1
0
0.44 0.50
0.14
0.22 0.33
0.00
0.00 0.00
0.43
0.11 0.17
0.14
0.00 0.00
0.28
0.11 0.00
0.00
0.11 0.00
0.00
0.00 0.00
0.00
0.06
0.11
0.00
0.06
0.00
0.11
0.11
0.00
= 0.42
0.36
0.33
0.43
0.03
0.28
0.00
0.00
0.001.43
i
ii qpL1
4 3
1
2 2
0
0 0
3
1 1
1
0 0
2
1 0
0
1 0
0
0 0
0
9 6 7
8
43
3
2
1
1
0
0.44 0.50
0.14
0.22 0.33
0.00
0.00 0.00
0.43
0.11 0.17
0.14
0.00 0.00
0.28
0.11 0.00
0.00
0.11 0.00
0.00
0.00 0.00
0.00
0.36%
1.21%
0.00%
0.36%
0.00%
1.21%
1.21%
0.00%
= 0.04
13%
11%
18%
0%
8%
0%
0%
0%0.50
i
ii qpL 22
L2 distance
•
Uniform global grid (e.g. 2525 m)•
Some known places (Mopsi services)
•
All user data from Mopsi•
The two data as such:-
Cluster the joint activity data
-
Each cluster represents one bin-
Count blues and reds in the cluster using any histogram matching technique
How to create histogram bins
Approach 2:Clustering
Clustering of two distributions
6
10
4
1 7/1 = 7
9/6 = 1.5
4/0.5 = 8
6/0.5 = 12
Experimental results
Collected data
Municipalities:Municipalities:JoensuuJoensuuLiperiLiperiOutokumpuOutokumpuPolvijPolvijäärvirviKontiolahtiKontiolahtiIlomantsiIlomantsiJuukaJuuka
63.44N28.65E
62.25N31.58E
Joensuu sub-region bounding box
•
Histogram from 293 places (Mopsi services)•
User activities until 31.12.2014‒
Photos taken
‒
Tracking started or ended
Summary of APR trio data 2011-2014
Distinct users:
3Total users:
12
Sample sizes:
37 -
1263
14
412
8
4
5
3
4
Andrei2012
Andrei2011
10
6
6
11
Andrei2011
10
6
6
11
20
7
9 20
5
3
Andrei2013
20
7
9 20
5
3
Andrei2013
46
11
6
8
3
Andrei201446
11
6
8
3
Andrei2014
19
32Radu201119
32Radu2011
18
28
10
Radu201218
28
10
Radu2012
13
11
16 Radu2013
13
11
16 Radu2013
6
21
8
11
9
Radu20146
21
8
11
9
Radu2014
5
64
Pasi2011 5
64
Pasi2011
10
12Pasi201210
12Pasi2012
9
9
Pasi20149
9
Pasi2014
8
5
8
5
Pasi20138
5
8
5
Pasi2013
Ten most popular histogram bins Frequencies shown
Work
Home
Home
Test protocolExpected result
True positive:
13
11
16 Radu2013
13
11
16 Radu2013 5
64
Pasi2011 5
64
Pasi2011
True negative:
vs.9
9
Pasi20149
9
Pasi20145
64
Pasi2011 5
64
Pasi2011 vs.
Distance Distance
SAME DIFFERENT
• 12*12 = 144 comparisons in total• 3*4*4 = 48 true positives (33%)• 3*8*8 = 96 true negatives (67%)
??
Comparison of all pairs
vs.
Distance
0.01 0.01 0.02 0.03 0.05 0.06 …
0.74 0.76 0.81 0.82 0.88
Same user
threshold
Different user
Sort
Algorithm result:
Classification error
0.01 0.01 0.02 0.03 0.05 0.06 …
0.74 0.76 0.81 0.82 0.88
Same user Different user
same same not same same same …
not not not not not
Algorithm result:
Ground truth:
Errors made:
3Total comparisons:
144
Classification error:
2%
Classification results
Averagedistance A priori
Top-33% Best possible
Fuzzy worse than crisp
Error Error
What if top-10 places removed?
Conclusions
Main results:•
People’s identity can be recognized with high accuracy
•
All except L2
and L•
Fuzzy works worse than crisp
Future research:•
Clustering-based approach
•
K-NN classifier
Thank you
Time for Time for questions!questions!