15
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History Priagung Khusumanegara, Rischan Mafrur, and Deokjai Choi School of Electronics & Computer Engineering Chonnam National University {priagung.123, rischanlab}@gmail.com, [email protected] The 23rd IFIP World Computer Congress (WCC 2015) Tuesday, October 6

Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Embed Size (px)

Citation preview

Page 1: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Profiler for Smartphone Users Interests Using Modified Hierarchical

Agglomerative Clustering Algorithm Based on Browsing History

Priagung Khusumanegara, Rischan Mafrur, and Deokjai Choi

School of Electronics & Computer Engineering Chonnam National University

{priagung.123, rischanlab}@gmail.com, [email protected]

The 23rd IFIP World Computer Congress (WCC 2015)Tuesday, October 6

Page 2: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Outline• Introduction• Proposed System• Data Collection• Data Extraction• URL Filtering Categories• Modified Hierarchical Agglomerative

Clustering• Experimental Results

Page 3: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

IntroductionMotivations:• Smartphone provides many applications to support our activity

which one of the applications is web browser applications.• People spend much time on browsing activity for finding useful

information that they are interested on it. • It is not easy to find the particular pieces of information that they

interested on it.

Proposed System:

• We proposed a Modified Hierarchical Agglomerative Clustering Algorithms on a server-based application to provides smartphone users profiling for interests-focused based on browsing history.

Page 4: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Proposed System

Figure: Illustration of Proposed System

Page 5: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Data Collection• Data Collection Application

– It is built using the Funf framework. Funf framework is an extensible sensing and data processing framework for mobile devices.

• Number of participants– 30 Chonnam National University Students (aged 19-22)

• Period of data collection– 1 month (July 2014)

Figure: Data Collection Application

Page 6: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Data Extraction• Example of Collected Data

• We extract collected data to observe a resource name part of URL structure which is useful information to analyze user interest.

• Example of data extraction

Page 7: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

URL Filtering Categories• URL Labeling

The URL data is labeled based on a popular Web directory.Open Directory Project (ODP) (dmoz.org).

• URL Filtering Categories

• The matrix of filtering categories

Category Group Category Type

Business Business/Economy, Job Search/Careers, Real Estate, and Shopping

Communications and search Blog/Web Communication, Social Networks, Email, and Search Engines

General Computer/Internet, Education, News/Media, and Reference

Lifestyle Entertainment, Games, Arts, Humor, Religion, Restaurants/Food, and Travel

𝐶4𝑥𝑛 = ۏ����������ێ?????ێ?????ێ?????𝑘𝑒𝑦1,1ۍ����������������� 𝑘𝑒𝑦1,2 … 𝑘𝑒𝑦1,𝑛𝑘𝑒𝑦2,1 𝑘𝑒𝑦2,2 … 𝑘𝑒𝑦2,𝑛𝑘𝑒𝑦3,1𝑘𝑒𝑦4,1

𝑘𝑒𝑦3,2𝑘𝑒𝑦4,2…… 𝑘𝑒𝑦3,𝑛𝑘𝑒𝑦4,𝑛 ۑ��������������ے������������������

ۑ��������������ۑ��������������ې?????

Page 8: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Modified Hierarchical Agglomerative Clustering

Input: 𝐴 𝑠𝑒𝑡 𝑋 𝑜𝑓 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 𝑈𝑅𝐿 𝑑𝑎𝑡𝑎 {𝑈𝑅𝐿1,…,𝑈𝑅𝐿𝑛} 𝑈𝑅𝐿 𝐹𝑖𝑙𝑡𝑒𝑟𝑖𝑛𝑔 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑒𝑠 𝐶4𝑥𝑛 =

ۏ����������

𝑘𝑒𝑦1,1 𝑘𝑒𝑦1,2 ⋯ 𝑘𝑒𝑦1,𝑛𝑘𝑒𝑦2,1 𝑘𝑒𝑦2,2 … 𝑘𝑒𝑦2,𝑛𝑘𝑒𝑦3,1 𝑘𝑒𝑦3,2 … 𝑘𝑒𝑦3,𝑛𝑘𝑒𝑦4,1 𝑘𝑒𝑦4,2 … 𝑘𝑒𝑦4,1 ے������������������

𝐴 𝐿𝑎𝑣𝑒𝑛𝑠ℎ𝑡𝑒𝑖𝑛 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑑𝑖𝑠𝑡(𝑐1,𝑐2) 𝒐𝒖𝒕𝒑𝒖𝒕: 𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡, 𝑢𝑠𝑒𝑟 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑠 1: 𝐶𝑔𝑟𝑜𝑢𝑝1= [] 2: 𝐶𝑔𝑟𝑜𝑢𝑝2= [] 3: 𝐶𝑔𝑟𝑜𝑢𝑝3= [] 4: 𝐶𝑔𝑟𝑜𝑢𝑝4= [] 5: 𝒇𝒐𝒓 𝑖 = 1 𝑡𝑜 𝑛 6: 𝑐𝑖 = {𝑈𝑅𝐿𝑖} 7: 𝑒𝑛𝑑 𝑓𝑜𝑟 8: 𝐶 = {𝑐1,…,𝑘} 9: 𝑙 = 𝑛+ 1 10: 𝑾𝒉𝒊𝒍𝒆 𝐶.𝑠𝑖𝑧𝑒 > 1 𝑑𝑜 11: (𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2) = 𝑚𝑖𝑛.𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑐𝑖,𝑐𝑗) 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑐𝑖,𝑐𝑗 𝑖𝑛 𝐶 12: 𝑅𝑒𝑚𝑜𝑣𝑒 𝑐𝑚𝑖𝑛1 𝑎𝑛𝑑 𝑐𝑚𝑖𝑛2 𝑓𝑟𝑜𝑚 𝐶 13: 𝐴𝑑𝑑 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑡𝑜 𝐶 14: 𝐼𝑓 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑎𝑛𝑦 [𝑘𝑒𝑦1,𝑛],𝑤ℎ𝑒𝑟𝑒 𝑛 = 1,2,..,𝑛: 15: 𝐴𝑑𝑑 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑡𝑜 𝐶𝑔𝑟𝑜𝑢𝑝1 16: 𝑒𝑙𝑖𝑓 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑎𝑛𝑦 [𝑘𝑒𝑦2,𝑛],𝑤ℎ𝑒𝑟𝑒 𝑛 = 1,2,..,𝑛: 17: 𝐴𝑑𝑑 ሼ𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2ሽ 𝑡𝑜 𝐶𝑔𝑟𝑜𝑢𝑝2 18: 𝑒𝑙𝑖𝑓 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑎𝑛𝑦 [𝑘𝑒𝑦3,𝑛],𝑤ℎ𝑒𝑟𝑒 𝑛 = 1,2,..,𝑛: 19: 𝐴𝑑𝑑 ሼ𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2ሽ 𝑡𝑜 𝐶𝑔𝑟𝑜𝑢𝑝3 20: 𝑒𝑙𝑖𝑓 {𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2} 𝑎𝑛𝑦 [𝑘𝑒𝑦4,𝑛],𝑤ℎ𝑒𝑟𝑒 𝑛 = 1,2,..,𝑛: 21: 𝐴𝑑𝑑 ሼ𝑐𝑚𝑖𝑛1,𝑐𝑚𝑖𝑛2ሽ 𝑡𝑜 𝐶𝑔𝑟𝑜𝑢𝑝4 22: 𝑙 = 𝑙 + 1 23: 𝒆𝒏𝒅 𝒘𝒉𝒊𝒍𝒆 24: 𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝐶𝑔𝑟𝑜𝑢𝑝1 =

𝑙𝑒𝑛𝑔𝑡ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 1σ 𝑙𝑒𝑛𝑔𝑡 ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 𝑖4𝑖=1

25: 𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝐶𝑔𝑟𝑜𝑢𝑝2 = 𝑙𝑒𝑛𝑔𝑡ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 2σ 𝑙𝑒𝑛𝑔𝑡 ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 𝑖4𝑖=1

26: 𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝐶𝑔𝑟𝑜𝑢𝑝3 = 𝑙𝑒𝑛𝑔𝑡ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 3σ 𝑙𝑒𝑛𝑔𝑡 ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 𝑖4𝑖=1

27: 𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 𝐶𝑔𝑟𝑜𝑢𝑝4 = 𝑙𝑒𝑛𝑔𝑡ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 4σ 𝑙𝑒𝑛𝑔𝑡 ℎ.𝐶𝑔𝑟𝑜𝑢𝑝 𝑖4𝑖=1

28: set 𝐶𝑔𝑟𝑜𝑢𝑝 𝑖 which has max ሺ𝐷𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 ሻ𝑎𝑠 𝑢𝑠𝑒𝑟 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡𝑠

Page 9: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Experimental Results

Figure: Degree of Users Interests

Figure: Users Interests Profile

Page 10: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Experimental Results (Cont’d)• C 4.5 is the best known classification

algorithm used to generate decision trees for continuous and discrete attributes.

• Attributes:– Total session time– Total time a user stays at the site– Total number of accessed pages during the

whole session

Page 11: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Experimental Results (Cont’d)

Figure : Execution time comparison with C4.5 algorithm

Page 12: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Experimental Results (Cont’d)

Memory Size (MB) MHAC (%) C4.5 (%)

25 83.3 76.7

50 86.7 80.0

75 93.3 80.0

100 96.7 83.3

125 96.7 86.7

Table. Accuracy comparison with C4.5 algorithm

Page 13: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Conclusion and Futures Work• We proposed a modified Hierarchical Agglomerative Clustering

that can automatically provides a user interests profile.

• The proposed algorithms can measure degree of users’ interests and inferring particular pieces of information that they interested on it based on browsing history

• Our work can outperforms the C4.5 algorithm in execution time and accuracy on all memory utilization.

• In the future, we need to implement Map-Reduce algorithm on modified Hierarchical Agglomerative Clustering to enhance performance of clustering algorithm.

Page 14: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

AcknowledgementsThis research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2012R1A1A2007014).

Page 15: Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History

Thank You감사합니다