Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 395
Abstract The success of twitter and the opportunities provided by it led to creation of new web based applications for social network and open new frontiers. Thus, discovering the usage pattern of social media sites might be useful in taking a decision about the design and implementation of those applications as well as educational tools. Therefore, in this study we extract tweets tweeted by user, analyze them and predict age group and gender attributes of Twitter users. Classification model is developed by employing lexical features and learning algorithms. The ability to classify latent user attributes, including gender and age exclusively from informal language used by the Twitter user/s has important applications in personalization, advertising, and recommendation. The work includes a novel investigation of classification algorithms over a rich set of features [1], applied for classifying these user attributes. Extensive analysis of features and approaches that are effective in classifying user attributes from casual written genres which are different from the other commonly spoken genres, are also included. Since Twitter only provides the name of its users, latent attributes are not available on social site directly, they are the hidden elements. So there is a need to develop a prediction system that predicts latent attributes of Twitter user based on his/her tweets. The investigation of same data set is also done with WEKA’s different classification algorithms. Then results of work carried out by the author and WEKA’s classifiers are compared and analyzed. As a conclusion statements, the study proved that our method works better than that of other Classifiers Keywords: Latent Attribute, SVM, WEKA,
1. INTRODUCTION A social network is an interactive network that often connects many individuals by a relationship. Initially these relationships may appear simple but further scrutiny may lead to form an interesting structures. As social media becomes more and more integrated into our environment, it plays a bigger role in our daily lives. So researchers find social media as one of the major area for scientists. The topic is becoming even more worthwhile as economic and business opportunities have sprouted up around these sites. Thus, things like profile customization, advertisement targeting, and interest prediction are automated by Social media sites using this user information. Nevertheless, the data provided by users on site is often segmented and not complete. Users may provide information such as name and working place but could leave out other information
such as interests or gender and sometimes age. Thus, it is very important to predict unavailable information from the subset of information that they provide. Predicting user’s preferences and demographic information has a long history. Thus, finding Gender seems to be basic piece of information but can open doors to many other applications of information prediction. Given a user’s actual name, we can simply evaluate whether it is a boy’s name or a girl’s name if and only if there is no discrepancies in names of users. (Ex. Taylor, Sam, Kiran etc.). However, sometimes we do not have access to a user’s actual name and thus have to access other information like screen name to predict their gender and followers (or friends). Since Twitter only provides the name and location of its users, we develop a classification system that predicts latent attributes of Twitter user based on his/her tweets. We have designed the system which predicts age group and gender attributes of Twitter users of a particular region. Classification model is developed by employing lexical features and learning algorithms. Thus we can propose a system for “Automated identification of user attributes from social media site: case-study Twitter”. Elaborating the same, a social media outlets such as Twitter has become an important forum for peer interaction. Thus the designed system has ability to classify user attributes, including gender and age solely from tweets
2. FLOW MODEL OF THE PROPOSED SYSTEM The flow model of the proposed system shown in fig. 1. The steps followed in the flow as given below
Figure 1 Data Flow Diagram of system
Automated Identification of Latent Attributes of Twitter users
Karuna C. Gull1, Sudip Padhye2 , Dr. Subodh Jain3
1Department of Computer Science and Engineering,K.L.E. Institute of Technology, V.T.U., Hubli -580030, India.
2 Business Intelligence, Digital Transformation Unit,KPIT Technologies Ltd., Navi Mumbai – 400710, India.
3Department of Computer Science,SVN University, Sagar, MP, India.
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 396
1. First authentication process is carried out by the authors to access the relevant data from twitter [7]
2. Preprocessing is done on the extracted data [8] 3. The preprocessed data is now acting as training data or
feature selecting information. 4. Now collect information related screen or user actual
name and analyze it. 5. The analyzed data is fed as input to the Support Vector
Machine (SVM) algorithm to predict user attributes like gender and age.
6. Predicted output is sent to the display module for displaying the result.
5 ANALYSIS OF THE DESIGNED SYSTEM Elaborating the proposed method as shown in fig. 2. to identify the attributes of users. The steps followed are given below Step 1: Authentication process issues access token and access secret key [7] which can be used in our application to extract the tweets and user information separately. Step 2: Preprocessing work like removal of Stop words, Calculation of occurrence of words in the tweets etc., is carried out on data extracted from Twitter, in order to train the system designed. Convert the preprocessed data into the vector form which is acceptable by the Machine learning algorithm – Support Vector Machine (SVM). Step 3: Now use SVM Machine Learning Algorithm to identify the attributes of users specifically age and gender from the Meta-data collected.
Figure 2 Elaboration of Proposed System
6 SUPPORT VECTOR MACHINE The support vector machine is the most sophisticated algorithm. It is one of the common classification methods. Its high classification accuracy which is linked with its use has made it so popular. The support vector machine [2] is classed as a non-probabilistic binary linear classifier. It works by plotting the training data in multidimensional space. It then tries to separate the classes with a hyperplane. If the classes are not immediately linearly separable in the multidimensional space the algorithm will add a new dimension in an attempt to further separate the classes. It will continue this process until it is able to separate the training data into its two separate classes using a hyperplane [6]. A basic representation of how it splits the data is shown in fig. 3 below.
Figure 3 SVM basic operation (Anon., 2011)
Working: Support Vectors are the co-ordinates of each observation. Support Vector Machine [3] is a boundary (hyper-plane/line) which best segregates the two classes. In most cases there may be multiple hyperplanes or in some cases an infinite number of hyperplanes that could separate that classes. The SVM algorithm chooses the hyperplane which provides the maximum separation between the classes i.e. which has the greatest margin or the maximal margin hyperplane which minimizes the upper bound of the classification errors. For a given dataset, there could be multiple possibilities of hyperplanes but the SVM algorithm [5] chooses the one that provides the maximum separation between the classes i.e. which has the greatest margin or the maximal margin. In an n-dimensional plot, each point represents a data item in SVM algorithm, where n represents the number of features. Classification is done with the help of hyper-plane, which differentiates the two classes efficiently.
Figure 4 Hyper-plane separating the two classes [9]
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 397
The objective function for Linear SVM is:
Where, θ is the parameter vector, x is the feature vector, n is the number of features, m is the number of training sets, C is the regularization parameter, 푐표푠푡 Θ 푥( ) and 푐표푠푡 Θ 푥( ) are the costs when y=1 and y=0 respectively.
7 IMPLEMENTATION PROCESS After Authentication of the application to twitter account, query the Twitter database by giving the screen name. Collect the status information (min 500) of screen name. Training process of the system designed is shown fig. 5.
Figure 5 Process to collect the information needed for
training of system
Insert status information along with screen name and tweets into a table designed. Database of system consists of tweet-info table whose sample contents are shown in the table 1.
Table 1: Status information into tweet_info table
tweet_id
screen_name tweet timing
re_tweet
favourite
583640176238277000
_AkshataShetty
Nashville Recap: Smash Landings: I’m so glad they started right after the slap-hug combo, aren’t you? ... http://t.co/wDRTOvXogh #music
2015-04-03 08:40:36 0 0
583640176917754000
_AkshataShetty
GRRM Posts a New Winds of Winter Preview Chapter, and It’s All About Sansa: Sansa Starks story line o... http://t.co/hnG4kQvisG #music
2015-04-03 08:40:36 0 0
58364 _Akshat True Story Author 2015- 1 1
0177593098000
aShetty Michael Finkel on His Relationship With the Murderer Who Inspired the James Franco–Jo... http://t.co/9DUGbIzE0e #music
04-03 08:40:36
583640176917754000
_AkshataShetty
GRRM Posts a New Winds of Winter Preview Chapter, and It’s All About Sansa: Sansa Starks story line o... http://t.co/hnG4kQvisG #music
2015-04-03 08:40:36 0 0
583640177593098000
_AkshataShetty
True Story Author Michael Finkel on His Relationship With the Murderer Who Inspired the James Franco–Jo... http://t.co/9DUGbIzE0e #music
2015-04-03 08:40:36 1 1
: :
: :
Enter user information like gender and age (range) with category to create user_info table which acts as training data for the designed system as shown in table 2 with sample data.
Table 2: Creation of user_info table
User_name Gender Age Category
_AkshataShetty Female 21-40 Test
AkanchaS female 41-60 train akrout81 male 21-40 train
AmarRamesh male 21-40 train amitnimade male 41-60 train Amitvele male 21-40 train
: cadrsunilgupta male 41-60 train cartoonistpai male 21-40 train chakraberty male 41-60 train
ChefAroraBhakti female 21-40 train crazydiode male 21-40 train
: System training 1. Access the number of documents having category
“Train” and place in Docs. 2. Access the tweets from tweet-info able for every user
and place them in document . (d:docs i.e. d1 tweets for user1 ; d2 tweets for user2 ; and so on)
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 398
Figure 6 Process to predict the user attributes
3. Pre-processing of data : Clean tweets removal of link, @,newline,
punctuations, numbers etc. Create text concatenation of tweets of single user. Create token Tokenization of concatenated tweets 4. Fill document contains all tokens 5. Eliminate single and two character/s words (Ex. I, a, an,
am etc) and place all the words whose characters are more than 2.
This includes creating a vocabulary, pre-processing the tweet_info by cleaning & concatenating and tokenizing it, which finally fetches the processed_tweets. The table 3 shows sample pre-processed tweets.
Table 3: Sample pre-processed tweets
Tweet Age Gender
are you serious isupportmsg 21-40 Female
katju 21-40 Female
great going richa todays delhi times 21-40 Female only the brave come here faces thirddegree at am watch now 21-40 Female kejriwal ke saamne kiran pm halla bol delhiassemblyelections watch live at aajtakin 21-40 Female
hahahaha 21-40 Female first exclusive interview of on right now at pm mustwatch 21-40 Female
first exclusive interview of former ips on right now at pm mustwatch kiranbedi 21-40 Female former ips kiranbedi says she will make delhi a world class city watch full interview at pm watch now 21-40 Female modi at centre bedi in delhi kiranbedi says its modi at centre make bjp govt in delhi to get centres support at pm 21-40 Female
: Using the tweet_info, user_info and processed_tweets data, create gender_vocabulary. Table 4 shows sample data of gender vocabulary:
Table 4: Sample Gender Vocabulary table word frequency gender
aab 2 female
aadarshliberal 1 female
aadhar 1 female
aadiguru 1 female
aadmi 1 female
aagyani 1 female
aahuti 1 female
aaj 5 female
aajtak 7 female
aajtakdillikadil 1 female
aajtakin 4 female
aaka 1 female
aakar 1 female
: :
Now, use formulas 1 and 2 to find the IDF values and TFvalues to find TFIDF values to provide proper training set to construct model file of SVM.
Term Frequency 퐓퐅(퐰) =.
. -- (1)
Term Frequency 퐓퐅(퐰) =.
. -- (2)
Table 5 shows a calculated IDF values for keywords.
Table 5: Calculated IDF values for keywords
Count Word idf
0 sanjay 0.82193
1 champions 1.26126
2 nietzsche 1.86332
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 399
3 event 0.5209
4 liberalisation 1.86332
5 generation 0.960233
6 islamists 1.86332
7 told 0.562293
8 possibly 1.56229
9 meet 0.21987
10 somebody 1.08517
11 rejection 1.86332
12 biggest 0.5209
13 relief 1.16435
14 bangladesh 0.82193
15 times 0.14732
16 pity 1.56229
: : The Age/Gender Model file acts as input to SVM is as below table 6:
Table 6: Age/Gender model file for SVM Age Model file acts as input to SVM +1 7:7.0E-4 13:0.0014 14:0.001 15:2.0E-4 19:6.0E-4 20:1.0E-4 25:3.0E-4 26:6.0E-4 35:1.0E-4 37:9.0E-4 39:8.0E-4 41:1.0E-4 45:3.0E-4 54:9.0E-4 57:1.0E-4 58:0.0013 60:1.0E-4 61:6.0E-4 62:2.0E-4 66:3.0E-4 67:1.0E-4 71:0.0 72:3.0E-4 76:4.0E-4 77:0.0011 78:1.0E-4 79:6.0E-4 80:0.0 82:3.0E-4 83:1.0E-4 87:3.0E-4 90:9.0E-4 91:2.0E-4 96:3.0E-4 98:0.0 99:1.0E-4 100:1.0E-4 101:5.0E-4 102:1.0E-4 103:0.0 : -1 7:6.0E-4 9:2.0E-4 14:9.0E-4 18:4.0E-4 24:1.0E-4 25:3.0E-4 26:5.0E-4 29:4.0E-4 35:1.0E-4 37:8.0E-4 39:7.0E-4 41:1.0E-4 45:3.0E-4 46:7.0E-4 50:3.0E-4 51:6.0E-4 57:1.0E-4 60:0.0 61:5.0E-4 62:2.0E-4 64:7.0E-4 66:3.0E-4 67:1.0E-4 71:0.0 78:1.0E-4 80:0.0 83:1.0E-4 89:0.0011 91:2.0E-4 :
Gender Model file acts as input to SVM +1 1:0.0013 2:0.0019 3:5.0E-4 4:0.0019 5:0.001 6:0.0019 7:6.0E-4 8:0.0016 9:2.0E-4 10:0.0011 11:0.0019 12:5.0E-4 13:0.0012 14:8.0E-4 15:1.0E-4 16:0.0016 17:0.0019 18:4.0E-4 19:5.0E-4 20:1.0E-4 21:0.0019 22:0.0019 23:0.0019 24:1.0E-4 25:3.0E-4 26:5.0E-4 27:0.0019 28:0.0012 29:4.0E-4 30:0.001 31:9.0E-4 32:0.0016 33:0.0019 34:0.0016 : -1 1:0.0017 7:7.0E-4 9:3.0E-4 15:2.0E-4 20:2.0E-4 24:1.0E-4 25:3.0E-4 26:6.0E-4 29:5.0E-4 35:1.0E-4 39:9.0E-4 41:1.0E-4 45:3.0E-4 46:9.0E-4 48:0.0014 57:1.0E-4 60:1.0E-4 64:9.0E-4 67:1.0E-4 71:0.0 76:5.0E-4 78:1.0E-4 79:7.0E-4 80:0.0 83:1.0E-4 87:3.0E-4 91:3.0E-4 96:4.0E-4 98:0.0 99:1.0E-4 :
8 EXPERIMENTAL RESULTS The various steps and results with snapshots of the work carried out are shown below. Step 1: In training part, after connecting to twitter through secret key and consumer key which are generated from twitter developer site, tweets are fetched. Normally tweets contain repeated words, numbers and symbols etc. Then these tweets are preprocessed i.e. words like and, or, as and symbols like @, &, * are removed. The training part of the designed system needs minimum hundreds of twitter users. Here training of the system helps to separate the tweets of male and female along with age group to store in database i.e. Tweets_info collected from Twitter server for given screen name. Thus the Tweet-info table contains Tweet_id, Screen name, Tweets, Timing and Re_tweets. Insert the collected tweets, the status information for the user (screen name) name into tweet_info table (Table 1). Enter user information like gender and age (range) with category to create user_info table (Table 2) which acts as training data for the designed system. Pre-process the tweet_info by cleaning, concatenating and tokenizing to create processed_tweets (Table 3). Using the tweet_info, user_info and processed_tweets data, create gender vocabulary (table 4) and age vocabulary tables (table 4). The model files named age and gender are created in the form required by the SVM algorithm. Now it uses linear SVM algorithm to differentiate male and female with age group. SVM algorithm used here works only with integers so words are stored as integer format as shown table 6. Let us enter the tweet id as BeingSalmanKhan which helps us to extract tweets which is shown in fig.7. Then enter related stuffs of twitter id to train the system as shown in fig 8. Now insert the data into database as shown in fig 9.
Figure 7 Screenshot to extract the tweets for given screen name
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 400
Figure 8 Screenshot to train the system by giving the
options like gender and age for a screen name
Figure 9 Screenshot to insert the data into database for
given screen name
Step 2: This step explain about testing part. This part is similar to training part. Tokens are generated and find the total number of unique words in the document. Now enter the user’s twitter id or screen name to find the age group and gender of that screen name. Let us enter the tweet id as iamsrk as test case-1 and see the result of it which is shown in fig 10 to fig 12 as male and age group 41-60.
Figure 10 Screenshot to test the system by giving a screen
name
Figure 11 Screenshot to extract the tweets for given a
screen name under test
Figure 12 Screenshot to display the age and gender for a
screen name under test
Let us enter the tweet id as deepikapadukone as test case-2 and see the result of it which is shown in fig 13 and fig 14 as female and age group 21-40.
Figure 13 Screenshot to extract the tweets for another
screen name under test
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 401
Figure 14 Screenshot to display the age and gender for
another screen name under test
Let us enter the tweet id as seemag as test case-3 and see the result of it, which is shown in fig 15. As the there were no sufficient tweets for the twitter id given. Hence an error message is displayed.
Figure 15 Screenshot to display error message for screen
name which has less number of tweets
9 COMPARATIVE RESULTS: Collected data for our work will be now given as input for the WEKA tool. The following data set are given as input for various algorithms of WEKA for analysis.
Table 7: Data Set for training and testing for both age and gender classification
For Age Classification: Training data: Testing data :
42000 (for age grp 21-40) + 20000 (for age grp 40+) = 62000 tweets
20 (for age grp 21-40) + 20 (for age grp 40+) = 40 tweets
For Gender Classification Training data Testing data :
160000 (for male grp) + 160000 (for female grp) = 320000 tweets
40 (for male grp) + 40 (for female grp) = 80 tweets
Collection of the analyzed results from WEKA tool’s Naïve-Bayes, Decision Table Classification and J48 Classification implementation for age for the given data set (table 7) are shown in fig.16 to fig 18.
Figure 16 Weka Tool’s Naïve Bayes Analysis for age
Figure 17 Weka Tool’s Decision table Analysis for age
Figure 18 Weka Tool’s J48 classification Analysis for age
Table 8 shows True Positive, False Negative, False Positive and True Negative age values for WEKA’s algorithms implementations and our work SVM Algorithm.
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 402
Table 8: TP, FN, FP and TN age values for WEKA’s algorithms and SVM algorithm
Naïve Bayes J48 Decision
Tables SVM
True Positives 40 40 0 40
False Negatives 0 0 40 4
False Positives 40 40 0 6
True Negatives 0 0 40 30
Figure 19 Shows graph of True Positive, False Negative, False Positive and True Negative age values of WEKA’s
various algorithms and SVM algorithm.
Collection of the analyzed results from WEKA tool’s Naïve-Bayes, Decision Table Classification and J48 Classification implementation for gender for the given data set (table 7) are shown in fig. 20 to fig 22.
Figure 20 Weka Tool’s Naïve Bayes Analysis for gender
Figure 21 Weka Tool’s Decision table Analysis for gender
Figure 22 Weka Tool’s J48 classification Analysis for gender
Table 9 shows True Positive, False Negative, False Positive and True Negative gender values for WEKA’s algorithms implementations and our work SVM Algorithm.
Table 9: TP, FN, FP and TN gender values for WEKA’s algorithms and SVM algorithm
Naïve Bayes J48
Decision Tables
SVM
True Positives 40 40 0 40 False Negatives 0 0 40 4 False Positives 40 40 0 6 True Negatives 0 0 40 30
Figure 23 Shows graph of True Positive, False Negative, False Positive and True Negative gender values of WEKA’s various algorithms and SVM algorithm
10 CONCLUSION For peer interaction the outlets of Social media such as Twitter have become an important forum. Thus the ability to classify latent user attributes, including gender, age and regional origin exclusively from informal language used by the Twitter user/s has important applications in personalization, advertising, and recommendation. The work includes a novel investigation of classification algorithms over a rich set of features [1], applied for classifying these user attributes. It also includes extensive analysis of features and approaches that are effective in classifying user attributes from casual written genres which are different from the other commonly spoken genres. Since Twitter only provides the name of its users, latent attributes are not available on social site directly, they are
05
101520
True Positives
False Negatives
False Positives
True Negatives
Naïve Bayes J48 Decision Tables SVM
010203040
True Positives
False Negatives
False Positives
True Negatives
Naïve Bayes J48 Decision Tables SVM
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: [email protected]
Volume 6, Issue 3, May- June 2017 ISSN 2278-6856
Volume 6, Issue 3, May – June 2017 Page 403
the hidden elements. So the work carried out by the classification system here, helps in predicting latent attributes of Twitter user based on his/her tweets. It can be used in finding the age (in the given range) and gender of the user name / screen name simultaneously, which was not done before. The previous work carried out is able to find either age or gender of the given user name. The investigation of same data set is also done with WEKA’s different classification algorithms. Then results of work carried out by the author and WEKA’s classifiers are compared and analyzed. As a conclusion statements, the study proved that our method works better than that of other Classifiers. To begin with sentiment analysis of data, the sentences are classified into three classes- Positive, Negative and Neutral. Then, the results are provided in the form of pie charts. List containing screen names that will come from Sentiment analysis module whose sentiments are classified are given as input to the predictive module. The output is a table showing the gender & age classification with their screen names, which is linked to Marketing module. Now, list containing screen names, their sentiment, gender & age group classified are given as input to marketing module. Finally, messages will be sent to different user groups as selected by the user.
References [1] Bo Pang, and Lillian Lee, “Thumbs up?: sentiment
classification using machine learning techniques, EMNLP '02”, the ACL-02 conference on Empirical methods in natural language processing - Volume 10, Pages 79-86, Association for Computational Linguistics Stroudsburg, PA, USA ©2002
[2] Miles N. Wernick, Robert M. Nishikawa, Nikolas P. Galatsanos, “A Support Vector Machine Approach for Detection of Micro calcifications”, IEEE Transactions On Medical Imaging, Vol. 21, No. 12, December 2002.
[3] Durgesh K. Srivastava, Lekha Bhambhu, “Data classification using support vector machine”, Journal of Theoretical and Applied Information Technology © 2005 - 2009 JATIT. All rights reserved. www.jatit.org.
[4] Christopher J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, [email protected], Bell Laboratories, Lucent Technologies, Kluwer Academic Publishers, Boston
[5] Mingmin Chi, Rui Feng, Lorenzo Bruzzone, “Classification of hyperspectral remote-sensing data with primal SVM for small-sized training dataset problem”, 0273-1177/$34.00 2008 COSPAR. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.asr.2008.02.012
[6] Dee Shi and Xiaojun Yang, “Support Vector Machines for Landscape Mapping from Remote Sensor Imagery”, Proceedings - AutoCarto 2012 - Columbus, Ohio, USA - September 16-18, 2012
[7] Narashima S. Purohit, Meghana Bhat, Akshata B. Angadi, Karuna C. Gull, (2015) “Crawling through Web to Extract the Data from Social Networking Site-
Twitter”, IEEE National Conference on Parallel Computing Technologies PARCOMPUTECH, India, 2015. doi:10.1109/PARCOMPTECH.2015.7084522, ISBN:978-1-4799-6916-6,pp.1-6. Available:http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7084522
[8] Karuna Gull, Sudip Padhye, Dr. Subodh Jain, (2017), “A Comparative Analysis of Lexical/NLP Method with WEKA’s Bayes Classifier”, International Journal on Recent and Innovation Trends in Computing and Communication (IJRITCC), Volume: 5 Issue: 2, February 2017, pp. 221 – 227 ISSN: 2321-8169 ISSN: 2321-8169. Available: http://www.ijritcc.org or http://www.researcherid.com/rid/A-9769-2016.
[9] http://dni-institute.in/blogs/building-predictive model -using-svm-and-r/
Author
Karuna C. Gull received the B.E. degree in Electronics and Communication from Karnataka University, India in the year 1996 and the M.Tech degree in Computer science and Engineering from the Visvesvaraya Technological
University, India in the year 2008. She has been working in the area of data mining and social networking since 2013. She has published 10 papers in International journals, 6 in International and 6 in national conference proceedings on Data Mining and Image Processing. She has also attended many of the workshops and conferences held in different places on High Impact Teaching Skills, Embedded System Using Microcontroller, Information Storage and Management (ISM), Data Mining, and many more. She worked as a Lecturer and Senior Lecturer for about 15 years. She is currently working as an Assistant Professor in K.L.E.IT, Hubli, India.
Sudip S. Padhye received the Bachelor of Engineering (B.E.) degree in Computer Science & Engineering from Visvesvaraya Technological University, India (V.T.U.) in 2016. He is currently working as Business Intelligence (BI) Developer at KPIT Technologies Ltd. with extensive experience in both ETL
and Reporting tools such as Informatica Data Center & Oracle Business Intelligence Enterprise Edition (OBIEE) respectively. In addition to this, he has many projects on his name such as “Movies & Books Recommender system using Collaborative Filtering”, ”Rainfall predictor using Regression techniques” and “Context-based Attitude Scrutiny using NLP”, to name a few. He has also published 2 International papers in the field of Data mining. He is fascinated towards R, Python and Java & thus has many MOOCs certifications from renowned Universities such as Stanford University, USA and Johns Hopkins University, USA.