Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Twitter Sentiments Analysis on Fitbit Zeyuanyu Long | Di Dai
UC San Diego
School of Global Policy and Strategy
1
Table of Contents
1. Executive Summary
2. Introduction
3. Literature Review
4. Data
4.1. Source of Data
4.2. Dataset Description
4.3. Removing Variables and Data Cleaning
5. Important Findings
5.1. Top 30 Frequent Terms
5.2. Popularity Change of “Charge”
5.3. User Location Distribution
5.4. Words Association
6. Methodologies
6.1. Sentiments Analysis
6.2. Models Building
7. Variables Manipulation and Model Results
7.1. Target Variable Description
7.2. Adding New Independent Variables
7.3. Model 1: Logistic Regression
7.4. Model 2: Random Forest
8. Discussion
9. Limitations
10. Conclusion
Bibliography
Appendices
2
1. Executive Summary
In the era of “Big Data”, sentiment analysis of social media has become a valuable tool for
corporates’ customer experience management, product improvements, marketing strategy
optimization and etc. This study conducted a systematic analysis on Fitbit related sentiments
through the social media platform Twitter and built multiple predictive models to identify the
influence of different factors on the positive and negative sentiments. By a close study of
Fitbit, healthcare industry and Twitter user language preferences, a Fitbit-focused domain-
dependent dictionary was built as the classification criteria for positive or negative attitudes.
Neutral sentiments are omitted in this analysis for its minimal information in explaining
customer preferences.
To assess the target binary variable sentiments which consist of positive outcome “1” and
negative outcome “0”, Logistic Regression and Random Forests models were applied to
predict how gender, location, words about product functionalities, health-related topics, hash
tags and holidays influence the variances of users’ sentiments. For example, the results
through Logistic Regression revealed that “syncing” and “charging” seem to be the most
problematic features of Fitbit for their strongest correlation with negative sentiments. This
link is attributed to large amount of tweets complaining Fitbit’s inaccuracy of syncing data on
users’ mobile app and its malfunction of charging the product. For Random Forest model,
independent variables that have significant coefficient in Logistic Regression result were
selected as input variables for the tree classification. The error rate for predicting negative
tweets is 70.91% without tuning the sample size, which is inaccurate for Fitbit to detect
negative tweets in that case. After adjusting the sample size, tuning the parameters, the
optimal model can minimize the error rate to 33.32% for true positive and 26.25% for true
negative. Although the decreasing positive rate is at the expense of oversampling the negative
3
tweets, it is still worthy since negative tweets are more easily to spur emotions on social
media, and they are also important alerts for further improvement of Fitbit.
2. Introduction
Micro blogging is not only a communication tool for individuals, it is also a strong platform
that can be used for companies to gather information regarding feedbacks of their products,
the trends of the market and so on. Twitter is one of the most popular micro blogging
platforms that are used for research in opinion mining and sentiment analysis. For companies,
sentiment analysis is extremely important since it offers public perceptions in terms of
products, services, even the feedback to advertisement. Companies could take great
advantages in improving products to meet the market demands if they make full use of it,
which is the motivation of this study.
Fitbit is a leading company in producing wireless sports wristbands, with the functions that
can track users’ daily walking steps, sleep quality, heart rate and etc. It is the biggest
competitor of Apple Watch and Jawbone. There are two main reasons to choose Fitbit as the
target of our sentiment analysis. First, healthcare industry is a long-term booming industry,
since it is one of the trends that keep growing rapidly in the recent years. Besides the
healthcare system, people started to pay more attention to nutrition and exercise. Second, in
2015, Fitbit yielded $409.3 millions revenue in the third quarter of 2015 and enjoyed the
growth rate of 168% over years. This delivered a message that Fitbit is a boosting company
in the healthcare industry, which has drawn great study interests. Knowing what features of
their products are the most influential in social media, and what they need to pay attention to
in order to improve their products and services to meet the further command will help Fitbit
build stronger competitiveness. Through building different models for the sentiment analysis,
4
this study is expected to help Fitbit achieve such goal by generating final discussion and
conclusion for their business.
3. Literature Review
This paper is greatly built on Professor Bohn’s help and previous scholars’ works. Most
scholars who conducted the sentiment analysis focused on the text mining of the tweets , and
revising the algorithms to get the most accurate sentiment scores. However, there is little
work have been done regarding to one specific product of one company.
In “Analysis of Twitter Data on E-cigarette Sentiments and Promotion”, the authors cleaned
105,605 tweets on the opinions of e-cigarettes. Then they built up their own domain
dependent dictionary for sentiment classification since common English words emotion
divisions do not all apply to this specific product.1 This motivates this paper to create the
specific domain dictionary for Fitbit, since online emotional terms used for sports and
healthcare could be slightly different from the standard common English. By revising the
dictionary, and changing the evaluation standards, it would improve the accuracy of the
sentiment analysis and the model results.
Besides the common method, which is to assign sentiment score to each tweet, to justify the
overall character is negative, positive, or neutral. However, this method overlooks the base
feature that users to tweet accordingly. “Twitter Power: Tweets as Electronic Word of
Mouth” exceeds the limit. The authors adopted “action-object” approach, which categorize
the objects into several types, like “service”, “appearance”, “duration” and on so. Then the
action could be any expressions that relevant to that category. Based on the pair match, it is
1 Godea, Andreea Kamiana, Cornelia Caragea, Florin Adrian Bulgarov, and Suhasini Ramisetty-Mikler. "An Analysis of Twitter Data on E-cigarette Sentiments and Promotion." Artificial Intelligence in Medicine Lecture Notes in Computer
5
able to generalize which category has the most positive effect, and which features need
further improvement. 2 This method gives this paper to idea to test different features’
importance levels on leading to the public sentiment on Twitter.
“Localized twitter opinion mining using sentiment analysis” analyzed what is the trend and
customers’ reaction of iPhone 6 in social media, and provide the company innovative
approach for market analysis. The authors not only focus on the main function of iPhone 6,
but also they analyzed the distribution of sentiment based on gender and cities. Based on that
information, companies could enhance the market strategy demographically.3
4. Data Description
4.1. Source of Data
All the data applied in this project is downloaded from Crimson Hexagon
www.crimsonhexagon.com, a company dedicated in social media data analytics. We built a
public monitor using Twitter as data source in Crimson Hexagon. The data selection criteria
are presented as in Table 1 below:
Table 1: Summary of Dataset Description
Key words fitbit
Country United States
Language English
2 Jansen, Bernard J., Mimi Zhang, Kate Sobel, and Abdur Chowdury. "Twitter Power: Tweets as Electronic Word of Mouth." J. Am. Soc. Inf. Sci. Journal of the American Society for Information Science and Technology 60.11 (2009): 2169-188. Web. 17 Mar. 2016. 3 Hridoy, Syed, M. Ekram, Mohammad Islam, Faysal Ahmed, and Rashedur Rahman. "Localized Twitter Opinion Mining Using Sentiment Analysis." Decision Analytics. 2015. Web. 18 Mar. 2016.
6
Date Range 12/1/2014 - 3/31/2015
4.2. Dataset Description
Based on the searching criteria, we were able to collect 45 raw datasets on different dates and
compiled them into one dataset that contains 347, 241 observations, 16 variables during the 4
months. We selected this time range for the reason that one of Fitbit’s popular product
“Charge HR” was released during January 2014. By focusing on the data during this period,
we are interested in learning the market reactions towards this new product and the change of
topic density during the 4 months.
Table 2: Description of Input Variables
Name Description
GUID Globally unique identifier for each observation
Date..GMT. Date in m/d/y h:m format in Greenwich Mean Time
URL Original URL of each tweet
Contents Text content of each tweet
Author The author who post the tweet
Name The twitter username of the tweet
Country This variable only have 1 unique value: United States
State.Region The state where the tweet was post
City.Urban.Area The city where the tweet was post
7
Category Sentiment results based on Crimson Hexagon’s algorithm
Source This variable only has one unique value: Twitter
Klout.Score Score of sentiments based on Crimson Hexagon’s algorithm
Gender Gender of users
Posts Total number of post the user has post
Followers Total number of followers of the user
Following Total number of other accounts the user is following
4.3. Removing Variables and Data Cleaning
4.3.1 Removing Variables
One of our objectives of this project is to create a new domain-dependent dictionary for more
accurate sentiments analysis. Thereby, we deleted the variables “Category” and
“Klout.Score” that were the sentiment analysis outcomes generated by Crimson Hexagon’s
built-in algorithm. Furthermore, variables “ URL”, “Country” and “Source” were removed
from our list for its irrelevance and redundancy to our project analysis.
4.3.2. Removing Fitbit Records Tweets and Ads
One of fitbit’s functionalities integrates twitter to post the record of users daily steps through
a tweet. Thereby, a large number of tweets in our dataset contains the format “ My fitbit
#Fitstats for m/d/y, xxx steps and xxx miles travelled”. To avoid this noise in our analysis, we
used R function “grepl” to separate tweets that contains this format from our dataset. Besides,
a quick sort of the data also led us to several suspicious advertisement tweets that were
8
repeated more than 100 times. We also eliminated those tweets from our dataset in order to
improve the accuracy in our text analysis. The key content in those large volume ads tweets
and their frequency are collected in Table 3:
Table 3: Key Contents of Automated and Ads Tweets
Content Number of times post
“ RT @FitBookFan Getting #fitbit...” 158
“I want to win the Fitbit Zip from @GetMyHealthyOn for
#NationalNutritionMonth!”
266
“I'm in the running to win a new @fitbit Charge from
@sleepopolis”
137
After the preliminary data cleaning, the new dataset contains 287,381 observations without
fitbit records tweets and repeatedly posted advertisements tweets.
4.3.3. Text Mining
In order to get rid of the irrelevant information in the tweet contents for sentiment analysis,
we used the “tm” package in R to conduct text mining. Through building a corpus and
specifying the source to be the “Content”, we cleaned the text by removing the URL,
punctuation, stopwords, non-english words and converting all the letters into lowercase. The
processed text was eventually converted into term document matrix.
5. Important Findings
5.1. Top 30 Frequent Terms
9
By cleaning up the text, we are able to identify which words have been frequently used when
twitter users were talking about Fitbit. We defined the minimum frequency to be 200, and the
result contains more than 800 words. We only focus on the top 30 of them. As one of the
popular features on twitter is using hashtag to label certain words that are connected with
specific topics or themes, we noticed that the hashtag word “getfit ”, “weightloss” and
“motivation” are at No.1, No.7 and No.23 in the list. Besides the hashtag words, we also
found that words “workout”, “fitness”, “calories” and “diet” are frequently associated with
fitbit related tweets. This result provided information to understand users’ interests and the
popular topics, which are helpful for Fitbit to capture the potential trend within the market as
well as to create marketing strategies that are more preferable to users.
Figure 1: Top 30 Frequent Terms
5.2. Popularity Change of “Charge”
In 2014 October, Fitbit announced a new version fitbit named “ Charge HR”, which was
officially released in January 2015. By calculating the change of volume of the tweets that
10
contains the name of the product inside, we can see that general trend of the “Charge HR” is
very low. There are some small peaks on 07/12/2014, 06/01/2016, and 28/03/2016. Looking
back to the original dataset, these peaks are primarily resulted from large numbers of
commercial and promotion tweets. For example, the tweet “#Fitbit #Charge - ItÕs been more
than a year since the ill-fated Fitbit Force tracker was pulled offÉ http://t.co/iBQ8KGLkpf
#OtherTech” appeared more than 1000 times on 07/12/2014. The reason that these
commercial tweets are not excluded is they also deliver the strong message that Fitbit and
other interests-relevant parties are trying to promote the new product. However, as shown
from the Figure XX below, even though with the strong intervention of promotion tweets, the
daily tweets about “Charge HR” consists of a very small percentage.
Comparatively, December and January have more tweets about “Charge HR” than February
and March, which means there are more users following the new product, but there was no
heated discussion on Twitter in regarding to its new functions and so on. There are two
possible reasons that are accounted for that. On one hand, there is no revolutionary
improvement or some breakthrough points on “Charge HR” that worthy to bring up the
discussion. On the other hand, the promotion method of “Charge HR” is not effective. Even
though with all these promotion and automated tweets, it still failed to encourage the
discussion.
11
Figure 2: Popularity of Charge During 2014/12/01-2015/3/31
5.3. User Location Distribution
Not all twitter users post their location too while tweeting, so the location information that
this paper collected is limited. The total valid observations that using location is 179,710,
consist 62.5% out of the total dataset. Through Figure XX, the general distribution in states
can still be captured. California and New York have the most tweets on Fitbit, which make
sense since these two areas have higher incomes than other states, and healthy lifestyle is
more promoted. However, that does not necessarily mean Fitbit are very popular and have
more positive feedback in these two states given more tweets. Therefore, in the model
section, “California” and “New York” are included as two independent variables to test
whether Fitbit are more welcomed in these two places.
12
Table 4: Number of Location Values
Missing value 107,671
Unique Value 51
Valid Observation 179,710
Figure 3: Distribution of tweets volumes in top 10 States
13
5.4. Words Association
Table 5: Words association with “Fitbitsupport”
Among our frequent terms, we noticed that the word “fitbitsupport” is among the popular
tags. Based on our research, this is a twitter account for fitbit customer service. Most of the
time, when people @“fitbitsupport”, they were either trying to find solution through fitbit
customer service about the issues they encountered with the product or services, or they were
expressing their feedbacks about the services that “fitbitsupport” provides. To understand
what issues and topics that people want to get answer from the customer service, we ran a
word association test to see the words that have more than 0.05 correlations with the word
14
“fitbitsupport”. In order to have a meaningful conclusion, we focus on studying the words
that are related to fitbit’s product features and functionalities. According to the result, which
is presented in the following table, we found that “reset”/“restart”/ “restarting”/ “restarted” all
refer to the similar meaning in this context. A further scrutiny of the original dataset, we
realized that users have been having trouble with resetting or restarting their fitbit. Similarly,
another major problems that consumers frequently complain about when they expect
“fitbitsupport” to help them are “sync” and “charge”. A large number of people had problem
sync their fitbit data with their app on the phone, or the synced data are not accurate. Besides,
people have been complaining about fitbit failed to charge properly even when the product is
still new. That information draw our great attention on the product functionalities and
customer service quality, which plays a critical role in maintaining customer loyalty and
growing market demand. Besides pointing out the importance of customer service, this result
also alerts fitbit about the necessity of better product design and quality, which is essential for
building strong competitiveness in the market, especially for healthcare and tech industries.
A few examples of such complaints have been listed below to help explain the problems.
“@FitbitSupport Hi. Fitbit flex not holding charge (3mth old). have reset it 3x, cleaned
it...Also will not sync anymore. can you help?”
“@FitbitSupport Hi. Fitbit flex not holding charge (3mth old). have reset it 3x, cleaned
it...Also will not sync anymore. can you help?”
“@FitbitSupport @fitbit is there anything I can do to avoid having to restart/reset my
#fitbitflex everyday so it will sync?”
15
However, on the other hand, we are glad to see the word “thanks” and “thank” have
correlation around 0.05 with “fitbitsupport”. Those expressions came from users whose
problems got solved by “fitbitsupport” after they @ the account. This is a sign of fitbit’s
efforts on providing successful customer services, which should be the company’s incessant
pursuit in order to win their customers back.
“@FitbitSupport Much appreciated, thankfully found my Fitbit this morning. You have the
best customer service EVER! You helped me before, THX”
“@FitbitSupport @fitbit I am speechless with your amazing gesture. Thank you so much
for sending me a brand new Fitbit for the one I'd lost.”
“Had the best customer service with @fitbit @FitbitSupport thanks Abby P!”
Besides the customer services, this result also inspired our curiosity about several important
twitter account including “talkmaster”, “seeshawnlive” and “everybodywalk”, all of which
have more than 0.05 correlation with the “fitbitsupport”. Through our research, we diagnosed
that the reasons for “talkmaster”’s frequent occurrence was related to fitbit’s failure of
providing satisfactory experience for a popular user, Neal Boortz, who used to be a host on a
talk radio and has more than 184,000 followers. In Dec 17, 2014, “talkmaster” @fitbitsupport
asking for help with the problem of setup (picture below). The popularity of this user drew
attention among its followers who participated and commented on the conversation
between“talkmaster” and “fitbitsupport” by quoting both of them on Twitter. This could be
another alert for fitbit to understand the level of influence that important customers could
generate among other customers. While striving to provide good service for those important
customers, fitbit should also be careful about the potential of the negative reviews from those
16
customers could be easily amplified especially through social media.
Another username “seeshawnlive” that has been frequently quoted was related to an issue
about fitbit’s sponsorship of a show that contains content degrading women. This criticism
started from the user “seeshaenlive” has been retweeted and spreaded on twitter rapidly
during December, causing a huge pressure on fitbit’s public relation. The original content of
the tweets in “seeshawnlive”’s comment is “@fitbit @FitbitSupport stop sponsoring
#SororitySisters, the show degrades women. Pull your ads. #BoycottSororitySisters
@dstinc1913”. This test enabled us to identify the unfavorable public relations that fitbit are
faced with. More importantly, it showed the importance for fitbit to select appropriate
marketing strategy and build a responsible corporate image.
Additionally, the result also led us to discover a user called “everybodywalk”, which stands
for an award-winning campaign aimed at getting Americans up and movin. They provide
news and resources on walking, health information, a personal pledge form to start walking,
as well as a place to share stories about individual experiences with walking.4 This discovery
explained how fitbit could potentially utilize public campaign or organizations that shared
information about healthcare to reinforce its presence among the users.
4 http://everybodywalk.org/about/
17
6. Methodologies
6.1. Sentiments Analysis
6.1.1. Building Domain-dependent Dictionary for Fitbit
As mentioned before, in order to get more accurate sentiment scores, building Fitbit’s own
domain-dependent dictionary is necessary. The base dictionary is from Ming Hu and Bing
Liu.5 Except for the common sense, more changes on the dictionary is based on the reference
to the original tweets when find something usual on the model's results. The revised part is
shown in the table XX below, while the whole version of the this dictionary is included in the
Appendix.
Table 6 Revised Words in Domain-Dictionary for Sentiment Analysis
● There are more revision to the positive words than to the negative ones. Given to
specific condition, negative words actually expressed in a positive feedback to the
product. However, it is rarefly to transfer general positives to negative ones.
● “Loss” in general is negative meaning, but in the Fitbit case, it is more likely to refer
to as lose weight, which shows a positive towards to Fitbit’s function. Same meaning
applies to “burn”, which means the users feel motivated to burn calories, such as this
5 Hu, Mingqing, and Bing Liu. "Mining Opinion Features in Customer Reviews - UIC." Web. 18 Mar. 2016.
18
tweet “What's good abt shoveling #Snow off an extra-wide,paved driveway in
#Winter?! Good workout/burns cals & calibrates my Fitbit! ___ #MONTANA”
● “Obsessed” , and “addicted” shows strong positive emotion towards Fitbit, since the
users feel it is already a habit to use Fitbit, checking the daily status, and cannot live
without it. The common emotional examples using these words in the dataset is “
@hijinksandhalos . @hijinksandhalos TaraLynne71 I'm FitBit obsessed so get my
10,000 steps a day in or am going for a walk at 11:00 pm! #girlstravel”;
● “Win” is deleted from the positive words dictionary, since most of the tweets
contained the word is advertisement, “Enter here to win a new Fitbit Charge”;
Considering the unique characteristic of sports band, accuracy in detecting steps and
constantly connected to the phone through bluetooth is extremely important. If Fitbit
fail to achieve this basic goal, it would greatly impact the consumers’ confidence in it.
Therefore, “inaccurate” and “disconnect” is added to the negative words dictionary to
track users’ disappointment emotion on it.
● There are also some words that cannot well defined given to the different ways of
usage. For example, “killing”, it can either show strong positive attitude in tweet like
this “@fitbit I love the #Charge, but when will the #ChargeHR be available!? _ The
suspense is killing me!!! __”, expressed huge disappointment “@Fitbit please start
supporting @Google Fit. cc: @FitbitSupport. You’re LITERALLY killing me, since I
can’t see all health stats. ;-)” , or just expressed something that irrelevant to Fibit
product itself, like “This Fitbit thingy would be more useful if it could tell me how
many times I picked up my 20 lb son. My back is killing me.” Therefore words like
“killing” are eliminated from the dictionary lists, since no matter include them into
which categories; there will be huge biased outcomes.
19
6.1.2. Sentiments Score
The methodology to calculate sentiments score is to extract every word from tweets, then
pairing the positive and negative words in the domain-dependent dictionary. For each
matching words, positive ones will assign the value of “+1”, while negative ones will get “-
1”. The final sentiment score is adding the scores for each tweet. In order to run logistic and
Random Forests in the next section, sentiment scores for tweets are transferred to binary
variable, while “1” stands for all the tweets possess positive values, while “0” means negative
tweets. “0” stands for the neutral tweets, which is regarded as missing values in this case,
since they do not contain much information to study on.
What worth noticed is the automated tweets are excluding in this sentiment analysis. From
table 7 below, it can be seen that the positive tweets consist a great percentage out of total
tweets, which is 79%, while the negative tweets are only 21%. This delivers a message that in
general twitter users have positive perception towards Fitbit. Nevertheless, the negative
tweets are more valuable for Fitbit to improve the product's function, and revising their
marketing strategy.
6.2. Models Building
Logistic Regression and Random Forests are the two models that this paper adopts to conduct
the sentimental analysis. Through logistic Regression, each independent variables’
association with the target variables can be elaborated clearly. Different from traditional
regression that focused on the whether the variables are significant, this paper brings more
attention to study what are the variables that have the most significant impacts on the
sentimental scores. Through looking back to the original tweets, and studying the unusual or
the most important variables, the specific reasons that caused the outcome could be
20
discovered. Random Forests has an advantage of dealing with big dataset. By tuning the
parameters, like the number of trees, number of variables, it could dramatically changed the
model’s accuracy rate. What’s more, by oversampling critical events, in this case, the
negative tweets, Random Forests could take into more weight on the negative comments
which target company cares more.
7. Variables Manipulation and Model Results
7.1. Target Variable Description
Through defining the “positive”, “negative” and “neutral” for the sentiments score, we
created a new variable named “Sentiments”. We further assigned the “positive” with value
“1” and the “negative” with value “0”. Since our main objective is to understand how non-
neutral sentiments vary with different independent variables, we excluded the ones that
represent “neutral” from our consideration. Thereby, our target variable is the newly-built
binary variable “Sentiments” that contains 2 unique values and 287,381 total observations.
150,636 valid observations that consist of 31, 984 value “0” (21%) and 118,652 value “1”
(79%) will be applied in our final models.
Table 7: Values of Sentiments Scores
21
7.2. Adding New Independent Variables
To understand what factors are associated with each sentiment, we decided to create
numerous new binary variables to assess the correlation between them and the sentiments.
Among the frequent term list that we concluded based the frequency benchmark of over 200,
We found certain words aligns with our research interests based on our knowledge about their
importance. We reviewed the frequent terms list and decided to use the related terms to
created numerous binary variables based on 5 categories: feature, topic, hashtag, product and
holiday. Based on whether each tweet contain those words inside, we assigned value “1” if
it’s true, and “0” if it is false. In order to maintain the accuracy in building models, each
variable contains all the format of those words. Besides the term variables, we are also
interested in seeing whether being in state California and New York will has significant
influence on the target variables. Another two independent variables are created as “
california” and “new york”, which are assigned 1 when users’ locations match with them. To
further see the interactive influence of location and gender, we created interaction terms using
“Gender” and “california, “Gender” and “new york” as two new variables. To make the
analysis more interpretable, we transformed the variable “Followers” and “Following” into
natural log format. Detailed information about the variables and containing words are
concluded in the Table 8 below:
22
Table 8: Input Variables Description
7.3. Model 1: Logistic Regression
7.3.1. Interpretation of Estimated Coefficients
To assess the correlations between each independent variable and the sentiments, we used
logistic regression that focus on analysing the relationship for a binary dependent outcome to
run our data. The dataset was partialed into training, validation and testing with proportion of
60/30/10 respectively. The result of each variable’s coefficient is attached as an appendix.
Although, the majority of our coefficients are more positively correlated to the sentiments,
for the purpose of the business cost, we are more interested in identifying variables with
negative coefficients that include “wireless”, “ sync” , “sleep”, “fitbitsupport”, “Christmas”,
“GenderM”, “California”, “New York”and the gender and location interaction terms. Results
of those variables are listed below based on the significance of the coefficient.
23
Table 9: Results of Negative Coefficients
The coefficients represent the odd ratio of each variable’s likelihood to have positive
sentiments. According to the results, the words “sync”, “wireless”, “charge” are usually
more likely associated with negative emotions. As it is included in our previous word
association analysis, those words lead us to the product feature problems that have became
more and more annoying among the users. The majority issues that users were complaining
about were fitbit’s inability to sync data accurately or charge properly. As wireless is a word
people mentioned when they are talking about sync, the negative odd ration of it is not
unexpected. Similarly, fitbitsupport is an official account for fitbit customer services. By
tagging “fitbitsupport”, users ask questions and solution for the problems they encountered
while using Fitbit. Thereby, it is naturally to expect “fitbitsupport” being more likely to link
to a negative expression.
We found gender of the user has some interesting discrepancy in their influence on
expressing sentiments. Among the 162,768 observations that clarified the gender of the user,
around 60% of them are female users while the rest 40% are male users. However, the result
indicates that on average, male users are more likely to have log (0.17) percentage probability
24
to have negative sentiments than a female user. This variable is further integrated with
another two popular location variables, California and New York City. Surprisingly, male in
California have higher likelihood to generate negative feeling about fitbit compared with
female users in California. While in New York City, although the difference between genders
is less significant, female users outweigh male to have higher likelihood of expressing
negatively. Our assumption for this discrepancy is the fundamental difference in
demographics in this two location. New York city in general has more female population than
male while in California the situation is opposite. However, it is also a negligible discovery
that is useful for fitbit to better understand users’ preferences.
Although the word “sleep” is also among the list of our negative coefficients, this result is
actually attributed to users’ realization of the poor quality of their sleep after using fitbit.
Since measuring the efficiency of sleep is one of fitbit’s functionality, this result will be
useful for fitbit to understand the demand and preferences of the customer, so as to improve
user experience through better product design.
This paper defined a series holiday related terms to see how they are associated with
sentiments. Generally, people enjoying buying fitbit as a gift during the holidays, thus it’ s
not surprising to predict positive sentiments with holiday. However, the only outlier is
“Christmas”, whose negative odd ratio indicated that when people are mentioning about
“Christmas”, they tended to have negative emotions. Since this is a notable difference, we
further looked up the tweets talking about Christmas in the dataset, it turned out that on 2014
christmas day, Fitbit website was under planned maintenance. Many users expressed their
disappointment when this inconvenience happens right on holiday, causing the significant
deviance from a generally positive holiday emotions.
25
7.3.2. Model Evaluation
Through running confusion matrix evaluation on the validation data, which is 30% of the
total data, we obtained the error matrix for our model as it is shown below. Our model in
general have high accuracy in predicting the positive sentiments while less so for negative
sentiments. This is largely due to the unbalanced distribution of the target variables in our
model. As the 79% of the valid observations contain positive sentiments while only 21%
represent negative value, it is not surprising to see the differences in the predicting ability for
each class. When predicting the positive sentiments, 1955 turned out to be negative while
10041 actually matches prediction. While when the predication is negative, 372 matches with
the prediction and 276 failed. The error rate for predicting the negative events is 84%, which
is much higher than the rate of accurate prediction of actual outcomes. However, this result
could be risky considering that the error of predicting a positive result will incur more cost
and the error predicting negative events. Thereby, our model still waits to be modified in
order to have a more accurate prediction capacity.
Table 10: Error Matrix Result for Logistic Regression Model 1
26
7.3.3. Adjustment of Model: Model on Sample Dataset
In order to avoid the unevenly distributed target results and to improve the predictive ability
of our model, a dataset that consists of equally 50% of positive sentiments and negative
sentiments are created by sampling the same size of all the negative sentiments data from the
total positive sentiments dataset. The newly created sample dataset contains 63,968
observations, which has 31,984 for each positive and negative outcome.
The coefficients results have been attached as appendix.
Table 11: the negative coefficients of adjusted model
New model didn’t significantly change the correlations that we have identified for the first
model, except that the term “ sync” became the strongest correlation with the negative
sentiments compared with “wireless”, and the gender difference started to make male users
more highly correlated to negative sentiments. Another noticeable change is the variable
“Follower”, which has been transformed into natural log format. The results indicated the
negative correlation between the number of followers that a user has and the sentiments,
27
which provides an alert for fitbit to maintain good customer relationship especially with
influential users.
7.3.4. Model Evaluation:
Using the same error matrix method on 30% of validation data, the predictive ability of the
model is less impaired by the uneven distribution of target variable.
Table 12: Error Matrix for Adjusted Model
The general average class error rate has significantly reduced to 28% from 43%, although the
prediction of positive sentiments has raised greatly compared with the model using all the
positive sentiments. On the other side, the error rate for predicting negative sentiments has
dropped down to 17% in comparison with the 84% in the first model. The result also shows
that the total amount of error is the lowest (814) when the prediction of positive turned out to
be negative, which is considered to be the most costly situation in this case. Thereby, this
new model has more desirable predictive capacity for more future analysis.
7.4. Model 2: Random Forests
7.4.1. Input Variables Selection
28
Selected input variables for Random Forest model is mostly based on the coefficients of
logistic regression. Variables have bigger coefficients, which cause significant impact is
selected in the Random Forests model. After re-examining the error rates through different
combinations of variables, the optimal input variables that are used for the final model
building is listed as below:
In this section, the base model uses 60% of total sentiments dataset as training, 30% is
validation, and 10% is used for testing. Selected seed is 800, 500 trees, and 5 variables at
each node. As shown from Table 14 below, without tuning the sample size, the results are
great driven by the positive tweets, since it consist of 70% of total dataset. The error rate for
positive tweets is 1.57%, which is extremely small. Negative tweets have the error rate of
70.91%, which is worrisome, since it is almost impossible to detect the negative tweets, and
help Fitbit to improve their products.
Table 14: Confusion matrix of base model
Since the accuracy rate for negative tweets are more valuable for Fitbit. Through tuning the
sample size and number of trees, this paper demonstrates to achieve a higher accuracy rate for
the model, especially the negative tweets. Through oversampling the negative tweets with
different portions, the trend of model error rate changes can be captured. As shown in Table
15 and Figure 4 below, as more negative tweets are oversampled, the accuracy rates for the
true negative increases significantly, from 73.10% in (40,45) to 87.24% in sample size
29
(55,45). However, this significant increase is at the expense of lower the true positive
accuracy rate.
From Figure 4, it could be seen that the trend of average accuracy rates for true positive and
true negative is very stable, which is from 68.84% to 70.13%, due to the increasing negative
accuracy rates are balanced off with the true positive ones. Since the average accuracy rates
are almost constant, Fitbit could adopt different sample sizes model according to their own
timely need. (55,45) does a very impressive job in classifying negative tweets, so Fitbit could
constantly get to know what characters of the product are being criticized the most on Twitter
from public perception, and adopt revision methods accordingly. Nevertheless, this sample
size also has some shortcomings. Due to the low positive accuracy rate, the company may
spend quite an amount of investment of human and capital resources in dealing with false
alarms. In the long term, weighing the advantages and disadvantages, oversampling the
negative tweets are still on the edge, since losing the real-time feedback, especially the
critical ones will put Fitbit in a more dangerous position given the heat competition in health
care sports bands field.
Table 15 Confusion Matrix for different sample sizes
30
Figure 4: Accuracy Rates trend for different sample sizes
Apart from the sampling ratio in the training dataset, the number of trees in the model also
plays an equally important part in improving the accuracy rate. This section used the sample
size (40,45), since the accuracy rates of true positive and true negative are very close, it is
more easily to detect the changes of both accuracy rates through changing the number of
trees.
As shown from Table 16 and Figure 5, adding more trees does not mean the accuracy rates
will improve. There is a certain threshold, which the accurate results will decrease after
peaking the threshold. In this case, the average accuracy rate declines after reaching 70.21%
in 600 trees. 400 trees achieves the highest true positive rate as 68.65%, while it does a poor
job in detecting the true negative tweets, which has the lowest accuracy rate as 71.28%. 400
trees definitely will not be chosen among all numbers of trees. What’s worth noticed is after
400 trees, the accuracy rates for negative tweets has been increasing when adding more trees,
while positive tweets has a decreasing accuracy rate trend. This delivers a strong message
31
that adding more numbers of trees is helpful in detecting the negative tweets than the positive
ones in this model.
Table 16: Confusion Matrix for various numbers of trees
32
Figure 5: Accuracy Rates Results for different number of trees
8. Discussion
● Fail to bring up the heat discussion on ChargeHR
ChrageHR was pronounced to launch in October 2014, and officially released in January
2015. However, during December to March, there was no hot discussion as expected, like
introducing the new functions, users experiences and so on. The discussion level is far away
to achieve the same extent when Apple Watch released. The most volume tweets about
ChargeHR everyday are automated tweets through machine, possibly by Fitbit, or its
partners. Therefore, even though Fitbit has enjoyed tremendously revenue growth rates year
by year, it lacks of the public popularity, so there was little discussion in social media even
though on a newly product. Several ways can be adopted to improve public popularity: have
more interactions with users on Twitter; introduce revolutionary functions; cooperate with
popular twitter users, and to increase the exposure.
33
● Disappointment Functions:
When it comes to the negative tweets, most contents is about the failed function of syncing
Fitbit to the phone, and failing to charge. These two functions are the at the urgent status to
be fixed since they caused very strong negative effects to the overall sentiments.
● Demographic Marketing
This paper included gender, New York City and California as the demographic variables in
the models. California males tend to have more negative tweets than females, which could be
understood that comparing to other areas, California males are more sporty, and have more
knowledge about sports bands, therefore they are more critical. New York females are more
likely to be negative towards Fitbit than males. This is interesting information, since on one
hand Fitbit could include more cities and states to study what are its popularity levels in
various regions. On the other hand, it would be useful to learn in certain regions, what
features that males care more, and what functions that females bringing more attention to. In
this case, customized products can be produced to meet the market.
9. Limitations
This paper assigns equal weight to all positive and negative words in the tweets, and adds the
value of positive and negative ones to get the sentiments scores. This method has several
limitations. First, some words should express stronger emotions than other words, for
example, “wtf” is certainly a higher angry level than “doubtful”. If could assigning the scores
to the words according to their strong level, the sentimental analysis would be more accurate.
Second, this paper initially intended to compare users’ tweets regarding to Fitbit’s
competitors, like Apple Watch, Jawebone and so on. However, it is hard to distinguish what
the emotional words exactly refer to in the tweet.
34
10. Conclusion
This paper demonstrates detailed quantitative and qualitative analysis of public perception
toward Fitbit on Twitter. Generally speaking, Fitbit enjoyed a high reputation in social media,
since the positive tweets consist of 79% out of the total tweets. Nevertheless, the negative
tweets shall bring into more attention; due to they can be regarded as the possible direction to
improve products’ features and marketing strategies. Based on the frequency terms during
12/01/2014 to 03/31/2015, the motivation of getting fit and weightloss brought up the most
discussion on Twitter, since “getfit”,“fitness”,“workout” is among the top 30 most frequent
terms. Through the models this paper built, it can be seen the function of tracking exerices
and geting fit have more positive effect on comments about Fitbit. While syncing and
charging lead to more negative comments, which are the functions that Fitbit is in need to
improve.
35
Bibliography
Kouloumpis, Efthymios, Theresa Wilson, and Johanna Moore. "Twitter Sentiment Analysis:
The Good the Bad and the OMG! - Edinburgh Research Explorer." Fifth International AAAI
Conference on Weblogs and Social Media. Web. 18 Mar. 2016.
Hu, Mingqing, and Bing Liu. "Mining Opinion Features in Customer Reviews - UIC." Web.
18 Mar. 2016.
Barbosa, Luciano, and Junlan Feng. "Robust Sentiment Detection on Twitter from Biased
and ..." AT&T Labs -Research, 2015. Web. 18 Mar. 2016.
Davidov, Dmitry, Oren Tsur, and Ari Rappoport. "Enhanced Sentiment Learning Using
Twitter Hashtags and Smileys." The Hebrew University, 2010. Web. 18 Mar. 2016.
O'Connor, Brendan, Ramnath Balasubramanyan, Bryan Rouledge, and Noah Smith. "From
Tweets to Polls: Linking Text Sentiment to Public ..." International AAAI Conference on
Weblogs and Social Media, May 2010. Web. 18 Mar. 2016.
Jansen, Bernard J., Mimi Zhang, Kate Sobel, and Abdur Chowdury. "Twitter Power: Tweets
as Electronic Word of Mouth." J. Am. Soc. Inf. Sci. Journal of the American Society for
Information Science and Technology 60.11 (2009): 2169-188. Web. 17 Mar. 2016.
Bifet, Albert, and Eibe Frank. "Sentiment Knowledge Discovery in Twitter Streaming Data."
Web. 18 Mar. 2016.
36
Alahmadi, Dimah Hussain, and Xiao-Jun Zeng. "Twitter-Based Recommender System to
Address Cold-Start: A Genetic Algorithm Based Trust Modelling and Probabilistic
Sentiment Analysis." 2015 IEEE 27th International Conference on Tools with Artificial
Intelligence (ICTAI) (2015). Web. 17 Mar. 2016.
Gao, Huiji, Jalal Mahmud, Jilin Chen, Jeffrey Nichols, and Michelle Zhou. "Modeling User
Attitude toward Controversial Topics in ..." Association for the Advancement of Artificial
Intelligence. 2012. Web. 18 Mar. 2016.
Yu, Yang, and Xiao Wang. "World Cup 2014 in the Twitter World: A Big Data Analysis of
Sentiments in U.S. Sports Fans’ Tweets." Computers in Human Behavior 48 (2015): 392-
400. Web. 17 Mar. 2016.
Hridoy, Syed, M. Ekram, Mohammad Islam, Faysal Ahmed, and Rashedur Rahman.
"Localized Twitter Opinion Mining Using Sentiment Analysis." Decision Analytics. 2015.
Web. 18 Mar. 2016.
Dacres, Shana, Hamed Haddadi, and Mattew Purver. "Topic and Sentiment Analysis on
OSNs: A Case Study of Advertising Strategies on Twitter." May 2014. Web. 18 Mar. 2016.
Godea, Andreea Kamiana, Cornelia Caragea, Florin Adrian Bulgarov, and Suhasini
Ramisetty-Mikler. "An Analysis of Twitter Data on E-cigarette Sentiments and Promotion."
Artificial Intelligence in Medicine Lecture Notes in Computer Science (2015): 205-15. Web.
18 Mar. 2016.
37
Appendix:
Coefficient of Logistics Regression Model 1 (Unadjusted)
Variable Estimated Coefficient
wireless(0,1] -0.911
sync(0,1] -0.885
New York(0,1] -0.677
fitbitsupport(0,1] -0.619
sleep(0,1] -0.496
charge(0,1] -0.436
Christmas(0,1] -0.295
GenderM -0.178
Posts -0.107
Gender CaliforniaF [0,0] -0.044
California(0,1] -0.012
Gender New YorkF [0,0] 0.003
Followers 0.049
Following 0.117
getfit(0,1] 0.118
badge(0,1] 0.129
38
calor(0,1] 0.151
surge(0,1] 0.387
diet(0,1] 0.397
birthday(0,1] 0.485
step(0,1] 0.553
compet(0,1] 0.561
Chargehr(0,1] 0.683
monitor(0,1] 0.806
fitstat(0,1] 1.101
health(0,1] 1.129
valentine(0,1] 1.254
gift(0,1] 1.360
flex(0,1] 1.365
weightloss(0,1] 1.407
motivat(0,1] 2.163
exercis(0,1] 2.287
weight(0,1] 2.356
scale(0,1] 2.370
workout(0,1] 2.492
39
dodger(0,1] 7.585
Sample tweets with negative sentiments about “sync”
@fitbit very disappointed wont sync with iphone and website down? Very bad. #Amazon
return!
#Fitbit is not syncing on computer, iPad, or iPhone. I am filled with rage. #rage #fitness
@fitbitsupport Whats wrong with storing data locally or an established cloud account?
Sync only to #FitBit is not bulletproof design C-today
Sample tweets with negative sentiments about “charge”
RT @shanselman I'm so sick of FitBit. Wife and my Flex will no longer charge. Lasted 12
months 3 weeks, just after the warranty. Built in batteries suck.
Annoyed with my fitbit charge. Kept waking me up last night due to battery dying. Doesnêt
help that thereês no indication on the screen.
@FitbitSupport Sent email Sun re my fitbit won't charge after cleaning etc. This is 2nd
failed unit in <1yr. How about an upgrade? Miss it!!
40
Sample tweets with negative sentiments about “Christmas”
@fitbit maintenance on Christmas Day?? Bad move! Worst first impression for the
people I purchased fitbits for. Thanks for messing up!
Considering how hard @Fitbit was pushing their products as Christmas gifts, they picked a
really bad day for server maintenance.
RT @Dave_Saba Seriously @fitbit - site down on Christmas morning for "maintenance" --
worst customer experience to start out ownership. Ridiculous
Coefficient of Adjusted Model using Sample Data
Variables Estimated Coefficient
sync(0,1] -0.854
Gender_ New_YorkM_(0,1] -0.803
fitbitsupport(0,1] -0.591
wireless(0,1] -0.532
sleep(0,1] -0.508
charge(0,1] -0.271
Christmas(0,1] -0.143
Gender_ CaliforniaF_[0,0] -0.088
getfit(0,1] -0.013
41
Posts 0.000
Followers 0.000
Following 0.000
calor(0,1] 0.332
badge(0,1] 0.335
diet(0,1] 0.339
surge(0,1] 0.368
monitor(0,1] 0.513
Gender_ CaliforniaM_(0,1] 0.577
step(0,1] 0.583
Gender_ CaliforniaM_[0,0] 0.600
Chargehr(0,1] 0.649
compet(0,1] 0.844
Gender_ New_YorkF_[0,0] 0.868
fitstat(0,1] 0.949
flex(0,1] 1.101
health(0,1] 1.126
valentine(0,1] 1.280
42
weightloss(0,1] 1.314
gift(0,1] 1.434
weight(0,1] 2.195
scale(0,1] 2.202
motivat(0,1] 2.343
workout(0,1] 2.373
exercis(0,1] 2.401
dodger(0,1] 8.132
Sample of Domain-dependent Dictionary for Fitbit6
6 Due to the amount of the words cannot be all listed in the paper, please contact the authors for further information