Twitter Sentiments Analysis on Fitbit - BigData@UCSD · scholars who conducted the sentiment analysis focused on the text mining of the tweets , and revising the algorithms to get

Twitter Sentiments Analysis on Fitbit Zeyuanyu Long | Di Dai

UC San Diego

School of Global Policy and Strategy

1

Table of Contents

1. Executive Summary

2. Introduction

3. Literature Review

4. Data

4.1. Source of Data

4.2. Dataset Description

4.3. Removing Variables and Data Cleaning

5. Important Findings

5.1. Top 30 Frequent Terms

5.2. Popularity Change of “Charge”

5.3. User Location Distribution

5.4. Words Association

6. Methodologies

6.1. Sentiments Analysis

6.2. Models Building

7. Variables Manipulation and Model Results

7.1. Target Variable Description

7.2. Adding New Independent Variables

7.3. Model 1: Logistic Regression

7.4. Model 2: Random Forest

8. Discussion

9. Limitations

10. Conclusion

Bibliography

Appendices

2

1. Executive Summary

In the era of “Big Data”, sentiment analysis of social media has become a valuable tool for

corporates’ customer experience management, product improvements, marketing strategy

optimization and etc. This study conducted a systematic analysis on Fitbit related sentiments

through the social media platform Twitter and built multiple predictive models to identify the

influence of different factors on the positive and negative sentiments. By a close study of

Fitbit, healthcare industry and Twitter user language preferences, a Fitbit-focused domain-

dependent dictionary was built as the classification criteria for positive or negative attitudes.

Neutral sentiments are omitted in this analysis for its minimal information in explaining

customer preferences.

To assess the target binary variable sentiments which consist of positive outcome “1” and

negative outcome “0”, Logistic Regression and Random Forests models were applied to

predict how gender, location, words about product functionalities, health-related topics, hash

tags and holidays influence the variances of users’ sentiments. For example, the results

through Logistic Regression revealed that “syncing” and “charging” seem to be the most

problematic features of Fitbit for their strongest correlation with negative sentiments. This

link is attributed to large amount of tweets complaining Fitbit’s inaccuracy of syncing data on

users’ mobile app and its malfunction of charging the product. For Random Forest model,

independent variables that have significant coefficient in Logistic Regression result were

selected as input variables for the tree classification. The error rate for predicting negative

tweets is 70.91% without tuning the sample size, which is inaccurate for Fitbit to detect

negative tweets in that case. After adjusting the sample size, tuning the parameters, the

optimal model can minimize the error rate to 33.32% for true positive and 26.25% for true

negative. Although the decreasing positive rate is at the expense of oversampling the negative

3

tweets, it is still worthy since negative tweets are more easily to spur emotions on social

media, and they are also important alerts for further improvement of Fitbit.

2. Introduction

Micro blogging is not only a communication tool for individuals, it is also a strong platform

that can be used for companies to gather information regarding feedbacks of their products,

the trends of the market and so on. Twitter is one of the most popular micro blogging

platforms that are used for research in opinion mining and sentiment analysis. For companies,

sentiment analysis is extremely important since it offers public perceptions in terms of

products, services, even the feedback to advertisement. Companies could take great

advantages in improving products to meet the market demands if they make full use of it,

which is the motivation of this study.

Fitbit is a leading company in producing wireless sports wristbands, with the functions that

can track users’ daily walking steps, sleep quality, heart rate and etc. It is the biggest

competitor of Apple Watch and Jawbone. There are two main reasons to choose Fitbit as the

target of our sentiment analysis. First, healthcare industry is a long-term booming industry,

since it is one of the trends that keep growing rapidly in the recent years. Besides the

healthcare system, people started to pay more attention to nutrition and exercise. Second, in

2015, Fitbit yielded $409.3 millions revenue in the third quarter of 2015 and enjoyed the

growth rate of 168% over years. This delivered a message that Fitbit is a boosting company

in the healthcare industry, which has drawn great study interests. Knowing what features of

their products are the most influential in social media, and what they need to pay attention to

in order to improve their products and services to meet the further command will help Fitbit

build stronger competitiveness. Through building different models for the sentiment analysis,

4

this study is expected to help Fitbit achieve such goal by generating final discussion and

conclusion for their business.

3. Literature Review

This paper is greatly built on Professor Bohn’s help and previous scholars’ works. Most

scholars who conducted the sentiment analysis focused on the text mining of the tweets , and

revising the algorithms to get the most accurate sentiment scores. However, there is little

work have been done regarding to one specific product of one company.

In “Analysis of Twitter Data on E-cigarette Sentiments and Promotion”, the authors cleaned

105,605 tweets on the opinions of e-cigarettes. Then they built up their own domain

dependent dictionary for sentiment classification since common English words emotion

divisions do not all apply to this specific product.1 This motivates this paper to create the

specific domain dictionary for Fitbit, since online emotional terms used for sports and

healthcare could be slightly different from the standard common English. By revising the

dictionary, and changing the evaluation standards, it would improve the accuracy of the

sentiment analysis and the model results.

Besides the common method, which is to assign sentiment score to each tweet, to justify the

overall character is negative, positive, or neutral. However, this method overlooks the base

feature that users to tweet accordingly. “Twitter Power: Tweets as Electronic Word of

Mouth” exceeds the limit. The authors adopted “action-object” approach, which categorize

the objects into several types, like “service”, “appearance”, “duration” and on so. Then the

action could be any expressions that relevant to that category. Based on the pair match, it is

1 Godea, Andreea Kamiana, Cornelia Caragea, Florin Adrian Bulgarov, and Suhasini Ramisetty-Mikler. "An Analysis of Twitter Data on E-cigarette Sentiments and Promotion." Artificial Intelligence in Medicine Lecture Notes in Computer

5

able to generalize which category has the most positive effect, and which features need

further improvement. 2 This method gives this paper to idea to test different features’

importance levels on leading to the public sentiment on Twitter.

“Localized twitter opinion mining using sentiment analysis” analyzed what is the trend and

customers’ reaction of iPhone 6 in social media, and provide the company innovative

approach for market analysis. The authors not only focus on the main function of iPhone 6,

but also they analyzed the distribution of sentiment based on gender and cities. Based on that

information, companies could enhance the market strategy demographically.3

4. Data Description

4.1. Source of Data

All the data applied in this project is downloaded from Crimson Hexagon

www.crimsonhexagon.com, a company dedicated in social media data analytics. We built a

public monitor using Twitter as data source in Crimson Hexagon. The data selection criteria

are presented as in Table 1 below:

Table 1: Summary of Dataset Description

Key words fitbit

Country United States

Language English

2 Jansen, Bernard J., Mimi Zhang, Kate Sobel, and Abdur Chowdury. "Twitter Power: Tweets as Electronic Word of Mouth." J. Am. Soc. Inf. Sci. Journal of the American Society for Information Science and Technology 60.11 (2009): 2169-188. Web. 17 Mar. 2016. 3 Hridoy, Syed, M. Ekram, Mohammad Islam, Faysal Ahmed, and Rashedur Rahman. "Localized Twitter Opinion Mining Using Sentiment Analysis." Decision Analytics. 2015. Web. 18 Mar. 2016.

6

Date Range 12/1/2014 - 3/31/2015

4.2. Dataset Description

Based on the searching criteria, we were able to collect 45 raw datasets on different dates and

compiled them into one dataset that contains 347, 241 observations, 16 variables during the 4

months. We selected this time range for the reason that one of Fitbit’s popular product

“Charge HR” was released during January 2014. By focusing on the data during this period,

we are interested in learning the market reactions towards this new product and the change of

topic density during the 4 months.

Table 2: Description of Input Variables

Name Description

GUID Globally unique identifier for each observation

Date..GMT. Date in m/d/y h:m format in Greenwich Mean Time

URL Original URL of each tweet

Contents Text content of each tweet

Author The author who post the tweet

Name The twitter username of the tweet

Country This variable only have 1 unique value: United States

State.Region The state where the tweet was post

City.Urban.Area The city where the tweet was post

7

Category Sentiment results based on Crimson Hexagon’s algorithm

Source This variable only has one unique value: Twitter

Klout.Score Score of sentiments based on Crimson Hexagon’s algorithm

Gender Gender of users

Posts Total number of post the user has post

Followers Total number of followers of the user

Following Total number of other accounts the user is following

4.3. Removing Variables and Data Cleaning

4.3.1 Removing Variables

One of our objectives of this project is to create a new domain-dependent dictionary for more

accurate sentiments analysis. Thereby, we deleted the variables “Category” and

“Klout.Score” that were the sentiment analysis outcomes generated by Crimson Hexagon’s

built-in algorithm. Furthermore, variables “ URL”, “Country” and “Source” were removed

from our list for its irrelevance and redundancy to our project analysis.

4.3.2. Removing Fitbit Records Tweets and Ads

One of fitbit’s functionalities integrates twitter to post the record of users daily steps through

a tweet. Thereby, a large number of tweets in our dataset contains the format “ My fitbit

#Fitstats for m/d/y, xxx steps and xxx miles travelled”. To avoid this noise in our analysis, we

used R function “grepl” to separate tweets that contains this format from our dataset. Besides,

a quick sort of the data also led us to several suspicious advertisement tweets that were

8

repeated more than 100 times. We also eliminated those tweets from our dataset in order to

improve the accuracy in our text analysis. The key content in those large volume ads tweets

and their frequency are collected in Table 3:

Table 3: Key Contents of Automated and Ads Tweets

Content Number of times post

“ RT @FitBookFan Getting #fitbit...” 158

“I want to win the Fitbit Zip from @GetMyHealthyOn for

#NationalNutritionMonth!”

266

“I'm in the running to win a new @fitbit Charge from

@sleepopolis”

137

After the preliminary data cleaning, the new dataset contains 287,381 observations without

fitbit records tweets and repeatedly posted advertisements tweets.

4.3.3. Text Mining

In order to get rid of the irrelevant information in the tweet contents for sentiment analysis,

we used the “tm” package in R to conduct text mining. Through building a corpus and

specifying the source to be the “Content”, we cleaned the text by removing the URL,

punctuation, stopwords, non-english words and converting all the letters into lowercase. The

processed text was eventually converted into term document matrix.

5. Important Findings

5.1. Top 30 Frequent Terms

9

By cleaning up the text, we are able to identify which words have been frequently used when

twitter users were talking about Fitbit. We defined the minimum frequency to be 200, and the

result contains more than 800 words. We only focus on the top 30 of them. As one of the

popular features on twitter is using hashtag to label certain words that are connected with

specific topics or themes, we noticed that the hashtag word “getfit ”, “weightloss” and

“motivation” are at No.1, No.7 and No.23 in the list. Besides the hashtag words, we also

found that words “workout”, “fitness”, “calories” and “diet” are frequently associated with

fitbit related tweets. This result provided information to understand users’ interests and the

popular topics, which are helpful for Fitbit to capture the potential trend within the market as

well as to create marketing strategies that are more preferable to users.

Figure 1: Top 30 Frequent Terms

5.2. Popularity Change of “Charge”

In 2014 October, Fitbit announced a new version fitbit named “ Charge HR”, which was

officially released in January 2015. By calculating the change of volume of the tweets that

10

contains the name of the product inside, we can see that general trend of the “Charge HR” is

very low. There are some small peaks on 07/12/2014, 06/01/2016, and 28/03/2016. Looking

back to the original dataset, these peaks are primarily resulted from large numbers of

commercial and promotion tweets. For example, the tweet “#Fitbit #Charge - ItÕs been more

than a year since the ill-fated Fitbit Force tracker was pulled offÉ http://t.co/iBQ8KGLkpf

#OtherTech” appeared more than 1000 times on 07/12/2014. The reason that these

commercial tweets are not excluded is they also deliver the strong message that Fitbit and

other interests-relevant parties are trying to promote the new product. However, as shown

from the Figure XX below, even though with the strong intervention of promotion tweets, the

daily tweets about “Charge HR” consists of a very small percentage.

Comparatively, December and January have more tweets about “Charge HR” than February

and March, which means there are more users following the new product, but there was no

heated discussion on Twitter in regarding to its new functions and so on. There are two

possible reasons that are accounted for that. On one hand, there is no revolutionary

improvement or some breakthrough points on “Charge HR” that worthy to bring up the

discussion. On the other hand, the promotion method of “Charge HR” is not effective. Even

though with all these promotion and automated tweets, it still failed to encourage the

discussion.

11

Figure 2: Popularity of Charge During 2014/12/01-2015/3/31

5.3. User Location Distribution

Not all twitter users post their location too while tweeting, so the location information that

this paper collected is limited. The total valid observations that using location is 179,710,

consist 62.5% out of the total dataset. Through Figure XX, the general distribution in states

can still be captured. California and New York have the most tweets on Fitbit, which make

sense since these two areas have higher incomes than other states, and healthy lifestyle is

more promoted. However, that does not necessarily mean Fitbit are very popular and have

more positive feedback in these two states given more tweets. Therefore, in the model

section, “California” and “New York” are included as two independent variables to test

whether Fitbit are more welcomed in these two places.

12

Table 4: Number of Location Values

Missing value 107,671

Unique Value 51

Valid Observation 179,710

Figure 3: Distribution of tweets volumes in top 10 States

13

5.4. Words Association

Table 5: Words association with “Fitbitsupport”

Among our frequent terms, we noticed that the word “fitbitsupport” is among the popular

tags. Based on our research, this is a twitter account for fitbit customer service. Most of the

time, when people @“fitbitsupport”, they were either trying to find solution through fitbit

customer service about the issues they encountered with the product or services, or they were

expressing their feedbacks about the services that “fitbitsupport” provides. To understand

what issues and topics that people want to get answer from the customer service, we ran a

word association test to see the words that have more than 0.05 correlations with the word

14

“fitbitsupport”. In order to have a meaningful conclusion, we focus on studying the words

that are related to fitbit’s product features and functionalities. According to the result, which

is presented in the following table, we found that “reset”/“restart”/ “restarting”/ “restarted” all

refer to the similar meaning in this context. A further scrutiny of the original dataset, we

realized that users have been having trouble with resetting or restarting their fitbit. Similarly,

another major problems that consumers frequently complain about when they expect

“fitbitsupport” to help them are “sync” and “charge”. A large number of people had problem

sync their fitbit data with their app on the phone, or the synced data are not accurate. Besides,

people have been complaining about fitbit failed to charge properly even when the product is

still new. That information draw our great attention on the product functionalities and

customer service quality, which plays a critical role in maintaining customer loyalty and

growing market demand. Besides pointing out the importance of customer service, this result

also alerts fitbit about the necessity of better product design and quality, which is essential for

building strong competitiveness in the market, especially for healthcare and tech industries.

A few examples of such complaints have been listed below to help explain the problems.

“@FitbitSupport Hi. Fitbit flex not holding charge (3mth old). have reset it 3x, cleaned

it...Also will not sync anymore. can you help?”

“@FitbitSupport Hi. Fitbit flex not holding charge (3mth old). have reset it 3x, cleaned

it...Also will not sync anymore. can you help?”

“@FitbitSupport @fitbit is there anything I can do to avoid having to restart/reset my

#fitbitflex everyday so it will sync?”

15

However, on the other hand, we are glad to see the word “thanks” and “thank” have

correlation around 0.05 with “fitbitsupport”. Those expressions came from users whose

problems got solved by “fitbitsupport” after they @ the account. This is a sign of fitbit’s

efforts on providing successful customer services, which should be the company’s incessant

pursuit in order to win their customers back.

“@FitbitSupport Much appreciated, thankfully found my Fitbit this morning. You have the

best customer service EVER! You helped me before, THX”

“@FitbitSupport @fitbit I am speechless with your amazing gesture. Thank you so much

for sending me a brand new Fitbit for the one I'd lost.”

“Had the best customer service with @fitbit @FitbitSupport thanks Abby P!”

Besides the customer services, this result also inspired our curiosity about several important

twitter account including “talkmaster”, “seeshawnlive” and “everybodywalk”, all of which

have more than 0.05 correlation with the “fitbitsupport”. Through our research, we diagnosed

that the reasons for “talkmaster”’s frequent occurrence was related to fitbit’s failure of

providing satisfactory experience for a popular user, Neal Boortz, who used to be a host on a

talk radio and has more than 184,000 followers. In Dec 17, 2014, “talkmaster” @fitbitsupport

asking for help with the problem of setup (picture below). The popularity of this user drew

attention among its followers who participated and commented on the conversation

between“talkmaster” and “fitbitsupport” by quoting both of them on Twitter. This could be

another alert for fitbit to understand the level of influence that important customers could

generate among other customers. While striving to provide good service for those important

customers, fitbit should also be careful about the potential of the negative reviews from those

16

customers could be easily amplified especially through social media.

Another username “seeshawnlive” that has been frequently quoted was related to an issue

about fitbit’s sponsorship of a show that contains content degrading women. This criticism

started from the user “seeshaenlive” has been retweeted and spreaded on twitter rapidly

during December, causing a huge pressure on fitbit’s public relation. The original content of

the tweets in “seeshawnlive”’s comment is “@fitbit @FitbitSupport stop sponsoring

#SororitySisters, the show degrades women. Pull your ads. #BoycottSororitySisters

@dstinc1913”. This test enabled us to identify the unfavorable public relations that fitbit are

faced with. More importantly, it showed the importance for fitbit to select appropriate

marketing strategy and build a responsible corporate image.

Additionally, the result also led us to discover a user called “everybodywalk”, which stands

for an award-winning campaign aimed at getting Americans up and movin. They provide

news and resources on walking, health information, a personal pledge form to start walking,

as well as a place to share stories about individual experiences with walking.4 This discovery

explained how fitbit could potentially utilize public campaign or organizations that shared

information about healthcare to reinforce its presence among the users.

4 http://everybodywalk.org/about/

17

6. Methodologies

6.1. Sentiments Analysis

6.1.1. Building Domain-dependent Dictionary for Fitbit

As mentioned before, in order to get more accurate sentiment scores, building Fitbit’s own

domain-dependent dictionary is necessary. The base dictionary is from Ming Hu and Bing

Liu.5 Except for the common sense, more changes on the dictionary is based on the reference

to the original tweets when find something usual on the model's results. The revised part is

shown in the table XX below, while the whole version of the this dictionary is included in the

Appendix.

Table 6 Revised Words in Domain-Dictionary for Sentiment Analysis

● There are more revision to the positive words than to the negative ones. Given to

specific condition, negative words actually expressed in a positive feedback to the

product. However, it is rarefly to transfer general positives to negative ones.

● “Loss” in general is negative meaning, but in the Fitbit case, it is more likely to refer

to as lose weight, which shows a positive towards to Fitbit’s function. Same meaning

applies to “burn”, which means the users feel motivated to burn calories, such as this

5 Hu, Mingqing, and Bing Liu. "Mining Opinion Features in Customer Reviews - UIC." Web. 18 Mar. 2016.

18

tweet “What's good abt shoveling #Snow off an extra-wide,paved driveway in

#Winter?! Good workout/burns cals & calibrates my Fitbit! ___ #MONTANA”

● “Obsessed” , and “addicted” shows strong positive emotion towards Fitbit, since the

users feel it is already a habit to use Fitbit, checking the daily status, and cannot live

without it. The common emotional examples using these words in the dataset is “

@hijinksandhalos . @hijinksandhalos TaraLynne71 I'm FitBit obsessed so get my

10,000 steps a day in or am going for a walk at 11:00 pm! #girlstravel”;

● “Win” is deleted from the positive words dictionary, since most of the tweets

contained the word is advertisement, “Enter here to win a new Fitbit Charge”;

Considering the unique characteristic of sports band, accuracy in detecting steps and

constantly connected to the phone through bluetooth is extremely important. If Fitbit

fail to achieve this basic goal, it would greatly impact the consumers’ confidence in it.

Therefore, “inaccurate” and “disconnect” is added to the negative words dictionary to

track users’ disappointment emotion on it.

● There are also some words that cannot well defined given to the different ways of

usage. For example, “killing”, it can either show strong positive attitude in tweet like

this “@fitbit I love the #Charge, but when will the #ChargeHR be available!? _ The

suspense is killing me!!! __”, expressed huge disappointment “@Fitbit please start

supporting @Google Fit. cc: @FitbitSupport. You’re LITERALLY killing me, since I

can’t see all health stats. ;-)” , or just expressed something that irrelevant to Fibit

product itself, like “This Fitbit thingy would be more useful if it could tell me how

many times I picked up my 20 lb son. My back is killing me.” Therefore words like

“killing” are eliminated from the dictionary lists, since no matter include them into

which categories; there will be huge biased outcomes.

19

6.1.2. Sentiments Score

The methodology to calculate sentiments score is to extract every word from tweets, then

pairing the positive and negative words in the domain-dependent dictionary. For each

matching words, positive ones will assign the value of “+1”, while negative ones will get “-

1”. The final sentiment score is adding the scores for each tweet. In order to run logistic and

Random Forests in the next section, sentiment scores for tweets are transferred to binary

variable, while “1” stands for all the tweets possess positive values, while “0” means negative

tweets. “0” stands for the neutral tweets, which is regarded as missing values in this case,

since they do not contain much information to study on.

What worth noticed is the automated tweets are excluding in this sentiment analysis. From

table 7 below, it can be seen that the positive tweets consist a great percentage out of total

tweets, which is 79%, while the negative tweets are only 21%. This delivers a message that in

general twitter users have positive perception towards Fitbit. Nevertheless, the negative

tweets are more valuable for Fitbit to improve the product's function, and revising their

marketing strategy.

6.2. Models Building

Logistic Regression and Random Forests are the two models that this paper adopts to conduct

the sentimental analysis. Through logistic Regression, each independent variables’

association with the target variables can be elaborated clearly. Different from traditional

regression that focused on the whether the variables are significant, this paper brings more

attention to study what are the variables that have the most significant impacts on the

sentimental scores. Through looking back to the original tweets, and studying the unusual or

the most important variables, the specific reasons that caused the outcome could be

20

discovered. Random Forests has an advantage of dealing with big dataset. By tuning the

parameters, like the number of trees, number of variables, it could dramatically changed the

model’s accuracy rate. What’s more, by oversampling critical events, in this case, the

negative tweets, Random Forests could take into more weight on the negative comments

which target company cares more.

7. Variables Manipulation and Model Results

7.1. Target Variable Description

Through defining the “positive”, “negative” and “neutral” for the sentiments score, we

created a new variable named “Sentiments”. We further assigned the “positive” with value

“1” and the “negative” with value “0”. Since our main objective is to understand how non-

neutral sentiments vary with different independent variables, we excluded the ones that

represent “neutral” from our consideration. Thereby, our target variable is the newly-built

binary variable “Sentiments” that contains 2 unique values and 287,381 total observations.

150,636 valid observations that consist of 31, 984 value “0” (21%) and 118,652 value “1”

(79%) will be applied in our final models.

Table 7: Values of Sentiments Scores

21

7.2. Adding New Independent Variables

To understand what factors are associated with each sentiment, we decided to create

numerous new binary variables to assess the correlation between them and the sentiments.

Among the frequent term list that we concluded based the frequency benchmark of over 200,

We found certain words aligns with our research interests based on our knowledge about their

importance. We reviewed the frequent terms list and decided to use the related terms to

created numerous binary variables based on 5 categories: feature, topic, hashtag, product and

holiday. Based on whether each tweet contain those words inside, we assigned value “1” if

it’s true, and “0” if it is false. In order to maintain the accuracy in building models, each

variable contains all the format of those words. Besides the term variables, we are also

interested in seeing whether being in state California and New York will has significant

influence on the target variables. Another two independent variables are created as “

california” and “new york”, which are assigned 1 when users’ locations match with them. To

further see the interactive influence of location and gender, we created interaction terms using

“Gender” and “california, “Gender” and “new york” as two new variables. To make the

analysis more interpretable, we transformed the variable “Followers” and “Following” into

natural log format. Detailed information about the variables and containing words are

concluded in the Table 8 below:

22

Table 8: Input Variables Description

7.3. Model 1: Logistic Regression

7.3.1. Interpretation of Estimated Coefficients

To assess the correlations between each independent variable and the sentiments, we used

logistic regression that focus on analysing the relationship for a binary dependent outcome to

run our data. The dataset was partialed into training, validation and testing with proportion of

60/30/10 respectively. The result of each variable’s coefficient is attached as an appendix.

Although, the majority of our coefficients are more positively correlated to the sentiments,

for the purpose of the business cost, we are more interested in identifying variables with

negative coefficients that include “wireless”, “ sync” , “sleep”, “fitbitsupport”, “Christmas”,

“GenderM”, “California”, “New York”and the gender and location interaction terms. Results

of those variables are listed below based on the significance of the coefficient.

23

Table 9: Results of Negative Coefficients

The coefficients represent the odd ratio of each variable’s likelihood to have positive

sentiments. According to the results, the words “sync”, “wireless”, “charge” are usually

more likely associated with negative emotions. As it is included in our previous word

association analysis, those words lead us to the product feature problems that have became

more and more annoying among the users. The majority issues that users were complaining

about were fitbit’s inability to sync data accurately or charge properly. As wireless is a word

people mentioned when they are talking about sync, the negative odd ration of it is not

unexpected. Similarly, fitbitsupport is an official account for fitbit customer services. By

tagging “fitbitsupport”, users ask questions and solution for the problems they encountered

while using Fitbit. Thereby, it is naturally to expect “fitbitsupport” being more likely to link

to a negative expression.

We found gender of the user has some interesting discrepancy in their influence on

expressing sentiments. Among the 162,768 observations that clarified the gender of the user,

around 60% of them are female users while the rest 40% are male users. However, the result

indicates that on average, male users are more likely to have log (0.17) percentage probability

24

to have negative sentiments than a female user. This variable is further integrated with

another two popular location variables, California and New York City. Surprisingly, male in

California have higher likelihood to generate negative feeling about fitbit compared with

female users in California. While in New York City, although the difference between genders

is less significant, female users outweigh male to have higher likelihood of expressing

negatively. Our assumption for this discrepancy is the fundamental difference in

demographics in this two location. New York city in general has more female population than

male while in California the situation is opposite. However, it is also a negligible discovery

that is useful for fitbit to better understand users’ preferences.

Although the word “sleep” is also among the list of our negative coefficients, this result is

actually attributed to users’ realization of the poor quality of their sleep after using fitbit.

Since measuring the efficiency of sleep is one of fitbit’s functionality, this result will be

useful for fitbit to understand the demand and preferences of the customer, so as to improve

user experience through better product design.

This paper defined a series holiday related terms to see how they are associated with

sentiments. Generally, people enjoying buying fitbit as a gift during the holidays, thus it’ s

not surprising to predict positive sentiments with holiday. However, the only outlier is

“Christmas”, whose negative odd ratio indicated that when people are mentioning about

“Christmas”, they tended to have negative emotions. Since this is a notable difference, we

further looked up the tweets talking about Christmas in the dataset, it turned out that on 2014

christmas day, Fitbit website was under planned maintenance. Many users expressed their

disappointment when this inconvenience happens right on holiday, causing the significant

deviance from a generally positive holiday emotions.

25

7.3.2. Model Evaluation

Through running confusion matrix evaluation on the validation data, which is 30% of the

total data, we obtained the error matrix for our model as it is shown below. Our model in

general have high accuracy in predicting the positive sentiments while less so for negative

sentiments. This is largely due to the unbalanced distribution of the target variables in our

model. As the 79% of the valid observations contain positive sentiments while only 21%

represent negative value, it is not surprising to see the differences in the predicting ability for

each class. When predicting the positive sentiments, 1955 turned out to be negative while

10041 actually matches prediction. While when the predication is negative, 372 matches with

the prediction and 276 failed. The error rate for predicting the negative events is 84%, which

is much higher than the rate of accurate prediction of actual outcomes. However, this result

could be risky considering that the error of predicting a positive result will incur more cost

and the error predicting negative events. Thereby, our model still waits to be modified in

order to have a more accurate prediction capacity.

Table 10: Error Matrix Result for Logistic Regression Model 1

26

7.3.3. Adjustment of Model: Model on Sample Dataset

In order to avoid the unevenly distributed target results and to improve the predictive ability

of our model, a dataset that consists of equally 50% of positive sentiments and negative

sentiments are created by sampling the same size of all the negative sentiments data from the

total positive sentiments dataset. The newly created sample dataset contains 63,968

observations, which has 31,984 for each positive and negative outcome.

The coefficients results have been attached as appendix.

Table 11: the negative coefficients of adjusted model

New model didn’t significantly change the correlations that we have identified for the first

model, except that the term “ sync” became the strongest correlation with the negative

sentiments compared with “wireless”, and the gender difference started to make male users

more highly correlated to negative sentiments. Another noticeable change is the variable

“Follower”, which has been transformed into natural log format. The results indicated the

negative correlation between the number of followers that a user has and the sentiments,

27

which provides an alert for fitbit to maintain good customer relationship especially with

influential users.

7.3.4. Model Evaluation:

Using the same error matrix method on 30% of validation data, the predictive ability of the

model is less impaired by the uneven distribution of target variable.

Table 12: Error Matrix for Adjusted Model

The general average class error rate has significantly reduced to 28% from 43%, although the

prediction of positive sentiments has raised greatly compared with the model using all the

positive sentiments. On the other side, the error rate for predicting negative sentiments has

dropped down to 17% in comparison with the 84% in the first model. The result also shows

that the total amount of error is the lowest (814) when the prediction of positive turned out to

be negative, which is considered to be the most costly situation in this case. Thereby, this

new model has more desirable predictive capacity for more future analysis.

7.4. Model 2: Random Forests

7.4.1. Input Variables Selection

28

Selected input variables for Random Forest model is mostly based on the coefficients of

logistic regression. Variables have bigger coefficients, which cause significant impact is

selected in the Random Forests model. After re-examining the error rates through different

combinations of variables, the optimal input variables that are used for the final model

building is listed as below:

In this section, the base model uses 60% of total sentiments dataset as training, 30% is

validation, and 10% is used for testing. Selected seed is 800, 500 trees, and 5 variables at

each node. As shown from Table 14 below, without tuning the sample size, the results are

great driven by the positive tweets, since it consist of 70% of total dataset. The error rate for

positive tweets is 1.57%, which is extremely small. Negative tweets have the error rate of

70.91%, which is worrisome, since it is almost impossible to detect the negative tweets, and

help Fitbit to improve their products.

Table 14: Confusion matrix of base model

Since the accuracy rate for negative tweets are more valuable for Fitbit. Through tuning the

sample size and number of trees, this paper demonstrates to achieve a higher accuracy rate for

the model, especially the negative tweets. Through oversampling the negative tweets with

different portions, the trend of model error rate changes can be captured. As shown in Table

15 and Figure 4 below, as more negative tweets are oversampled, the accuracy rates for the

true negative increases significantly, from 73.10% in (40,45) to 87.24% in sample size

29

(55,45). However, this significant increase is at the expense of lower the true positive

accuracy rate.

From Figure 4, it could be seen that the trend of average accuracy rates for true positive and

true negative is very stable, which is from 68.84% to 70.13%, due to the increasing negative

accuracy rates are balanced off with the true positive ones. Since the average accuracy rates

are almost constant, Fitbit could adopt different sample sizes model according to their own

timely need. (55,45) does a very impressive job in classifying negative tweets, so Fitbit could

constantly get to know what characters of the product are being criticized the most on Twitter

from public perception, and adopt revision methods accordingly. Nevertheless, this sample

size also has some shortcomings. Due to the low positive accuracy rate, the company may

spend quite an amount of investment of human and capital resources in dealing with false

alarms. In the long term, weighing the advantages and disadvantages, oversampling the

negative tweets are still on the edge, since losing the real-time feedback, especially the

critical ones will put Fitbit in a more dangerous position given the heat competition in health

care sports bands field.

Table 15 Confusion Matrix for different sample sizes

30

Figure 4: Accuracy Rates trend for different sample sizes

Apart from the sampling ratio in the training dataset, the number of trees in the model also

plays an equally important part in improving the accuracy rate. This section used the sample

size (40,45), since the accuracy rates of true positive and true negative are very close, it is

more easily to detect the changes of both accuracy rates through changing the number of

trees.

As shown from Table 16 and Figure 5, adding more trees does not mean the accuracy rates

will improve. There is a certain threshold, which the accurate results will decrease after

peaking the threshold. In this case, the average accuracy rate declines after reaching 70.21%

in 600 trees. 400 trees achieves the highest true positive rate as 68.65%, while it does a poor

job in detecting the true negative tweets, which has the lowest accuracy rate as 71.28%. 400

trees definitely will not be chosen among all numbers of trees. What’s worth noticed is after

400 trees, the accuracy rates for negative tweets has been increasing when adding more trees,

while positive tweets has a decreasing accuracy rate trend. This delivers a strong message

31

that adding more numbers of trees is helpful in detecting the negative tweets than the positive

ones in this model.

Table 16: Confusion Matrix for various numbers of trees

32

Figure 5: Accuracy Rates Results for different number of trees

8. Discussion

● Fail to bring up the heat discussion on ChargeHR

ChrageHR was pronounced to launch in October 2014, and officially released in January

2015. However, during December to March, there was no hot discussion as expected, like

introducing the new functions, users experiences and so on. The discussion level is far away

to achieve the same extent when Apple Watch released. The most volume tweets about

ChargeHR everyday are automated tweets through machine, possibly by Fitbit, or its

partners. Therefore, even though Fitbit has enjoyed tremendously revenue growth rates year

by year, it lacks of the public popularity, so there was little discussion in social media even

though on a newly product. Several ways can be adopted to improve public popularity: have

more interactions with users on Twitter; introduce revolutionary functions; cooperate with

popular twitter users, and to increase the exposure.

33

● Disappointment Functions:

When it comes to the negative tweets, most contents is about the failed function of syncing

Fitbit to the phone, and failing to charge. These two functions are the at the urgent status to

be fixed since they caused very strong negative effects to the overall sentiments.

● Demographic Marketing

This paper included gender, New York City and California as the demographic variables in

the models. California males tend to have more negative tweets than females, which could be

understood that comparing to other areas, California males are more sporty, and have more

knowledge about sports bands, therefore they are more critical. New York females are more

likely to be negative towards Fitbit than males. This is interesting information, since on one

hand Fitbit could include more cities and states to study what are its popularity levels in

various regions. On the other hand, it would be useful to learn in certain regions, what

features that males care more, and what functions that females bringing more attention to. In

this case, customized products can be produced to meet the market.

9. Limitations

This paper assigns equal weight to all positive and negative words in the tweets, and adds the

value of positive and negative ones to get the sentiments scores. This method has several

limitations. First, some words should express stronger emotions than other words, for

example, “wtf” is certainly a higher angry level than “doubtful”. If could assigning the scores

to the words according to their strong level, the sentimental analysis would be more accurate.

Second, this paper initially intended to compare users’ tweets regarding to Fitbit’s

competitors, like Apple Watch, Jawebone and so on. However, it is hard to distinguish what

the emotional words exactly refer to in the tweet.

34

10. Conclusion

This paper demonstrates detailed quantitative and qualitative analysis of public perception

toward Fitbit on Twitter. Generally speaking, Fitbit enjoyed a high reputation in social media,

since the positive tweets consist of 79% out of the total tweets. Nevertheless, the negative

tweets shall bring into more attention; due to they can be regarded as the possible direction to

improve products’ features and marketing strategies. Based on the frequency terms during

12/01/2014 to 03/31/2015, the motivation of getting fit and weightloss brought up the most

discussion on Twitter, since “getfit”,“fitness”,“workout” is among the top 30 most frequent

terms. Through the models this paper built, it can be seen the function of tracking exerices

and geting fit have more positive effect on comments about Fitbit. While syncing and

charging lead to more negative comments, which are the functions that Fitbit is in need to

improve.

35

Bibliography

Kouloumpis, Efthymios, Theresa Wilson, and Johanna Moore. "Twitter Sentiment Analysis:

The Good the Bad and the OMG! - Edinburgh Research Explorer." Fifth International AAAI

Conference on Weblogs and Social Media. Web. 18 Mar. 2016.

Hu, Mingqing, and Bing Liu. "Mining Opinion Features in Customer Reviews - UIC." Web.

18 Mar. 2016.

Barbosa, Luciano, and Junlan Feng. "Robust Sentiment Detection on Twitter from Biased

and ..." AT&T Labs -Research, 2015. Web. 18 Mar. 2016.

Davidov, Dmitry, Oren Tsur, and Ari Rappoport. "Enhanced Sentiment Learning Using

Twitter Hashtags and Smileys." The Hebrew University, 2010. Web. 18 Mar. 2016.

O'Connor, Brendan, Ramnath Balasubramanyan, Bryan Rouledge, and Noah Smith. "From

Tweets to Polls: Linking Text Sentiment to Public ..." International AAAI Conference on

Weblogs and Social Media, May 2010. Web. 18 Mar. 2016.

Jansen, Bernard J., Mimi Zhang, Kate Sobel, and Abdur Chowdury. "Twitter Power: Tweets

as Electronic Word of Mouth." J. Am. Soc. Inf. Sci. Journal of the American Society for

Information Science and Technology 60.11 (2009): 2169-188. Web. 17 Mar. 2016.

Bifet, Albert, and Eibe Frank. "Sentiment Knowledge Discovery in Twitter Streaming Data."

Web. 18 Mar. 2016.

36

Alahmadi, Dimah Hussain, and Xiao-Jun Zeng. "Twitter-Based Recommender System to

Address Cold-Start: A Genetic Algorithm Based Trust Modelling and Probabilistic

Sentiment Analysis." 2015 IEEE 27th International Conference on Tools with Artificial

Intelligence (ICTAI) (2015). Web. 17 Mar. 2016.

Gao, Huiji, Jalal Mahmud, Jilin Chen, Jeffrey Nichols, and Michelle Zhou. "Modeling User

Attitude toward Controversial Topics in ..." Association for the Advancement of Artificial

Intelligence. 2012. Web. 18 Mar. 2016.

Yu, Yang, and Xiao Wang. "World Cup 2014 in the Twitter World: A Big Data Analysis of

Sentiments in U.S. Sports Fans’ Tweets." Computers in Human Behavior 48 (2015): 392-

400. Web. 17 Mar. 2016.

Hridoy, Syed, M. Ekram, Mohammad Islam, Faysal Ahmed, and Rashedur Rahman.

"Localized Twitter Opinion Mining Using Sentiment Analysis." Decision Analytics. 2015.

Web. 18 Mar. 2016.

Dacres, Shana, Hamed Haddadi, and Mattew Purver. "Topic and Sentiment Analysis on

OSNs: A Case Study of Advertising Strategies on Twitter." May 2014. Web. 18 Mar. 2016.

Godea, Andreea Kamiana, Cornelia Caragea, Florin Adrian Bulgarov, and Suhasini

Ramisetty-Mikler. "An Analysis of Twitter Data on E-cigarette Sentiments and Promotion."

Artificial Intelligence in Medicine Lecture Notes in Computer Science (2015): 205-15. Web.

18 Mar. 2016.

37

Appendix:

Coefficient of Logistics Regression Model 1 (Unadjusted)

Variable Estimated Coefficient

wireless(0,1] -0.911

sync(0,1] -0.885

New York(0,1] -0.677

fitbitsupport(0,1] -0.619

sleep(0,1] -0.496

charge(0,1] -0.436

Christmas(0,1] -0.295

GenderM -0.178

Posts -0.107

Gender CaliforniaF [0,0] -0.044

California(0,1] -0.012

Gender New YorkF [0,0] 0.003

Followers 0.049

Following 0.117

getfit(0,1] 0.118

badge(0,1] 0.129

38

calor(0,1] 0.151

surge(0,1] 0.387

diet(0,1] 0.397

birthday(0,1] 0.485

step(0,1] 0.553

compet(0,1] 0.561

Chargehr(0,1] 0.683

monitor(0,1] 0.806

fitstat(0,1] 1.101

health(0,1] 1.129

valentine(0,1] 1.254

gift(0,1] 1.360

flex(0,1] 1.365

weightloss(0,1] 1.407

motivat(0,1] 2.163

exercis(0,1] 2.287

weight(0,1] 2.356

scale(0,1] 2.370

workout(0,1] 2.492

39

dodger(0,1] 7.585

Sample tweets with negative sentiments about “sync”

@fitbit very disappointed wont sync with iphone and website down? Very bad. #Amazon

return!

#Fitbit is not syncing on computer, iPad, or iPhone. I am filled with rage. #rage #fitness

@fitbitsupport Whats wrong with storing data locally or an established cloud account?

Sync only to #FitBit is not bulletproof design C-today

Sample tweets with negative sentiments about “charge”

RT @shanselman I'm so sick of FitBit. Wife and my Flex will no longer charge. Lasted 12

months 3 weeks, just after the warranty. Built in batteries suck.

Annoyed with my fitbit charge. Kept waking me up last night due to battery dying. Doesnêt

help that thereês no indication on the screen.

@FitbitSupport Sent email Sun re my fitbit won't charge after cleaning etc. This is 2nd

failed unit in <1yr. How about an upgrade? Miss it!!

40

Sample tweets with negative sentiments about “Christmas”

@fitbit maintenance on Christmas Day?? Bad move! Worst first impression for the

people I purchased fitbits for. Thanks for messing up!

Considering how hard @Fitbit was pushing their products as Christmas gifts, they picked a

really bad day for server maintenance.

RT @Dave_Saba Seriously @fitbit - site down on Christmas morning for "maintenance" --

worst customer experience to start out ownership. Ridiculous

Coefficient of Adjusted Model using Sample Data

Variables Estimated Coefficient

sync(0,1] -0.854

Gender_ New_YorkM_(0,1] -0.803

fitbitsupport(0,1] -0.591

wireless(0,1] -0.532

sleep(0,1] -0.508

charge(0,1] -0.271

Christmas(0,1] -0.143

Gender_ CaliforniaF_[0,0] -0.088

getfit(0,1] -0.013

41

Posts 0.000

Followers 0.000

Following 0.000

calor(0,1] 0.332

badge(0,1] 0.335

diet(0,1] 0.339

surge(0,1] 0.368

monitor(0,1] 0.513

Gender_ CaliforniaM_(0,1] 0.577

step(0,1] 0.583

Gender_ CaliforniaM_[0,0] 0.600

Chargehr(0,1] 0.649

compet(0,1] 0.844

Gender_ New_YorkF_[0,0] 0.868

fitstat(0,1] 0.949

flex(0,1] 1.101

health(0,1] 1.126

valentine(0,1] 1.280

42

weightloss(0,1] 1.314

gift(0,1] 1.434

weight(0,1] 2.195

scale(0,1] 2.202

motivat(0,1] 2.343

workout(0,1] 2.373

exercis(0,1] 2.401

dodger(0,1] 8.132

Sample of Domain-dependent Dictionary for Fitbit6

6 Due to the amount of the words cannot be all listed in the paper, please contact the authors for further information

Documents

Twitter Sentiments Analysis on Fitbit - BigData@UCSD · scholars who conducted the sentiment analysis focused on the text mining of the tweets , and revising the algorithms to get