15

Data Driven: Home Care Reviews In-Depth Analysis

  • Upload
    carejoy

  • View
    46

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data Driven: Home Care Reviews In-Depth Analysis
Page 2: Data Driven: Home Care Reviews In-Depth Analysis

ANALYSIS OF BUSINESSES

Being in the home care assistance business we decided to analyze millions of reviews from the web.

Note that the data contains only businesses with at least one review. Reviews shows businesses near a certain location, so when we search near cities that are close to the borders, we can get businesses outside the US (e.g. in Canada). Furthermore, if a business is near two different cities, it may be listed twice, so we also have to remove duplicates. From Data from the National Study of Long-Term Care Providers (PDF):

“12,400 home health agencies [in the United States]

At least 60% of home health agencies, hospices, nursing homes, and residential care

communities were for profit, while about 40% of adult day services centers were for profit.”

Page 3: Data Driven: Home Care Reviews In-Depth Analysis

That means there are roughly 0.6×12,400=7440 home care assistance providers in the United States that operate for profit. By making many random samples from Yelp pages (Monte Carlo simulation), it's possible to estimate the proportion of home health businesses with reviews, and the findings are as follows. Approximately 1 in 4 businesses has at least one review on Yelp. If we apply this to our data, we find that there are approximately 0.25×7440=1860 home care assistance businesses with reviews on Yelp. This roughly matches our data, in which we count 1922 businesses. We can observe that, on average, a business has 8997/1922=4.7 reviews.

Page 4: Data Driven: Home Care Reviews In-Depth Analysis

HOME CARE REVIEWS LOOKING DEEPERLet's check the distribution of businesses in US states. For readability sake, only the first 30 states are shown in the chart.

Page 5: Data Driven: Home Care Reviews In-Depth Analysis

California home health care reviews lead with a disproportionate amount of home care assistance businesses: almost 900! On the other hand, there are states, like Vermont, with only 11 business. This last fact can be checked quite easily: for example some online directories list 11 businesses in Vermont, but only one has reviews (just one review, actually). It’s interesting to visualize the density of the home care assistance businesses on a map. We first compute the number of businesses per city, then we pair each city with its coordinates. Here’s a home health map across the nation, showing the

density of home health care companies by city/region.

Page 6: Data Driven: Home Care Reviews In-Depth Analysis

As expected, San Francisco, Los Angeles and San Diego (all in California) have the highest concentration of home care businesses. New York, Miami, Chicago and Austin closely follow. From the map, it's also clear that the west coast has a very high concentration in fewer spots, while on the other side of the country the businesses are more evenly distributed. This trend follows the population density:

Page 7: Data Driven: Home Care Reviews In-Depth Analysis

Let's confirm our hypothesis with raw numbers:

Page 8: Data Driven: Home Care Reviews In-Depth Analysis

It’s not surprising that California reaches almost 900 businesses, if Californian cities make for half of the top 15 and take the first three spots. We can also confirm what we anticipated before: with the exception of California, home care assistance most of the top cities offering in home care are from the Midwest, the South or the Northeast. Let’s examine the distribution of ratings amongst them. The rating of businesses where

found directly online, we calculated ratings using a custom formula that takes into

consideration only the recommended reviews.

Page 9: Data Driven: Home Care Reviews In-Depth Analysis

The distribution of the ratings is, very interestingly, U-shaped. This property of star ratings is well known and has been studied extensively (here is one short paper on the topic). This is probably one reason why YouTube switched their five-star system to a likes/dislikes system. The most popular star rating on YouTube was 5. People also gave 1 star ratings, but very few people gave 2-4 star ratings (source). As to why this happens, some possible explanations are:

Vote Stuffing – Some people are more interested in changing the average rating for an item rather than give their true opinion on it. (IMDb tries to address this problem, as explained here.)

Selection Bias – People who vote 2-4 stars don't have a strong enough desire to represent themselves in the dataset.

Controversiality – Some things are just naturally controversial. Check the reviews of some pens made for women on Amazon for a hilarious example.

Page 10: Data Driven: Home Care Reviews In-Depth Analysis

We'll now analyze the reviews. We will start by conducting a very general analysis on the reviews. Then we will do a frequency analysis of the words in the reviews, and then perform sentiment analysis on them. Let's plot the distribution of reviews length in words:

REVIEWS OF HOME CARE COMPANIES IN THE USA

Amazingly enough, there are reviews that surpass 1000 words. The reviewers clearly went in-depth on those ones! Review length peaks at around 100 words, which is roughly a paragraph. This page lists a number of examples that show how different word counts look like.

Page 11: Data Driven: Home Care Reviews In-Depth Analysis

FREQUENCY ANALYSIS OF WORDS USED IN HOME HEALTH CARE REVIEWS

We want to extract the unique words. To do so, we first join all the reviews.The first thing we’ll do is create a word cloud, where word size is proportional to its frequency in the reviews.

Page 12: Data Driven: Home Care Reviews In-Depth Analysis

We can clearly see that 'care' is overwhelmingly present in the reviews. Other frequent words are 'time', 'home', 'one', 'service', 'need', 'mother'. Some of those words aren't really meaningful, while others, like 'mother' and 'time' can shed great insight in what the reviewers were thinking. As we did before, let's check the actual frequencies of the first 15 more common words:

Page 13: Data Driven: Home Care Reviews In-Depth Analysis

Let's plot the distribution of positive vs negative reviews.

SENTIMENT ANALYSIS OF HOME CARE REVIEWS

To perform sentiment analysis we'll use the textblob package with NLTK's NaiveBayesClassifier, which is trained on a movie reviews corpus and therefore is quite suitable to analyze our reviews.

We can see that the majority of reviews carry a positive sentiment. The proportion of negative to positive reviews is approximately 1 in 8 reviews. Could it be that users review businesses more willingly, when their expectations are met with a great service? It's very interesting to plot the proportion of negative to positive reviews while distributing them according to the stars.

Page 14: Data Driven: Home Care Reviews In-Depth Analysis

From the above chart we see that on average a 1-star review is much more negative than a 5-star one. Note that our classifier still estimated that some 1-star reviews were positive. This could be due to the fact that in a review many different aspects are considered. It could be the case that only one feature is criticized in the review, while the others are considered good enough. However, if that single feature is deemed very important by the reviewer, they are going to rate the service 1 star, even though the language used could be more or less positive.