Discovering Trending Topics in News

Preview:

Citation preview

DISCOVERING

TRENDING TOPICS

IN NEWSWITH MACHINE LEARNING AND R

Kory Becker – December 2014

DATA SCIENCE

Data Science vs Machine Learning

Data Science Generalizable extraction of knowledge from data

Predictive models and patterns

Machine Learning Algorithms for modeling data

SVM, neural network, clustering

Artificial Intelligence Creating intelligent machines, human-like, or not

Data science, machine learning, and a lot more

Unsupervised Learning

Exploratory data analysis

Discovers patterns in unlabeled data

No training set

No error rate for potential solution

Clustering, markov chains, feature extraction,

dimensionality reduction (principal component

analysis, etc)

K-Means

K-Means Example

http://www.naftaliharris.com/blog/visualizing-k-means-clustering/

What about Text?

Natural language processing

Term document matrix

Reduces text into array of 1’s and 0’s by term

Remove sparse terms (words existing in just a few documents)

Reduced dimensionality => compressed data => speed!

Unigrams vs Bigrams

Unigrams

George

Bush

Clooney

Bigrams

George Bush

George Clooney

N-grams?

Machine Learning + AP + ??? = Profit!

Read AP Video Hub mongo database

Build corpus from headlines

Use bigrams (word pairs)

Strip sparse terms

Apply K-means clustering

.. and what do we get?

Visualizing News Clusters

October 6, 2014

Visualizing News Clusters

November 5, 2014

Visualizing News Clusters

December 1, 2014

Conclusion

Data Mining

Machine Learning

Exploratory Data Analysis

Unsupervised Learning

Beyond human intelligence?

Discovering Trending Topics in Newshttp://primaryobjects.com/CMS/Article162

Mirroring Your Twitter Personal with Intelligencehttp://primaryobjects.com/CMS/Article160

TF*IDF with .NEThttp://primaryobjects.com/CMS/Article157

Time for Some Hacking in R

Now, the fun begins!

Recommended