View
212
Download
2
Category
Preview:
Citation preview
DISCOVERING
TRENDING TOPICS
IN NEWSWITH MACHINE LEARNING AND R
Kory Becker – December 2014
DATA SCIENCE
Data Science vs Machine Learning
Data Science Generalizable extraction of knowledge from data
Predictive models and patterns
Machine Learning Algorithms for modeling data
SVM, neural network, clustering
Artificial Intelligence Creating intelligent machines, human-like, or not
Data science, machine learning, and a lot more
Unsupervised Learning
Exploratory data analysis
Discovers patterns in unlabeled data
No training set
No error rate for potential solution
Clustering, markov chains, feature extraction,
dimensionality reduction (principal component
analysis, etc)
K-Means
K-Means Example
http://www.naftaliharris.com/blog/visualizing-k-means-clustering/
What about Text?
Natural language processing
Term document matrix
Reduces text into array of 1’s and 0’s by term
Remove sparse terms (words existing in just a few documents)
Reduced dimensionality => compressed data => speed!
Unigrams vs Bigrams
Unigrams
George
Bush
Clooney
Bigrams
George Bush
George Clooney
N-grams?
Machine Learning + AP + ??? = Profit!
Read AP Video Hub mongo database
Build corpus from headlines
Use bigrams (word pairs)
Strip sparse terms
Apply K-means clustering
.. and what do we get?
Visualizing News Clusters
October 6, 2014
Visualizing News Clusters
November 5, 2014
Visualizing News Clusters
December 1, 2014
Conclusion
Data Mining
Machine Learning
Exploratory Data Analysis
Unsupervised Learning
Beyond human intelligence?
Discovering Trending Topics in Newshttp://primaryobjects.com/CMS/Article162
Mirroring Your Twitter Personal with Intelligencehttp://primaryobjects.com/CMS/Article160
TF*IDF with .NEThttp://primaryobjects.com/CMS/Article157
Time for Some Hacking in R
Now, the fun begins!
Recommended