KDD’14 Debrief 24 th April - 27 st, 2014 24 th April - 27 st August, 2014 New York City, US WING...
If you can't read please download the document
KDD’14 Debrief 24 th April - 27 st, 2014 24 th April - 27 st August, 2014 New York City, US WING Monthly Meeting (Oct 24, 2014) Presented by Xiangnan He
KDD14 Debrief 24 th April - 27 st, 2014 24 th April - 27 st
August, 2014 New York City, US WING Monthly Meeting (Oct 24, 2014)
Presented by Xiangnan He
Slide 2
Open Ceremony 2
Slide 3
Welcome Words Donot spend your precious time asking Why isnt
the world a better place? It will only be time wasted. The question
to ask is How can I make it better? To that there is an answer. ---
--- Leo Buscaglia
Slide 4
Overview The largest KDD conference ever. The largest KDD
conference ever. Number of attendees: 2200 + (last year is 1176).
151 Research papers (20% growth over KDD13), a 43 industry &
govt. papers (30% growth) 26 workshops (75% growth) 12 tutorials
(100% growth) Whats new? Whats new? Paper spotlights every morning
(1 min/paper) All papers are required to have a poster presented.
Networking Session: Building a Career in Data Science
Slide 5
Research Track
Slide 6
Reviewing Process
Slide 7
Submissions per Country
Slide 8
Acceptance by Subject Area
Slide 9
Predicting Paper Acceptance
Slide 10
Slide 11
Academia VS. Industry
Slide 12
Review Statistics
Slide 13
Slide 14
Research Topics Some technical topics that I found especially
notable/popular include: Topic/Graphical modeling (not only for
text mining, many tasks are addressed with this method) Deep
Learning (2 tutorials, but no full papers) Social Networks and
graph analytics (popular for the last 10 years, and even more so
this year) Recommendations Workforce analytics
Slide 15
Best Paper Awards Best paper: Best paper: Reducing the Sampling
Complexity of Topic Models. Aaron Q Li, Carnegie Mellon University;
Amr Ahmed, Sujith Ravi, Alexander J Smola, Google. Best student
paper: Best student paper: An Efficient Algorithm For Weak
Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye, Arizona State
University,Arizona State University.
Slide 16
Test of Time Award Integrating Classification and Association
Rule Mining [KDD 1998], cited by over 2000 times.
Slide 17
Some interesting papers Mining Topics in Documents: Standing on
the Shoulders of Big Data. Mining Topics in Documents: Standing on
the Shoulders of Big Data. Zhiyuan Chen, Bing Liu; University of
Illinois at Chicago; Matching Users and Items Across Domains to
Improve the Recommendation Quality. Matching Users and Items Across
Domains to Improve the Recommendation Quality. Chung-Yi Li,Shou-De
Lin; National Taiwan University FoodSIS: A Text Mining System to
Improve the State of Food Safety in Singapore Kiran Kate, Sneha
Chaudhari, Andy Prapanca, Jayant Kalagnanam; IBM Research;
Slide 18
Mining Topics in Documents: Standing on the Shoulders of Big
Data. Mining Topics in Documents: Standing on the Shoulders of Big
Data. Zhiyuan Chen, Bing Liu; University of Illinois at Chicago;
Proposed a variant of topic model that can generate more accurate
and coherent topics via integrating knowledge. Proposed a variant
of topic model that can generate more accurate and coherent topics
via integrating knowledge. 2 kinds of Knowledge: 2 kinds of
Knowledge: Must-links, e.g., Must-links, e.g., Cannot-links, e.g.,
Cannot-links, e.g., Knowledge are mined through frequent itemset
mining. Knowledge are mined through frequent itemset mining. But
knowledge can be wrong, authors further propose some rules to clean
up the knowledge. But knowledge can be wrong, authors further
propose some rules to clean up the knowledge. Knowledge can be
easily integrated the into the inference algorithm with generalized
Polya Urn Model. Knowledge can be easily integrated the into the
inference algorithm with generalized Polya Urn Model.
Slide 19
Innovation Award Talk Principles of Very Large Scale Modeling
Principles of Very Large Scale Modeling by by Pedro Domingos, from
University of Washington. Three principles: Three principles: 1.
Model the whole, not just parts; 1. Model the whole, not just
parts; People (customers) influence each other - model the whole
network, not each person separately. 2. Tame complexity via
hierarchical decomposition; 2. Tame complexity via hierarchical
decomposition; We can make 2 assumptions: 1) Subparts are
independent given the part; 2) Probability for class is the avg
over subclasses. Using hierarchy and 2 previous assumptions makes
our inference tractable. Example: Markov Logic Network +
Sum-Product Theorem = Tractable Markov Log 3. Time and space should
not depend on data size. 3. Time and space should not depend on
data size.
Slide 20
THANK YOU! Video recordings of KDD:
http://videolectures.net/kdd2014_newyork/
http://videolectures.net/kdd2014_newyork/