Download pptx - Klout as an Example Application of Topics-oriented NLP APIs

Topics-oriented APIsMay 2015 – APIdays Barcelona

Tyler Singletary - @harmophone

Director of Platform

[email protected]

HI.

HI. WITH CONTEXT.

A Practical Application of Social Media Machine Learning and NLP1

WHAT IS KLOUT, REALLY?

• Klout is an API client application of the social web.

• Federated identity across platforms

• Macro and micro understanding of profile, conversation, and content.

People linked by Topics.

UNIFYING PRINCIPLE: TOPICS

• TBs of Social Interactions a Day

• NLP applied to posts

• Aggregated to profiles: – effects are Klout Score,

topical strengths– The what becomes topics– The why becomes TopicSets

• Links crawled, NLP summarization

Content and people linked by Topics.

TOPIC SETS + USERS + SCORING

• Allow for time-series slicing

• Aggregate counting

• Slicing of set to create ordered list

Topic-oriented view

NLP-based Building Blocks 2

KLOUT DEALS WITH RIDICULOUS AMOUNTS OF DATA

o Topic assignment at scale:o ~650 M new pieces of data daily o hundreds of millions of profileso ~10,000 topics in 3-level hierarchyo Daily update

o Multiple Social networks and various data sources:o Twitter, Facebook, LinkedIn, Google+, Wikipediao User activity, profiles, connections

o Topics normalized to an evolving, managed ontology

WEIGHTING, NORMALIZATION, CALIBRATION

Signals are weighted and normalized to mirror real-world influence– Machine-learned weighting based on regression

analysis of survey data

Advanced algorithm based on 1500 signal combinations of relationships and ratios

– Where: Which network is the action taking place?– What: What action was taken?– Who: Who acted on your content?– How much: How many actions and unique actors?– When: When was the action performed?

TOPIC SETS FOR CONTEXT

User’s Influence

With various Scores

User’s Interests

With various Scores

User’s Self-selection

Based on registered self-declared interest

Audience InfluenceRollup of User’s

Influence within a user’s downlevel and

uplevel networks

Audience Interests

Rollup of User’s Interests within a User’s downlevel (and uplevel) networks

CHALLENGES IN BIG DATA

● Message size: Overall data size may be huge, but message size per user may be small.

● Text Sparsity: Many users may be passive consumers of content.

● Noise: colloquial language, slang, grammatical errors, abbreviations.

● Context: Need to expand context to get more information

● False positives are embarrassing when user-facing

CHALLENGES TO SCALE

NLP* - StanfordNLP english.conll.4class.distsim.crf.ser.gz

● Speed Matters (650M messages a day): ○ Stanford Named Entity Extraction - 10.959 ms (82.0 CPU days)○ Dictionary - 0.056ms (0.42 CPU days)

● Corpus○ Stanford Named Entity Extraction:

■ {‘the rule of law’=1.0}○ Dictionary based:

■ {‘the rule of law’=1.0, ‘nsa’=1.0, ‘eff’=1.0}

WEBSTER

MACHINE LEARNING AT KLOUT

We our leverage past machine learning and NLP classification assets to:

• Train new models for adding additional data sources

• Retraining Topics classification

• Predict “actionability” of support

• Predict virality of content [macro and micro]

• Predict the “personhood” of a social media account

• Content-targeting based on downlevel predictions

How do you productize this in APIs?3

INPUTS AND OUTPUTS

People-Specific Insights

Input: People(s)

Output: TopicSet(s)

Topic-Specific People

Input: Topic(s)

Output: People

Topic-Aggregate Insights

Input: Topic(s)

Output: Metadata, Aggregation

People-Aggregate Insights

Input: User(s)

Output: Metadata, Aggregate Sets

GET user.json/[id]/insights/influence-topics

GET user.json/insights/aggregated/influence-topics?userIds=1,2,3

GET topic.json/[ids]/people

GET topic.json/[ids]/insights

PAYLOADS{topicSetType: "expertise",topicSet: [{topicId: "7516448513106795305",score: 0.999596145670965,strength: "strong",displayName: "APIs",name: "APIs",slug: "api",imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d483c4eb8.png",displayType: "visible",topicType: "entity"},{topicId: "10000000000000008253",score: 0.9992839644220868,strength: "strong",displayName: "Twitter",name: "Twitter",slug: "twitter",imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",displayType: "visible",topicType: "entity"},{topicId: "8961164588331655920",score: 0.9992326280041798,strength: "strong",displayName: "Klout",name: "Klout",slug: "klout",imageUrl: "http://kcdn3.klout.com/static/images/klout-topic-image-1333588028647.jpg",displayType: "visible",topicType: "entity”

topicSetType: "interest",topicSet: [{topicId: "10000000000000008253",score: 0.9946672348339362,strength: "strong",displayName: "Twitter",name: "Twitter",slug: "twitter",imageUrl: "http://kcdn3.klout.com/static/images/icons/generic-topic.png",displayType: "visible",topicType: "entity"},{topicId: "6485494992525344250",score: 0.9918719149780779,strength: "strong",displayName: "Marketing",name: "Marketing",slug: "marketing",imageUrl: "http://kcdn3.klout.com/static/images/topics/people.png",displayType: "visible",topicType: "sub"},{topicId: "7516448513106795305",score: 0.9888798650771197,strength: "strong",displayName: "APIs",name: "APIs",slug: "api",imageUrl: "http://kcdn3.klout.com/static/images/topics/api_6bae2a67e1a5a9b68d526b4d483c4eb8.png",displayType: "visible",topicType: "entity"},

Let’s get practical, prescriptive and

talk about the future4

PARAMETERIZATION

• Topics Scoring uses different models in each topic set

• Overall Topic Scoring is based on hundreds of features, weights, decays, spanning short and long term

• Parameterize scoring for different contexts

EXAMPLES

Use interchanging, specified models, with rules modifiers

EXAMPLES

• Treated like a product, you must think through implementations others would make.

• Maybe even make them your own.

POLICY

• Data is great.

• Representation of data is hard.

• Raw data rarely if ever needs to be displayed.

• Balance innovation on data assets with brand and utility, allowed use cases.

KLOUT RESEARCH ONLINE

• LASTA

Bye!May 2015 – APIdays

Tyler Singletary - @harmophone

Director of Platform

[email protected]