Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Website: parse.ly Blog: blog.parse.ly Email: [email protected]
We help you understand audience attention.
Follow me: @amontalenti Our research: @parsely Our podcast: @attnpod
How? Parse.ly Analytics.
Web content visits represent attention at global scale.
+ hundreds of other companies who run thousands of high-traffic sites. + the long tail.
Sites with content and audience Platforms
Parse.ly measures content and audience …
Page views Visitors Engaged time Social shares
Audience loyalty Devices Video Titles
Authors Sections Tags Referrers
Campaigns Publish dates Channels
+Much more
… to tell the story behind the story.
Our dashboard can answer this question: What’s gaining attention on your sites and apps?
Provide a real-time and historical window into what’s happening with your content when it comes to audience attention.
• 30,000 monthly active users across 350+ media companies.
• Measures the attention of over 2 million page views per minute at peak time.
• Sub-second data latency with 99.99% internal SLO.
We make data accessible and essential.
Parse.ly Analytics: What’s running under the hood?
Powered by mage:
• 100+ Elasticsearch nodes storing over 20 terabytes of production live query data.
• 3,600+ real-time processing CPU cores using Storm.
• Kafka and Cassandra for rock-solid distributed streaming data.
• Elastic scalability for hourly and nightly jobs using Spark.
Parse.ly Analytics: What does the team release publicly?
We love open source!
• streamparse is our publicly-maintained and popular project for running production parallel computation systems with Python 2.x and 3.x, using Apache Storm.
• PyKafka is the community’s fastest and most production-tested Python driver for Apache Kafka.
+ PyKafka
+ parsely_raw_data
+ time-engaged
+ others
Why now? Parse.ly Currents.
Aggregate attention data already guides the industry.
And answers questions it could never answer on its own.
Our network data can answer this question: What do people care about?
Front row seat to the web interests of over 1 billion people per month and 150 million people per day.
Categories include: news, entertainment, finance, politics, sports, opinion, culture, and more.
Apply modern machine learning and natural language processing techniques.
Parse.ly Currents: What is our petabyte-scale analysis stack?
Petabytes of event data and terabytes of web crawl data.
• BigQuery used with day-partitioned tables to do fast aggregation over petabyte-scale event data without running a cluster.
• PyData stack used for statistics and machine learning over time series data.
• Natural language processing on text data using Python, leveraging a web-based ontology (knowledge graph), domain-specific keyword/entity lists, word vectors, document classifiers, unsupervised clustering, and more.
1 billion unique visitors per month
20 billion page views per month
5 billion clicks from search, social, & others
900k posts published and analyzed each day
2 million topics, categories, and keywords
Does discovery vary by topic?
87.1% Facebook
61.4%
60.8%
59.5%
58.9%
53.5%
52.7%
41.3%
36.3%
35.5%
21.3%
19.2%
14.1%
11.9% 3.7% 84.4%
39.0% 14.1%
30.4% 50.4%
18.0% 60.8%
22.3% 42.2%
20.7% 43.0%
28.9% 29.7%
22.6% 24.6%
22.2% 24.4%
19.8% 21.3%
15.9% 24.6%
10.1% 29.1%
12.3% 26.2%
6.2% 6.7%Google
Job Postings
Business & Finance
Sports
Technology
State & Local Politics
World Economy
National Security
Local Crime & Incidents
Criminal Justice
Education & Research
U.S. Presidential Politics
Entertainment
Local Events
Lifestyle
2.7k posts
39k posts
210k posts
67k posts
17k posts
26k posts
49k posts
98k posts
55k posts
36k posts
110k posts
190k posts
96k posts
110k posts
Topics are derived from posts in the Parse.ly network of sites from 2016 using a topic modeling algorithm called LDA (Latent Dirlichet Allocation). For more information: parsely.com/authority
Number of posts for each topic
110kposts
U.S.
Pre
s. P
oliti
cs
43% 47% 10%
Desktop Mobile Tablet
Device tra ic breakdown
Number of posts for each topic
26kposts
Wor
ld E
cono
my
46% 45% 9%
Desktop Mobile Tablet
Device tra ic breakdown
CLINTONPRESIDENTCAMPAIGNDONALDPRESIDENTIAL
OBAMAELECTION
PARTYHILLARY
STATE
POLITICALDEMOCRATIC
WHITE
CANDIDATE
VOTE
SANDERSHOUSE
VOTERS
FORMERAMERICAN
NEWSSTATES
COUNTRY
NATIONAL
DEBATE
WOMENAMERICA
CRUZCO
MM
ON
W
OR
DS
IN
PO
ST
S TRUMPU.S. Presidential Politics
REPUBLICAN
CHINAOILEUPERCENT
CHINESE
ENERGYSINCEPER
EUROPEANTRADE
CO
MM
ON
W
OR
DS
IN
PO
ST
S
STOCKSBREXITPRICESDEALBANKCENTNFLAPUK
World Economy
ACCORDINGMARKETSTRADINGBILLIONBRITAINMARKETSTOCKWORLDGLOBALPOWER
Google Search
Other
43.0%
36.3%
20.7%
External referral sources
4.6%
4.0%
2.4%
1.4%1.1%0.9%0.9%0.8%0.7%
news.google.com
twitter.com
yahoo!
drudgereport.comflipboard.combinglinkedin.comreddit.comtra ic.outbrain.com
Google Search
Other
59.5%
24.6%
15.9%
External referral sources
4.3%
4.1%
1.9%
1.1%0.9%0.7%
news.google.com
twitter.com
drudgereport.com
yahoo!bingreddit.com
Can Internet attention predict public opinion?
Can Internet attention predict a film’s revenue?
600k
500k
400k
300k
200k
100k
10k 20k 30k 40k 50k 60k 70k
Cumulative Box Office Gross Revenue
Print Ad Cost in US $
600k
500k
400k
300k
200k
100k
Cumulative Box Office Gross Revenue
Negative Cost in US $
50k 100k 150k 200k 250k200k
600k
500k
400k
300k
200k
100k
400k 600k 800k 1M
Cumulative Box Office Gross Revenue
Unique Views
0.955Pearson Correlation Coefficient
when excluding PG rated movies
Movies rated PG
Movies not rated PG
0.474Pearson Correlation Coefficient
when excluding PG rated movies
0.829Pearson Correlation Coefficient
when excluding PG rated movies
Revenue Compared toUnique Views
for Related Web Posts 3 Days Prior to Release
Revenue Compared toPrint Ad Cost in US $
Revenue Compared toProduction Cost in US $
Total unique views for posts related to a movie three days prior to its release has the highest correlation with revenue compared to production cost and advertising budget.
200k
600k
500k
400k
300k
200k
100k
400k 600k 800k 1M
Cumulative Box Office Gross
Revenue
Unique Views
0.955Pearson Correlation Coefficientwhen excluding PG rated movies
Movies rated PGMovies not rated PG
Revenue Compared to Unique Viewsfor Related Web Posts 3 Days Prior to Release
We are a partner you can trust. 400+ paying clients. 3000+ big sites. 1B+ network visitors.
We’re small and nimble, yet we operate with scale and integrity. We are 70+ people.
• A client services, support, and ops team of 40 people, with a head office in NYC.
• A fully distributed product team of engineers, data scientists, and designers. 30 people across US, Canada, and Europe.
• $12M+ USD in financing raised from 2011 to 2017.
Three asks for the audience today.
Sign up free, give us feedback!
http://parse.ly/currents
Follow me on Twitter!
@amontalenti
Website: parse.ly Blog: blog.parse.ly Email: [email protected]
Let’s continue the conversation about internet attention.
Follow me: @amontalenti Our research: @parsely Our podcast: @attnpod