Upload
jason-packer
View
330
Download
0
Embed Size (px)
Citation preview
WHAT DRIVES OUR METRICS?
*Note all metrics may be inaccurate by some amount****But we’re not sure which ones and by how much.
DATA COLLECTION 1.0: SERVER LOGS, HITS, IP ADDRESSES
• Server logs, valid in 1996 and 2016
• Basic, but still contains highly useful data!
• Unanalyzed raw logs get big, fast.
128.135.189.9 - - [15/Feb/1996:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/1.0 (Win3.1)” 65.60.216.104 - - [15/Feb/2016:15:16:27] "GET / HTTP/1.1" 200 5397 "Mozilla/5.0 (Mac OS)"
WEB ANALYST, CIRCA 2000
flickr: boston_public_libraryCC BY-NC-ND 2.0
DATA COLLECTION 2.0: CLIENT-SIDE JAVASCRIPT, COOKIES
• Easier to implement (“just a few lines of JavaScript…”)
• Cookies match users closer than IPs
• Much more info available on client-side
HOW DOES CLIENT-SIDE JS WORK? …SPECIFICALLY GOOGLE ANALYTICS
2 requests - 1st for code, 2nd with measurement
TRACKING CODE SNIPPETS
• Sets up command queue
• Loads analytics.js, which does the real work.
<script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-34128028-1', 'auto'); ga('send', 'pageview');
</script>
MEASUREMENT PROTOCOL
https://www.google-analytics.com/collect?v=1&_v=j41&a=702618035&t=pageview&_s=1&dl=https://www.quantable.com/&ul=en-us&de=UTF-8&dt=Quantable - Analytics & Optimization&sd=24-bit&sr=1680x1050&vp=1442x464&je=0&_u=SCCAAUAjK~&jid=&cid=157092037.1441829013&tid=UA-34128028-1&z=823826407
This hit..
Once made readable, is this data…
SEEMS GREAT, WHAT COULD POSSIBLY GO WRONG?
Some data still only on the server side…
• Bot traffic (mostly)
• HTTP errors
• Pages we forgot to tag
• Content blocking users
SERVER LOGS, AGAIN
• Distributed systems, distributed logs
• As before, but somewhat different consumers
AS ANALYSTS, WHAT’S GIVING US GRIEF
• Cookie Deleting Users
• Bots
• Analytics “Referrer” Spam
• Ad blocker Users
COOKIE DELETING USERS IS IT STILL ~30%?
• Artificially increases user counts
• Visit after deletion is direct, no attribution
• Stats based on users accounts? flickr: diskantCC BY-NC 2.0
BROWSER FINGERPRINTS
• Survives Cookie deletion
• 2010 EFF Panopticlick: 84% of browsers unique
• Invasive?
• Browser fingerprint + IP in Piwik as cookie fallback
• Can be thought of as next gen User-Agent + IP
BOTS
• About 50% of all traffic may be bots (48.5%, Incapsula 2015)
• Most of these don’t show in GA (yet?)
• Smaller the site, higher the bot % (85% for <1k visits/day) flickr: skynoir
CC BY-NC 2.0
BOTS
BOTSBOTS
BOTS
ANALYTICS SPAM
• free-social-buttons.biz, top-seo-blah-blah-blah.com, number-one-analytics.fail
• Way to get traffic, SEO, and lulz since before 2009
• Not GA specific, just the #1 target
• Two kinds: Crawler & Ghost
WHO’S SPAMMING US TODAY?List of 2016 GA Spammers from Analytics Edge
Google is blocking offenders, but often not quickly.
WHY IS IT SO PREVALENT?“Ghost” version via Measurement Protocol abuse $ curl "https://www.google-analytics.com/collect?v=1&t=pageview&tid=UA-XXXX-X&cid=fa0c8140-eef8-47c5-a244-b4c60cf46f74&dr=http%3A%2F%2Fmyspamsite.pizza&dp=%2Fhome"
Just iterate through UA-XXXX-1 numbers.
HOW DO I FIX IT?
• Filters for new traffic, segments for historical
• Tool available on my site: quantable.com/spamfilter
• Higher than UA-XX—1 property tracking id number for new site
AD BLOCKING IS MAKING SOME OF OUR USERS DISAPPEAR
• Blockers such as AdBlock Plus, Ghostery, uBlock Origin, and Purify can block analytics tools, not just ads
• ABP has largest install base (300M downloads)
• These users are still in your server logs, but may never show up in your web analytics
HOW DOES THE BLOCKING WORK?
• Long lists of URLs to block loading for, e.g.: google-analytics.com/analytics.js /piwik.php ?[AQB]&ndh=1&t= com/0.gif?
• EasyPrivacy list (used by ABP and others) is over 10,000 lines long and very actively maintained
HOW DO I COUNT BLOCKERS?
• Can’t really be “fixed” client-side
• Still show up server-side
• May be against GA terms (can’t circumvent Opt-Out Add-on)
THANKS!slides & recap to be posted at cbuswaw.com
References & Further Reading
Quantable GA Blocking Analysis:https://www.quantable.com/analytics/how-many-users-block-google-analytics/
GA Tracking Code walkthrough:http://code.stephenmorley.org/javascript/understanding-the-google-analytics-tracking-code/
GA Measurement Protocol Hit Builder: https://ga-dev-tools.appspot.com/hit-builder/
Fingerprintjs2: http://valve.github.io/fingerprintjs2/
Incapsula 2015 Bot Reporthttps://www.incapsula.com/blog/bot-traffic-report-2015.html
Analytics Edge’s Guide to GA Spam:http://help.analyticsedge.com/spam-filter/definitive-guide-to-removing-google-analytics-spam/