Frontend at Scale - The Tumblr Story

Preview:

DESCRIPTION

Growing to become one of the largest sites on the Internet comes with a unique set of problems. Learning how to and adopt, and doing so without losing sight of content creator's voice proves tricky. This talk details some of the frontend tools we've built and approaches we've taken to service our millions of users at scale.

Citation preview

Frontend at Scalethe Tumblr story

What is Tumblr?→ Platform for you to express yourself

→ ~200 million blogs

→ 83+ billion posts

→ HQ in NYC

→ Founded in 2007

→ 100+ engineers

What is Tumblr?→ Three ways to surface

content:

→ The dashboard

What is Tumblr?→ Three ways to surface

content:

→ The dashboard

→ Search

What is Tumblr?→ Three ways to surface

content:

→ The dashboard

→ Search

→ Blog network

!

(Example: http://16-bitch.tumblr.com/)

Who am I?

→ Chris Miller

→ Product Engineering Manager

→ Content Consumption (a.k.a., The Dashboard)

Our stack→ Frontend

→ Backbone (+ lodash, underscore, etc.)

→ jQuery (+ some plugins)

→ SASS (+ Bourbon)

→ a bit of VelocityJS

→ Gulp for build

Our stack

→ Backend

→ PHP application layer

→ Some specialized services (Scala, C, etc.)

→ Data: MySQL, Redis, memcache, HDFS

How does it work?

→ 1000’s of servers

→ Deploy dozens of times per day

→ Monitor and measure everything

→ Hadoop

→ OpenTSDB (backed by HBase)

Our process

→ Teams are small

→ Iterate quickly

→ Release early and often, usually to % of users

→ 2 code review “ok’s” required for all Pull Requests

Feature Flagging

Feature Flagging

What is it?

→ Segregate your users to certain features

→ Control who sees what (and when)

Feature Flagging

Implementation→ Server-side feature flagging

→ Client-side feature flagging

Feature Flagging

Usage

→ Provides

→ A/B testing

→ Run beta code alongside production code

→ Kill switch

Feature Flagging

A/B Testing→ Injected recommendations

→ A/B(/*) testing of positioning

→ Which position is the best? Why?

Feature Flagging

A/B Test Results→ Injected recommendations

→ A/B(/*) testing of positioning

→ Which position is the best? Why?

Position 2

Position 3

Position 4

Position 5

Position 6

Position 7

Position 8

Position 9

Feature Flagging

Ramping & Kill Switch

→ Ramping new features

→ Deploy to only “admin” (staff)

→ …then 1% of users… then 5%… 10%… 25%…

→ Kill switch

→ Completely turn off a feature that’s breaking the site… poof

Feature Flagging

Use Carefully→ Feature flagging certain functionality can give a mixed

experience

→ Can cause user confusion:

→ “Why does my mom see this and I don’t?” — Confused teenager

→ Easy to build complex dependencies — don’t

Error Logging

Error Logging

Launching Features→ New features usually have bugs

→ (Well, not my code)

→ (just kidding)

Error Logging

Error Logging→ New features usually have bugs

→ Server-side errors, easy to find

Error Logging

Error Logging→ New features usually have bugs

→ Client-side errors, also easy to find…

→ …on my browser

Error Logging

Error Logging→ New features usually have bugs

→ Client-side errors, not easy to find on your browser

→ …until recently

Error Logging

Capture Errors→ We built: exceptions.js

→ Really, it’s just: window.onerror

Error Logging

Capture Errors→ Build dependency-free

→ Build to be defensive

Error Logging

Capture Errors→ What you do with the logs doesn’t matter; it’s how you use it

→ We log errors to Scribe…

→ …throw them into Hadoop

→ …and count frequency with OpenTSDB

Error Logging

Error Data→ With Hive, we can query Hadoop:

→ With this, I can see we log around 1.4 million errors per day

Error Logging

Error Data→ With OpenTSDB we can plot the frequency of logs

Error Logging

We Love Graphs→ We made pretty graphs with OpenTSDB and graph everything

Getting it Right→ Sometimes we find errors before our users do.

→ Sometimes.

→ And it makes us feel good.

Getting it Right→ So we dance.

Thank You

Email - cmiller@tumblr.comFollow me - ee99ee.com