24
Stig Workshop Architect of Scalable Infrastructure - Jason Lucas

Wed 1315 lucas_jason_color

Embed Size (px)

Citation preview

Page 1: Wed 1315 lucas_jason_color

Stig Workshop

Architect of Scalable Infrastructure - Jason Lucas

Page 2: Wed 1315 lucas_jason_color

A graph of user session analytics

Page 3: Wed 1315 lucas_jason_color

Session Analytics Graph

A sample of session flows stored in Stig <Source, ‘generated’, Clickthrough> <Clickthrough, ‘clicked by’, User> <Clickthrough, ‘served’, Page>

Page 4: Wed 1315 lucas_jason_color

Inferred Edges We can define rules for edges that are not part of the graph, but that are inferred knowledge from the edges that are in the graph. For example, we can infer that a person requested a page: <person, ‘requested’, page> = walk clickthrough:

[ <clickthrough, ‘clicked_by’, person>;

<clickthrough, ‘served’, page> ];

Or that a page was visited from another page:

<page1, ‘clicked_to’, page2> = walk clickthrough:

[ <page1, ‘generated’, clickthrough>;

<clickthrough, ‘served’, page2> ];

Page 5: Wed 1315 lucas_jason_color

Queries of Interest

•  How often does a person visit the same page? •  How often is such a visit an inconvenient clickthrough or redirect? •  Do people visit a given page more often at a certain time of day? •  Does their ip address (home or work PC) influence their usage

patterns? •  Do friends’ usage patterns on the site influence this specific user? (A

friend started playing a new game) •  General usage statistics. •  Have links on our site been moved/removed in such a way that users

can’t find the features they want to use?

•  The biggest question is…

How can we improve our users’ experience?

Page 6: Wed 1315 lucas_jason_color

User Experience Optimization

•  Find common usage patterns in a specific user’s session flow •  Look for inconveniences within this flow and experiment with removing

them; acquire user feedback •  For example, a user that comes to our site to play a specific game and

has to click 3-4 times to get to this game’s main page •  How can we identify users whom we want to put in an experimental

flow? •  A simple identifier: 90% of their flows lead to a particular feature.

Page 7: Wed 1315 lucas_jason_color

Simple Stig Functions

How many times did Page A lead to Page B? numClicksFromTo source dest = count (solve : [<source, ‘clicked_to’, dest>]);

How many times was Page A served? numServes dest =

count (solve (clickthrough) : [<clickthrough, ‘served’, dest>];

How many times did a particular user access Page A? numUserRequestsPage person page =

count (solve : [<person, ‘requested’, page>]);

How many times did this particular user go from Page A to Page B? numUserRequestsPageFrom person source page =

count (solve : [ <source, ‘clicked_to’, page>;

<person, ‘requested’, page> ]);

Page 8: Wed 1315 lucas_jason_color

Let’s take a look at some sample user session flows…

Page 9: Wed 1315 lucas_jason_color

Alice loves to play Pets!

“Alice comes to our site to play Pets. She always hits the home page and has

to click through to her favorite game. Plenty of room for improvement here”

Page 10: Wed 1315 lucas_jason_color

Bob interacts a lot with his newsfeed

“Bob likes to go straight to newsfeed and expand people’s comments or like them.

Can we better his experience by expanding his favorite comments or

displaying newsfeed on his home page?”

Page 11: Wed 1315 lucas_jason_color

Carol receives an e-mail about a specific newsfeed post

“Carol received an e-mail notification that her friend

mentioned her in a newsfeed comment. She had to press expand on the comments to be able to see it. We should

have known better!”

Page 12: Wed 1315 lucas_jason_color

Dave checks his Cafe thanks to an e-mail notification

“Dave received an e-mail notification telling him his cookers were ready in Cafe. When he clicked on the link in the e-mail it took

him to the home page! Now he has to click on his alert, get redirected to the Cafe game,

and then start playing.”

Page 13: Wed 1315 lucas_jason_color

“What about Alice?”

•  Alice is identified as a candidate for our user experience optimization study.

•  Let’s look at Alice’s patterns… •  Alice goes from the home page straight to Games 90% of the time, so

we want to just give her the Games page when she visits our site. •  So we’ll be adding her to our experimental user base. •  Also, we want to let her give us a “Thumbs Up!” or “Thumbs Down!”

on the new flow.

•  How can we make the database do this?

Page 14: Wed 1315 lucas_jason_color

Modifying the User Flow In Real Time

We update the database to set Alice’s status as participating in the user flow experiment:

when (( numUserRequestsPageFrom ‘Alice’ ‘Home Page’ ‘Games Page’ / numUserRequestsPage ‘Alice’ ‘Home Page’) > 0.9)

do {

FlowExperiment@’Alice’ := { running = True;

startDate = Now;

HasVoted = False;

};

<‘Alice’, ‘participates_in_experiment’, ‘Games Page’> := True;

};

We also add an edge to the graph showing that Alice is part of the Games Page experiment – we can find all the users that are part of the experiment by getting all the edges from that node.

Page 15: Wed 1315 lucas_jason_color

Analyzing the New Flow

•  Alice gets a friendly pop-up asking how she likes the new flow, and whether to keep it.

–  Thumbs Up! -> Keep the new structure! I love it! –  Thumbs Down! -> Give me back my old flow!

•  We can update the database with a simple Stig function: thumbsUpdate user = do {

FlowExperiment@’user’ := { HasVoted = True; };

ThumbsUp@’Flow Experiment’ += 1;

};

thumbsDowndate user = do {

FlowExperiment@’user’ := { HasVoted = True;

running = False; };

ThumbsDown@’Flow Experiment’ += 1;

};

Page 16: Wed 1315 lucas_jason_color

We Snuck Something in There:"Improving Update Concurrency

•  x +=1 is better than x = x + 1 •  We can take in many thumb updates, and only need to evaluate the

ThumbsUp or ThumbsDown total when we need it •  Common terminology:

–  Database theorists would call this a Field Call –  Escrowing –  Write without read –  Commutative operations

Don’t ask for things you don’t need! “If you don’t care about the result, don’t make Stig compute it”

Page 17: Wed 1315 lucas_jason_color

Calling Stig

•  Our client API is available from: –  Python –  PHP –  Perl –  Java –  C / C++ –  and we can serve HTTP directly

•  Our focus is the web: –  Almost all calls are asynchronous and return futures –  Sessions are durable and progress while you’re not connected –  Most interface objects have a Time To Live

Page 18: Wed 1315 lucas_jason_color

Sessions

•  Sessions are durable. •  You can close a session and

re-open it later. •  This is to facilitate HTTP. •  Access is controlled by a

security token. •  Sessions eventually die of old

age if left alone. •  Sessions have synthetic

nodes in the graph. •  Use the sessions node to store

session-specific data.

•  Sessions are replicated. •  If a session server goes down,

one of its backups will take over.

•  This might require your client to restart its cursors.

•  Progress happens when you’re not looking. •  Queries and updates continue

to make progress, even when a session is closed.

•  Notifications accumulate in your in-box and are waiting for you when you re-open.

Page 19: Wed 1315 lucas_jason_color

Writing in Stig

•  Stig is a compiled language –  What we can do through analysis most databases have to do through data-

definition languages and DBA tweaking –  Feedback from the compiler is not just about program correctness but expected

performance –  Application programmers become aware of scaling problems before they happen

but are not required to be scalability engineers in order to fix them •  Stig is a programming language not a query language

–  Stig harnesses the computation power of the cluster, not just its storage capacity –  The more of your program is written in Stig, the more you can take advantage of

distributed evaluation –  Stig programs are stored in the database and can call each other, enabling a

strategy of library development •  Stig marries logical pattern matching and functional evaluation.

–  That is: search + computation = Stig.

Page 20: Wed 1315 lucas_jason_color

Logical Pattern Matching

•  Writing a pattern: –  walk: person, friend, hobby, site [

< person, ‘is friend of’, friend >, < friend, ‘has interest in’, hobby >, < hobby, ‘is advertised on’, site > ];

•  Yields a sequence: –  [ { person=/users/alice, friend=/users/bob,

hobby=/subjects/gardening, site=“http://www.plantastic.com” }, { person=/users/bob, friend=/users/carol, hobby=/subjects/yoga, site=“http://lowfatyoga.com” } ]

Page 21: Wed 1315 lucas_jason_color

Composing Sequences for Distributed Evaluation •  chain() // concatenate sequences •  collect() // collect a sequence into a list •  reverse() // collect a sequence into a list in

reverse order •  map() // apply an arity-1 function to each

element in a sequence, yielding a sequence

•  reduce() // apply an arity-2 function to each element in a sequence, yielding a scalar

•  zip() // convert a tuple of sequences into a sequence of tuples

•  group() // collect a sequence into a sequence of lists

•  enumerate() // convert a sequence into a sequence of lists with ordinals

•  sort() // convert a sequence into a sorted list

•  filter() // convert a sequence into a sequence with some elements filtered out

•  count() // count the number of elements in a sequence, yielding an integer

•  product() // yield as a sequence the Cartesian product of a tuple of sequences

•  range() // yield as a sequence a range of integers

•  slice() // slice a subsequence from a sequence

•  cycle() // repeat a sequence over and over •  select() // filter elements from a sequence

using a second sequence of true/false •  group_by() // collect a sequence into a set

of subsequences by key

Page 22: Wed 1315 lucas_jason_color

Pure Graph vs. Fat Graph

Pure Graph •  Stores exactly one kind of

data in each node. •  Doesn’t usually support

aggregate types (structures).

•  Tend to have lots of ‘has attribute’ edges making their edges fuzzy and inefficient.

Fat Graph •  Stores multiple kinds of

data in each node. •  Allows structures and can

see into them. •  Coalesces aggregated

data into a single component, suitable for re-constitution as a program object.

Page 23: Wed 1315 lucas_jason_color

Evolving into Stig

•  Treat Stig like a filesystem •  Store unstructured information

at locations •  Treat Stig like a kv store

•  Nodes are basically like key-value stores, except with type.

•  Ignore edges and just access nodes by their ids.

•  Treat Stig like SQL •  Simple walks are like table

scans. •  Keep your data in separate,

scalar fields and the results will look tabular.

•  Over time… •  Begin with edges as ad-hoc

indices. •  Add more complex edge-

driven behavior as you grow more comfortable.

•  Evolve your schema freely by adding facets.

•  Even if you ignore everything, you still get: •  ACID-like guarantees at scale. •  Single path to data. •  Stand-alone development.

Page 24: Wed 1315 lucas_jason_color

Moving Forward

•  Open Source –  We plan to offer Stig under

a extremely liberal license (Apache)

–  We are seeking engagement with the open source community and with other companies in this space

–  Introducing Yaakov, Stig’s voice in open source

•  At our Booth –  Copies of our decks

•  Find us at stigdb.org –  Signs up there for updates –  Open source to drop Q4

2011 •  Workshops

–  At our office in SF –  See website for schedule

•  Ideas? –  If you have the killer Stig

app, we want to hear from you.