Smalltalk and Big Data - Avi Bryant

Preview:

Citation preview

Smalltalk and Big Data

Avi BryantTwitter

Smalltalk and Big Data

Avi BryantTwitter

and the web

Smalltalk and Big Data

Avi BryantTwitter

and the weband stuff

2004

2004-2011

View

Controller

Model

Web Client

Web Server

Storage

Web Client

HTML

GET/POST

HTML

GET/POST

HTML

GET/POST

HTML

GET/POST

Web Client

HTML

GET/POST

HTML

GET/POST

HTML

GET/POST

HTML

GET/POST

Web Client

HTML+JS

GET/POST

JSON

XHR

JSON

XHR

JSON

XHR

Ten days to implement the lexer, parser, bytecode emitter, interpreter, built-in classes, and decompiler.

Ten days without much sleep to build JS from scratch, "make it look like Java" (I made it look like C), and smuggle in its saving graces: first class functions (closures came later but were part of the plan), Self-ish prototypes (one per instance, not many as in Self).

I'll do better in the next life.

— Brendan Eich

Lars Bak

150M+ active users

Web Client

Web Server

Storage

Web Server

• Continuation-based flow control

Web Server

• Continuation-based flow control• HTML generation

Web Server

• Continuation-based flow control• HTML generation• Stateful UI components

Web Server

• Continuation-based flow control• HTML generation• Stateful UI components• callbacks with unique IDs

Web Server

• Continuation-based flow control• HTML generation• Stateful UI components• callbacks with unique IDs

CSRF<img src=“http://mail.google.com/mail/?logout” />

CSRF<img src=“http://mail.google.com/mail/?logout” />

http://mail.google.com/mail/?logout&token=ab4367de

CSRF<img src=“http://mail.google.com/mail/?logout” />

http://mail.google.com/mail/?logout&token=ab4367dehttp://mail.google.com/seaside/mail?_k=ab4367de

Burn the disk packs• No components, continuations, or canvas• JSON builder w/ callbacks

Web Client

Web Server

Storage

Storage

=~

Stone

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

MySQL

Memcache

Ruby Ruby

Memcache

Ruby Ruby

Memcache

Ruby Ruby

Memcache

Ruby Ruby

Gem+SPC+Stone = Transparent Management

Ruby+Memcache+MySQL =Explicit Management

Storage

Gem+SPC+Stone = Transparent Management

Ruby+Memcache+MySQL =Explicit Management

Storage

Gem+SPC+Stone = Transparent Management

Ruby+Memcache+MySQL =Explicit Management

Storage

MySQL

Memcache

Ruby Ruby

Memcache

Ruby Ruby

Memcache

Ruby Ruby

Memcache

Ruby Ruby

MySQL MySQL MySQL

Sharding?

OOCL: 3B objects500GB data

Sharding?

OOCL: 3B objects500GB data

= 3 weeks of tweets

Stone Slave

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

Shared Page Cache

Gem Gem

Stone Stone Stone

Web Client

Web Server

Online Storage Offline Storage

Offline Storage

15TB

Offline Storage

15TB

Thu

Offline Storage

15TB

Mon

15TB

Tue

15TB

Wed

15TB

Thu

15TB

Fri

15TB

Sat

15TB

Sun

15TB

Mon

15TB

Tue

15TB

Wed

15TB

Thu

15TB

Fri

15TB

Sat

15TB

Sun

15TB

Mon

15TB

Tue

15TB

Wed

15TB

Thu

15TB

Fri

15TB

Sat

15TB

Sun

Hadoop

Hadoop

Hadoop

tweets.tsv/part0

tweets.tsv

tweets.tsv/part1

tweets.tsv/part2

tweets.tsv/part0tweets.tsv/part1

tweets.tsv

tweets.tsv/part1tweets.tsv/part2

tweets.tsv/part2tweets.tsv/part0

MAP REDUCE

tweets.tsv/part0

tweets.tsv

tweets.tsv/part1

tweets.tsv/part2

grep smalltalk tweets.tsv > st.tsv

grep smalltalk tweets.tsv/part0 > st.tsv/part0

grep smalltalk tweets.tsv/part1 > st.tsv/part1

grep smalltalk tweets.tsv/part2 > st.tsv/part2

MAP

tweets.tsv/part0st.tsv/part0

tweets.tsv

tweets.tsv/part1st.tsv/part1

tweets.tsv/part2st.tsv/part2

grep smalltalk tweets.tsv > st.tsv

grep smalltalk tweets.tsv/part0 > st.tsv/part0

grep smalltalk tweets.tsv/part1 > st.tsv/part1

grep smalltalk tweets.tsv/part2 > st.tsv/part2

MAP

tweets.tsv/part0st.tsv/part0

tweets.tsv

tweets.tsv/part1st.tsv/part1

tweets.tsv/part2st.tsv/part2

wc -l st.tsv > count.tsv

REDUCE

sum > count.tsv/part0

wc -l st.tsv/part0

wc -l st.tsv/part1

wc -l st.tsv/part2

count-words st.tsv/* | sort | sum > count.tsv

squeak 3smalltalk 5visualworks 10squeak 6smalltalk 4visualworks 7squeak 1visualworks 3

squeak 1squeak 3squeak 6smalltalk 4smalltalk 5visualworks 3 visualworks 7visualworks 10

squeak 10smalltalk 9visualworks 20

tweets.tsv/part0st.tsv/part0

tweets.tsv

tweets.tsv/part1st.tsv/part1

tweets.tsv/part2st.tsv/part2

count-words st.tsv | sort | sum

REDUCE

sum > count.tsv/part2

count-words st.tsv/part0

count-words st.tsv/part1

count-words st.tsv/part2

sum > count.tsv/part1

sum > count.tsv/part0

(word, count)

MAP REDUCE

MAP

REDUCE

MAP

REDUCE REDUCE

MAP

MAP

REDUCE

MAP REDUCE

MAP

REDUCE

MAP

REDUCE REDUCE

MAP

MAP

REDUCE

Join

Group & Count

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }

:(

:/

users := ‘users.csv’ loadFromHadoop.users_1825 := users select:

[:ea | ea age between: 18 and: 25].

joined := users_1825 joinedWith: pages by: ...

?

!HadoopCollection categoriesFor: ‘map/reduce’!map: mapBlock thenReduce: reduceBlock...

Doesn’t Need

Raw performance

Extensive libraries

Concurrency/Async IO

Wide industry acceptance

Fast startup time

Doesn’t Need

Raw performance

Extensive libraries

Concurrency/Async IO

Wide industry acceptance

Fast startup time

Should Have

Lightweight functions/blocks

Dynamic OO

Process migration

Good debugging

Doesn’t Need

Raw performance

Extensive libraries

Concurrency/Async IO

Wide industry acceptance

Fast startup time

Should Have

Lightweight functions/blocks

Dynamic OO

Process migration

Good debugging

Doesn’t Need

Raw performance

Extensive libraries

Concurrency/Async IO

Wide industry acceptance

Fast startup time (JVM integration)

Doesn’t Need

Raw performance

Extensive libraries

Concurrency/Async IO

Wide industry acceptance

Fast startup time

Should Have

Lightweight functions/blocks

Dynamic OO

Process migration

Good debugging

(JVM integration)

?

Recommended