Prophet - assets.en.oreilly.comassets.en.oreilly.com/1/event/12/Prophet, your path out of the... ·...

Preview:

Citation preview

Propheta path out of the cloud

http://syncwith.us

jesse@bestpractical.com1

You may know me from...

RT (Request Tracker)

Jifty

SVK

Hiveminder

Perl 6

Shirts

2

I’ve been hacking on an open source database

called “Prophet”

3

It has an API like Amazon SimpleDB or Google App Engine’s...

4

It’s designed for “team-scale” apps

5

It’s built for P2P replication and

disconnected use

6

App #1 is the canonical “offline bug tracker”

7

App #2 will probably be a BBS you can sync

over sneakernet

8

But first, a brief digression...

9

...about cloud computing

10

Living in the cloud =

sharecropping

11

(That’s bad)

12

This is a rant

13

The bad old days:

14

Pic of sharecroppers

15

You farmed land you didn’t own...

16

...with tools you couldn’t really afford

17

You paid for it with part of your harvest...

18

It sounded like apretty sweet deal...

19

...until things got bad

20

(Things always got bad)

21

In a bad year, you got further in debt tothe land owner

22

23

The (more recent)bad old days:

24

pic of mainframes

25

You ran code you didn’t own on hardware you

didn’t own

26

Things got a little better:

27

Pic of PCs

28

Things weren’t all rosy:

29

Pic of BSOD

30

Sometimes new versions of software

killed features...

31

...so you were locked in to old versions

32

pic of win 31?

33

Things got ‘better’:

34

rmsche

35

Now, things are getting worse again...

36

37

What happens when your favorite service

goes down?

38

pic of twitter being down

39

...or stops accepting new signups?

40

41

...or gives all your data to the secret police?

42

Pic of yahoo.cn

43

...or starts making arbitrary choices about what’s ‘safe’ content?

44

45

You don’t own the services you use

46

When the service provider cuts you off, that’s it. No recourse.

47

Not so secret shame:I’m a really bad zealot

48

My calendar lives at google.com

49

50

I make a web 2.0 tasklist service called

Hiveminder.com

51

pic of hiveminder

52

Using hosted apps is going to hurt you

53

Data access is important

54

APIs are great

55

...but easy access to a service just makes it

easier to get locked in

56

What about Google Gears, Adobe Air, etc?

57

Great. now you can use your word processer while you’re offline!

58

Pic of wordperfect

59

Real offline apps shouldn’t need servers

60

Real offline appsshould sync like you do

61

I might be a nut job

62

...but smart people seem to agree with me

63

If we want people to have the same degree of user autonomy as we've come to expect from the world, we may have to sit down and code alternatives to Google Docs, Twitter, and EC3 that can live with us on the edge, not be run by third parties.

- Danny O’Brienhttp://www.oblomovka.com/entries/2008/07/16

64

Back to that database thing...

65

Jesse Vincent

66

Chia-liang Kao

67

We work together

68

CL lives in TaipeiJesse lives in Boston

69

Sometimes we needto work face to face

70

TPE - BOS:TPE - HNL:BOS - HNL:

9410 mi5,095 mi5,069 mi

71

Step 1: Go to Hawaii for “work”Step 2: ???Step 3: Prophet!

Our Plan

72

The Plan Backfired

We were there for 8 days

We wrote 8000 lines of Perl

We figured out step 2

73

Step 2:

Build a Disconnected Syncable Database

74

Prophet

75

A grounded, semirelational,

peer to peer replicated,

disconnected, versioned,

property database with

self-healing conflict resolution

Prophet

77

What do all thosebuzzwords mean?

78

grounded

Runs here

79

grounded

Not here

80

grounded

Runs at the edge

Doesn’t need to run in the cloud

Syncs with services you already use

(We call the adaptors “Foreign Replicas”)

81

Joins are expensive

(They’re still possible)

semirelational

82

Update any replica

Pull from any replica

Push to any replica

Publish a replica

Changes will propagate

peer-to-peer replicated

83

Real-time replication is hard to scale

It only “works” with constant connectivity

I don’t have constant connectivity

Neither do you

Prophet sync can happen whenever

disconnected

84

Every update is recorded as a change set

Change sets don’t lose any data

(so you can use them to go backwards)

All history is introspectable

Replication just replays changesets

versioned

85

Atomic operations

CREATE, READ, UPDATE, DELETE, SEARCH

Record types can have optional validation and canonicalization

Records of the same type do not need to have the same properties

Add and remove properties at will

property database

86

Remembers all conflict resolutions

Syncs all resolutions with your peers

Detects identical conflicts

Uses your peers’ resolutions to “vote” for the winner of a conflict

self-healing conflict resolution

87

Working with Prophet

88

RESTy API

GET /records.json

GET /records/Cars.json

GET /records/Cars/716499-5F9-4AC4-827.json

GET /records/Cars/716499-5F9-4AC4-827/wheels.json

POST /records/Cars.json

POST /records/Cars/716499-5F9-4AC4-827.json

POST /records/Cars/716499-5F9-4AC4-827/wheels.json

89

RESTy API

Yes, we should be using PUT and DELETE

Yes, you can have a commit bit and help us fix it :)

90

Native API(Yes, the core is Perl.)

my $cli = Prophet::CLI->new();

my $cxn = $cli->app_handle->handle;

my $record = Prophet::Record->new( handle => $cxn, type => 'Person' );

my $uuid = $record->create( props => { name => 'Jesse', age => 31 } );

$record->set_prop( name => 'age', value => 32 );

my $people = Prophet::Collection->new( handle => $cxn, type => 'Person' );

$people->matching( sub { shift->prop('species') ne 'cat' } );

91

What could you build with Prophet?

92

A bug tracker: “simple defects”

• id. Status, Summary

• (Arbitrary other properties too)

•History

•Comments

•Attachments

sd

93

./bin/sd ticket create -- summary="Can't sync sd with Google Code" status=new

Created ticket 5 (93BF979E-08C1-11DD-94C3-D4B1FCEE7EC4)

Create

94

./bin/sd ticket search --regex publish

29 } new the online help doesn't describe publish

34 } new publish a static html view of records

35 } new publish should create a static rss file

List and Search

95

./bin/sd ticket update --uuid 93BF979E-08C1-11DD-94C3-D4B1FCEE7EC4 -- status=resolved

Updates

96

Bugs on my laptop aren’t interesting.

97

Jesse

sd publish --to fsck.com:public_html/sd/

CL

sd pull --from http://my.com/~jesse/sd

Sync!

98

My project has a bug tracker

99

Actually, mine use two:

• RT

• hiveminder.com

My project has a bug tracker

99

Foreign Replicas

Prophet makes Foreign Replicas easy

SD gets them "for free"

100

(Using only the public REST API)

It took an afternoon

Mirror an RT instance into SD

Share it with your peers using prophet

Sync changes back from your peers to RT

Supports Comments and Attachments

Wrote an RT Replica for SD

101

(Using only the public REST API)

...and one for Hiveminder

102

I can sync my bugs with RT or Hiveminder

103

Actually, it’s better

104

I can sync between RT and Hiveminder

105

I can sync between two different RTs, too

106

• Trac

• Launchpad

• Google Code

• SourceForge

• Bugzilla

• Jira

• GForge

• debbugs

• GNATS

• todo.txt

• Lighthouse

• Redmine

• FogBugz

• What else?

We need more replica definitions:

107

What else can you use Prophet for?

108

All your “social” databases

109

•CRM

•Bug tracking

•Sales orders

•Phone book

•Blog

•Trading Card Database

•Ideas?

All the databases you want while offline.

110

How about a P2P BBS?

Prophet doesn’t need a server.

You can sync over sneakernet.

“Private” Social Networks

111

A look inside Prophet

112

Anatomy of a Prophet Replica

113

The bits and pieces

Database UUID

Replica UUID

Record Store

Changeset Store

Resolution Database

Configuration metadata

114

The Record Store

Stores individual records by type

Not guaranteed to have all old versions

115

The Changeset Store

Stores every change to a set of records

Guaranteed to have all old changesets

Replaying all changesets will create an exact clone of the replica

116

Replica Backends

117

Filesystem

Readable

Flat files

Compact

Fast

(Not yet fully atomic)

118

HTTP

Designed to let you “publish” databases

Flat-files, Currently read-only.

Same format as the filesystem replica type.

119

Subversion (DEPRECATED)

Slow

Steady

Robust

Supports remote sync

Requires Subversion Perl Bindings

120

Backends are pluggable!

The filesystem is cheap and easy

The filesystem is portable

Help us write new backends:

CouchDB, SQLite, MySQL, Postgres, S3, AppEngine, $YOUR_FAVORITE_DB

121

Prophet is designed to sync with “other” databases and systems

They don’t need to support all of Prophet’s features - Prophet knows how to interpret mumbo-jumbo from the Cloud

Foreign Replicas will usually be app specific

All current examples are for SD

Foreign Replicas

122

Synchronization

123

Publish

Serialize and export all of a replica's resolutions and changesets

124

Pull

Integrate unseen resolutions and then unseen changesets from a replica

125

Push

Integrate new resolutions and changesets into a replica

126

Conflicts

127

Figures out the best resolution

“Nullifies” the conflict so the changeset can be cleanly integrated

Integrates the conflicting changeset

Records the resolution as a new changeset

Records the resolution decision in the resolution database

Resolving Conflicts

128

Prophet has clever ways to figure out the best resolution.

If there are previous resolutions for the same conflict and a majority agree, use that

If the merger has specified a “prefer this side” choice, use that

Prompt the user to make a decision, giving them info about previous decisions for this conflict

“The Best Resolution”

129

Scaling

130

Scaling to giant clusters is boring

(Can I play the “They’re not Green” card here?)

Scales to many weakly coonnected peers

You are not Google.

Does anyone here work for Google?

Current target is databases of O(50k) records

How does it scale?

131

We have a political agenda.

Cloud computing is not Open Source.

APIs for “export” are not good enough.

You should always have full control.

You probably don’t need to store 10 billion records in one database.

Why not, then?

132

Do you have 10 billion bugs, customer contacts

or sales orders?

133

That said, we'd love to see a scalable, high

performance prophet replica store

134

Getting Involved

135

Project Status

Simple, well-defined Perl API

RESTy web API (with microserver)

Fast, lightweight backend

Small, active dev community

Great test coverage

...less than great documentation coverage

136

Better ergonomics

Improved search and indexing

(Including full-text indexing)

Client libraries for other languages

Proper security model

More apps

Our Plans

137

Prophet

6937 lines of code and doc

1952 lines of tests

sd

2121 lines of code and doc

973 lines of tests

Codebase

138

Prophet is very young

Prophet designed in April

Prophet core implemented in April

SD designed in April

SD built in June and July

139

We need your help!

Kick-ass functional and text indexing

Backend data store improvements

Slick GUIs for syncing

More Foreign Replicas for SD

Documentation improvements

A clever logo

New applications

140

Prophet

http://syncwith.us/prophet/download

SD

http://syncwith.us/sd/download

Getting Prophet

141

http://syncwith.us

prophet-subscribe@lists.bestpractical.com

#prophet on freenode IRC

Thanks!

142