BADCamp 2008 DB Sync

Preview:

DESCRIPTION

http://badcamp.net/session/database-synchronization

Citation preview

Database Synchronization

Shaun HaberWarner Bros. Records

What is it?

• Merging content between a dev site and a production site

Disclaimer

• No single answer

• No “Drupally” solution

• Not exclusive to Drupal

• Not magic

Who the hell am I?

Warner Music Group

Warner Bros. Records

• Subsidiary of Warner Music Group

• Family of labels (Reprise, Sire, etc.)

• Over 100 artists

• Top-selling albums

• It’s music biz after all!

So what?

WBR Tech

• Only label with an in-house Tech team

• “Start-up” mentality

• Fast-paced, hectic, and fun!

• We use Drupal... religiously

93 Drupal Sites1 new site every week

Launching like crazy!

Source: http://flickr.com/photos/krosinsky/2848288562/

Web sites in the wild!

Websites in the wild

• Always collecting new data!

Time

Data Launch

Not a bad thing, obviously

• Want websites to grow

• More users + more data = PROFIT

But...

• How do we keep the site updated?

- New content

- New features

- Code fixes

- <insert your own update here>

Source: http://flickr.com/photos/nimboo/132386298

Minor updates

Major updates

Minor Updates

• CSS tweak

• template.php change

• Add a new Block

• Change settings on a View

• Install a new module

Major Updates

• Schema changes

• Information re-architecture

• Significant configuration changes

• User flow changes

• New theme integration

Maintain a separate Dev site!

Strategy?

Time

Dev server

Prod server

New

Time

Dev server

Prod server

New QA

Time

Dev server

Prod server

New QA

Prod

Dev

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

?

Syncing Databases Sucks

Code Easy

Files Easy

Database Hard

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Time

Dev server

Prod server

New QA

Prod

Dev

Prod

Dev

Prod 2.0

Order of Events

1. Develop a new site

2. Launch site

3. Take snapshot of prod site

4. Develop on snapshot

5. Magic? => Relaunch new version of site

But it’s not Magic!1. Take dev site down

2. Shift sequenced IDs on Dev

3. Take prod site down

4. Merge content from Prod to Dev

5. QA “new” dev site

6. Copy dev site to prod site

7. Bring “new” prod site live

It’s Database Surgery!

Source: http://flickr.com/photos/interplast/6339098/

2 Step Process

• Step 1 - Shift Sequenced IDs

• Step 2 - Merge content

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

4

5

4

5

6

1

2

3

1

2

3

10

11

4

5

6

1

2

3

1

2

3

10

11

4

5

6

4

5

6

1

2

3

1

2

3

10

11

4

5

6

4

5

6

1

3

1

2a

3

10

11

4

5

6

4

5

6

2a

1

3

1

2a

3

10

11

4

5

6

4

5

6

2a

Step 1 - Shifting IDs

• comments_cid

• files_fid

• node_revisions_vid

• node_nid

• users_uid

Need to know

• Highest common ID between Dev and Prod

• Delta value to shift

• Reference of known tables and fields

Highest Common ID

• Top item on the “stack” at time of the snapshot.

1

2

3

1

2

33

Delta value

• Amount to shift the conflicted items, with extra padding

3

10

11

7

UPDATE tableSET id = id + $delta WHERE id > $common

And that’s it for Step 1

Actually, it’s MUCH more complicated...

What tables have nid?comments.nidcontent_field_* nid.field_*_nidcontent_type_* nid.field_*_nidfiles.nidforum.nidforward_log.nidhistory.nidnode.nidnode_access.nidnode_comment_statistics.nidnode_counter.nidnode_revisions.nidnodefamily.parent_nid, child_nidpanels_node.nid

poll.nidpoll_choices.nidpoll_votes.nidterm_node.niduc_cart_products.niduc_order_products.niduc_product_features.niduc_products.niduc_roles_products.nidusernode.nidwebform.nidwebform_component.nidwebform_submissions.nidwebform_submitted_data.nid

Also...

• Special tables:

• location, sequences, url_alias, etc.

• node-nid.tpl.php

• Serialized PHP variables in DB

• PHP code in DB

• URLs in DB or elsewhere (e.g., /node/123)

Well shit!

Do the best we can!

• Reference of all known tables

• Reference of all known sequence fields

• Reference of all known “special cases”

• Automate as much as possible

Scripting Time!

Check for unknown tables

$rs = db_query(“SHOW TABLES”);

while ($row = db_fetch_row($rs)) { if (!is_known_table($row[0]) { log_unknown_table($row[0]); }}

if (found_unknown_tables()) { print_unknown_tables(); exit;}

Store all known tables in a txt file

accessaccesslogaudio_widget_thumbnailaudio_widget_trackauthmapblocksblocks_rolesboxesbuddylistbuddylist_buddy_group

buddylist_groupsbuddylist_pending_requestscache*commentscontactcontent_field_*content_type_*devel_queriesdevel_times...

Store all fields in separate txt files

comments.nidcontent_field_* nid.field_*_nidcontent_type_* nid.field_*_nidfiles.nidforum.nidforward_log.nidhistory.nidnode.nidnode_access.nid

node_comment_statistics.nidnode_counter.nidnode_revisions.nidnodefamily.parent_nid, child_nidpanels_node.nidpoll.nidpoll_choices.nidpoll_votes.nid...

Now we can shift IDs!

• Iterate thru DB tables

• If table has known fields, shift IDs (remember that SQL command?)

• Rinse and repeat for each sequenced ID

UPDATE tableSET id = id + $delta WHERE id > $common

Special Cases

Sequences table

• Simply reset the value to new highest ID

• Do this after shifting IDs in the “primary” table (node.nid, user.uid, etc.)

UPDATE sequencesSET `$seq` = $max

Location table

• Stores ID val in column `eid`

• Stores sequence type in column `type`

• type = node, user

UPDATE location SET `eid` = `eid` + $delta WHERE `eid` > $commonAND `type` = $type

Url_alias table

• ID values are embedded as strings

• Use pattern matching to parse the ID

• node: node/nid

• user: user/uid, blog/uid

• Add the delta, update new alias

Pseudo-code

SELECT * FROM url_alias WHERE src LIKE ‘node/%’

preg_match('/node\/([0-9]*)/', $src, $matches)

$id = $matches[1]

$id = $id + $delta

UPDATE url_alias SET src = 'node/$id' WHERE pid = $pid

Manually

• Rename any node-nid.tpl.php files

• Search for ID vals in DB:

• Eval’ed PHP code

• Serialized PHP code

• URLs

• anything else?

Step 1 Recap

• Maintain indexes for tables and fields

• Automate using the indexes

• Review indexes before each shift

• Inspect for manual cases after each shift

• Document every new case you find!

At least most of this can be automated!

Step 2 - Merging Content

Merging Content

3 3

10

4

5

6

4

5

6

What to merge?

• Content

• Really, just the content

• No variables, settings, etc.

Need to know

• Highest Common ID (same from Step 1)

• Reference of tables

Process

• Iterate thru Prod tables:

• Skip

• INSERT IGNORE (I)

• REPLACE (R)

• DROP and INSERT (A)

Special Cases

• Url_alias table

• Sequences table

• Some nodes

Url_Alias table

• Don’t go by pid

• REPLACE INTO url_alias SET src = '$src', dst = '$dst'

Sequences table

• Manually inspect sequence values!

Node timestamps

• Get timestamp of Highest Common nid

• Check for older nodes on Prod that have been modified recently

SELECT nidFROM nodeWHERE changed > $timestampAND nid > $common

Replace on Dev with

That’s it... for now.

Future

• Share sequences table between Dev and Prod

• Even/odd IDs (Drupal 6+)

• Macro recordings and playbacks

Questions?

• Shaun Habershaun.haber@wbr.com

http://srhaber.comTwitter: @srhaber

Recommended