57
The Digital Mayflower Data Pilgrimage with the Migrate Module 1

Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Embed Size (px)

Citation preview

Page 1: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

The Digital MayflowerData Pilgrimage with the Migrate Module

1

Page 2: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Erich BeyrentWorking with Drupal since 2004!

Drupal: http://drupal.org/u/ebeyrent Twitter: http://twitter.com/ebeyrent Github: http://github.com/ebeyrent

2

Page 3: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

3

Page 4: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites

4

Page 5: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts

5

Page 6: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts • Implementation

6

Page 7: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts • Implementation • Classes & Organization

7

Page 8: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts • Implementation • Classes & Organization • Execution

8

Page 9: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts • Implementation • Classes & Organization • Execution • Performance

9

Page 10: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Agenda

• Prerequisites • Migrate Concepts • Implementation • Classes & Organization • Execution • Performance • Q & A

10

Page 11: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

11

Page 12: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

12

Page 13: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

13

Page 14: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

14

Page 15: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

15

• A working Drupal 7 / Drupal 8 installation • Basic understanding of PHP OO • Understanding of Drupal modules, hooks,

and the API • Understanding of how to write SQL

and the QueryInterface

Prerequisites

Page 16: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

16

Page 17: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Migrate Goals

17

• Accurate • Repeatable • Performant • Safe

Page 18: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Concepts

18

Source

The source is the interface to the data that is being migrated.

Page 19: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Concepts

19

Destination

Destinations are responsible for saving data in Drupal.

Page 20: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Concepts

20

Map Connects the source to the destination

Page 21: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Implementation

21

The ModuleImplements hook_migrate_api() Registers classes in the .info file

Page 22: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

22

; ** ; MY_MODULE.info ; ** dependencies[] = migratefiles[] = migrate_newsletters.module files[] = migrate_newsletters.migrate.inc files[] = base.inc files[] = newsletters.inc files[] = subscribers.inc files[] = sources.inc files[] = subscriptions.inc

Page 23: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

23

/** * Implements hook_migrate_api(). */ function MY_MODULE_migrate_api() { $api = array( 'api' => 2, 'groups' => array( 'newsletters' => array( 'title' => t('Newsletter'), ), ), 'field handlers' => array( 'DateMigrateFieldHandler', ), 'migrations' => array( 'Newsletters' => array( 'class_name' => 'NewslettersMigration', 'group_name' => 'newsletters', ), 'Sources' => array( 'class_name' => 'SourcesMigration', 'group_name' => 'newsletters', ), ), ); return $api; }

Page 24: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

24

/** * Implements hook_migrate_api(). */ function MY_MODULE_migrate_api() { $api = array( 'api' => 2, 'groups' => array( 'newsletters' => array( 'title' => t('Newsletter'), ), ), 'field handlers' => array( 'DateMigrateFieldHandler', ), 'migrations' => array( 'Newsletters' => array( 'class_name' => 'NewslettersMigration', 'group_name' => 'newsletters', ), 'Sources' => array( 'class_name' => 'SourcesMigration', 'group_name' => 'newsletters', ), ), ); return $api; }

Page 25: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

25

/** * Implements hook_migrate_api(). */ function MY_MODULE_migrate_api() { $api = array( 'api' => 2, 'groups' => array( 'newsletters' => array( 'title' => t('Newsletter'), ), ), 'field handlers' => array( 'DateMigrateFieldHandler', ), 'migrations' => array( 'Newsletters' => array( 'class_name' => 'NewslettersMigration', 'group_name' => 'newsletters', ), 'Sources' => array( 'class_name' => 'SourcesMigration', 'group_name' => 'newsletters', ), ), ); return $api; }

Page 26: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

26

/** * Implements hook_migrate_api(). */ function MY_MODULE_migrate_api() { $api = array( 'api' => 2, 'groups' => array( 'newsletters' => array( 'title' => t('Newsletter'), ), ), 'field handlers' => array( 'DateMigrateFieldHandler', ), 'migrations' => array( 'Newsletters' => array( 'class_name' => 'NewslettersMigration', 'group_name' => 'newsletters', ), 'Sources' => array( 'class_name' => 'SourcesMigration', 'group_name' => 'newsletters', ), ), ); return $api; }

Page 27: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

27

/** * Implements hook_migrate_api(). */ function MY_MODULE_migrate_api() { $api = array( 'api' => 2, 'groups' => array( 'newsletters' => array( 'title' => t('Newsletter'), ), ), 'field handlers' => array( 'DateMigrateFieldHandler', ), 'migrations' => array( 'Newsletters' => array( 'class_name' => 'NewslettersMigration', 'group_name' => 'newsletters', ), 'Sources' => array( 'class_name' => 'SourcesMigration', 'group_name' => 'newsletters', ), ), ); return $api; }

Page 28: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Implementation

28

The ClassesMigration classes extend the abstract Migration class

Page 29: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

29

abstract class NewslettersBaseMigration extends Migration { /** * @var array */ protected $sourceConfiguration = array(); /** * Class constructor. */ public function __construct($group = NULL) { // Always call the parent constructor first for basic setup. parent::__construct($group); $this->team = array( new MigrateTeamMember(‘Joe Migrate', ‘[email protected]', t('admin')), ); $this->issuePattern = 'https://example.com/issues/:id:'; $this->sourceConfiguration = array( 'servername' => 'localhost', 'username' => 'admin', 'password' => 'bubbles', 'database' => ‘source_database', ); }}

Page 30: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

30

abstract class NewslettersBaseMigration extends Migration { /** * @var array */ protected $sourceConfiguration = array(); /** * Class constructor. */ public function __construct($group = NULL) { // Always call the parent constructor first for basic setup. parent::__construct($group); $this->team = array( new MigrateTeamMember(‘Joe Migrate', ‘[email protected]', t('admin')), ); $this->issuePattern = 'https://example.com/issues/:id:'; $this->sourceConfiguration = array( 'servername' => 'localhost', 'username' => 'admin', 'password' => 'bubbles', 'database' => ‘source_database', ); }}

Page 31: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

31

abstract class NewslettersBaseMigration extends Migration { /** * @var array */ protected $sourceConfiguration = array(); /** * Class constructor. */ public function __construct($group = NULL) { // Always call the parent constructor first for basic setup. parent::__construct($group); $this->team = array( new MigrateTeamMember(‘Joe Migrate', ‘[email protected]', t('admin')), ); $this->issuePattern = 'https://example.com/issues/:id:'; $this->sourceConfiguration = array( 'servername' => 'localhost', 'username' => 'admin', 'password' => 'bubbles', 'database' => ‘source_database', ); }}

Page 32: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

32

abstract class NewslettersBaseMigration extends Migration { /** * @var array */ protected $sourceConfiguration = array(); /** * Class constructor. */ public function __construct($group = NULL) { // Always call the parent constructor first for basic setup. parent::__construct($group); $this->team = array( new MigrateTeamMember(‘Joe Migrate', ‘[email protected]', t('admin')), ); $this->issuePattern = 'https://example.com/issues/:id:'; $this->sourceConfiguration = array( 'servername' => 'localhost', 'username' => 'admin', 'password' => 'bubbles', 'database' => ‘source_database', ); }}

Page 33: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Anatomy of a migration class

33

Migration classes typically contain four methods: • __construct() • prepare() • prepareRow() • complete()

Page 34: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

34

class NewslettersMigration extends NewslettersBaseMigration { /** * Class constructor method. * * @param array $arguments */ public function __construct(array $arguments = array()) { parent::__construct($arguments); $this->description = t(‘Newsletters’); $options = array( 'track_changes' => 1 );

Page 35: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

35

class NewslettersMigration extends NewslettersBaseMigration { /** * Class constructor method. * * @param array $arguments */ public function __construct(array $arguments = array()) { parent::__construct($arguments); $this->description = t(‘Newsletters’); $options = array( 'track_changes' => 1 );

Page 36: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

36

// List of source fields to import. $source_fields = array( ‘email’ => t(‘Subscriber email address’), // … ); $select_query = Database::getConnection('default', ‘legacy') ->select('profile', 'p'); $select_query->fields('p', array()); $select_query->orderBy(‘email’);

$count_query = Database::getConnection('default', ‘legacy') ->select('profile', 'p'); $count_query->fields('p', array()); $count_query->orderBy(‘email’); $count_query->addExpression(‘COUNT(email)’, ‘count’);

$this->source = new MigrateSourceSQL( $select_query, $source_fields, $count_query, $options );

Page 37: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

37

$this->destination = new MigrateDestinationTable(‘subscribers’);

/* * Other examples of destination: * * $this->destination = new MigrateDestinationNode(‘newsletter'); * * $this->destination = new MigrateDestinationTerm(‘advertising_source'); * */

Page 38: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

38

// Source and destination relation for rollbacks.$this->map = new MigrateSQLMap( $this->machineName, array( 'id' => array( 'type' => 'int', 'not null' => TRUE, 'description' => 'Newsletter ID.', 'alias' => 'n', ) ), MigrateDestinationNode::getKeySchema());

Page 39: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

39

// Source and destination relation for rollbacks.$this->map = new MigrateSQLMap( $this->machineName, array( 'member_id' => array( 'type' => 'varchar', 'length' => 128, 'not null' => TRUE, 'description' => 'Subscriber Email ID.', 'alias' => 'member_id', ), ), MigrateDestinationTable::getKeySchema('subscribers') );

Page 40: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

40

// Source and destination relation for rollbacks.$this->map = new MigrateSQLMap( $this->machineName, array( 'Source_ID' => array( 'type' => 'int', 'not null' => TRUE, 'description' => 'Source ID.', 'alias' => 't', ), ), MigrateDestinationTerm::getKeySchema());

Page 41: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

41

// Basic field mapping. $this->addFieldMapping('email', 'email') ->description(t('Map the subscriber email from the source’));

// More advanced field mapping. $this->addFieldMapping('uuid', 'uuid') ->description(t('UUID for the subscriber’)) ->defaultValue(uuid_generate()) ->issueGroup(t('Client questions')) ->issueNumber(765736) ->issuePriority(MigrateFieldMapping::ISSUE_PRIORITY_OK);

// Getting fancy - convert string to an array. $this->addFieldMapping(‘field_tags', ‘tags') ->separator(‘,’);

// Process the field with a callback. $this->addFieldMapping(‘json_field’, ‘json’) ->callback(‘json_decode’);

// De-duplicate data. $this->addFieldMapping('name', 'username') ->dedupe('users', 'name');

Page 42: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

42

// Map the field to a value from another migration. $this->addFieldMapping('uid', 'author_id') ->sourceMigration('Users');

Map fields to previously migrated fields

Page 43: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

43

// Unmapped source fields. $this->addUnmigratedSources( array( 'first_name', 'last_name', )); // Unmapped destination fields.$this->addUnmigratedDestinations( array( 'eid', ));

Skip source and destination fields

Page 44: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

44

// Define the highwater field. $this->highwaterField = array( 'name' => 'last_changed', 'alias' => 'w', 'type' => 'int', );

Define the highwater field

Page 45: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Anatomy of a migration class

45

• prepare($entity, stdClass $row) This method is called just before saving the entity.

• complete($entity, stdClass $row) This method is called immediately after the entity is saved.

Page 46: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Anatomy of a migration class

46

• prepareRow(stdClass $row) This method is for conditionally skipping rows,and for modifying fields before other handlers

• createStub($migration, array $id) This method is for creating stub objects to be referenced by the current row

Page 47: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Anatomy of a migration class

47

source: http://alexrayu.com/blog/drupal-migration-onion-skin-or-node

Page 48: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Migration flow

48

Iterator prepareRow()

Mappings & field handlers

prepare()

complete()

Page 49: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Execution

49

Use drush, not the web interface https://www.drupal.org/node/1806824

Run a single migration:drush migrate-import Subscribers

Run all the migrations in a migration group:

drush migrate-import --group newsletters

Page 50: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Performance

50

Benchmarking

$ drush mi Migration --instrument [timer, memory, all]

$ drush mi Newsletters \ --limit="100 items" \ --instrument=“all" \ --feedback=“30 seconds"

Page 51: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Performance

51

Database indexes add a lot of overhead on writes, especially with multiple multi-column indexes

db_drop_index($table, $index) db_add_index($table, $index, array $fields)

Page 52: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

52

class MyMigration extends MigrationBase {

/** * Code to execute before first article row is imported. */ public function preImport() { parent::preImport(); $this->dropIndexes(); }

/** * Code to execute after the last article row has been imported. */ public function postImport() { parent::postImport(); $this->restoreIndexes(); }

protected function dropIndexes() { db_drop_index('my_table', 'name_of_index'); return $this; }

protected function restoreIndexes() { db_add_index('my_table', 'name_of_index', array('name_of_field')); return $this; } }

Page 53: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Performance

53

MySQL Tuning

innodb_flush_log_at_trx_commit = 0 innodb_doublewrite = 0 innodb_support_xa = 0

Page 54: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Performance

54

MySQL Tuningkey_buffer_size = 128M max_allowed_packet = 20M query_cache_size = 128M table_open_cache = 64 read_buffer_size = 2M read_rnd_buffer_size = 8M myisam_sort_buffer_size = 16M join_buffer_size = 4M tmp_table_size = 92M max_heap_table_size = 92M sort_buffer_size = 4M innodb_additional_mem_pool_size = 8M

Page 55: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Performance

55

MySQL Tuning

innodb_file_per_table

Page 56: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Resources

56

Project page - http://drupal.org/project/migrate Migrate Handbook - http://drupal.org/node/415260

Drush migrate commands - http://drupal.org/node/1561820 Using stubs - http://drupal.org/node/1013506 Database Tuning - http://www.drupal.org/node/1994584

Page 57: Digital Mayflower - Data Pilgrimage with the Drupal Migrate Module

Questions & Answers

57