An Introduction to Stubs

A common need in our projects is the ability to migrate data that references other content. This can be in the form of taxonomy hierarchy (i.e. parent > child relationship) or content that is attached such as images, videos, or other nodes that exist as standalone entities in the system.

When importing referenced content, there is usually no guarantee that the parent entity will exist prior to the child entity being created. Without the use of stubs, if content that does not yet exist in the destination site is referenced, the connection will not be made between the two entities. This would lead to having to redo the migration to pick up these missing relationships. Depending on the size of the data set this extra work would essentially double the time required to complete a migration.

This “chicken and egg” concept is covered by the use of stubs, or placeholders, that will be referenced in the system until the full entity can be imported. For more detail on the chicken and egg structure, see Chickens and eggs: using stubs on Drupal.org.

Stubs in Drupal 8

For our examples, we’re going to look at the most common hierarchical relationship in Drupal sites: Taxonomy Terms.

In Drupal 7, the creation of stubs could be added onto any migration in the following way:

class CustomMigration {
  ... Normal migration definitions here ...
  protected function createStub(Migration $migration, array $source_id) {
    $term = new stdClass();
    $term->name = t('Stub for @id', array('@id' => $source_id[0]));
    $vocabulary = taxonomy_vocabulary_machine_name_load($this->destination->getBundle());
    $term->vid = $vocabulary->vid;
    taxonomy_term_save($term);
    if (isset($term->tid)) {
      return array($term->tid);
    }
    else {
      return FALSE;
    }
  }
}

In the above example, any migration that listed “CustomMigration” as its source would use the terms imported by CustomMigration and create a new one if the term didn’t already exist. I find it helpful to name my stubs in a standard way. This makes it easier to find any stubs that failed to be updated when the full term record is imported.

In Drupal 8, this looks very different.

In the YAML file which controls this migration, under “process”, the standard taxonomy fields should look something like this:

  vid:
    plugin: default_value
    default_value: your_vocabulary_id
  title:
    -
      plugin: get
      source: name
    -
      plugin: default_value
      default_value: Placeholder Term
  name:
    -
      plugin: get
      source: name
    -
      plugin: default_value
      default_value: Placeholder Term
  parent:
    plugin: migration
    migration: your_migration_id
    source: your_migration_identifier

Plugin Declarations Explained

Under vid, we are setting a default value of “your_vocabulary_id” which should be set to the machine name of the vocabulary where the term should be imported.

For title and name, we first declare the “get” process plugin which is the default method used by migrate to import data. It defines a one-to-one pairing between mapped values. Normally, these mappings are defined using the short method of field_name: imported_field_name. When creating stubs it is important to use the long declaration since we are defining fallback plugins.

The fallback plugin is “default_value” which accepts a string and writes it to the name and title fields. When this migration is run, if the system is creating a stub the name and title will naturally not be known at that time. Since they won’t be available right away, these settings will instead write “Placeholder Term” as the name and title of the imported term.

At this point, the record is marked as needing to be updated and Migrate will automatically attempt an update on the record when the full term is finally handled. Once the full term is imported, these placeholder values will be updated along with the rest of the data being mapped onto the term.

This isn’t only for default taxonomy fields! It can also be used for custom fields created against the taxonomy term.

The final entry, parent, is what actually triggers this stub logic. It defines the “migration” process plugin which will attempt to retrieve the value for parent from the data already imported by the migration. If that fails, it will create a stub utilizing the fallback logic defined on the fields in the migration. At that time any field with fallback logic defined will be generated and the new stub term saved.

This is similar to the field mapping logic from Drupal 7:

$this->addFieldMapping('parent', 'your_migration_identifier')
  ->sourceMigration('your_migration_id');

Pitfalls and Performance

When dealing with data, there is always the potential for bad data and outdated references being imported. What if the external data being imported had the parent term deleted, but none of the child terms took on that change? That would lead to a stub being created and never updated.

As with any migration, it is important to review and review often. Once a full migration has been created and executed, comb through looking for any issues that might have arisen. While stubs are powerful, they can also require more processing time, especially in the beginning when data is first being imported.

Depending on the order in which the data is imported, and how deep the references go, the migration could end up creating many times more terms as it creates stubs to fill in the references required by the terms. This is normal when using stubs. The command line output, in that case, can look something like:

Processed 263 items (230 created, 33 updated, 0 failed, 0 ignored) - done with 'subjects'

Modules and Documentation

The following are helpful contributed modules:

  • Migrate Tools – provides the missing drush commands for Drupal 8 that were available in Drupal 7 and a limited UI.

  • Migrate Plus – provides additional source parsers (such as JSON feeds, HTTP, and other dynamic sources)

The Drupal.org documentation team has created a lot of great documentation on Migrate and the various process plugins available.