Custom Preprocessing

In Drupal 7 there were numerous ways to preprocess data prior to migrating values. This was especially useful when inconsistencies in the data source needed to be addressed. One example I have dealt with in the past was migrating email addresses from an old system that didn’t check for properly formatted emails on its user fields. These errors would produce faulty data in the new Drupal site. I preprocessed the field data for the most common mistakes which reduced the amount of erroneous addresses brought over.

Empty fields that are required by Drupal, converting zeros and ones to no or yes, setting a published state based on a timestamp, and more can be handled via custom preprocess logic.

prepareRow

The prepareRow() function still exists in Drupal 8 but has been moved into process plugins. This is in keeping with Drupal 8’s separation of the migration workflow from a single large class into smaller plugins that handle data.

In order to use this functionality, create a custom class that extends the source process you’re using. Our example was pulled from a recent project which accessed a JSON feed of category data, complete with extra fields and parent term data. In order to read the JSON data the contributed module Migrate Plus should be installed to use the provided Url class.

Our custom class could look something like the following:

use Drupal\migrate_plus\Plugin\migrate\source\Url;
use Drupal\migrate\Row;

/**
* Source plugin for retrieving data via URLs.
*
* @MigrateSource(
*   id = "custom_process_plugin"
* )
*/

class CustomProcessPlugin extends Url  {

 /**
  * {@inheritdoc}
  */
 public function prepareRow(Row $row) {
   // Set the parent uuid on the term itself.
   if ($row->hasSourceProperty('parent')) {
     $parent = $row->getSourceProperty('parent');
     $parent_uuid = substr($parent['url'], strrpos($parent['url'], '/') + 1);
     $row->setSourceProperty('parent_uuid', $parent_uuid);
   }
   return parent::prepareRow($row);
 }
}

In the above example, the source feed had a value called parent which was an array containing the name and a URL to the parent term. The URL contained the UUID from the external system which uniquely determined each term (parent or otherwise). This UUID value was always at the very end of the URL. So we extracted that value and set it as parent_uuid in the source data. It is important to note that, in our case, this value did not exist in the original JSON data. In this way, custom field values can be added to the source data as needed. The setSourceProperty function doesn’t require that the property being written exist in the source. So you have the freedom to either overwrite the values passed in by existing properties or create new properties on the fly.

In our case, we wanted to extract the UUID of a parent term in order to preserve the relationship between the parent and child taxonomy terms being created. This value is used as a placeholder and is discussed in a previous blog post.

For the final step, the migrate YAML file needs to be updated to use this custom class. The machine name for plugin was taken from the id in the docblock on our custom processor.
 

source:
 plugin: custom_process_plugin
 data_fetcher_plugin: http
 data_parser_plugin: json
 urls: 'http://url_of_json_source'

And that should do it! If the preprocessing you’re doing in the prepareRow function is something needed for all incoming data, you can reuse this plugin at any time. Or make new ones as necessary.

More Reading

Registering Migrations in Drupal 8
Migrating Content References in Drupal 8

Modules and Documentation

The following are helpful contributed modules:

  • Migrate Tools – provides the missing drush commands for Drupal 8 that were available in Drupal 7 and a limited UI.

  • Migrate Plus – provides additional source parsers (such as JSON feeds, HTTP, and other dynamic sources)

The Drupal.org documentation team has created a lot of great documentation on Migrate and the various process plugins available.