Creating Content with YAML Content Module

In my previous post, I introduced the YAML Content module and described the goal and usage for it at a high level. In this post, I aim to provide a more in-depth look at how to write content for the module and how to take advantage of a couple of the more advanced options included.

Import Data Structure: Creating a Node

To start with we'll have a look at creating a basic Node to explore the data structure being used. The following is an example of YAML content that could be included directly in a content file for import assuming a matching Node type with fields exists in the database. For these examples, this may be achieved in a basic Drupal install with the Standard profile.

# Add a basic article page with simple values.
- entity: "node"
  type: "article"
  title: "Basic Article"
  status: 1
  # Rich text fields contain multiple keys that must be provided.
  body:
    - format: "basic_html"
      # Using a pipe we can define content across multiple lines.
      value: |
        <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vobis
        voluptatum perceptarum recordatio vitam beatam facit, et quidem corpore
        perceptarum. Tum Quintus: Est plane, Piso, ut dicis, inquit.</p>
        <p>Primum cur ista res digna odio est, nisi quod est turpis? Duo Reges:
        constructio interrete. Rhetorice igitur, inquam, nos mavis quam
        dialectice disputare?</p>

Using this structure, any entity defined in Drupal may be created with or without field values populated. An entity is flagged for creation using the entity key where the value is the machine name of the entity type to be created. The rest of the values listed at that level are either property or field names at the entity level. These property names will vary based on the specific entity type being created, but most true properties will be assigned values directly corresponding one-to-one with the property name.

Populating fields gets a bit more complex, but the body key is an example of this. The body field is a special case in that the name of the field does not follow the standard naming convention of custom fields, typically field_*, but regardless, it passes through the field architecture and is populated in the same way.

Within the field name property, we assign an array of field item values. This array is indicated by the - prefixing the next level of values where each specific field item is indicated in this way. Depending on the type of field being populated, the properties assigned to each field item may vary. In the case of rich text fields, like the body field, there are two properties: value and format. At this field item level we assign these properties in the same way as at the entity level.

How to Determine a Data Structure

To understand and map the entity data structure, it is important to understand the hierarchy of classes it represents.

ContentEntityBase
- FieldItemList
  - FieldItemBase

Within this structure, the top level of data containing the entity key maps to the Entity object being created. Most of the properties to be assigned at this level may be found within the entity_keys of the entity type definition. A look at the plugin annotation for the Node class can give some insight into this. When looking at these keys, it's important to note that the properties to use in your data files should be the one specific to the entity type being created. That is to say, your property keys should match the values from the entity_keys array, and not the keys.

The next level below the entity is where we've indicated a field name for assignment. Using the architecture described above, this field name added as a property navigates us to the level of a FieldItemList. Loosely, the FieldItemList can be thought of as an array of values assigned to a specific field. Even in the case of single-value fields, if it uses the Typed Data API it passes through a FieldItemList class. While this may be confusing at first, this is actually very fortunate since it allows reference to, and assignment of, all field values using the same structure. To map through this layer, each field item to be assigned to a field must be contained within an array. In YAML this is represented by prefixing each item with a -.

At the individual field item level, we're mapping more specifically to an extension of the FieldItemBase class. At this level, the properties available for assignment may become more specialized. Examples of this include rich text fields (TextItem), entity reference fields (EntityReferenceItem), and link fields (LinkItem). While the property keys needed for assignment of each of these field types may vary, it is possible to identify the keys by determining the FieldItem class corresponding to the field being assigned. Once the FieldItem class is identified, inspecting the propertyDefinitions() method will describe the properties for assignment.

Advanced Value Assignments

More advanced value assignments may be used throughout content to leverage some of the utilities built into the YAML Content module to create more dynamically interconnected or enriched content.

Processing Functions

Processing functions may be used throughout content being imported to provide dynamic content values to be populated during the import process. At the time of this writing, the following processing callbacks are available for use within field items:

Reference
- Query for an entity ID to populate as the target ID in an entity reference field.
File
- Query for an existing file by file name, and upload the asset if it doesn't exist.

Entity References

Nested Content Creation

Referencing other content can be done in a couple of ways. The most convenient method takes advantage of the entity save system to handle nested entities during the save process. If the content being created doesn't exist yet elsewhere in the content file or doesn't need to exist as an independent entity like an individual Paragraph entity, it may be defined fully within the parent field as an item value. See the code snippet below for a basic example of this.

- entity: "node"
  type: "page"
  title: "Paragraph Example"
  status: 1
  # Populate an example paragraph field.
  field_paragraph_content:
    # Define a nested entity directly as a field item value.
    - entity: 'paragraph'
      type: 'rich_text'
      field_title:
        - value: "Paragraph Headline"
      field_body:
        - value: |
            <p>Lorem ipsum...</p>
          format: 'full_html'

In the snippet above, the parent entity is defined normally. Within the paragraph field, a nested entity is then defined directly as an item value of the paragraph field. This works by populating the nested field architecture of the overall node, and once the parent entity is saved, the nested structure is traversed recursively to save all entity children to determine the entity IDs to be stored in the parent entity reference fields.

As long as entity existence checking is enabled for the import operations (it is by default), this approach should work fine as an alternative for the more complex usage of an entity reference callback. The only exceptions to this as of the time of this writing are Paragraph and Media entities which are never updated in place due to the need for more specialized logic to uniquely identify instances of them (see issue #2893055 for more detail).

Entity Reference Processing

In cases where referenced content needs to be more dynamically identified, the reference entity callback may be used to query existing entities and build the target_id value required for entity reference fields.

- entity: "node"
  type: "article"
  title: "Tagged Article"
  status: 1
  body:
    format: "full_html"
    # Using a pipe we can define content across multiple lines.
    value: |
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vobis
      voluptatum perceptarum recordatio vitam beatam facit, et quidem corpore
      perceptarum. Tum Quintus: Est plane, Piso, ut dicis, inquit.Primum cur
      ista res digna odio est, nisi quod est turpis? Duo Reges: constructio
      interrete. Rhetorice igitur, inquam, nos mavis quam dialectice disputare?</p>
  # Using the tags below assumes the tags were created manually or imported earlier.
  field_tags:
    # This is done via a preprocessor.
    - '#process':
        # First we designate the processor callback to be used.
        callback: 'reference'
        # Each callback may require a set of arguments to configure its behavior.
        args:
          # Indicate the machine name of the entity type to be referenced.
          - 'taxonomy_term'
          # Provide a list of conditions to filter the content matches.
          # Each property filter maps directly to an EntityQuery condition.
          - vid: 'tags'
            name: 'Generated content'
    # Processors may be called multiple times to fill in any content requirements.
    - '#process':
        callback: 'reference'
        args:
          - 'taxonomy_term'
          - vid: 'tags'
            name: 'Imported demo content'

The code snippet above demonstrates usage of the reference processor to query existing taxonomy terms to be applied to the article being created. In the case that either taxonomy term does not exist already, it will be created containing the basic values provided to the callback query. Behind the scenes, this reference callback is building an entity query to search for existing content of the entity type designated in the first callback argument. All subsequent keyed values passed into the second key under the args array are then used to populate conditions on the entity query object before executing it. Making use of this mechanic offers a great deal of flexibility in the queries since the entity query system has been so greatly expanded in Drupal 8. By looking at the list of options available for query conditions it is clear that very complex queries may be created by providing the field condition as the argument key and the search value as the argument value.

File and Image References

Adequate demonstration of a site's functionality often requires files and image assets to be intermingled with content. Using the file processor, YAML Content supports the inclusion of files and images throughout content. To begin with, any images referenced from content files are expected to be located within an images/ directory beside the content/ directory containing the content files being imported. Likewise, any file or media assets being included with imported content are assumed to be located within a data_files/ directory on the same level.

# Files like images can even be referenced and added within content.
- entity: "node"
  type: "article"
  title: "Article with an Image"
  status: 1
  body:
    format: "full_html"
    # Using a pipe we can define content across multiple lines.
    value: |
      <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed vobis
      voluptatum perceptarum recordatio vitam beatam facit, et quidem corpore
      perceptarum. Tum Quintus: Est plane, Piso, ut dicis, inquit.Primum cur
      ista res digna odio est, nisi quod est turpis? Duo Reges: constructio
      interrete. Rhetorice igitur, inquam, nos mavis quam dialectice disputare?</p>
  field_tags:
    - '#process':
        callback: 'reference'
        args:
          - 'taxonomy_term'
          - vid: 'tags'
            name: 'Generated content'
  field_image:
    # To lookup and add files we'll need to use a different callback function.
    - '#process':
        # In this case we're looking up a file, so we'll use the `file` callback.
        callback: 'file'
        args:
          # Our first argument is, again, the bundle of the entity type.
          - 'image'
          # For this callback our additional arguments are telling what file we want.
          # By default, images are searched for within an `images` directory beside the
          # `content` directory containing our content files.
          - filename: 'demo-image.jpg'
      # Additional properties needed for a reference field may be defined at the same
      # level as the process indicator.
      alt: "Don't forget the alt text."

The example above again demonstrates a use of the reference processor to populate a taxonomy term field, but in the next field it provides an example of including an image file within an image field. Like before, the processor is defined within the individual field item it should populate. The rest of the definition is very similar in overall structure to the previous reference callback.

First, we define the specific callback to be used. In this case, we're looking up a file. Next we define the arguments required for this callback. The first of these is the bundle of the file being loaded. If this bundle is defined as image, the file will be searched for in the images/ directory. Otherwise, the file will be searched for within the data_files/ directory. The only other argument required in this case is the file name to be searched for. Given this definition, the processor will walk through the following steps before proceeding through the rest of the import process:

Search for the file at images/demo-image.jpg
Save the file as a managed file in Drupal
Return the new managed file ID for reference in the parent entity reference field.

Hopefully this has shed some light onto a few more advanced methods of creating content with the YAML Content module. For any other questions feel free to reach out through the issue queue or the comments section below!

Additional Resources
Previous Article: Introducing the YAML Content Module
YAML Content module
Module documentation
Reference card
Wikipedia: YAML