How can I index arbitrarily related entity data in a Search API index the same as if it were an entity reference?

This is possible with custom search_api processors.

First, I created an abstract class to use as a base for the shared functionality. I.e. a method to index arbitrary entity data with a piece of content.

namespace Drupalmy_modulePluginsearch_apiprocessor;

use DrupalCoreEntityContentEntityInterface;
use Drupalsearch_apiDatasourceDatasourceInterface;
use Drupalsearch_apiItemItemInterface;
use Drupalsearch_apiProcessorEntityProcessorProperty;
use Drupalsearch_apiProcessorProcessorPluginBase;
use Drupalsearch_apiUtilityUtility;

/**
 * Base plugin class for indexing arbitrarily related entity data.
 *
 * This can be helpful to index properties of entities referencing an entity or
 * entities related in some other arbitrary way.
 *
 * @package Drupalmy_modulePluginsearch_apiprocessor
 */
abstract class RelatedEntityBase extends ProcessorPluginBase {

  /**
   * {@inheritdoc}
   */
  public function getPropertyDefinitions(DatasourceInterface $datasource = NULL) {
    $plugin_definition = $this->getPluginDefinition();
    $properties = ();

    if (!$datasource || $datasource->getEntityTypeId() !== $this->getIndexedEntityTypeId()) {
      return $properties;
    }

    $definition = (
      'label' => $plugin_definition('label'),
      'description' => $plugin_definition('description'),
      'type' => 'entity:' . $this->getRelatedEntityTypeId(),
      'processor_id' => $this->getPluginId(),
      'is_list' => TRUE,
    );
    $property = new EntityProcessorProperty($definition);
    $property->setEntityTypeId($this->getRelatedEntityTypeId());
    $properties($this->getPluginId()) = $property;

    return $properties;
  }

  /**
   * {@inheritdoc}
   */
  public function addFieldValues(ItemInterface $item) {
    /** @var DrupalCoreEntityContentEntityInterface $entity */
    $entity = $item->getOriginalObject()->getValue();

    $to_extract = ();
    foreach ($item->getFields() as $field) {
      $datasource = $field->getDatasource();
      $property_path = $field->getPropertyPath();
      ($direct, $nested) = Utility::splitPropertyPath($property_path, FALSE);
      if ($datasource && $datasource->getEntityTypeId() === $entity->getEntityTypeId() && $direct === $this->getPluginId()) {
        $to_extract($nested)() = $field;
      }
    }

    foreach ($this->getRelatedEntities($entity) as $relation) {
      $this->getFieldsHelper()
        ->extractFields($relation->getTypedData(), $to_extract, $item->getLanguage());
    }
  }

  /**
   * Get an array of related entities.
   *
   * This should return an array of fully loaded entities that relate to the
   * $entity being indexed.
   *
   * @param DrupalCoreEntityContentEntityInterface $entity
   *   The entity being indexed.
   *
   * @return array
   *   An array of entities related to $entity.
   */
  abstract protected function getRelatedEntities(ContentEntityInterface $entity): array;

  /**
   * Get the entity type id of the entity being indexed.
   *
   * This is the entity type of the $entity passed to
   * $this->getRelatedEntities().
   *
   * @return string
   *   An entity type id string, e.g. 'node', 'media', or 'taxonomy_term'.
   */
  abstract protected function getIndexedEntityTypeId(): string;

  /**
   * Get the entity type id of the related entities.
   *
   * This is the entity type of the items returned from
   * $this->getRelatedEntities().
   *
   * @return string
   *   An entity type id string, e.g. 'node', 'media', or 'taxonomy_term'.
   */
  abstract protected function getRelatedEntityTypeId(): string;

}

Next, I created plugin classes that extended my abstract class for each case (Collection’s Authors, Article’s Collections, Author’s Collections). For example, to index data from an Article’s Collections as part of the Article’s indexed data:

namespace Drupalmy_modulePluginsearch_apiprocessor;

use DrupalCoreEntityContentEntityInterface;
use Drupalmy_modulePluginsearch_apiprocessorRelatedEntityBase;

/**
 * Index properties from Collections referencing an Article.
 *
 * @SearchApiProcessor(
 *   id = "my_module_article_collections",
 *   label = @Translation("Article's Collections"),
 *   description = @Translation("Index properties from Collections referencing this Article."),
 *   stages = {
 *     "add_properties" = 0,
 *   },
 * )
 */
class ArticleCollections extends RelatedEntityBase {

  /**
   * {@inheritdoc}
   */
  protected function getRelatedEntities(ContentEntityInterface $entity): array {
    return my_function_to_get_article_collections($entity)
  }

  /**
   * {@inheritdoc}
   */
  protected function getIndexedEntityTypeId(): string {
    return 'node';
  }

  /**
   * {@inheritdoc}
   */
  protected function getRelatedEntityTypeId(): string {
    return 'node';
  }

}

This allowed me to index data from a Collection as part of an Article’s data, for example the Article’s Collection Ids (i.e. the Ids of Collections referencing the Article). I can index any field from the Collection – by selecting the field I want in the UI – the same as if the Article had an entity reference field referencing the Collection. (Note: before you can index any fields with the custom processor, you must first enable it on the Processor tab for your index.)

This all worked great, however, my indexed data did not stay synced with reality. For example, if I added a new Article to a Collection, the indexed data for that new Article would not get updated with information for the new Collection. I.e. the Article was not getting re-indexed if a Collection referencing it was updated. I resolved this with a hook_ENTITY_TYPE_update() implementation that marks dependent Articles to be re-indexed when a Collection is saved.

use DrupalnodeNodeInterface;

/*
 * Implements hook_ENTITY_TYPE_update().
 */
function my_module_node_update(NodeInterface $node) {
  if ($node->bundle() == 'collection') {
    $articles = ();

    // Gather all Articles that this Collection references.
    $articles = my_function_to_get_collection_articles($node);
    // Also gather any Articles that were referenced before this save, but are
    // no longer referenced.
    $original_node = isset($node->original) ? $node->original : NULL;
    if ($original_node instanceof NodeInterface) {
      $articles += my_function_to_get_collection_articles($original_node);
    }

    // Mark the articles to be re-indexed.
    foreach ($articles as $article) {
      /** @var Drupalsearch_apiPluginsearch_apidatasourceContentEntityTrackingManager $tracking_manager */
      $search_api_tracking_manager = Drupal::service('search_api.entity_datasource.tracking_manager');

      $indexes = $search_api_tracking_manager->getIndexesForEntity($article);
      if (!empty($indexes)) {
        $item_ids = ();
        foreach ($article->getTranslationLanguages() as $langcode => $language) {
          $item_ids() = $article->id() . ':' . $langcode;
        }
        foreach ($indexes as $index) {
          $index->trackItemsUpdated('entity:node', $item_ids);
        }
      }
    }
  }
}

After all of this, I can safely index data from arbitrarily related entities.