HomeServiceContact
Drupal
min read
April 5, 2024

Structuring the site's content for Algolia - Drupal integration

Structuring the site's content for Algolia - Drupal integration
Table of contents

The traditional approach to building search pages in Drupal involves using views, which offers several advantages. For instance, when a content type includes a media field, only the media ID needs to be indexed, as the Drupal Render API can handle rendering the actual image.

Similarly, when there's a taxonomy reference field, only the ID needs to be indexed, as the Render API can render the term's name.Views can be used to create a search page for an Algolia index also. But,  the true power of Algolia can be harnessed only when the search UI is built using JavaScript.

However, this decouples the search UI from Drupal, eliminating the assistance of the Render API. So, we must develop our own strategies for structuring the data according to the content.

In Part 1 of this blog, we covered the fundamental steps in  integrating Algolia search with Drupal and created a search UI for the Umami profile. But, we indexed only the body and field_ingredients fields apart from the title and image_url fields.

In real scenarios, content types often have multiple text fields, paragraphs etc - each holding crucial data that should be searchable. In this blog, we will explore 2 different strategies for structuring the data 

  • When the content type features only a few text fields.
  • When the content type comprises multiple text fields and paragraphs.

Before we start:

  • Consider checking out part 1 of this blog series if you haven’t already. We have discussed many important concepts and terminologies related to Algolia there.
  • It's crucial to understand the code in this repository. Refer to part 1 for a quick refresher if you need to.
  • Read the official documentation on search api processors as we will be developing custom search api processors. But feel free to skip if you prefer ‘coding first, theory later’.

When the content type features only a few text fields.

Add one more body field “body_2” to the “Article” content type and create a new article. Now, both the fields contain data that should be searchable by the user. We can add the new field to our index and then re-index all contents. But how will we display the data from the new field in the search result cards?

Before starting the implementation, let's take a look at the design of our search result cards.

searchable field

  • Each card has a title, image and body fields. 
  • The body field holds most of the searchable data. So contents from the new field “body_2” should also get displayed there.

The solution is to combine the text from both ‘body’ and  body_2 fields into a single field. We will use  the ‘Aggregated Field’ field provided by the search api module for this. We can then configure this new field as a ‘searchableAttribute’ in the Algolia dashboard and then render value from that field in the result card. These are the steps to follow.

  • Go to /admin/config/search/search-api/index/demo_umami/fields and remove the already added “body” field. Then save the settings.
  • Click on Add fields and add “Aggregated field”.
  • Select “Concatenation” as the aggregation type. Select “Body” and “Body 2” fields in “Contained fields”.
aggregated field

  • Save the field and re-index all contents again.

If you check the records in Algolia now, All the records will have the “aggregated_field” property. Go to Configuration -> Searchable attributes, remove body field and add “aggregated_field”. Update the same in “Attributes to snippet” as well.

Next, we need to replace “body” property with “aggrgated_field” in the search.js from the custom module.

Algolia aggregated field

Clear the cache and visit the search page again. Search for any values from the “body_2” field of the new article. The new article will be displayed in the search result and the data we searched will get highlighted in the search result card.

seach field and search result card

When the content type comprises multiple text fields and paragraphs.

The “Aggregate field” provided by the search api module is very useful for combining values from different fields. But if you are working with a site with lots of content, then there is a high chance that paragraphs might be used for creating content. All the paragraphs might contain important data that should be searchable. Let’s take a look at a slightly complex scenario.

Install paragraphs module in the Umami site and create the following paragraphs.

Paragraph name

Fields

Field type

Banner

field_media

Media reference (Image only)

Banner with text

field_media
field_text

Re use existing field
Text (formatted, long)

Accordion

field_title
field_text

Text (plain)
Re use existing field

Tabs

field_title
field_text

Re use existing field
Re use existing field

  • Edit the article content type and add a new paragraph reference field “field_paragraphs”. 
  • Add reference to Banner with text, Accordion and Tabs. 
  • Next create a blog content type with the following fields.

Field name Field type Referenced items
title
body
field_banner paragraph Banner Banner with text
field_paragraphs Re use existing field Re use existing field

Add few more Article and Blog contents. Now all the searchable data is spread across multiple fields. How should we structure the data and index in Algolia in the above scenario?

We need the following basic properties in each record when we index content.

  • Title - Holds the title of the page.
  • Image - Holds the image to be displayed in search result cards.
  • Body - Holds all other searchable data.

Most of our work would be in adding the image and body parameters.We will be creating search api processor plugins for accomplishing that. Let’s add them one by one.

First, edit the “demo_umai” index we added in the site and delete all fields except “title” and “page_url”.

search api processor plugins

Adding the image field

Image will be present in “field_media_image” for both Article and Recipe content types. But for Blog, Image will be in either “Banner” or “Banner with text” paragraph.

Content type

Field that stores the image

Recipe

field_media_image

Article

field_media_image

Blog

field_banner (Paragraph reference field)

We need a single image_url field that will store the image url for all content types. So, we have to create a custom search api processor plugin. In simple words, search api processor plugins are used to manipulate the data before indexing.

Add the following code to umami_site_search/src/Plugin/search_api/processor/UmamiSearchCommonImageField.php



<?php

namespace Drupal\umami_site_search\Plugin\search_api\processor;

use Drupal\Core\Entity\EntityInterface;
use Drupal\Core\Entity\EntityTypeManagerInterface;
use Drupal\media\Entity\Media;
use Drupal\search_api\Datasource\DatasourceInterface;
use Drupal\search_api\Item\ItemInterface;
use Drupal\search_api\Processor\ProcessorPluginBase;
use Drupal\search_api\Processor\ProcessorProperty;
use Symfony\Component\DependencyInjection\ContainerInterface;

/**
 * Adds 'common_image_field' field.
 *
 * @SearchApiProcessor(
 *   id = "common_image_field",
 *   label = @Translation("Common image field"),
 *   description = @Translation("Common field for all content types."),
 *   stages = {
 *     "add_properties" = 0,
 *   },
 * )
 */
class UmamiSearchCommonImageField extends ProcessorPluginBase {

  /**
   * The entity type manager.
   *
   * @var \Drupal\Core\Entity\EntityTypeManagerInterface
   */
  protected $entityTypeManager;

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    /** @var static $plugin */
    $plugin = parent::create($container, $configuration, $plugin_id, $plugin_definition);

    $plugin->setEntityTypeManager($container->get('entity_type.manager'));
    return $plugin;
  }

  /**
   * Sets entity type manager service.
   *
   * @param \Drupal\Core\Entity\EntityTypeManagerInterface $entity_type_manager
   *   The entity type manager service.
   *
   * @return $this
   */
  public function setEntityTypeManager(EntityTypeManagerInterface $entity_type_manager) {
    $this->entityTypeManager = $entity_type_manager;
    return $this;
  }

  /**
   * Retrieves the entity type manager service.
   *
   * @return \Drupal\Core\Entity\EntityTypeManagerInterface
   *   The entity type manager service.
   */
  protected function getEntityTypeManager() {
    return $this->entityTypeManager ?: \Drupal::service('entity_type.manager');
  }

  /**
   * {@inheritdoc}
   */
  public function getPropertyDefinitions(DatasourceInterface $datasource = NULL) {
    $properties = [];

    if (!$datasource) {
      $definition = [
        'label' => $this->t('Common image field'),
        'description' => $this->t('Common field for all content types'),
        'type' => 'string',
        'processor_id' => $this->getPluginId(),
      ];
      $properties['common_image_field'] = new ProcessorProperty($definition);
    }

    return $properties;
  }

  /**
   * {@inheritdoc}
   */
  public function addFieldValues(ItemInterface $item) : void {
    $node = $item->getOriginalObject()->getValue();
    $uri = '';
    switch ($node->bundle()) {
      case 'page':
        // Skip Basic page contents as they are not included.
        break;

      case 'article':
      case 'recipe':
        // Image will be present in “field_media_image” for both Article and
        // Recipe content types.
        $media = $this->getMediaEntity($node, 'field_media_image');
        if ($media instanceof Media) {
          $uri = $this->getUrlFromMedia($media);
        }
        break;

      case 'blog':
        // Image will be present in paragraphs added in "field_banner" field in
        // blog content type.
        if ($node->hasField('field_banner') && !empty($node->get('field_banner')->referencedEntities())) {
          $paragraph = $node->get('field_banner')->referencedEntities()[0];
          // Get the media entity from banner paragraphs.
          $media = $this->getMediaEntity($paragraph, 'field_media');
          if ($media instanceof Media) {
            $uri = $this->getUrlFromMedia($media);
          }
          break;
        }
    }
    // Save the URL to the field.
    if ($uri) {
      $fields = $this->getFieldsHelper()
        ->filterForPropertyPath($item->getFields(), NULL, 'common_image_field');
      foreach ($fields as $field) {
        $field->addValue($uri);
      }
    }

  }

  /**
   * Helper function to get the URI of of the media item.
   *
   * @param \Drupal\Core\Entity\EntityInterface $entity
   *   The entity object.
   * @param string $field_name
   *   The media reference field name.
   *
   * @return \Drupal\media\Entity\Media|null
   *   The media entity.
   */
  public function getMediaEntity(EntityInterface $entity, string $field_name) {
    if ($entity->hasField($field_name)) {
      return $entity->$field_name->entity;
    }

    return NULL;
  }

  /**
   * Helper function to get the media URL from media entity.
   *
   * @param \Drupal\media\Entity\Media $media
   *   The media item.
   *
   * @return string
   *   URI of the media.
   */
  public function getUrlFromMedia(Media $media) {
    $url = '';
    if (!empty($media)) {
      switch ($media->bundle()) {
        case 'image':
          $url = $media->field_media_image->entity?->createFileUrl();
          break;
      }
      return $url;
    }
  }

}


  • This processor plugin creates a new field with property path “common_image_field” that will store image urls from all content types. 
  • To add this field to the “demo_umami” index, Go to “admin/config/search/search-api/index/demo_umami/processors” and enable the “Common image field” processor.
managing processors for search plugins

  • Then add the field from “admin/config/search/search-api/index/demo_umami/fields”.
adding fields
  • Change the machine name to “image_url” and save.
updating and changing the name field
  • Index the contents and check the records in the Algolia dashboard and you should see image_url property in all the records.
Algolia dashoard records

Adding the body field

Body field of records should hold contents from all the “text” fields including paragraph fields. 

  • For that, first we need to create a custom service umami_site_search/src/ParagraphsContentAggregator.php that will return text from all the paragraphs in a node.


<?php

namespace Drupal\umami_site_search;

use Drupal\Core\Entity\EntityDisplayRepositoryInterface;
use Drupal\Core\Entity\EntityInterface;
use Drupal\node\Entity\Node;
use Drupal\paragraphs\Entity\Paragraph;

/**
 * Helper service that gives content from all paragraphs in a node.
 */
class ParagraphsContentAggregator {

  /**
   * The entity display repository.
   *
   * @var \Drupal\Core\Entity\EntityDisplayRepositoryInterface
   */
  protected $entityDisplayRepository;

  /**
   * Constructs a new object.
   *
   * @param \Drupal\Core\Entity\EntityDisplayRepositoryInterface $entity_display_repository
   *   The entity display repository.
   */
  public function __construct(EntityDisplayRepositoryInterface $entity_display_repository = NULL) {
    $this->entityDisplayRepository = $entity_display_repository;
  }

  /**
   * Loads all text contents from paragraphs and returns the concatanated text.
   *
   * @param \Drupal\node\Entity\Node $node
   *   The node object.
   *
   * @return string
   *   Aggregated text from all paragraphs in the provided entity.
   */
  public function getContentFromAllParagraphs(Node $node): string {
    $concatanated_string = '';
    // Get the paragarph reference fields of the node.
    $node_paragraph_fields = $this->getEntityFieldsAsPerWeight($node, 'node');
    if ($node_paragraph_fields) {
      foreach ($node_paragraph_fields as $field_name) {
        $field_definition = $node->getFieldDefinition($field_name);
        if ($field_definition) {
          $field_storage_definition = $field_definition->getFieldStorageDefinition();
          $field_settings = $field_storage_definition->getSettings();
          if (isset($field_settings['target_type']) && $field_settings['target_type'] == "paragraph") {
            if (!$node->get($field_name)->isEmpty()) {
              foreach ($node->get($field_name)->referencedEntities() as $paragraph_item) {
                $concatanated_string .= $this->getContentsFromParagraph($paragraph_item);
              }
            }
          }
        }
      }
    }
    return $concatanated_string;
  }

  /**
   * Retrives field names of an entity as per weight in form display.
   *
   * @param \Drupal\Core\Entity\EntityInterface $entity
   *   The node/paragraph entity.
   * @param string $type
   *   Node or paragraph.
   */
  public function getEntityFieldsAsPerWeight(EntityInterface $entity, string $type = 'paragraph'): array {
    $form_display = $this->entityDisplayRepository->getFormDisplay($type, $entity->bundle(), 'default');
    $fields = $form_display?->get('content');
    if (!empty($fields)) {
      // Sort the fields according to weight.
      uasort($fields, function ($a, $b) {
        return $a['weight'] - $b['weight'];
      });

      if ($type == 'node') {
        // Return only the paragraph reference fields.
        $paragraph_fields = array_filter($fields, function ($field) {
          if (isset($field['type'])) {
            return ($field['type']) ? str_contains($field['type'], 'paragraph') : FALSE;
          }
          else {
            return FALSE;
          }
        });

        return array_keys($paragraph_fields);
      }
      else {
        return array_keys($fields);
      }
    }
    return [];
  }

  /**
   * Get contents from the paragarph.
   *
   * @param \Drupal\paragraphs\Entity\Paragraph $paragraph_item
   *   The paragraph entity.
   */
  public function getContentsFromParagraph(Paragraph $paragraph_item): string {
    $string = '';
    $fields_as_per_form_display = $this->getEntityFieldsAsPerWeight($paragraph_item);
    foreach ($fields_as_per_form_display as $field_name) {
      $field_definition = $paragraph_item->getFieldDefinition($field_name);
      if ($field_definition) {
        $field_storage_definition = $field_definition->getFieldStorageDefinition();
        // Skip base fields.
        if ($field_storage_definition->isBaseField()) {
          continue;
        }

        // Handle text fields.
        $field_type = $field_definition->getType();
        if (in_array($field_type, ['text_long', 'string', 'string_long'])) {
          if (!$paragraph_item->get($field_name)->isEmpty()) {
            $value = $paragraph_item->get($field_name)->getValue();
            $value = strip_tags($value[0]['value']);
            $string .= trim($value) . ' ';
          }
        }
        // Handle other paragraph fields.
        elseif ($field_type == 'entity_reference_revisions') {
          $field_settings = $field_storage_definition->getSettings();
          if (isset($field_settings['target_type']) && $field_settings['target_type'] == "paragraph") {
            if (!$paragraph_item->get($field_name)->isEmpty()) {
              foreach ($paragraph_item->get($field_name)->referencedEntities() as $inner_paragraph_item) {
                $string .= $this->getContentsFromParagraph($inner_paragraph_item);
              }
            }
          }
        }
      }
    }
    return $string;
  }

}

  • Next, create a new search api processor that will aggregate the text.


<?php

namespace Drupal\umami_site_search\Plugin\search_api\processor;

use Drupal\search_api\Datasource\DatasourceInterface;
use Drupal\search_api\Item\ItemInterface;
use Drupal\search_api\Processor\ProcessorPluginBase;
use Drupal\search_api\Processor\ProcessorProperty;
use Drupal\umami_site_search\ParagraphsContentAggregator;
use Symfony\Component\DependencyInjection\ContainerInterface;

/**
 * Adds 'aggregated_text_field' field.
 *
 * @SearchApiProcessor(
 *   id = "aggregated_text_field",
 *   label = @Translation("Aggregated text field"),
 *   description = @Translation("Aggregates text from all fields"),
 *   stages = {
 *     "add_properties" = 0,
 *   },
 * )
 */
class UmamiSearchAggregatedTextField extends ProcessorPluginBase {

  /**
   * The helper service.
   *
   * @var \Drupal\umami_site_search\ParagraphsContentAggregator
   */
  protected $paragraphsContentAggregator;

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    /** @var static $plugin */
    $plugin = parent::create($container, $configuration, $plugin_id, $plugin_definition);

    $plugin->setParagraphsContentAggregator($container->get('umami_site_search.aggregate_paragraph_contents'));
    return $plugin;
  }

  /**
   * Sets the helper service.
   *
   * @param \Drupal\umami_site_search\ParagraphsContentAggregator $algolia_index_helper
   *   The index helper service.
   *
   * @return $this
   */
  public function setParagraphsContentAggregator(ParagraphsContentAggregator $paragraphs_content_aggregator) {
    $this->paragraphsContentAggregator = $paragraphs_content_aggregator;
    return $this;
  }

  /**
   * Retrieves the index helper service.
   *
   * @return \Drupal\umami_site_search\ParagraphsContentAggregator
   *   The index helper service.
   */
  protected function getParagraphsContentAggregator() {
    return $this->paragraphsContentAggregator ?: \Drupal::service('umami_site_search.aggregate_paragraph_contents');
  }

  /**
   * {@inheritdoc}
   */
  public function getPropertyDefinitions(DatasourceInterface $datasource = NULL) {
    $properties = [];

    if (!$datasource) {
      $definition = [
        'label' => $this->t('Aggregated text field'),
        'description' => $this->t('Aggregates text from all fields'),
        'type' => 'string',
        'processor_id' => $this->getPluginId(),
      ];
      $properties['aggregated_text_field'] = new ProcessorProperty($definition);
    }

    return $properties;
  }

  /**
   * {@inheritdoc}
   */
  public function addFieldValues(ItemInterface $item) : void {
    /** @var \Drupal\node\Entity\Node $node */
    $node = $item->getOriginalObject()->getValue();
    // Fist, get the text from all content type fields.
    $fields_in_order = [
      'body', 'field_body_2', 'field_recipe_instruction', 'field_ingredients',
    ];
    $aggregated_text = '';
    foreach ($fields_in_order as $field) {
      if ($node->hasField($field) && !$node->get($field)->isEmpty()) {
        // The 'field_ingredients' field in recipe content tye is a list field
        // and needs separate logic to get data.
        if ('field_ingredients' === $field) {
          $aggregated_text .= trim(strip_tags($node->get($field)->getString())) . ' ';
        }
        else {
          $aggregated_text .= trim(strip_tags($node->get($field)->value)) . ' ';
        }
      }
    }

    // Concatanate the result with aggregated text from paragraphs.
    $aggregated_text .= $this->getParagraphsContentAggregator()->getContentFromAllParagraphs($node);
    // Save the value to the field.
    $fields = $this->getFieldsHelper()
      ->filterForPropertyPath($item->getFields(), NULL, 'aggregated_text_field');
    foreach ($fields as $field) {
      $field->addValue($aggregated_text);
    }

  }

}


  • For simplicity, All the non paragraph text fields are hard coded in the code. Text from those fields are concatenated in a specific order. Then text from all paragraphs are concatenated with it at the end and the end result is stored in the “aggregated_text_field” field.
  • Enable the processor and add “aggregated_text_field” field to “demo_umami” index.

aggregated fields for managing processors

updated fields

  • Index contents again and check the records. Every record will have an “aggregated_text_field” property.
aggregated text field property
  • Now all we have to do is to update the searchableAttributes and add the “aggregarted_text_field” property as attributesToSnippet form Algolia dashboard.
attriutes to snippet Algolia dashboard
  • Finally, change “aggregated_field” to “aggregated_text_field” in search.js. Visit the search page after clearing cache.

We've reached the end of Part 2 in our journey of integrating Algolia search into Drupal. Let's do a quick recap of what we have accomplished so far.

  • We revisited the design of our 'search result cards' and identified the essential elements: 'title,' 'image,' and 'body' fields.
  • Adding the title field was simple and straightforward. Simply add the field from the search api UI.
  • For the image field, we wanted to accommodate different content types. So, we used Search API processors to create a custom field that could store images from all content types.
  • Most of the searchable data resides in the 'body' field. So our approach was to concatenate contents from all ‘text’ fields into a single field.
  • We used the ‘Aggregated fields’ field provided by the search api module to combine the texts when paragraphs were not used.
  • In situations where our content incorporated paragraphs, we developed a custom 'Aggregated text field' processor to effectively combine text from various fields and paragraphs.

Hope this part helped you to get a better understanding on structuring data. The entire code can be found here

In the next and final part of this blog series, we will understand how to split records when individual records (nodes) contain a large volume of content. We will look into the record size limits in Algolia and explore concepts of ‘deduplication and grouping’ . It is a must read if you are dealing with ‘content rich’ websites.

Link to part 3

Written by
Editor
No art workers.
We'd love to talk about your business objectives