heyday/silverstripe-elastica

Provides Elastic Search integration for SilverStripe DataObjects using Elastica

Installs: 29 990

Dependents: 2

Suggesters: 0

Security: 0

Stars: 11

Watchers: 23

Forks: 24

Type:silverstripe-vendormodule

4.3.1 2024-10-25 01:27 UTC

README

Facilitates searching and indexing of SilverStripe CMS using ElasticSearch. We use Elastica to do all the heavy lifting in terms of communication with the elastic search server.

This module makes it easy to use ElasticSearch with SilverStripe without limiting any of the functionality found in Elastica. Basically anything that can be done with Elastica alone can be done in conjunction with this module.

This module supercedes Symbiote's Elastica Module, which was only supported up to SilverStripe 3.

Features

  • Uses Elastica to communicate with the ElasticSearch Server
  • Uses PSR/Log interface for logging purposes (optional)
  • Uses YAML configuration to index Data Objects and Pages
  • Can handle has_many, many_many, and has_one relationships in the indexed ElasticSearch document
  • Can handle invalidation and reindexing of related data objects
  • Can handle custom fields that are not in the database but only exist as part of an object instance
  • Infers ElasticSearch document field type from the database field type defined in the corresponding SilverStripe model

Compatibility

This release should be compatible with all ElasticSearch 7.0 and above versions. May work with elasticsearch 6. This release requires SilverStripe 4.x

If you need to work with an earlier version of elasticsearch (2.x) and SS (3.x), please try the 1.0 release of this module

Installation

$ composer require heyday/silverstripe-elastica

Usage

Elastica Service configuration example:

mysite/_config/search.yml

Heyday\Elastica\ElasticaService: # Example of customising the index config on the elastic search server (completely optional).
  index_config:
    settings:
      analysis:
        analyzer:
          default:
            type: custom
            tokenizer: standard
            filter:
              - lowercase
              - stemming_filter
        filter:
          stemming_filter:
            type: snowball
            language: English

---
Only:
  environment: dev
---
SilverStripe\Core\Injector\Injector:
  Elastica\Client:
    constructor:
      - host: localhost # hostname of the elastic search server
        port: 9200 # port number of the elastic search server

  Heyday\Elastica\ElasticaService:
    constructor:
      - "%$Elastica\Client"
      - "name-of-index"  # name of the index on the elastic search server
      - "%$Logger"  # your error logger (must implement psr/log interface)
      - "64MB"      # increases memory limit while indexing

Index configuration example:

mysite/_config/search.yml

# PageTypes

Your\Namespace\Page:
  extensions:
    - Heyday\Elastica\Searchable
  indexed_fields: &page_defaults
    - Title
    - MenuTitle
    - Content
    - MetaDescription

Your\Namespace\SpecialPageWithAdditionalFields:
  extensions:
    - Heyday\Elastica\Searchable # only needed if this page does not extend the 'Page' configured above
  indexed_fields:
    <<: *page_defaults
    - BannerHeading
    - BannerCopy
    - SubHeading

Your\Namespace\SpecialPageWithRelatedDataObject:
  extensions:
    - Heyday\Elastica\Searchable
  indexed_fields:
    <<: *page_defaults
    -
      RelatedDataObjects:
        type: nested
        relationClass: App\DataObjects\Tags # Will be pulled from has_many / many_many, but you can specify it here too

Your\Namespace\RelatedDataObject:
  extensions:
    - Heyday\Elastica\Searchable
  indexed_fields:
    - Title
    - SomeOtherField
  dependent_classes:
    - SpecialPageWithRelatedDataObject # invalidates the index for SpecialPageWithRelatedDataObject when a RelatedDataObject is updated/created

Custom field index configuration example:

mysite/_config/search.yml

# PageTypes

Your\Namespace\Page:
  extensions:
    - Heyday\Elastica\Searchable
  indexed_fields:
    - Title
    - SomeOtherField
    -
      TitleAlias:
        type: text
        field: Title # You can specify a custom internal field value with 'field'
    -
      SomeCustomFieldSimple:
        type: text
    -
      SomeCustomFieldComplicatedConfig:
        type: text
        analyzer: nGram_analyser # Must reference analyzer defined on index_config
        search_analyzer: whitespace_analyser # Must reference analyzer defined on index_config
        store: true

mysite/code/PageTypes/Page.php

<?php

class Page extends SiteTree
{
    public function getSomeCustomFieldSimple()
    {
        return 'some dynamic text or something';
    }

    public function getSomeCustomFieldComplicatedConfig()
    {
        return 'the config does not have anyting to do with me';
    }
}

Simple search controller configuration/implementation example:

mysite/_config/search.yml

  SearchController:
    properties:
      SearchService: "%$Heyday\Elastica\ElasticaService"

mysite/code/Controllers/SearchController.php

<?php

class SearchController extends Page_Controller
{
    /**
     * @var array
     */
    private static $allowed_actions = [
        'index'
    ];

    /**
     * @var \Heyday\Elastica\ElasticaService
     */
    protected $searchService;

    /**
     * Search results page action
     *
     * @return HTMLText
     */
    public function index()
    {
        return $this->renderWith(['SearchResults', 'Page']);
    }

    /**
     * @param \Heyday\Elastica\ElasticaService $searchService
     */
    public function setSearchService(\Heyday\Elastica\ElasticaService $searchService)
    {
        $this->searchService = $searchService;
    }

    /**
     * @return bool|\Heyday\Elastica\PaginatedList
     */
    public function Results()
    {
        $request = $this->getRequest();

        if ($string = $request->requestVar('for')) {

            $query = new \Elastica\Query\BoolQuery();

            $query->addMust(
                new \Elastica\Query\QueryString(strval($string))
            );

            $results = $this->searchService->search($query);

            return new \Heyday\Elastica\PaginatedList($results, $request);
        }

        return false;
    }

    /**
     * Query all Page fields and RelatedObjects nested fields.
     *
     * @return bool|\SilverStripe\ORM\PaginatedList
     */
    public function ResultsWithRelatedObjects()
    {
        $request = $this->getRequest();

        if ($string = $request->requestVar('for')) {

            $queryString = new \Elastica\Query\QueryString(strval($string));

            $boolQuery = new \Elastica\Query\BoolQuery();

            $nestedQuery = new \Elastica\Query\Nested();
            $nestedQuery->setPath('RelatedDataObjects');
            $nestedQuery->setQuery($queryString);

            $boolQuery->addShould($queryString);
            $boolQuery->addShould($nestedQuery);

            $results = $this->searchService->search($boolQuery);

            return new \SilverStripe\ORM\PaginatedList($results, $request);
        }

        return false;
    }

    /**
     * @return mixed
     */
    public function SearchString()
    {
        return Convert::raw2xml($this->getRequest()->requestVar('for'));
    }
}

Reindexing

To run a full reindex of Elastica use

./vendor/bin/sake dev/tasks/ElasticaReindexTask

Subsite support

Our typical setup is to add SubsiteID as an indexed_field.

  indexed_fields:
    - ..
    - SubsiteID

And filter by this at query time.

$query = new \Elastica\Query();
$bool = new \Elastica\Query\BoolQuery();

$match = new \Elastica\Query\MultiMatch();
$match
    ->setQuery(strval($searchString))
    ->setFields($this->getESFields())
    ->setType('most_fields')
    ->setFuzziness('AUTO');
$bool->addMust($match);

$subsiteId = SubsiteState::singleton()->getSubsiteId();

if ($subsiteId) {
    $bool->addFilter(
        new \Elastica\Query\Term(['SubsiteID' => $subsiteId])
    );
} else {
    $bool->addFilter(
        new \Elastica\Query\Term(['SubsiteID' => 0])
    );
}

...

At indexing time

Using Queues

You can make use of queues to have your reindex processes run in the background.

We use silverstripe-queuedjobs (https://github.com/symbiote/silverstripe-queuedjobs) and a job to reindex on publish has been created.

To turn on queues, you will need the following config:

SilverStripe\Core\Injector\Injector:
  Heyday\Elastica\Searchable:
    properties:
      queued: true

You will also need to set up a cronjob (I know not very queue-like...):

Every minute to run the jobs in the queue

*/1 * * * * php /path/to/silverstripe/framework/cli-script.php dev/tasks/ProcessJobQueueTask

and to clean up the jobs, add the cleanup job once by running (it then gets automatically added to run once a day):

framework/sake dev/tasks/CreateQueuedJobTask?name=Symbiote\QueuedJobs\Jobs\CleanupJob