Search API for QuickAppsCMS. Allows to index table entities and locate them using such index.

Installs: 1 385

Dependents: 2

Suggesters: 0

Security: 0

Stars: 0

Watchers: 2

Forks: 0

Type:cakephp-plugin

dev-master 2018-05-15 17:56 UTC

This package is not auto-updated.

Last update: 2025-01-18 19:14:23 UTC


README

Search Plugin
#############

The Search Plugin allows entities to be search-able through an auto-generated
index of words. You can make any table "index-able" by attaching the
``SearchableBehavior`` to it.

Searchable Behavior
===================

This `behavior <http://book.cakephp.org/3.0/en/orm/behaviors.html>`__ is
provided by the Search plugin and allows entities to be "searchable" by using
interchangeable search engines, such engines are responsible of index each
entity within your tables, they also allows you to locate any of those entities
using engine-specific query language.

Using this Behavior
-------------------

You must attach the Searchable behavior and tell which search engine should be
used, by default ``Generic Engine`` will be used which should cover most use
cases, however new "Engine Adapters" can be created to cover specific needs:

.. code:: php

    $this->addBehavior('Search.Searchable', [
        'engine' => [
            'className' => 'Search\Engine\Generic\GenericEngine',
            'config' => [
                'bannedWords' => []
            ]
        ]
    ]);

This particular engine (GenericEngine) will apply a series of filters (converts
to lowercase, remove line breaks, etc) to a list of words extracted from each
entity being indexed. For more details check "Generic Engine" documentation.

Searching Entities
------------------

When attaching this behavior, every entity under your table gets indexed ``depending
on the Search Engine being used``. The idea is you can use this index to locate any
entity indexed in that way. To search entities you should use the `search()` method:

.. code:: php

    $query = $this->Articles->search($criteria);

This method interacts with the engine being used. The ``$criteria`` must be a valid
search-query compatible with the engine being used.

Indexing Events
---------------

Whatever search engine is being used, some events are automatically triggered by
Searchable Behavior when an entity is being index or when its index is being
removed, you can catch these events in your table and alter the index information as
you need:

- ``Model.beforeIndex``: Before entity gets indexed by the configured search engine
  adapter. First argument is the entity instance being indexed.

- ``Model.afterIndex``: After entity was indexed by the configured search engine
  adapter. First argument is the entity instance that was indexed, and second
  indicates whether the indexing process completed correctly or not.

- ``Model.beforeRemoveIndex``: Before entity's index is removed. First argument is
  the affected entity instance.

- ``Model.afterRemoveIndex``: After entity's index is removed. First argument is the
  affected entity instance, and second indicates whether the index-removing process
  completed correctly or not.

Search Criteria
---------------

In most cases ``$criteria`` will be a string representing a search query. For
instance: ``chris AND pratt AND -rat``. Criteria syntax depends exclusively on
the search engine being used. Search plugin provides a generic criteria parsing
API for defining new criteria syntax.

By default Search plugin comes with one built-in language parser: "Mini-Language
Parser" which is used by the built-in "Generic Engine" search engine.

A criteria parser must satisfy the ``Search\Parser\ParserInterface`` interface;
basically it must provide the ``parser()`` method which must return an array
list of "token" objects (``Search\Parser\TokenInterface``).

Search Operators
----------------

An ``Operator`` is a search-criteria command which allows you to perform very
specific SQL filter conditions. An operator is composed of **two parts**, a
``name`` and its ``arguments``, both parts separated using the ``:`` symbol
e.g.:

::

    // operator name is: "created"
    // operator arguments are: "2013..2016"
    created:2013..2016

.. note::

    Operators names are treated as **lowercase_and_underscored**, so ``AuthorName``,
    ``AUTHOR_NAME`` or ``AuThoR_naMe`` are all treated as: ``author_name``.

You can define custom operators for your table by using the ``addSearchOperator()``
method. For example, you might need create a custom operator ``author`` which would
allow you to search an ``Article`` entity by its author name. A search-criteria
using this operator may looks as follow:

::

    // get all articles containing `this phrase` and created by `John Locke`
    "this phrase" author:"John Locke"

You can define in your table an operator method and register it into this behavior
under the `author` name, a full working example may look as follow:

.. code:: php

    class MyTable extends Table {
        public function initialize(array $config)
        {
            // attach the behavior
            $this->addBehavior('Search.Searchable');

            // register a new operator for handling `author:<author_name>` expressions
            $this->addSearchOperator('author', 'operatorAuthor');
        }

        public function operatorAuthor(Query $query, Token $token)
        {
            // $query: The query object to alter
            // $token: Token representing the operator to apply.
            // Scope query using $token information and return.
            return $query;
        }
    }

You can also define operator as a callable function:

.. code:: php

    class MyTable extends Table
    {
        public function initialize(array $config)
        {
            $this->addBehavior('Search.Searchable');

            $this->addSearchOperator('author', function(Query $query, Token $token) {
                // Scope query and return.
                return $query;
            });
        }
    }

Built-in Operator
~~~~~~~~~~~~~~~~~

Search Plugin comes with a few of these operator that should cover most common use
cases:

Date Operator
^^^^^^^^^^^^^

Allows to filter by date-based column types, for example, ``created``, ``modified``,
etc. Date ranges are fully supported as follow: ``created:2014..2015``.

To use this operator you should indicate the column you wish to scope as follow:

.. code:: php

    $this->addSearchOperator('created', 'Search.Date', ['field' => 'created_on']);

Once operator is attached you should be able to filter using the ``created``
operator in you search criteria:, for example:

.. code:: php

    $criteria = "created:2015..2016";
    $this->Articles->search($criteria);

Generic Operator
^^^^^^^^^^^^^^^^

Provides generic scoping for any column type. Usage:

.. code:: php

    $this->addSearchOperator('name', 'Search.Date', ['field' => 'name']);

Supported options:

-   conjunction: Indicates which conjunction type should be used when scoping the
    column. Defaults to `auto`, accepted values are:

    - LIKE: Useful when matching string values, accepts wildcard ``*`` for matching
      "any" sequence of chars and ``!`` for matching any single char. e.g.
      ``author:c*`` or ``author:ca!``, mixing: ``author:c!r*``.

    - IN: Useful when operators accepts a list of possible values. e.g.
      ``author:chris,carter,lisa``.

    - =: Used for strict matching.

    - <>: Used for strict matching.

    - auto: Auto detects, it will use ``IN`` if comma symbol is found in the given
      value, ``LIKE`` will be used otherwise. e.g. For ``author:chris,peter`` the
      "IN" conjunction will be used, and for ``author:chris`` the "LIKE" conjunction
      will be used instead.

Limit Operator
^^^^^^^^^^^^^^

Allows to limit the number of results. Usage:

.. code:: php

    $this->addSearchOperator('num_articles', 'Search.Limit');

Once operator is attached you should be able to filter using the ``num_articles``
operator in you search criteria:, for example:

.. code:: php

    $criteria = "num_articles:6";
    $this->Articles->search($criteria);


Order Operator
^^^^^^^^^^^^^^

Allows to order results by given columns. When attaching this operator you must
indicate which columns are allowed to be ordered by, for example:

.. code:: php

    $this->addSearchOperator('order_articles_by', 'Search.Order', [
        'fields' => ['title', 'created_on']
    ]);

In this example, results can be sorted only by "title" and "created_on" columns.
Once operator is attached you should be able to filter using the
``order_articles_by`` operator in you search criteria and indicating the column and
the ordering direction ("asc" or "desc"), if no direction is given "asc" will be
used by default, for example:

.. code:: php

    $criteria = "order_articles_by:title,asc";
    $this->Articles->search($criteria);

Ordering by multiple columns is supported, in these cases each order command must be
separated using the ``;`` symbol:

.. code:: php

    $criteria = "order_articles_by:title;created_on,desc";
    $this->Articles->search($criteria);

Range Operator
^^^^^^^^^^^^^^

Allows to scope results matching a given range constraint, in order words, SQL's
``BETWEEN`` equivalent. Lower and upper values must be separated using "..".
Example:

.. code:: php

    $this->addSearchOperator('comments_count', 'Search.Range', [
        'field' => 'num_comments'
    ]);

Once operator is attached you should be able to filter using the ``comments_count``
operator in you search criteria:, for example:

.. code:: php

    $criteria = "comments_count:6..10";
    $this->Articles->search($criteria);

This example should return only articles with 6 to 10 comments.


Creating Reusable Operators
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If your application has operators that are commonly reused, it is helpful to package
those operators into re-usable classes extending ``\Search\BaseOperator``, for
instance:

.. code:: php

    // in MyPlugin/Model/Search/CustomOperator.php
    namespace MyPlugin\Model\Search;

    use Search\BaseOperator;

    class CustomOperator extends BaseOperator
    {
        public function scope($query, $token)
        {
            // Scope $query
            return $query;
        }
    }

    // In any table class:

    // Add the custom operator,
    $this->addSearchOperator('operator_name', 'MyPlugin.Custom', ['opt1' => 'val1', ...]);

    // OR passing a constructed operator
    use MyPlugin\Model\Search\CustomOperator;
    $this->addSearchOperator('operator_name', new CustomOperator($this, ['opt1' => 'val1', ...]));


Fallback Operators
~~~~~~~~~~~~~~~~~~

When an operator is detected in the given search criteria but no operator
callable was defined using ``addSearchOperator()``, then
``Search.operator<OperatorName>`` event will be globally triggered, so other
plugins may respond and handle any undefined operator. For example, given the
search criteria below, lets suppose ``date`` operator **was not defined** early:

::

    "this phrase" author:"John Locke" date:2013-06-06..2014-06-06

The ``Search.operatorDate`` event will be fired. A plugin may respond to
this call by implementing this event:

.. code:: php

    // ...

    public function implementedEvents() {
        return [
            'Search.operatorDate' => 'operatorDate',
        ];
    }

    // ...

    public function operatorDate($event, $query, $token)
    {
        // alter $query object and return it
        return $query;
    }

    // ...

.. note::

    -  Event handler method should always return the modified $query object.
    -  The event’s context, that is ``$event->subject``, is the table instance that
       triggered the event.


Interacting With The Engine
---------------------------

You can get an instance of the Search Engine being used by invoking the
``searchEngine()`` method, this allows you, for instance, manually index an entity,
get index information for an specific entity, etc.

.. code:: php

    $engine = $this->Articles->searchEngine();
    $engine->search( ... );
    $engine->get( ... );
    $engine->index( ... );
    $engine->delete( ... );


You can also use the ``searchEngine()`` method to change the engine on the fly:

.. code:: php

    $config = [ ... ];
    $engine = $this->Articles->searchEngine(new CustomSearchEngine($this->Articles, $config));


Engines Adapters
################

New search engine adapters can be created, such adapters must simply extend the
class ``Search\Engine\BaseEngine``. These adapters must provide methods for
indexing, retrieving and removing indexes. This allows for instance use different
search engines for indexing different tables.

This plugin comes with one built-in Search Engine adapter: ``Generic Engine`` which
should be enough in most cases. However, when working with big-sized tables a more
efficiency approach is recommended, such as ``Elasticsearch``, ``Apache SOLR``,
``Sphinx``, etc.


---


Generic Engine
##############

Search plugins comes with one built-in Engine which should cover most use cases.
This Search Engine allows entities to be searchable through an auto-generated list
of words using ``LIKE`` SQL expressions, and optionally ``fulltext`` based searches.
If you need to hold a very large amount of index information you should create your
own Engine adapter to work with third-party solutions such as "Elasticsearch",
"Sphinx", etc. Or enable ``fulltext`` index to speed up Generic Engine.


Using Generic Engine
--------------------

You must indicate Searchable behavior to use this engine when attaching Search
Behavior to your table. For example when attaching Searchable behavior to `Articles`
table:

.. code:: php

    $this->addBehavior('Search.Searchable', [
        'engine' => [
            'className' => 'Search\Engine\Generic\GenericEngine',
            'config' => [
                'bannedWords' => []
            ]
        ]
    ]);

This engine will apply a series of filters (converts to lowercase, remove line
breaks, etc) to words list extracted from each entity being indexed.


Banned Words
------------

You can use the `bannedWords` option to tell which words should not be indexed by
this engine. For example:


.. code:: php

    $this->addBehavior('Search.Searchable', [
        'engine' => [
            'className' => 'Search\Engine\Generic\GenericEngine',
            'config' => [
                'bannedWords' => ['of', 'the', 'and']
            ]
        ]
    ]);

If you need to ban a really specific list of words you can set `bannedWords` option
as a callable method that should return true or false to tell if a words should be
indexed or not. For example:

.. code:: php

    $this->addBehavior('Search.Searchable', [
        'engine' => [
            'className' => 'Search\Engine\Generic\GenericEngine',
            'config' => [
                'bannedWords' => function ($word) {
                    return strlen($word) > 3;
                }
            ]
        ]
    ]);

- Returning TRUE indicates that the word is safe for indexing (not banned).
- Returning FALSE indicates that the word should NOT be indexed (banned).

In the example, above any word of 4 or more characters will be indexed (e.g. "home",
"name", "quickapps", etc). Any word of 3 or less characters will be banned (e.g.
"and", "or", "the").


Searching Entities
------------------

When using this engine, every entity under your table gets a list of indexed words.
The idea behind this is that you can use this list of words to locate any entity
based on a customized search-criteria. A search-criteria looks as follow:

::

    "this phrase" OR -"not this one" AND this

Use wildcard searches to broaden results; asterisk (``*``) matches any one or more
characters, exclamation mark (``!``) matches any single character:

::

    "thisrase" OR wor* AND thi!

Anything containing space (" ") characters must be wrapper between quotation marks:

::

    "this phrase" my_operator:100..500 -word -"more words" -word_1 word_2

The search criteria above will be treated as it were composed by the following
parts:

::

    [
        this phrase,
        my_operator:100..500,
        -word,
        -more words,
        -word_1,
        word_2,
    ]

Search criteria allows you to perform complex search conditions in a human-readable
way. Allows you, for example, create user-friendly search-forms, or create some RSS
feed just by creating a friendly URL using a search-criteria. e.g.:
``http://example.com/rss/category:music created:2014``

You must use the Searchable Behavior's ``search()`` method to scope any query using
a search-criteria. For example, in some controller using ``Articles`` model:

.. code:: php

    $criteria = '"this phrase" OR -"not this one" AND this';
    $query = $this->Articles->find();
    $query = $this->Articles->search($criteria, $query);

The above will alter the given ``$query`` object according to the given criteria.
The second argument (query object) is optional, if not provided this Searchable
Behavior automatically generates a find-query for you. Previous example and the one
below are equivalent:

.. code:: php

    $criteria = '"this phrase" OR -"not this one" AND this';
    $query = $this->Articles->search($criteria);


Fulltext Search
---------------

Generic engine uses by default ``LIKE`` SQL-statements when searching trough index,
this should be enough for small sized web sites. However, for large websites
``fulltext`` index is recommended in order to improve search speed, you can enable
fulltext search by simply creating a ``fulltext index`` for the ``words`` column of
the ``search_datasets`` table.

NOTE: This feature is currently supported for MySQL databases only.