ankane/mitie

Named-entity recognition for PHP

v0.2.0 2024-07-01 00:43 UTC

This package is auto-updated.

Last update: 2025-01-09 09:34:29 UTC


README

MITIE - named-entity recognition, binary relation detection, and text categorization - for PHP

  • Finds people, organizations, and locations in text
  • Detects relationships between entities, like PERSON was born in LOCATION

Build Status

Installation

Run:

composer require ankane/mitie

Add scripts to composer.json to download the shared library:

    "scripts": {
        "post-install-cmd": "Mitie\\Vendor::check",
        "post-update-cmd": "Mitie\\Vendor::check"
    }

Run:

composer install

And download the pre-trained models for your language:

Getting Started

Named Entity Recognition

Load an NER model

$model = new Mitie\NER('ner_model.dat');

Create a document

$doc = $model->doc('Nat works at GitHub in San Francisco');

Get entities

$doc->entities();

This returns

[
    ['text' => 'Nat',           'tag' => 'PERSON',       'score' => 0.3112371212688382, 'offset' => 0],
    ['text' => 'GitHub',        'tag' => 'ORGANIZATION', 'score' => 0.5660115198329334, 'offset' => 13],
    ['text' => 'San Francisco', 'tag' => 'LOCATION',     'score' => 1.3890524313885309, 'offset' => 23]
]

Get tokens

$doc->tokens();

Get tokens and their offset

$doc->tokensWithOffset();

Get all tags for a model

$model->tags();

Training

Load an NER model into a trainer

$trainer = new Mitie\NERTrainer('total_word_feature_extractor.dat');

Create training instances

$tokens = ['You', 'can', 'do', 'machine', 'learning', 'in', 'PHP', '!'];
$instance = new Mitie\NERTrainingInstance($tokens);
$instance->addEntity(3, 4, 'topic');    // machine learning
$instance->addEntity(6, 6, 'language'); // PHP

Add the training instances to the trainer

$trainer->add($instance);

Train the model

$model = $trainer->train();

Save the model

$model->saveToDisk('ner_model.dat');

Binary Relation Detection

Detect relationships betweens two entities, like:

  • PERSON was born in LOCATION
  • ORGANIZATION was founded in LOCATION
  • FILM was directed by PERSON

There are 21 detectors for English. You can find them in the binary_relations directory in the model download.

Load a detector

$detector = new Mitie\BinaryRelationDetector('rel_classifier_organization.organization.place_founded.svm');

And create a document

$doc = $model->doc('Shopify was founded in Ottawa');

Get relations

$detector->relations($doc);

This returns

[['first' => 'Shopify', 'second' => 'Ottawa', 'score' => 0.17649169745814464]]

Training

Load an NER model into a trainer

$trainer = new Mitie\BinaryRelationTrainer($model);

Add positive and negative examples to the trainer

$tokens = ['Shopify', 'was', 'founded', 'in', 'Ottawa'];
$trainer->addPositiveBinaryRelation($tokens, [0, 0], [4, 4]);
$trainer->addNegativeBinaryRelation($tokens, [4, 4], [0, 0]);

Train the detector

$detector = $trainer->train();

Save the detector

$detector->saveToDisk('binary_relation_detector.svm');

Text Categorization

Load a model into a trainer

$trainer = new Mitie\TextCategorizerTrainer('total_word_feature_extractor.dat');

Add labeled text to the trainer

$trainer->add('This is super cool', 'positive');

Train the model

$model = $trainer->train();

Save the model

$model->saveToDisk('text_categorization_model.dat');

Load a saved model

$model = new Mitie\TextCategorizer('text_categorization_model.dat');

Categorize text

$model->categorize('What a super nice day');

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/mitie-php.git
cd mitie-php
composer install
composer test