rajentrivedi / tokenizer-x
TokenizerX calculates required tokens for given prompt
Installs: 56 218
Dependents: 2
Suggesters: 0
Security: 0
Stars: 70
Watchers: 2
Forks: 5
Open Issues: 0
Requires
- php: ^8.1
- illuminate/console: ^9.52.16 || ^10.28.0 || ^11.0
- illuminate/contracts: ^10.0 || ^11.0
- illuminate/support: ^9.52.16 || ^10.28.0 || ^11.0
- spatie/laravel-package-tools: ^1.14.0
- yethee/tiktoken: ^0.5.1
Requires (Dev)
- laravel/pint: ^1.0
- nunomaduro/collision: ^7.0 || ^8.0 || ^9.0 ||^10.0
- nunomaduro/larastan: ^2.9.2
- orchestra/testbench: ^7.33.0 || ^8.13.0 || ^9.0.0
- pestphp/pest: ^2.0
- pestphp/pest-plugin-arch: ^2.0
- pestphp/pest-plugin-laravel: ^2.0
- phpstan/extension-installer: ^1.1
- phpstan/phpstan-deprecation-rules: ^1.0
- phpstan/phpstan-phpunit: ^1.0
README
TokenizerX supports Laravel 11 and Laravel 10.
Installation
composer require rajentrivedi/tokenizer-x
TokenizerX
TokenizerX is a Laravel package designed to streamline tokenization processes in your applications. With the latest update, TokenizerX now supports cutting-edge GPT-4 models, providing advanced natural language processing capabilities.
It calculates the tokens required for a given prompt before requesting the OpenAI REST API. This package helps to ensure that the user does not exceed the OpenAI API token limit and can generate accurate responses.
To access the OpenAI Rest API, you may consider the beautiful Laravel Package OpenAI PHP.
Supported OpenAI Models
- gpt-4o
- gpt-4
- gpt-3.5-turbo
- text-davinci-003
- text-davinci-002
- text-davinci-001
- text-curie-001
- text-babbage-001
- text-ada-001
- davinci
- curie
- babbage
- ada
- code-davinci-002
- code-davinci-001
- code-cushman-002
- code-cushman-001
- davinci-codex
- cushman-codex
- text-davinci-edit-001
- code-davinci-edit-001
- text-embedding-ada-002
- text-similarity-davinci-001
- text-similarity-curie-001
- text-similarity-babbage-001
- text-similarity-ada-001
- text-search-davinci-doc-001
- text-search-curie-doc-001
- text-search-babbage-doc-001
- text-search-ada-doc-001
- code-search-babbage-code-001
- code-search-ada-code-001
Supported Encoding
- r50k_base
- p50k_base
- p50k_edit
- cl100k_base
Installation
You can install the package via composer:
composer require rajentrivedi/tokenizer-x
Usage
By default package will consider GPT-3 model
use Rajentrivedi\TokenizerX\TokenizerX; TokenizerX::count("how are you?");
If you want token counts for specific OpenAI model, you can pass model as a second argument from above given supported model list.
use Rajentrivedi\TokenizerX\TokenizerX; TokenizerX::count("how are you?", "gpt-4");
You can also read the text from file
TokenizerX::count(file_get_contents('path_to_file'));
Please make sure that text of the file don't change while reading the file programmatically, this may happen due to encoding. You can check the generated token IDs by using following
TokenizerX::tokens(file_get_contents('path_to_file'));
This will return an array of tokens generated & compare those token IDs with OpenAI Tokenizer
You can also use the OpenAI Tokenizer to double-check package generated token counts.
Support
If you find TokenizerX helpful and would like to support its ongoing development, you can contribute by buying me a coffee! Your support helps in maintaining and improving the package for the Laravel community.
Testing
composer test
Changelog
Please see CHANGELOG for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
Security Vulnerabilities
Please review our security policy on how to report security vulnerabilities.
⭐ Star the Repository ⭐
If you find this project useful or interesting, I kindly request you to give it a ⭐ star on GitHub. Your support will encourage and motivate me to continue improving and maintaining this project.
By starring the repository, you can show appreciation for the work put into developing this open-source project. It also helps to increase its visibility, making it more accessible to other developers and potentially attracting contributors.
To give a ⭐ star, simply click on the Star button at the top-right corner of the repository page.
Credits
License
TokenizerX is developed using
The MIT License (MIT). Please see License File for more information.