wikimedia / message
This library provides interfaces and value objects for internationalization (i18n) of applications in PHP, using 'en' as the default language code.
Requires
- php: ^7.4 || ^8.0
- ext-json: *
- wikimedia/json-codec: ^3.0
- wikimedia/testing-access-wrapper: ^3.0
Requires (Dev)
- mediawiki/mediawiki-codesniffer: 46.0.0
- mediawiki/mediawiki-phan-config: 0.15.1
- mediawiki/minus-x: 1.1.3
- ockcyp/covers-validator: 1.6.0
- php-parallel-lint/php-console-highlighter: 1.0.0
- php-parallel-lint/php-parallel-lint: 1.4.0
- phpunit/phpunit: 9.6.21
- wikimedia/update-history: ^1.0
This package is auto-updated.
Last update: 2025-02-12 18:16:42 UTC
README
This library provides interfaces and value objects for internationalization (i18n) of applications in PHP.
It is based on the i18n code used in MediaWiki, and is also intended to be compatible with jQuery.i18n, a JavaScript i18n library.
Installation
Use Composer to install the Message library:
composer require wikimedia/message
Concepts
Any text string that is needed in an application is a message. This might be something like a button label, a sentence, or a longer text. Each message is assigned a message key, which is used as the identifier in code.
Each message is translated into various languages, each represented by a language code. The message's text (as translated into each language) can contain placeholders, which represents a place in the message where a parameter is to be inserted, and formatting commands. It might be plain text other than these placeholders and formatting commands, or it might be in a markup language such as wikitext or Markdown.
A formatter is used to convert the message key and parameters (that is, a message specifier) into a text representation in a particular language and output format.
The library itself imposes few restrictions on all of these concepts; this document contains recommendations to help various implementations operate in compatible ways.
Usage
use Wikimedia\Message\MessageValue; use Wikimedia\Message\MessageParam; use Wikimedia\Message\ParamType; // Constructor interface $message = new MessageValue( 'message-key', [ 'parameter', new MessageValue( 'another-message' ), new MessageParam( ParamType::NUM, 12345 ), ] ); // Fluent interface $message = MessageValue::new( 'message-key' ) ->params( 'parameter', new MessageValue( 'another-message' ) ) ->numParams( 12345 ); // Formatting $messageFormatter = $serviceContainer->get( 'MessageFormatterFactory' )->getTextFormatter( 'de' ); $output = $messageFormatter->format( $message );
Class Overview
Messages
Messages, and their parameters are represented by newable value objects.
MessageValue represents an instance of a message, holding the key and any parameters. It is mutable in that parameters can be added to the object after creation.
MessageSpecifier is an interface implemented by MessageValue (and, outside of the Wikimedia\Message namespace, also MediaWiki\Message\Message), which only provides getter methods for the key and parameters, and no way to mutate the object. It should be used in methods that output or inspect messages, but aren't supposed to modify them.
MessageParam is an abstract value class representing a parameter to a message. It has a type (using constants defined in the ParamType class) and a value. It has two implementations:
- ScalarParam represents a single-valued parameter, such as a text string, a number, or another message.
- ListParam represents a list of values, which will be joined together with appropriate separators. It has a “list type” (using constants defined in the ListType class) defining the desired separators.
Machine-readable messages
DataMessageValue represents a message with additional machine-readable data. In addition to the key and message parameters, it holds a "code" and structured data that would be a useful representation of the message in an API response or the like.
For example, a message for an "integer out of range" error might have one of
three different keys depending on whether the range has a minimum, maximum, or
both. But all should have the same code (representing the concept of "integer
out of range") and should likely have structured data representing the range
directly as [ 'min' => 1, 'max' => 10 ]
rather than as a flat array of
MessageParam objects.
Formatters
A formatter for a particular language is obtained from an implementation of IMessageFormatterFactory. No implementation of this interface is provided by this library. If an environment needs its formatters to vary behavior on things other than the language code, for example selecting among multiple sources of messages or markup language used for processing message texts, it should define a MessageFormatterFactoryFactory of some sort to provide appropriate IMessageFormatterFactory implementations.
There is no one base interface for all formatters; the intent is that type hinting will ensure that the formatter being used will produce output in the expected output format. The defined output formats are:
- ITextFormatter produces plain text output.
No implementation of these interfaces are provided by this library.
Formatter implementations are expected to perform the following procedure to generate the output string:
- Fetch the message's translation in the formatter's language. Details of this
fetching are unspecified here.
- If no translation is found in the formatter's language, it should attempt to fall back to appropriate other languages. Details of the fallback are unspecified here.
- If no translation can be found in any fallback language, a string should be returned that indicates at minimum the message key that was unable to be found.
- Replace placeholders with parameter values.
- Note that placeholders must not be replaced recursively. That is, if a parameter's value contains text that looks like a placeholder, it must not be replaced as if it really were a placeholder.
- Certain types of parameters are not substituted directly at this stage.
Instead their placeholders must be replaced with an opaque representation
that will not be misinterpreted during later stages.
- Parameters of type RAW or PLAINTEXT
- TEXT parameters with a MessageValue as the value
- LIST parameters with any late-substituted value as one of their values.
- Process any formatting commands.
- Process the source markup language to produce a string in the desired output format. This may be a no-op, and may be combined with the previous step if the markup language implements compatible formatting commands.
- Replace any opaque representations from step 2 with the actual values of the corresponding parameters.
Guidelines for Interoperability
Besides allowing for libraries to safely supply their own translations for every app using them, and apps to easily use libraries' translations instead of having to retranslate everything, following these guidelines will also help open source projects use translatewiki.net for crowdsourced volunteer translation into many languages.
Language codes
BCP 47 language tags should be used for language codes. If a supplied language tag is not recognized, at minimum the corresponding tag with all optional subtags stripped should be tried as a fallback.
All messages must have a translation in English (code "en"). All languages should fall back to English as a last resort.
The English translations should use {{PLURAL:...}}
and {{GENDER:...}}
even
when English doesn't make a grammatical distinction, to signal to translators
that plural/gender support is available.
Language code "qqq" is reserved for documenting messages. Documentation should describe the context in which the message is used and the values of all parameters used with the message. Generally this is written in English. Attempting to obtain a message formatter for "qqq" should return one for "en" instead.
Language code "qqx" is reserved for debugging. Rather than retrieving
translations from some underlying storage, every key should act as if it were
translated as something (key-name: $1, $2, $3)
with the number of
placeholders depending on how many parameters are included in the
MessageValue.
Message keys
Message keys intended for use with external implementations should follow certain guidelines for interoperability:
- Keys should be restricted to the regular expression
/^[a-z][a-z0-9-]*$/
. That is, it should consist of lowercase ASCII letters, numbers, and hyphen only, and should begin with a letter. - Keys should be prefixed to help avoid collisions. For example, a library named "ApplePicker" should prefix its message keys with "applepicker-".
- Common values needing translation, such as names of months and weekdays, should not be prefixed by each library. Libraries needing these should use keys from the Common Locale Data Repository and document this requirement, and environments should provide these messages.
Message format
Placeholders are represented by $1
, $2
, $3
, and so on. Text like $100
is interpreted as a placeholder for parameter 100 if 100 or more parameters
were supplied, as a placeholder for parameter 10 followed by text "0" if
between ten and 99 parameters were supplied, and as a placeholder for parameter
1 followed by text "00" if between one and nine parameters were supplied.
All formatting commands look like {{NAME:$value1|$value2|$value3|...}}
. Braces
are to be balanced, e.g. {{NAME:foo|{{bar|baz}}}}
has $value1 as "foo" and
$value2 as "{{bar|baz}}". The name is always case-insensitive.
Anything syntactically resembling a placeholder or formatting command that does not correspond to an actual parameter or known command should be left unchanged for processing by the markup language processor.
Libraries providing messages for use by externally-defined formatters should generally assume no markup language will be applied, and should avoid constructs used by common markup languages unless they also make sense when read as plain text.
Formatting commands
The following formatting commands should be supported.
PLURAL
{{PLURAL:$count|$formA|$formB|...}}
is used to produce plurals.
$count is a number, which may have been formatted with ParamType::NUM.
The number of forms and which count corresponds to which form depend on the
language, for example English uses {{PLURAL:$1|one|other}}
while Arabic uses
{{PLURAL:$1|zero|one|two|few|many|other}}
. Details are defined in
CLDR.
It is not possible to "skip" positions while still suppling later ones. If too few values are supplied, the final form is repeated for subsequent positions.
If there is an explicit plural form to be given for a specific number, it may
be specified with syntax like {{PLURAL:$1|one egg|$1 eggs|12=a dozen eggs}}
.
GENDER
{{GENDER:$name|$masculine|$feminine|$unspecified}}
is used to handle
grammatical gender, typically when messages refer to user accounts.
This supports three grammatical genders: "male", "female", and a third option for cases where the gender is unspecified, unknown, or neither male nor female. It does not attempt to handle animate-inanimate or T-V distinctions.
$name is a user account name or other similar identifier. If the name given does not correspond to any known user account, it should probably use the $unspecified gender.
If $feminine and/or $unspecified is not specified, the value of $masculine is normally used in its place.
GRAMMAR
{{GRAMMAR:$form|$term}}
converts a term to an appropriate grammatical form.
If no mapping for $term to $form exists, $term should be returned unchanged.
See jQuery.i18n § Grammar for details.
BIDI
{{BIDI:$text}}
applies directional isolation to the wrapped text, to attempt
to avoid errors where directionally-neutral characters are wrongly displayed
when between LTR and RTL content.
This should output U+202A (left-to-right embedding) or U+202B (right-to-left embedding) before the text, depending on the directionality of the first strongly-directional character in $text, and U+202C (pop directional formatting) after, or do something equivalent for the target output format.
Supplying translations
Code intending its messages to be used by externally-defined formatters should supply the translations as described by jQuery.i18n § Message File Format.
In brief, the base directory of the library should contain a directory named "i18n". This directory should contain JSON files named by code such as "en.json", "de.json", "qqq.json", each with contents like:
{ "@metadata": { "authors": [ "Alice", "Bob", "Carol", "David" ], "last-updated": "2012-09-21" }, "appname-title": "Example Application", "appname-sub-title": "An example application", "appname-header-introduction": "Introduction", "appname-about": "About this application", "appname-footer": "Footer text" }
Formatter implementations should be able to consume message data supplied in this format, either directly via registration of i18n directories to check or by providing tooling to incorporate it during a build step.
Machine-readable data
Libraries producing MessageValues as error messages should generally produce
DataMessageValues instead. Codes should be similar to message keys but need
not be prefixed. Data should be restricted to values that will produce valid
output when passed to json_encode()
.
Libraries producing MessageValues in other contexts should consider whether the same applies to those contexts.
Scripts
The following scripts are defined for testing and code fixing purposes:
test
: Run linting and code checks.fix
: Automatically fix code style issues.phan
: Run the Phan static analyzer.phpcs
: Run the PHP Code Sniffer.
Example usage:
composer run-script test
composer run-script fix
composer run-script phan
composer run-script phpcs
License
This project is licensed under the GPL-2.0-or-later. See the LICENSE file for details.
Contributing
Please read the CONTRIBUTING file for details on our code of conduct, and the process for submitting pull requests to us.
Bugs
Report bugs at Phabricator.
Homepage
For more information, visit the homepage.