hyvor / phrosemirror
Prosemirror in PHP
Installs: 5 020
Dependents: 0
Suggesters: 0
Security: 0
Stars: 5
Watchers: 1
Forks: 0
Open Issues: 3
Requires
- php: ^8.1
- ext-dom: *
- ext-libxml: *
- myclabs/deep-copy: ^1.11
Requires (Dev)
- pestphp/pest: ^2.0
- phpstan/phpstan: ^1.8
This package is auto-updated.
Last update: 2024-12-28 17:57:51 UTC
README
Phrosemirror is a PHP library to work with Prosemirror (or TipTap) JSON content in an easy and type-safe way.
Here is what this library can do:
- Convert Prosemirror JSON into a Document with typed Nodes, Marks, and Attributes
- Analyze and change Documents
- Convert a Document to HTML
- Convert a Document to Text
- Parse HTML to a Document
content
andgroup
for more strict schema conformity
Installation
composer require hyvor/phrosemirror
1. Schema
This library is unopinionated, which means there is no default schema. To start, you have to start with defining your schema that is similar to your front-end Prosemirror configurations.
You can find an example schema in the
/example
directory in this repo, which is similar toprosemirror-schema-basic
package's schema.
use Hyvor\Phrosemirror\Types\Schema; $schema = new Schema( [ new Doc, new Text, new Paragraph, new Blockquote, new Image ], [ new Strong, new Italic, ] );
In the Schema
constructor, first argument is an array of Nodes Types and the second one is an array of Marks Types.
Node Types
A basic node type looks like this:
use Hyvor\Phrosemirror\Types\NodeType; class Doc extends NodeType { public string $name = 'doc'; public ?string $content = 'block+'; }
They can contain content
and group
properties. If content
is not set, no content is allowed in this node. See Content & Grouping below for more information on how these properties work.
Here is another example of a Node Type:
class Paragraph extends NodeType { public string $name = 'paragraph'; public ?string $content = 'inline*'; public string $group = 'block'; }
Mark Types
A basic mark type looks like this:
use Hyvor\Phrosemirror\Types\MarkType; class Strong extends MarkType { public string $name = 'strong'; }
Attributes (Attrs)
One main goal of this library is to achieve type-safety. Therefore, attributes are defined in a typed class.
use Hyvor\Phrosemirror\Types\AttrsType; class ImageAttrs extends AttrsType { public string $src; public ?string $alt; }
By defining explicit types, we are sure that
src
attribute of the Image is always a string.alt
can be a string or null.
You can also define default values for attributes, which will be used if they are not present in the JSON document.
class ImageAttrs extends AttrsType { public string $src = 'https://hyvor.com/placeholder.png'; }
Then, in the Node Type or Mark Type, you have to mention the Attrs class.
use Hyvor\Phrosemirror\Types\NodeType; class Image extends NodeType { // ... public string $attrs = ImageAttrs::class; }
2. Document
Once the Schema is ready, we can start working with Documents.
use Hyvor\Phrosemirror\Types\Schema; use Hyvor\Phrosemirror\Document\Document; $schema = new Schema($nodes, $marks); $json = '{}'; // <- this is the JSON from the front-end $document = Document::fromJson($schema, $json);
$json
can be a JSON string, a PHP array, or a PHP object. If the given JSON is valid, $document
will be an instance of Hyvor\Phrosemirror\Document\Document
. If not, an error will be thrown. See Error Handling below.
Node
A Document
is just a Node
with the doc
type. These are the properties of a Node.
namespace Hyvor\Phrosemirror\Document; use Hyvor\Phrosemirror\Types\NodeType; use Hyvor\Phrosemirror\Types\AttrsType; class Node { public NodeType $type; public AttrsType $attrs; public Fragment $content; public Mark[] $marks; }
NodeType $type
is the type of the node, which you defined in the schema
AttrsType $attrs
is the attributes of the Node. This will be an object of the class you defined in Node Type attrs. For example, as in the above example of Node Types, if the node is Image
, $attrs
will be an object of ImageAttrs
.
Fragment $content
is a collection of children Nodes.
Mark[] $marks
is an array of Marks assigned to this node.
TextNode
is a specialNode
that represents thetext
node type in Prosemirror. It has thestring $text
property in addition to the above properties. Also,$marks
only makes sense in the context ofTextNode
.
Checking Node Type
Use isOfType()
to check if a Node
is of a particular NodeType
defined in your schema.
$json = ['type' => 'paragraph']; $node = Node::fromJson($schema, $json); $node->isOfType(Paragraph::class); // true $node->isOfType(Image::class); // false $node->isOfType([Paragraph::class, Image::class]); // true
Accessing Attributes
Use the attr()
method to access an attribute of the Node.
$json = ['type' => 'image', 'attrs' => ['src' => 'image.png']]; $image = Node::fromJson($schema, $json); // html-escaped (safe to use in HTML output) $src = $image->attr('src'); // not html-escaped $src = $image->attr('src', escape: false);
Traversing Through Nested Nodes
You can traverse through nested nodes using the traverse()
method with a callback. Here is an example that traverse through all nodes and finds all image nodes.
$document = Document::fromJson($schema, $json); $images = []; $document->traverse(function(Node $node) use(&$images) { if ($node->isOfType(Image::class)) { $images[] = $node; } })
traverse()
traverses throughTextNode
s too!
Traversing Through Direct Children
Use foreach
with $node->content
.
foreach ($node->content as $child) { if ($child->isOfType(Image::class)) { echo "I found an image!"; } }
Finding Nodes
Earlier, we used traverse()
to find nodes, but there is the getNodes()
method to make it easier. It searches through the all nested nodes and returns Node[]
of matched nodes.
// images $node->getNodes(Image::class); // all nodes (including TextNodes) $node->getNodes(); // nodes of multiple types $node->getNodes([Paragraph::class, Blockquote::class]); // images (only direct children) $node->getNodes(Image::class, false);
Finding Marks
Similar to getNodes()
you can use getMarks()
to find marks within the current node. It searches all nested nodes and returns Mark[]
of matched marks.
// links $node->getMarks(Link::class); // all marks $node->getMarks(); // multiple types $node->getMarks([Strong::class, Italic::class]); // without nesting (marks of the current node only) $node->getMarks(Link::class, false);
JSON Serialize
You can serialize a Node/Document back to JSON.
$node->toJson(); // JSON string $node->toArray(); // PHP array
Mark
namespace Hyvor\Phrosemirror\Document; use Hyvor\Phrosemirror\Types\MarkType; use Hyvor\Phrosemirror\Types\AttrsType; class Mark { public MarkType $type; public AttrsType $attrs; }
$type
and $attrs
are analogous to those of Node's.
Mark
has isOfType()
, attr()
, toArray()
, and toJson()
, which works similar to Node
's methods.
$mark = Mark::fromJson(['type' => 'link', 'attrs' => ['src' => 'https://hyvor.com']); $mark->isOfType(Strong::class); // false $mark->attr('src'); // https://hyvor.com
Fragment
$node->content
is a Fragment
. It contains an array of children nodes. You can think of it just as an array, but with helper methods that makes things easier.
$fragment = $node->content(); // READ $fragment->first(); // Node | null $fragment->last(); // Node | null $fragment->nth(2); // Node | null $fragment->count(); // int // get all Nodes in the Fragment as an array $fragment->all(); // Node[] // loop through each node $fragment->each(fn (Node $node) => false); // WRITE (Be careful, these methods changes the document) $fragment->addNodeToStart($node); $fragment->addNodeToEnd($node); $fragment->addNode($node); // same as addNodeToEnd $fragment->setNodes($nodes); $fragment->map(fn (Node $node) => $node); // update nodes in a callback
3. HTML
Next, let's convert your document to HTML. To do this, you have to define the toHtml()
method in Node Types and Mark Types.
use Hyvor\Phrosemirror\Document\Node; use Hyvor\Phrosemirror\Types\NodeType; class Paragraph extends NodeType { public function toHtml(Node $node, string $children) : string { return "<p>$children</p>"; } }
toHtml()
should return the HTML string of the node, placing the $children
string in it.
Here is another example using the attributes of that Node.
use Hyvor\Phrosemirror\Document\Node; use Hyvor\Phrosemirror\Types\NodeType; class Image extends NodeType { public function toHtml(Node $node, string $children) : string { $src = $node->attr('src'); return "<img src=\"$src\">$children</p>"; } }
Do not directly use
$node->attrs->src
as the raw attributes are not HTML-escaped. Always use$node->attr()
or$node->attrs->get()
HTML: Document -> HTML
Use the toHtml()
method to serialize a document (or any node) to HTML.
$document = Document::fromJson($schema, $json); $html = $document->toHtml();
Parsing HTML
The HtmlParser
class is responsible for parsing HTML to a Document. It takes the Schema and some parsing rules to parse the HTML.
<?php use Hyvor\Phrosemirror\Converters\HtmlParser\HtmlParser;use Hyvor\Phrosemirror\Converters\HtmlParser\ParserRule; $schema = new Schema($nodes, $marks); // this is the same schema you create for the document $parser = new HtmlParser($schema, [ new ParserRule(tag: 'p', node: 'paragraph'), new ParserRule(tag: '#text', node: 'text'), // ... other rules ]) $doc = $parser->parse($html);
However, in most cases, you only need one rule set to parse from multiple HTML inputs. Therefore, you can directly define rules in the Schema (in the fromHtml()
method of Nodes and Marks).
use Hyvor\Phrosemirror\Types\NodeType; use Hyvor\Phrosemirror\Converters\HtmlParser\ParserRule; use Hyvor\Phrosemirror\Document\Node; class Paragraph extends NodeType { public string $name = 'paragraph'; public ?string $content = 'inline*'; public string $group = 'block'; public function toHtml(Node $node, string $children): string { return "<p>$children</p>"; } public function fromHtml(): array { return [ new ParserRule(tag: 'p'), ]; } }
The fromHtml()
method should return ParserRule[]
. Here, the node
property is not required as it is the same as the Node Type's name.
Then, use the fromSchema()
method to create the parser.
$parser = HtmlParser::fromSchema($schema); $doc = $parser->parse($html);
Parsing HTML Attributes to Node Attributes
Use the getAttrs()
method to parse attributes from the HTML element.
use DOMElement; class Image extends NodeType { public string $name = 'image'; public string $attrs = ImageAttrs::class; public function fromHtml() : array { return [ new ParserRule( tag: 'img', getAttrs: fn (DOMElement $element) => ImageAttrs::fromArray([ 'src' => $element->getAttribute('src'), 'alt' => $element->getAttribute('alt'), ]) ) ]; } }
The getAttrs()
callback should return one of the following:
false
to ignore the elementnull
if the attributes are not foundAttrsType
if the attributes are found
Content & Grouping
Defining content
and group
properties in Node Types is important for parsing HTML.
For example, let's say we have a blockquote
node with content
set to block+
. This means that the blockquote
node can only contain block nodes. Therefore, the following HTML does not conform to the schema.
<blockquote>Hello World</blockquote>
This is when content
and group
properties come in handy. Because, we know that the blockquote
node can only contain block nodes such as paragraphs, the HTML parser will automatically wrap the text in a paragraph when parsing. The resulting HTML would be:
<blockquote><p>Hello World</p></blockquote>
This logic is handled by Sanitizer
class. Simply, it does the following to ensure content
and group
conformity:
- Tries to wrap nodes
- Tries to promote children
- Tries to connect inline nodes
- If all fails, it will remove the node
content
expressions support everything the Prosemirror front-end library supports. Here are some examples:
paragraph
paragraph|heading
paragraph?
paragraph*
paragraph+
paragraph{1,3}
paragraph{1,}
paragraph heading
(subsequent nodes)paragraph (heading | code_block)+
block+
(using groups)
Note: This sanitization process is only run when parsing a document from HTML. It is not run when parsing a document from JSON, because we expect the JSON (usually from your front-end) to be valid. However, you can still run the sanitization process as follows if needed:
$doc = Document::fromJson($schema, $json); $sanitizedDoc = Sanitizer::sanitize($schema, $doc);
Disabling Sanitization
You can disable content
sanitization when parsing HTML by setting sanitize: false
.
$parser = HtmlParser::fromSchema($schema); $doc = $parser->parse($html, sanitize: false);
⚠️ Warning: Disabling sanitization can result in invalid documents.
Error Handling
This library is strict, and it expects correct input from the front-end. It can throw the following exceptions:
InvalidJsonException
- on invalid JSONInvalidAttributeTypeException
- on invalid attribute type
Both exceptions extend the PhrosemirrorException
class. Therefore, the best practise would be catching it when building the document.
use Hyvor\Phrosemirror\Exception\PhrosemirrorException; try { $document = Document::fromJson($schema, $json); } catch (PhrosemirrorException $e) { // invalid document }
If the front-end (JS) and back-end (PHP) schema matches, the only way an exception can happen is when the Prosemirror JSON is altered. Therefore, it is a good practise to stop processing here.
Who uses this Library?
- Hyvor Talk
- Hyvor Blogs
- Add yours with a PR