kuria / dom
Wrappers around the PHP DOM classes
Installs: 11 730
Dependents: 0
Suggesters: 0
Security: 0
Stars: 6
Watchers: 2
Forks: 0
Open Issues: 0
Requires
- php: >=7.1
- kuria/simple-html-parser: ^2.0
Requires (Dev)
- kuria/dev-meta: ^0.3.0
README
Wrappers around the PHP DOM classes that handle the common DOM extension pitfalls.
Contents
- Features
- Requirements
- Container methods
- Usage examples
Features
- HTML documents
- encoding sniffing
- optional tidy support (automatically fix broken HTML)
- HTML fragments
- XML documents
- XML fragments
- XPath queries
- creating documents from scratch
- optional error suppression
- helper methods for common tasks, such as:
- querying multiple or a single node
- checking for containment
- removing a node
- removing all nodes from a list
- prepending a child node
- inserting a node after another node
- fetching
<head>
and<body>
elements (HTML) - fetching root elements (XML)
Requirements
- PHP 7.1+
Container methods
These methods are shared by both HTML and XML containers.
Loading documents
<?php use Kuria\Dom\HtmlDocument; // or XmlDocument, HtmlFragment, etc. // using loadString() $dom = new HtmlDocument(); $dom->setLibxmlFlags($customLibxmlFlags); // optional $dom->setIgnoreErrors($ignoreErrors); // optional $dom->loadString($html); // using static loadString() shortcut $dom = HtmlDocument::fromString($html); // using existing document instance $dom = new HtmlDocument(); $dom->loadDocument($document); // using static loadDocument() shortcut $dom = HtmlDocument::fromDocument($document); // creating an empty document $dom = new HtmlDocument(); $dom->loadEmpty();
Getting or changing document encoding
<?php // get encoding $encoding = $dom->getEncoding(); // set encoding $dom->setEncoding($newEncoding);
Note
The DOM extension uses UTF-8 encoding.
This means that text nodes, attributes, etc.:
- will be encoded using UTF-8 when read (e.g.
$elem->textContent
) - should be encoded using UTF-8 when written (e.g.
$elem->setAttribute()
)
The encoding configured by setEncoding()
is used when saving the document,
see Saving documents.
Saving documents
<?php // entire document $content = $dom->save(); // single element $content = $dom->save($elem); // children of a single element $content = $dom->save($elem, true);
Getting DOM instances
After a document has been loaded, the DOM instances are available via getters:
<?php $document = $dom->getDocument(); $xpath = $dom->getXpath();
Running XPath queries
<?php // get a DOMNodeList $divs = $dom->query('//div'); // get a single DOMNode (or null) $div = $dom->query('//div'); // check if a query matches $divExists = $dom->exists('//div');
Escaping strings
<?php $escapedString = $dom->escape($string);
DOM manipulation and traversal helpers
Helpers for commonly needed tasks that aren't easily achieved via existing DOM methods:
<?php // check if the document contains a node $hasNode = $dom->contains($node); // check if a node contains another node $hasNode = $dom->contains($node, $parentNode); // remove a node $dom->remove($node); // remove a list of nodes $dom->removeAll($nodes); // prepend a child node $dom->prependChild($newNode, $existingNode); // insert a node after another node $dom->insertAfter($newNode, $existingNode);
Usage examples
HTML documents
Loading an existing document
<?php use Kuria\Dom\HtmlDocument; $html = <<<HTML <!doctype html> <html> <head> <meta charset="UTF-8"> <title>Example document</title> </head> <body> <h1>Hello world!</h1> </body> </html> HTML; $dom = HtmlDocument::fromString($html); var_dump($dom->queryOne('//title')->textContent); var_dump($dom->queryOne('//h1')->textContent);
Output:
string(16) "Example document" string(12) "Hello world!"
Optionally, the markup can be fixed by Tidy prior to being loaded.
<?php $dom = new HtmlDocument(); $dom->setTidyEnabled(true); $dom->loadString($html);
Note
HTML documents ignore errors by default, so there is no need to call
$dom->setIgnoreErrors(true)
.
Creating an new document
<?php use Kuria\Dom\HtmlDocument; // initialize empty document $dom = new HtmlDocument(); $dom->loadEmpty(['formatOutput' => true]); // add <title> $title = $dom->getDocument()->createElement('title'); $title->textContent = 'Lorem ipsum'; $dom->getHead()->appendChild($title); // save echo $dom->save();
Output:
<!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Lorem ipsum</title> </head> <body> </body> </html>
HTML fragments
Loading an existing fragment
<?php use Kuria\Dom\HtmlFragment; $dom = HtmlFragment::fromString('<div id="test"><span>Hello</span></div>'); $element = $dom->queryOne('/div[@id="test"]/span'); if ($element) { var_dump($element->textContent); }
Output:
string(5) "Hello"
Note
HTML fragments ignore errors by default, so there is no need to call
$dom->setIgnoreErrors(true)
.
Creating a new fragment
<?php use Kuria\Dom\HtmlFragment; // initialize empty fragment $dom = new HtmlFragment(); $dom->loadEmpty(['formatOutput' => true]); // add <a> $link = $dom->getDocument()->createElement('a'); $link->setAttribute('href', 'http://example.com/'); $link->textContent = 'example'; $dom->getBody()->appendChild($link); // save echo $dom->save();
Output:
<a href="http://example.com/">example</a>
XML documents
Loading an existing document
<?php use Kuria\Dom\XmlDocument; $xml = <<<XML <?xml version="1.0" encoding="utf-8"?> <library> <book name="Don Quixote" author="Miguel de Cervantes" /> <book name="Hamlet" author="William Shakespeare" /> <book name="Alice's Adventures in Wonderland" author="Lewis Carroll" /> </library> XML; $dom = XmlDocument::fromString($xml); foreach ($dom->query('/library/book') as $book) { /** @var \DOMElement $book */ var_dump("{$book->getAttribute('name')} by {$book->getAttribute('author')}"); }
Output:
string(34) "Don Quixote by Miguel de Cervantes" string(29) "Hamlet by William Shakespeare" string(49) "Alice's Adventures in Wonderland by Lewis Carroll"
Creating a new document
<?php use Kuria\Dom\XmlDocument; // initialize empty document $dom = new XmlDocument(); $dom->loadEmpty(['formatOutput' => true]); // add <users> $document = $dom->getDocument(); $document->appendChild($document->createElement('users')); // add some users $bob = $document->createElement('user'); $bob->setAttribute('username', 'bob'); $bob->setAttribute('access-token', '123456'); $john = $document->createElement('user'); $john->setAttribute('username', 'john'); $john->setAttribute('access-token', 'foobar'); $dom->getRoot()->appendChild($bob); $dom->getRoot()->appendChild($john); // save echo $dom->save();
Output:
<?xml version="1.0" encoding="UTF-8"?> <users> <user username="bob" access-token="123456"/> <user username="john" access-token="foobar"/> </users>
Handling XML namespaces in XPath queries
<?php use Kuria\Dom\XmlDocument; $xml = <<<XML <?xml version="1.0" encoding="UTF-8"?> <lib:root xmlns:lib="http://example.com/"> <lib:book name="Don Quixote" author="Miguel de Cervantes" /> <lib:book name="Hamlet" author="William Shakespeare" /> <lib:book name="Alice's Adventures in Wonderland" author="Lewis Carroll" /> </lib:root> XML; $dom = XmlDocument::fromString($xml); // register namespace in XPath $dom->getXpath()->registerNamespace('lib', 'http://example.com/'); // query using the prefix foreach ($dom->query('//lib:book') as $book) { /** @var \DOMElement $book */ var_dump($book->getAttribute('name')); }
Output:
string(11) "Don Quixote" string(6) "Hamlet" string(32) "Alice's Adventures in Wonderland"
XML fragments
Loading an existing fragment
<?php use Kuria\Dom\XmlFragment; $dom = XmlFragment::fromString('<fruits><fruit name="Apple" /><fruit name="Banana" /></fruits>'); foreach ($dom->query('/fruits/fruit') as $fruit) { /** @var \DOMElement $fruit */ var_dump($fruit->getAttribute('name')); }
Output:
string(5) "Apple" string(6) "Banana"
Creating a new fragment
<?php use Kuria\Dom\XmlFragment; // initialize empty fragment $dom = new XmlFragment(); $dom->loadEmpty(['formatOutput' => true]); // add a new element $person = $dom->getDocument()->createElement('person'); $person->setAttribute('name', 'John Smith'); $dom->getRoot()->appendChild($person); // save echo $dom->save();
Output:
<person name="John Smith"/>