README

Wrappers around the PHP DOM classes that handle the common DOM extension pitfalls.

https://travis-ci.com/kuria/dom.svg?branch=master

Contents

Features
Requirements
Container methods
Usage examples

Features

HTML documents
- encoding sniffing
- optional tidy support (automatically fix broken HTML)
HTML fragments
XML documents
XML fragments
XPath queries
creating documents from scratch
optional error suppression
helper methods for common tasks, such as:
- querying multiple or a single node
- checking for containment
- removing a node
- removing all nodes from a list
- prepending a child node
- inserting a node after another node
- fetching <head> and <body> elements (HTML)
- fetching root elements (XML)

Requirements

PHP 7.1+

Container methods

These methods are shared by both HTML and XML containers.

Loading documents

<?php

use Kuria\Dom\HtmlDocument; // or XmlDocument, HtmlFragment, etc.

// using loadString()
$dom = new HtmlDocument();
$dom->setLibxmlFlags($customLibxmlFlags); // optional
$dom->setIgnoreErrors($ignoreErrors); // optional
$dom->loadString($html);

// using static loadString() shortcut
$dom = HtmlDocument::fromString($html);

// using existing document instance
$dom = new HtmlDocument();
$dom->loadDocument($document);

// using static loadDocument() shortcut
$dom = HtmlDocument::fromDocument($document);

// creating an empty document
$dom = new HtmlDocument();
$dom->loadEmpty();

Getting or changing document encoding

<?php

// get encoding
$encoding = $dom->getEncoding();

// set encoding
$dom->setEncoding($newEncoding);

Note

The DOM extension uses UTF-8 encoding.

This means that text nodes, attributes, etc.:

will be encoded using UTF-8 when read (e.g. $elem->textContent)
should be encoded using UTF-8 when written (e.g. $elem->setAttribute())

The encoding configured by setEncoding() is used when saving the document, see Saving documents.

Saving documents

<?php

// entire document
$content = $dom->save();

// single element
$content = $dom->save($elem);

// children of a single element
$content = $dom->save($elem, true);

Getting DOM instances

After a document has been loaded, the DOM instances are available via getters:

<?php

$document = $dom->getDocument();
$xpath = $dom->getXpath();

Running XPath queries

<?php

// get a DOMNodeList
$divs = $dom->query('//div');

// get a single DOMNode (or null)
$div = $dom->query('//div');

// check if a query matches
$divExists = $dom->exists('//div');

Escaping strings

<?php

$escapedString = $dom->escape($string);

DOM manipulation and traversal helpers

Helpers for commonly needed tasks that aren't easily achieved via existing DOM methods:

<?php

// check if the document contains a node
$hasNode = $dom->contains($node);

// check if a node contains another node
$hasNode = $dom->contains($node, $parentNode);

// remove a node
$dom->remove($node);

// remove a list of nodes
$dom->removeAll($nodes);

// prepend a child node
$dom->prependChild($newNode, $existingNode);

// insert a node after another node
$dom->insertAfter($newNode, $existingNode);

Usage examples

HTML documents

Loading an existing document

<?php

use Kuria\Dom\HtmlDocument;

$html = <<<HTML
<!doctype html>
<html>
    <head>
        <meta charset="UTF-8">
        <title>Example document</title>
    </head>
    <body>
        <h1>Hello world!</h1>
    </body>
</html>
HTML;

$dom = HtmlDocument::fromString($html);

var_dump($dom->queryOne('//title')->textContent);
var_dump($dom->queryOne('//h1')->textContent);

Output:

string(16) "Example document"
string(12) "Hello world!"

Optionally, the markup can be fixed by Tidy prior to being loaded.

<?php

$dom = new HtmlDocument();
$dom->setTidyEnabled(true);
$dom->loadString($html);

Note

HTML documents ignore errors by default, so there is no need to call $dom->setIgnoreErrors(true).

Creating an new document

<?php

use Kuria\Dom\HtmlDocument;

// initialize empty document
$dom = new HtmlDocument();
$dom->loadEmpty(['formatOutput' => true]);

// add <title>
$title = $dom->getDocument()->createElement('title');
$title->textContent = 'Lorem ipsum';

$dom->getHead()->appendChild($title);

// save
echo $dom->save();

Output:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Lorem ipsum</title>
</head>
<body>
    </body>
</html>

HTML fragments

Loading an existing fragment

<?php

use Kuria\Dom\HtmlFragment;

$dom = HtmlFragment::fromString('<div id="test"><span>Hello</span></div>');

$element = $dom->queryOne('/div[@id="test"]/span');

if ($element) {
    var_dump($element->textContent);
}

Output:

string(5) "Hello"

Note

HTML fragments ignore errors by default, so there is no need to call $dom->setIgnoreErrors(true).

Creating a new fragment

<?php

use Kuria\Dom\HtmlFragment;

// initialize empty fragment
$dom = new HtmlFragment();
$dom->loadEmpty(['formatOutput' => true]);

// add <a>
$link = $dom->getDocument()->createElement('a');
$link->setAttribute('href', 'http://example.com/');
$link->textContent = 'example';

$dom->getBody()->appendChild($link);

// save
echo $dom->save();

Output:

<a href="http://example.com/">example</a>

XML documents

Loading an existing document

<?php

use Kuria\Dom\XmlDocument;

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<library>
    <book name="Don Quixote" author="Miguel de Cervantes" />
    <book name="Hamlet" author="William Shakespeare" />
    <book name="Alice's Adventures in Wonderland" author="Lewis Carroll" />
</library>
XML;

$dom = XmlDocument::fromString($xml);

foreach ($dom->query('/library/book') as $book) {
   /** @var \DOMElement $book */
   var_dump("{$book->getAttribute('name')} by {$book->getAttribute('author')}");
}

Output:

string(34) "Don Quixote by Miguel de Cervantes"
string(29) "Hamlet by William Shakespeare"
string(49) "Alice's Adventures in Wonderland by Lewis Carroll"

Creating a new document

<?php

use Kuria\Dom\XmlDocument;

// initialize empty document
$dom = new XmlDocument();
$dom->loadEmpty(['formatOutput' => true]);

// add <users>
$document = $dom->getDocument();
$document->appendChild($document->createElement('users'));

// add some users
$bob = $document->createElement('user');
$bob->setAttribute('username', 'bob');
$bob->setAttribute('access-token', '123456');

$john = $document->createElement('user');
$john->setAttribute('username', 'john');
$john->setAttribute('access-token', 'foobar');

$dom->getRoot()->appendChild($bob);
$dom->getRoot()->appendChild($john);

// save
echo $dom->save();

Output:

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user username="bob" access-token="123456"/>
  <user username="john" access-token="foobar"/>
</users>

Handling XML namespaces in XPath queries

<?php

use Kuria\Dom\XmlDocument;

$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<lib:root xmlns:lib="http://example.com/">
    <lib:book name="Don Quixote" author="Miguel de Cervantes" />
    <lib:book name="Hamlet" author="William Shakespeare" />
    <lib:book name="Alice's Adventures in Wonderland" author="Lewis Carroll" />
</lib:root>
XML;

$dom = XmlDocument::fromString($xml);

// register namespace in XPath
$dom->getXpath()->registerNamespace('lib', 'http://example.com/');

// query using the prefix
foreach ($dom->query('//lib:book') as $book) {
    /** @var \DOMElement $book */
    var_dump($book->getAttribute('name'));
}

Output:

string(11) "Don Quixote"
string(6) "Hamlet"
string(32) "Alice's Adventures in Wonderland"

XML fragments

Loading an existing fragment

<?php

use Kuria\Dom\XmlFragment;

$dom = XmlFragment::fromString('<fruits><fruit name="Apple" /><fruit name="Banana" /></fruits>');

foreach ($dom->query('/fruits/fruit') as $fruit) {
    /** @var \DOMElement $fruit */
    var_dump($fruit->getAttribute('name'));
}

Output:

string(5) "Apple"
string(6) "Banana"

Creating a new fragment

<?php

use Kuria\Dom\XmlFragment;

// initialize empty fragment
$dom = new XmlFragment();
$dom->loadEmpty(['formatOutput' => true]);

// add a new element
$person = $dom->getDocument()->createElement('person');
$person->setAttribute('name', 'John Smith');

$dom->getRoot()->appendChild($person);

// save
echo $dom->save();

Output:

<person name="John Smith"/>

kuria / dom

Maintainers

Details

README

Features

Requirements

Container methods

Loading documents

Getting or changing document encoding

Saving documents

Getting DOM instances

Running XPath queries

Escaping strings

DOM manipulation and traversal helpers

Usage examples

HTML documents

Loading an existing document

Creating an new document

HTML fragments

Loading an existing fragment

Creating a new fragment

XML documents

Loading an existing document

Creating a new document

Handling XML namespaces in XPath queries

XML fragments

Loading an existing fragment

Creating a new fragment