benmorel / xml-streamer
Stream large XML files as DOM nodes with low memory consumption
Fund package maintenance!
BenMorel
Installs: 30 951
Dependents: 0
Suggesters: 0
Security: 0
Stars: 15
Watchers: 4
Forks: 0
Open Issues: 0
Requires
- php: ^7.4 || ^8.0
- ext-dom: *
- ext-xmlreader: *
Requires (Dev)
- ext-simplexml: *
- php-coveralls/php-coveralls: ^2.4
- phpunit/phpunit: ^8.0 || ^9.0
- vimeo/psalm: 4.29.0
README
Stream large XML files as individual DOM elements with low memory consumption.
Installation
This library is installable via Composer:
composer require benmorel/xml-streamer
Requirements
This library requires:
These extensions are enabled by default, and should be available in most PHP environments.
Project status & release process
This library is under development.
The current releases are numbered 0.x.y
. When a non-breaking change is introduced (adding new methods, optimizing
existing code, etc.), y
is incremented.
When a breaking change is introduced, a new 0.x
version cycle is always started.
It is therefore safe to lock your project to a given release cycle, such as 0.5.*
.
If you need to upgrade to a newer release cycle, check the release history
for a list of changes introduced by each further 0.x.0
version.
Quickstart
Let's say you have a product feed containing a list of one million products, in the following format:
<?xml version="1.0" encoding="UTF-8"?> <feed> <products> <product> <id>1</id> <name>foo</name> ... </product> ... <product> <id>1000000</id> <name>bar</name> ... </product> </products> </feed>
To read it product by product, you instantiate an XMLStreamer
with the path to a <product>
element:
use BenMorel\XMLStreamer\XMLStreamer; $streamer = new XMLStreamer('feed', 'products', 'product');
Any element in the document that does not match this path will be ignored.
You can then proceed to streaming the file with a generator, that will yield a DOMElement object for each <product>
:
foreach ($streamer->stream('product-feed.xml') as $product) { /** @var DOMElement $product */ echo $product->getElementsByTagName('name')->item(0)->textContent; // foo, ..., bar }
Querying with SimpleXML
If you prefer to work with SimpleXML, you can use simplexml_import_dom(). SimpleXML requires that you wrap your element in a DOMDocument
before importing it:
foreach ($streamer->stream('product-feed.xml') as $product) { /** @var DOMElement $product */ $document = new \DOMDocument(); $document->appendChild($product); $element = simplexml_import_dom($product); echo $element->name; // foo, ..., bar }
This requires the SimpleXML extension, which is enabled by default.
Return value
After all elements have been processed, the generator returns the number of streamed elements:
$products = $streamer->stream('product-feed.xml'); foreach ($products as $product) { /* ... */ } $productCount = $products->getReturn();
Configuration options
Limiting the number of elements
If you need to get just a preview of the XML file, you can set the maximum number of elements to stream:
$streamer->setMaxElements(10);
With this configuration, XMLStreamer
would yield at most 10 elements, and ignore further entries.
Configuring the encoding
The encoding of the source file is automatically read from the XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
If your XML file is missing the encoding
, you can specify it manually:
$streamer->setEncoding('ISO-8859-1');
Note that this only specifies the input file encoding. The DOMElement
output is always UTF-8.
Error handling
If an error occurs at any point (error opening or reading the file, malformed document), an XMLReaderException
is thrown.
Note that the streaming may have already been started when the exception is thrown, so the generator may have already yielded a number of elements.