tomzx / html-parser
An HTML parser written in PHP
v0.1.0
2016-01-24 13:58 UTC
Requires
- php: >=5.4.0
- tomzx/abstract-parser: ~0.1
Requires (Dev)
- phpunit/phpunit: ~4
This package is auto-updated.
Last update: 2025-01-10 08:48:02 UTC
README
An HTML parser written in PHP. Based on nikic's PHP Parser.
Getting started
HTML parser
goal is to simplify the traversal/modification of an HTML tree using the visitor pattern.
First, you'll want to parse your HTML using the Parser
in order to generate a data structure appropriate for the NodeTraverser
.
Once that is done, you specify one or many visitors that implement the operation you want to apply on the HTML elements.
Then, you traverse the HTML tree structure, which will call the visitors on every element entry/exit.
Finally, you may print back the final output as a string.
<?php $code = file_get_contents('input.html'); $parser = new Parser(); $statements = $parser->parse($code); $traverser = new NodeTraverser(); $traverser->addVisitor(new ElementStripper(['head', 'a'])); // A visitor which removes any element of a specific type $statements = $traverser->traverse($statements); $printer = new Printer(); $printer->output($statements);
License
The code is licensed under the MIT license. See LICENSE.