berlioz/html-selector

Berlioz HTML Selector is a PHP library to do queries on HTML files (converted in SimpleXMLElement object) like jQuery on DOM.

v2.1.0 2024-10-18 11:53 UTC

This package is auto-updated.

Last update: 2025-01-18 12:17:48 UTC


README

Latest Version Software license Build Status Quality Grade Total Downloads

Berlioz HTML Selector is a PHP library to do queries on HTML files with CSS selectors like jQuery on DOM.

Installation

Composer

You can install Berlioz HTML Selector with Composer, it's the recommended installation.

$ composer require berlioz/html-selector

Dependencies

  • PHP ^8.0
  • PHP libraries:
    • dom
    • libxml
    • mbstring
    • simplexml

Usage

Load HTML

You can easily load an HTML string or file with the static function HtmlSelector::query(). For files, use second parameter contentsIsFile of method.

$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();

$query = $htmlSelector->query('<html><body>...</body></html>');
$query = $htmlSelector->query('path-of-my-file/file.html', contentsIsFile: true);
$query = $htmlSelector->query(new SimpleXMLElement(/*...*/));

Load from ResponseInterface

HtmlSelector::queryFromResponse() permit loading html of a response body.

$htmlSelector = new \Berlioz\HtmlSelector\HtmlSelector();

/** @var \Psr\Http\Message\ResponseInterface $response */
$query = $htmlSelector->queryFromResponse($response);

Do a query

It's very simple to query an HTML string with a selector like jQuery.

/** @var \Berlioz\HtmlSelector\Query\Query $query */
$query = $query->find('body > .wrapper h2');
$query = $query->filter(':first');

Selectors

CSS Simple selectors

  • type: selection of elements with their type.
  • #id: selection of an element with it's ID.
  • .class: selection of elements with their class.
  • Attributes selections.
    • [attribute]: with attribute 'attribute'.
    • [attribute=foo]: value of attribute equals to 'foo'.
    • [attribute^=foo]: value of attribute starts with 'foo'.
    • [attribute$=foo]: value of attribute ends with 'foo'.
    • [attribute=foo]*: value of attribute contains 'foo'.
    • [attribute!=foo]: value of attribute different of 'foo'.
    • [attribute~=foo]: value of attribute contains word 'foo'.
    • [attribute|=foo]: value of attribute contains prefix 'foo'.

CSS Ascendants, descendants, multiples

  • selector selector or selector >> selector: all descendant selector.
  • selector > selector: direct descendant selector (only children).
  • selector ~ selector: siblings selector.
  • selector, selector: multiple selectors.

CSS Pseudo Classes

  • :any(selector, selector): only elements given in arguments.
  • :any-link: only elements of type <a>, <area> and <link>, with [href] attribute.
  • :blank: only elements without child, and no text (except spaces).
  • :checked: only elements with attribute [checked].
  • :dir: only elements with directional text given (default: ltr).
  • :disabled: only elements of type <button>, <input>, <optgroup>, <select> or <textarea> with [disabled] attribute.
  • :empty: only elements without child.
  • :enabled: only elements of type <button>, <input>, <optgroup>, <option>, <select>, <textarea> , <menuitem> or <fieldset> without [disabled] attribute.
  • :first: only first result of complete selection.
  • :first-child: only firsts children in their parents.
  • :first-of-type: only firsts type in their parents.
  • :has(selector, selector): only elements who valid child selector.
  • :lang(x): only elements with attribute [lang] prefixed by or equals to given value.
  • :last-child: only lasts in their parents.
  • :last-of-type: only lasts type in their parents.
  • :not(selector, selector): filter 'not'.
  • :nth-child(): n elements in selector result.
  • :nth-last-child(): n elements in selector result, start at end of list.
  • :nth-of-type(): n elements of given type in selector result.
  • :nth-last-of-type(): n elements of given type in selector result, start at end of list.
  • :only-child: only elements who are only child in the parent.
  • :only-of-type: only elements who are only type child in the parent.
  • :optional(): only input elements without [required] attribute.
  • :read-only(): only elements that the user cannot edit.
  • :read-write(): only elements with editable property.
  • :required(): only elements with [required] attribute.
  • :root(): get root element.

Additional CSS Pseudo Classes (not in CSS specifications) from jQuery library

  • :button: only elements of type <button> without attribute value [type=submit] or <input type="button">.
  • :checkbox: only elements with attribute [type=checkbox].
  • :contains(x): only elements who contain text given.
  • :eq(x): only result with index given (index start to 0).
  • :even: only even results in selection.
  • :file: only elements with attribute [type=file].
  • :gt(x): only result with an index greater than index given (index start to 0).
  • :gte: only result with an index greater than or equal to index given (index start to 0).
  • :header: only elements of heading, like <h1>, <h2>...
  • :image: only elements with attribute [type=image].
  • :input: only elements of type <input>, <textarea>, <select> or <button>.
  • :last: only last result of complete selection.
  • :lt: only result with index leather than index given (index start to 0).
  • :lte: only result with index leather than or equal to index given (index start to 0).
  • :odd: only odd results in selection.
  • :parent: only elements with one child or more.
  • :password: only elements with attribute [type=password].
  • :radio: only elements with attribute [type=radio].
  • :reset: only elements with attribute [type=reset].
  • :selected: only elements of type <option> with attribute [selected].
  • :submit: only elements of type <button> or <input> with attribute [type=submit].
  • :text: only elements of type <input> with attribute [type=text] or without [type] attribute.

Additional CSS Pseudo Classes (not in CSS specifications)

  • :count(x): only elements who are x children in the parent, used in :has(selector) pseudo class.

Full example of selectors

select > option:selected
div#myId.class1.class2[name1=value1][name2=value2]:even:first

Functions

Default functions

Some default functions are available in Query object to interact with results. The functions should have the same result as their counterparts on jQuery.

  • attr(name): get attribute value
  • attr(name, value): set attribute value
  • children(): get children of elements in result.
  • count(): count the number of elements in query result.
  • data(nameOfData): get data value (name is with camelCase syntax without the 'data-' prefix).
  • filter(selector): filter elements in result.
  • find(selector): find selector in elements in result.
  • get(i): get DOM element in result.
  • hasClass(class_name): know if least one of element in result have given classes.
  • html(): get html of first element in result.
  • index(selector): get the index of given selector in result elements.
  • is(selector): know if selector valid the least one element in result.
  • isset(i): return boolean to know if an element key exists in result.
  • next(selector): get next element after each element in result.
  • nextAll(selector): get all next elements after each element in result.
  • not(selector): filter elements in result.
  • parent(): get direct parent of current result of selecting.
  • parents(selector): get all parents of current result of selecting.
  • prev(selector): get prev element after each element in result.
  • prevAll(selector): get all prev elements after each element in result.
  • prop(name): get property boolean value of an attribute, used for example for disabled attribute.
  • prop(name, value): set property boolean value of an attribute, used for example for disabled attribute.
  • serialize(): serialize input values of a form. Return a string.
  • serializeArray(): serialize input values of a form. Return an array.
  • text(): get text of each element concatenated.
  • val(): get value of a form element.