vanilla/htmlawed

A composer wrapper for the htmLawed library to purify & filter HTML. Tested with PHPUnit and PhantomJS!

Installs: 522 619

Dependents: 6

Suggesters: 0

Security: 0

Stars: 40

Watchers: 17

Forks: 15

Open Issues: 4

Language:HTML

v2.2.15 2023-09-29 18:41 UTC

README

Build Status Coverage Packagist Version LGPL-3.0

A composer wrapper for the htmLawed library to purify & filter HTML. Tested with PHPUnit and PhantomJS.

Why use htmLawed?

If your website has any user-generated content then you need to worry about cross-site scripting (XSS). htmLawed will take a piece of potentially malicious html and remove the malicious code, leaving the rest of html behind.

Beyond the base htmLawed library, this package makes htmLawed a composer package and wraps it in an object so that it can be autoloaded.

Installation

htmLawed requres PHP 5.4 or higher

htmLawed is PSR-4 compliant and can be installed using composer. Just add vanilla/htmlawed to your composer.json.

"require": {
    "vanilla/htmlawed": "~1.0"
}

Example

echo Htmlawed::filter('<h1>Hello world!');
// Outputs: '<h1>Hello world!</h1>'.

echo Htmlawed::filter('<i>nothing to see</i><script>alert("xss")</script>')
// Outputs: '<i>nothing to see</i>alert("xss")'

Configs and Specs

The htmLawed filter takes two optional parameters: $config and $spec. This library provides sensible defaults to these parameters, but you can override them in Htmlawed::filter().

$xss = "<i>nothing to see <script>alert('xss')</script>";

// Pass an empty config and spec for no filtering of malicious code.
echo Htmlawed::filter($xss, [], []);
// Outputs: '<i>nothing to see <script type="text/javascript">alert("xss")</script></i>'

// Pass safe=1 to turn on all the safe options.
echo Htmlawed::filter($xss, ['safe' => 1]);
// Outputs: '<i>nothing to see alert("xss")</i>'

// We provide a convenience method that strips all tags that aren't supposed to be in rss feeds.
echo Htmlawed::filterRSS('<html><body><h1>Hello world!</h1></body></html>');
// Outputs: '<h1>Hello world!</h1>'

See the htmLawed documentation for the full list of options.

Differences in Vanilla's version of Htmlawed

We try and use the most recent version of htmLawed with as few changes as possible so that bug fixes and security releases can be merged from the main project. However, We've made a few changes in the source code.

  • Balance tags (hl_bal) before validating tags (hl_tag). We found some cases where an unbalanced script tag would not get removed and this addresses that issue.
  • Don't add an extra <div> inside of <blockquote> tags.
  • Remove naked <span>.
  • Change indentation from 1 space to 4 spaces.

If the original author of htmLawed wants to make any of these changes upstream please get in contact with support@vanillaforums.com.