New in Symfony 6.1: HtmlSanitizer Component


Symfony 6.1 will be released at the end of May 2022 and it will require
PHP 8.1 or higher. This is the first article of the series that shows the most
important new features introduced by Symfony 6.1.


Contributed by Titouan Galopin
in #44681.

Web applications often need to work with HTML contents generated by users. It's
difficult to do so in a safe way. Rendering those unsafe HTML contents in a
Twig template or injecting them via JavaScript in the innerHTML property of
elements can lead to unwanted and dangerous JavaScript code execution.
HTML sanitization is "the process of examining an HTML document and
producing a new HTML document that preserves only whatever tags or attributes
that are designated safe and desired"
.
Most of the times, this sanitization process is used to protect against attacks
such as cross-site scripting (XSS). However, sanitization is also about fixing
wrong HTML contents in the best way possible:


Original: <div><em>foodiv>

Sanitized: <div><em>fooem>div>

Original: <textarea><em>footextarea>

Sanitized: <textarea><em>footextarea>

In Symfony 6.1 we're adding a PHP-based HTML sanitizer so you can transform
user generated HTML content into safe HTML content. This new component is similar
to the upcoming W3C HTML Sanitizer API and we even use the same method names
whenever possible to ease the learning curve.

use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;

// By default, any elements not included in the allowed or blocked elements
// will be dropped, including its children
$config = (new HtmlSanitizerConfig())
// Allow "safe" elements and attributes. All scripts will be removed
// as well as other dangerous behaviors like CSS injection
->allowSafeElements()

// Allow the "div" element and no attribute can be on it
->allowElement('div')

// Allow the "a" element, and the "title" attribute to be on it
->allowElement('a', ['title'])

// Allow the "span" element, and any attribute from the Sanitizer API is allowed
// (see https://wicg.github.io/sanitizer-api/#default-configuration)
->allowElement('span', '*')

// Drop the "div" element: this element will be removed, including its children
->dropElement('div')
;

In addition to adding and removing HTML elements and attributes, you can force
the value of some attributes to improve the resulting HTML contents:

$config = (new HtmlSanitizerConfig())
// ...

// Forcefully set the value of all "rel" attributes on "a"
// elements to "noopener noreferrer"
->forceAttribute('a', 'rel', 'noopener noreferrer')

// Drop the "data-custom-attr" attribute from all elements:
// this attribute will be removed
->dropAttribute('data-custom-attr', '*')

// Transform all HTTP schemes to HTTPS
->forceHttpsUrls()

// Configure which hosts are allowed in img/audio/video/iframe (by default all are allowed)
->allowedMediaHosts(['youtube.com', 'example.com'])
;

In addition to these, there are many other configuration options. Check out the
docs for the HtmlSanitizer bundle. Once configured, use the sanitizer as follows:

use Symfony\Component\HtmlSanitizer\HtmlSanitizer;

$sanitizer = new HtmlSanitizer($config);

// this sanitizes contents in the context, removing any tags that are
// only allowed inside the element
$sanitizer->sanitize($userInput);

// this sanitizes contents to include them inside a tag
$sanitizer->sanitizeFor('head', $userInput);

// this sanitizes contents in the best way possible for the HTML element
// provided as the first argument (sometimes it will add missing tags and
// other times it will HTML-encode the unclosed tags)
$sanitizer->sanitizeFor('textarea', $userInput); // it will encode as HTML entities
$sanitizer->sanitizeFor('div', $userInput); // it will sanitize same as

Sponsor the Symfony project.