wpseek.com
A WordPress-centric search engine for devs and theme authors

wp_kses_normalize_entities › WordPress Function

Since1.0.0

Deprecatedn/a

› wp_kses_normalize_entities ( $content, $context = 'html' )

Parameters: (2)	(string) $content Content to normalize entities. Required: Yes (string) $context Context for normalization. Can be either 'html' or 'xml'. Default 'html'. Required: No Default: 'html'
Returns:	(string) Content with normalized entities.
Defined at:	wp-includes/kses.php , line 2080
Codex:	developer.wordpress.org / wp_kses_normalize_entities
Change Log:	5.5.0

Converts and fixes HTML entities.

This function normalizes HTML entities. It will convert AT&T to the correct AT&T, : to :, &#XYZZY; to &#XYZZY; and so on. When $context is set to 'xml', HTML entities are converted to their code points. For example, AT&T…&#XYZZY; is converted to AT&T…&#XYZZY;.

Source

function wp_kses_normalize_entities( $content, $context = 'html' ) {
	// Disarm all entities by converting & to &amp;
	$content = str_replace( '&', '&amp;', $content );

	/*
	 * Decode any character references that are now double-encoded.
	 *
	 * It's important that the following normalizations happen in the correct order.
	 *
	 * At this point, all `&` have been transformed to `&amp;`. Double-encoded named character
	 * references like `&amp;amp;` will be decoded back to their single-encoded form `&amp;`.
	 *
	 * First, numeric (decimal and hexadecimal) character references must be handled so that
	 * `&amp;#09;` becomes `&#9;`. If the named character references were handled first, there
	 * would be no way to know whether the double-encoded character reference had been produced
	 * in this function or was the original input.
	 *
	 * Consider the two examples, first with named entity decoding followed by numeric
	 * entity decoding. We'll use U+002E FULL STOP (.) in our example, this table follows the
	 * string processing from left to right:
	 *
	 * | Input        | &-encoded        | Named ref double-decoded  | Numeric ref double-decoded |
	 * | ------------ | ---------------- | ------------------------- | -------------------------- |
	 * | `&#x2E;`     | `&amp;#x2E;`     | `&amp;#x2E;`              | `&#x2E;`                   |
	 * | `&amp;#x2E;` | `&amp;amp;#x2E;` | `&amp;#x2E;`              | `&#x2E;`                   |
	 *
	 * Notice in the example above that different inputs result in the same result. The second case
	 * was not normalized and produced HTML that is semantically different from the input.
	 *
	 * | Input        | &-encoded        |  Numeric ref double-decoded | Named ref double-decoded |
	 * | ------------ | ---------------- | --------------------------- | ------------------------ |
	 * | `&#x2E;`     | `&amp;#x2E;`     | `&#x2E;`                    | `&#x2E;`                 |
	 * | `&amp;#x2E;` | `&amp;amp;#x2E;` | `&amp;amp;#x2E;`            | `&amp;#x2E;`             |
	 *
	 * Here, each input is normalized to an appropriate output.
	 */
	$content = preg_replace_callback( '/&amp;#(0*[0-9]{1,7});/', 'wp_kses_normalize_entities2', $content );
	$content = preg_replace_callback( '/&amp;#[Xx](0*[0-9A-Fa-f]{1,6});/', 'wp_kses_normalize_entities3', $content );
	if ( 'xml' === $context ) {
		$content = preg_replace_callback( '/&amp;([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_xml_named_entities', $content );
	} else {
		$content = preg_replace_callback( '/&amp;([A-Za-z]{2,8}[0-9]{0,2});/', 'wp_kses_named_entities', $content );
	}

	return $content;
}