Moodle  2.2.1
http://www.collinsharper.com
HTMLPurifier_Encoder Class Reference

Static Public Member Functions

static muteErrorHandler ()
static cleanUTF8 ($str, $force_php=false)
static unichr ($code)
static convertToUTF8 ($str, $config, $context)
static convertFromUTF8 ($str, $config, $context)
static convertToASCIIDumbLossless ($str)
static testEncodingSupportsASCII ($encoding, $bypass=false)

Detailed Description

A UTF-8 specific character encoder that handles cleaning and transforming.

Note:
All functions in this class should be static.

Definition at line 7 of file Encoder.php.


Member Function Documentation

static cleanUTF8 ( str,
force_php = false 
) [static]

Cleans a UTF-8 string for well-formedness and SGML validity

It will parse according to UTF-8 and return a valid UTF8 string, with non-SGML codepoints excluded.

Note:
Just for reference, the non-SGML code points are 0 to 31 and 127 to 159, inclusive. However, we allow code points 9, 10 and 13, which are the tab, line feed and carriage return respectively. 128 and above the code points map to multibyte UTF-8 representations.
Fallback code adapted from utf8ToUnicode by Henri Sivonen and hsivonen@iki.fi at <http://iki.fi/hsivonen/php-utf8/> under the LGPL license. Notes on what changed are inside, but in general, the original code transformed UTF-8 text into an array of integer Unicode codepoints. Understandably, transforming that back to a string would be somewhat expensive, so the function was modded to directly operate on the string. However, this discourages code reuse, and the logic enumerated here would be useful for any function that needs to be able to understand UTF-8 characters. As of right now, only smart lossless character encoding converters would need that, and I'm probably not going to implement them. Once again, PHP 6 should solve all our problems.

Definition at line 47 of file Encoder.php.

Here is the caller graph for this function:

static convertFromUTF8 ( str,
config,
context 
) [static]

Converts a string from UTF-8 based on configuration.

Note:
Currently, this is a lossy conversion, with unexpressable characters being omitted.

Definition at line 299 of file Encoder.php.

Here is the call graph for this function:

Here is the caller graph for this function:

static convertToASCIIDumbLossless ( str) [static]

Lossless (character-wise) conversion of HTML to ASCII

Parameters:
$strUTF-8 string to be converted to ASCII
Returns:
ASCII encoded string with non-ASCII character entity-ized
Warning:
Adapted from MediaWiki, claiming fair use: this is a common algorithm. If you disagree with this license fudgery, implement it yourself.
Note:
Uses decimal numeric entities since they are best supported.
This is a DUMB function: it has no concept of keeping character entities that the projected character encoding can allow. We could possibly implement a smart version but that would require it to also know which Unicode codepoints the charset supported (not an easy task).
Sort of with cleanUTF8() but it assumes that $str is well-formed UTF-8

Definition at line 345 of file Encoder.php.

Here is the caller graph for this function:

static convertToUTF8 ( str,
config,
context 
) [static]

Converts a string to UTF-8 based on configuration.

Definition at line 266 of file Encoder.php.

Here is the call graph for this function:

Here is the caller graph for this function:

static muteErrorHandler ( ) [static]

Error-handler that mutes errors, alternative to shut-up operator.

Definition at line 20 of file Encoder.php.

static testEncodingSupportsASCII ( encoding,
bypass = false 
) [static]

This expensive function tests whether or not a given character encoding supports ASCII. 7/8-bit encodings like Shift_JIS will fail this test, and require special processing. Variable width encodings shouldn't ever fail.

Parameters:
string$encodingEncoding name to test, as per iconv format
bool$bypassWhether or not to bypass the precompiled arrays.
Returns:
Array of UTF-8 characters to their corresponding ASCII, which can be used to "undo" any overzealous iconv action.

Definition at line 387 of file Encoder.php.

Here is the caller graph for this function:

static unichr ( code) [static]

Translates a Unicode codepoint into its corresponding UTF-8 character.

Note:
Based on Feyd's function at <http://forums.devnetwork.net/viewtopic.php?p=191404#191404>, which is in public domain.
While we're going to do code point parsing anyway, a good optimization would be to refuse to translate code points that are non-SGML characters. However, this could lead to duplication.
This is very similar to the unichr function in maintenance/generate-entity-file.php (although this is superior, due to its sanity checks).

Definition at line 226 of file Encoder.php.

Here is the caller graph for this function:


The documentation for this class was generated from the following file:
 All Data Structures Namespaces Files Functions Variables Enumerations