PeterO.Text.Encodings

PeterO.Text.Encodings

public static class Encodings

Contains methods for converting text from one character encoding to another. This class also contains convenience methods for converting strings and other character inputs to sequences of bytes and vice versa. The Encoding Standard, which is a Candidate Recommendation as of early November 2015, defines algorithms for the most common character encodings used on Web pages and recommends the UTF-8 encoding for new specifications and Web pages. Calling the GetEncoding(name) method returns one of the character encodings with the given name under the Encoding Standard.

Now let's define some terms.

Encoding Terms

There are several kinds of character encodings:

Getting an Encoding

The Encoding Standard includes UTF-8, UTF-16, and many legacy encodings, and gives each one of them a name. The GetEncoding(name) method takes a name string and returns an ICharacterEncoding object that implements that encoding, or null if the name is unrecognized.

However, the Encoding Standard is designed to include only encodings commonly used on Web pages, not in other protocols such as email. For email, the Encoding class includes an alternate function GetEncoding(name, forEmail) . Setting forEmail to true will use rules modified from the Encoding Standard to better suit encoding and decoding text from email messages.

Classes for Character Encodings

This Encodings class provides access to common character encodings through classes as described below:

Custom Encodings

Classes that implement the ICharacterEncoding interface can provide additional character encodings not included in the Encoding Standard. Some examples of these include the following:

(Note that this library doesn't implement either encoding.)

Member Summary

UTF8

public static readonly PeterO.Text.ICharacterEncoding UTF8;

Character encoding object for the UTF-8 character encoding, which represents each code point in the universal coded character set using 1 to 4 bytes.

DecodeToString

public static string DecodeToString(
    this PeterO.Text.ICharacterEncoding enc,
    byte[] bytes);

Reads a byte array from a data source and converts the bytes from a given encoding to a text string. Errors in decoding are handled by replacing erroneous bytes with the replacement character (U+FFFD). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: enc.DecodeToString(bytes) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: enc.DecodeToString(bytes) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A string consisting of the decoded text.

Exceptions:

DecodeToString

public static string DecodeToString(
    this PeterO.Text.ICharacterEncoding enc,
    byte[] bytes,
    int offset,
    int length);

Reads a portion of a byte array from a data source and converts the bytes from a given encoding to a text string. Errors in decoding are handled by replacing erroneous bytes with the replacement character (U+FFFD). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: enc.DecodeToString(bytes, offset, length) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: enc.DecodeToString(bytes, offset, length) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A string consisting of the decoded text.

Exceptions:

DecodeToString

public static string DecodeToString(
    this PeterO.Text.ICharacterEncoding enc,
    System.IO.Stream input);

Decodes data read from a data stream into a text string in the given character encoding. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.DecodeToString(input) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: enc.DecodeToString(input) . If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A string consisting of the decoded text.

Exceptions:

DecodeToString

public static string DecodeToString(
    this PeterO.Text.ICharacterEncoding encoding,
    PeterO.IByteReader input);

Reads bytes from a data source and converts the bytes from a given encoding to a text string. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: "encoding.DecodeString(input)". If the object's class already has a DecodeToString method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

The converted string.

Exceptions:

EncodeToBytes

public static byte[] EncodeToBytes(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoder encoder);

Reads Unicode characters from a character input and writes them to a byte array encoded using a given character encoding. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoder) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoder) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array.

Exceptions:

EncodeToBytes

public static byte[] EncodeToBytes(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoder encoder,
    bool htmlFallback);

Reads Unicode characters from a character input and writes them to a byte array encoded using the given character encoder and fallback strategy. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoder, htmlFallback) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array containing the encoded characters.

Exceptions:

EncodeToBytes

public static byte[] EncodeToBytes(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoding encoding);

Reads Unicode characters from a character input and writes them to a byte array encoded using the given character encoder. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoding) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array containing the encoded text.

Exceptions:

EncodeToBytes

public static byte[] EncodeToBytes(
    this string str,
    PeterO.Text.ICharacterEncoding enc);

Reads Unicode characters from a text string and writes them to a byte array encoded in a given character encoding. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD), and when writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any String object and can be called as follows: str.EncodeToBytes(enc) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.EncodeToBytes(enc) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array containing the encoded text string.

Exceptions:

EncodeToBytes

public static byte[] EncodeToBytes(
    this string str,
    PeterO.Text.ICharacterEncoding enc,
    bool htmlFallback);

Reads Unicode characters from a text string and writes them to a byte array encoded in a given character encoding and using the given encoder fallback strategy. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD). In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.EncodeToBytes(enc, htmlFallback) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array containing the encoded text string.

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoder encoder,
    PeterO.IWriter writer);

Reads Unicode characters from a character input and writes them to a byte array encoded in a given character encoding. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoder) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToWriter(encoder, writer) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoder encoder,
    System.IO.Stream output);

Reads Unicode characters from a character input and writes them to a byte array encoded in a given character encoding. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoder) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToWriter(encoder, output) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoding encoding,
    PeterO.IWriter writer);

Reads Unicode characters from a character input and writes them to a byte array encoded using the given character encoder. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoding) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToWriter(encoding, writer) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this PeterO.Text.ICharacterInput input,
    PeterO.Text.ICharacterEncoding encoding,
    System.IO.Stream output);

Reads Unicode characters from a character input and writes them to a byte array encoded using the given character encoder. When writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToBytes(encoding) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: input.EncodeToWriter(encoding, output) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this string str,
    PeterO.Text.ICharacterEncoding enc,
    PeterO.IWriter writer);

Converts a text string to bytes and writes the bytes to an output byte writer. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD), and when writing to the byte stream, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any String object and can be called as follows: str.EncodeToBytes(enc, writer) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.EncodeToWriter(enc, writer) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

EncodeToWriter

public static void EncodeToWriter(
    this string str,
    PeterO.Text.ICharacterEncoding enc,
    System.IO.Stream output);

Converts a text string to bytes and writes the bytes to an output data stream. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD), and when writing to the byte stream, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any String object and can be called as follows: str.EncodeToBytes(enc, writer) . If the object's class already has a EncodeToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.EncodeToWriter(enc, output) . If the object's class already has a EncodeToWriter method with the same parameters, that method takes precedence over this extension method.

Parameters:

Exceptions:

GetDecoderInput

public static PeterO.Text.ICharacterInput GetDecoderInput(
    this PeterO.Text.ICharacterEncoding encoding,
    PeterO.IByteReader stream);

Converts a character encoding into a character input stream, given a streamable source of bytes. The input stream doesn't check the first few bytes for a byte-order mark indicating a Unicode encoding such as UTF-8 before using the character encoding's decoder. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: "encoding.GetDecoderInput(input)". If the object's class already has a GetDecoderInput method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

Exceptions:

GetDecoderInput

public static PeterO.Text.ICharacterInput GetDecoderInput(
    this PeterO.Text.ICharacterEncoding encoding,
    System.IO.Stream input);

Converts a character encoding into a character input stream, given a data stream. The input stream doesn't check the first few bytes for a byte-order mark indicating a Unicode encoding such as UTF-8 before using the character encoding's decoder. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.GetDecoderInput(input) . If the object's class already has a GetDecoderInput method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.GetDecoderInput(input) . If the object's class already has a GetDecoderInput method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

Exceptions:

GetDecoderInputSkipBom

public static PeterO.Text.ICharacterInput GetDecoderInputSkipBom(
    this PeterO.Text.ICharacterEncoding encoding,
    PeterO.IByteReader stream);

Converts a character encoding into a character input stream, given a streamable source of bytes. But if the input stream starts with a UTF-8 or UTF-16 byte order mark, the input is decoded as UTF-8 or UTF-16, as the case may be, rather than the given character encoding. This method implements the "decode" algorithm specified in the Encoding standard.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.GetDecoderInputSkipBom(input) . If the object's class already has a GetDecoderInputSkipBom method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

GetDecoderInputSkipBom

public static PeterO.Text.ICharacterInput GetDecoderInputSkipBom(
    this PeterO.Text.ICharacterEncoding encoding,
    System.IO.Stream input);

Converts a character encoding into a character input stream, given a readable data stream. But if the input stream starts with a UTF-8 or UTF-16 byte order mark, the input is decoded as UTF-8 or UTF-16, as the case may be, rather than the given character encoding.This method implements the "decode" algorithm specified in the Encoding standard. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.GetDecoderInputSkipBom(input) . If the object's class already has a GetDecoderInputSkipBom method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

GetEncoding

public static PeterO.Text.ICharacterEncoding GetEncoding(
    string name);

Returns a character encoding from the given name.

Parameters:

Return Value:

An object implementing a character encoding (gives access to an encoder and a decoder).

GetEncoding

public static PeterO.Text.ICharacterEncoding GetEncoding(
    string name,
    bool forEmail);

Returns a character encoding from the given name.

Parameters:

Return Value:

An object that enables encoding and decoding text in the given character encoding. Returns null if the name is null or empty, or if it names an unrecognized or unsupported encoding.

GetEncoding

public static PeterO.Text.ICharacterEncoding GetEncoding(
    string name,
    bool forEmail,
    bool allowReplacement);

Returns a character encoding from the given name.

Parameters:

Return Value:

An object that enables encoding and decoding text in the given character encoding. Returns null if the name is null or empty, or if it names an unrecognized or unsupported encoding.

InputToString

public static string InputToString(
    this PeterO.Text.ICharacterInput reader);

Reads Unicode characters from a character input and converts them to a text string. In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterInput and can be called as follows: reader.InputToString() . If the object's class already has a InputToString method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A text string containing the characters read.

Exceptions:

ResolveAlias

public static string ResolveAlias(
    string name);

Resolves a character encoding's name to a standard form. This involves changing aliases of a character encoding to a standardized name. In several Internet specifications, this name is known as a "charset" parameter. In HTML and HTTP, for example, the "charset" parameter indicates the encoding used to represent text in the HTML page, text file, etc.

Parameters:

The UTF-8 , UTF-16LE , and UTF-16BE encodings don't encode a byte-order mark at the start of the text (doing so is not recommended for UTF-8 , while in UTF-16LE and UTF-16BE , the byte-order mark character U+FEFF is treated as an ordinary character, unlike in the UTF-16 encoding form). The Encoding Standard aliases UTF-16 to UTF-16LE "to deal with deployed content".

.

Return Value:

A standardized name for the encoding. Returns the empty string if name is null or empty, or if the encoding name is unsupported.

ResolveAliasForEmail

public static string ResolveAliasForEmail(
    string name);

Resolves a character encoding's name to a canonical form, using rules more suitable for email.

Parameters:

Return Value:

A standardized name for the encoding. Returns the empty string if name is null or empty, or if the encoding name is unsupported.

StringToBytes

public static byte[] StringToBytes(
    this PeterO.Text.ICharacterEncoder encoder,
    string str);

Converts a text string to a byte array using the given character encoder. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD), and when writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoder and can be called as follows: encoder.StringToBytes(str) . If the object's class already has a StringToBytes method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoder and can be called as follows: encoder.StringToBytes(str) . If the object's class already has a StringToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array.

Exceptions:

StringToBytes

public static byte[] StringToBytes(
    this PeterO.Text.ICharacterEncoding encoding,
    string str);

Converts a text string to a byte array encoded in a given character encoding. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD), and when writing to the byte array, any characters that can't be encoded are replaced with the byte 0x3f (the question mark character). In the.NET implementation, this method is implemented as an extension method to any object implementing ICharacterEncoding and can be called as follows: encoding.StringToBytes(str) . If the object's class already has a StringToBytes method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

A byte array containing the string encoded in the given text encoding.

Exceptions:

StringToInput

public static PeterO.Text.ICharacterInput StringToInput(
    this string str);

Converts a text string to a character input. The resulting input can then be used to encode the text to bytes, or to read the string code point by code point, among other things. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD). In the.NET implementation, this method is implemented as an extension method to any String object and can be called as follows: str.StringToInput(offset, length) . If the object's class already has a StringToInput method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.StringToInput() . If the object's class already has a StringToInput method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

Exceptions:

StringToInput

public static PeterO.Text.ICharacterInput StringToInput(
    this string str,
    int offset,
    int length);

Converts a portion of a text string to a character input. The resulting input can then be used to encode the text to bytes, or to read the string code point by code point, among other things. When reading the string, any unpaired surrogate characters are replaced with the replacement character (U+FFFD). In the.NET implementation, this method is implemented as an extension method to any String object and can be called as follows: str.StringToInput(offset, length) . If the object's class already has a StringToInput method with the same parameters, that method takes precedence over this extension method.

In the.NET implementation, this method is implemented as an extension method to any object implementing string and can be called as follows: str.StringToInput(offset, length) . If the object's class already has a StringToInput method with the same parameters, that method takes precedence over this extension method.

Parameters:

Return Value:

An ICharacterInput object.

Exceptions:

Back to Encoding start page.