PeterO.Text.NormalizingCharacterInput
## PeterO.Text.NormalizingCharacterInput
public sealed class NormalizingCharacterInput : PeterO.Text.ICharacterInput
Deprecated. Renamed to NormalizerInput.
A character input class that implements the Unicode normalization algorithm and contains methods and functionality to test and convert text strings for normalization. This is similar to the deprecated Normalizer class, except it implements the ICharacterInput interface.
-
NFD (Normalization Form D) decomposes combined forms to their constituent characters (E plus acute, for example), then reorders combining marks to a standardized order. This is called canonical decomposition.
-
NFC does canonical decomposition, then combines certain constituent characters to their composites (E-acute, for example). This is called canonical composition.
-
Two normalization forms, NFKC and NFKD, are similar to NFC and NFD, except they also “decompose” certain characters, such as ligatures, font or positional variants, and subscripts, whose visual distinction can matter in some contexts. This is called compatibility decomposition.
For more information, see Standard Annex 15 at http://www.unicode.org/reports/tr15/
.
Thread safety: This class is mutable; its properties can be changed. None of its instance methods are designed to be thread safe. Therefore, access to objects from this class must be synchronized if multiple threads can access them at the same time.
NOTICE: While this class’s source code is in the public domain, the class uses an internal class, called NormalizationData, that includes data derived from the Unicode Character Database. In case doing so is required, the permission notice for the Unicode Character Database is given here:
COPYRIGHT AND PERMISSION NOTICE.
Copyright (c) 1991-2014 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the “Data Files”) or Unicode software and any associated documentation (the “Software”) to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that (a) this copyright and permission notice appear with all copies of the Data Files or Software, (b) this copyright and permission notice appear in associated documentation, and (c) there is clear notice in each modified Data File or in the Software as well as in the documentation associated with the Data File(s) or Software that the data or software has been modified.
THE DATA FILES AND SOFTWARE ARE PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.
Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.
Member Summary
[GetChars(PeterO.Text.ICharacterInput, PeterO.Text.Normalization)](#GetChars_PeterO_Text_ICharacterInput_PeterO_Text_Normalization)
- Deprecated: Instead of this method, create a NormalizerInput on the input and call ReadChar to get the normalized string’s code points.[GetChars(string, PeterO.Text.Normalization)](#GetChars_string_PeterO_Text_Normalization)
- Deprecated: Instead of this method, create a NormalizerInput on the string and call ReadChar to get the normalized string’s code points.[IsNormalized(int[], PeterO.Text.Normalization)](#IsNormalized_int_PeterO_Text_Normalization)
- Deprecated: Either convert the array to a string or wrap it in an ICharacterInput and call the corresponding overload instead.[IsNormalized(PeterO.Text.ICharacterInput, PeterO.Text.Normalization)](#IsNormalized_PeterO_Text_ICharacterInput_PeterO_Text_Normalization)
- Determines whether the text provided by a character input is normalized.[IsNormalized(string, PeterO.Text.Normalization)](#IsNormalized_string_PeterO_Text_Normalization)
- Determines whether the given string is in the given Unicode normalization form.[IsNormalized(System.Collections.Generic.IList, PeterO.Text.Normalization)](#IsNormalized_System_Collections_Generic_IList_PeterO_Text_Normalization)
- Deprecated: Either convert the list to a string or wrap it in an ICharacterInput and call the corresponding overload instead.[Normalize(string, PeterO.Text.Normalization)](#Normalize_string_PeterO_Text_Normalization)
- Converts a string to the given Unicode normalization form.[Read(int[], int, int)](#Read_int_int_int)
- Reads a sequence of Unicode code points from a data source.[ReadChar()](#ReadChar)
- Reads a Unicode character from a data source.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( PeterO.Text.ICharacterInput input);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
- input: The parameter input is an ICharacterInput object.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( PeterO.Text.ICharacterInput stream, PeterO.Text.Normalization form);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
-
stream: The parameter stream is an ICharacterInput object.
-
form: The parameter form is a Normalization object.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( string str);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
- str: The parameter str is a text string.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( string str, int index, int length, PeterO.Text.Normalization form);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
-
str: The parameter str is a text string.
-
index: An index, starting at 0, showing where the desired portion of str begins.
-
length: The length, in code units, of the desired portion of str (but not more than str ‘s length).
-
form: The parameter form is a Normalization object.
Exceptions:
-
System.ArgumentException: Either index or length is less than 0 or greater than str ‘s length, or str ‘s length minus index is less than length .
-
System.ArgumentNullException: The parameter str is null.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( string str, PeterO.Text.Normalization form);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
-
str: The parameter str is a text string.
-
form: The parameter form is a Normalization object.
Exceptions:
- System.ArgumentNullException: The parameter str is null.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( System.Collections.Generic.IList characterList);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
- characterList: The parameter characterList is an IList object.
### NormalizingCharacterInput Constructor
public NormalizingCharacterInput( System.Collections.Generic.IList characterList, PeterO.Text.Normalization form);
Initializes a new instance of the PeterO.Text.NormalizingCharacterInput class.
Parameters:
-
characterList: The parameter characterList is an IList object.
-
form: The parameter form is a Normalization object.
public static System.Collections.Generic.IList GetChars( PeterO.Text.ICharacterInput chars, PeterO.Text.Normalization form);
Deprecated. Instead of this method, create a NormalizerInput on the input and call ReadChar to get the normalized string’s code points.
Gets a list of normalized code points after reading from a character stream.
Parameters:
-
chars: An object that implements a stream of Unicode characters.
-
form: Specifies the normalization form to use when normalizing the text.
Return Value:
A list of the normalized Unicode characters.
Exceptions:
- System.ArgumentNullException: The parameter chars is null.
public static System.Collections.Generic.IList GetChars( string str, PeterO.Text.Normalization form);
Deprecated. Instead of this method, create a NormalizerInput on the string and call ReadChar to get the normalized string’s code points.
Gets a list of normalized code points after reading from a string.
Parameters:
-
str: The parameter str is a text string.
-
form: Specifies the normalization form to use when normalizing the text.
Return Value:
A list of the normalized Unicode characters.
Exceptions:
- System.ArgumentNullException: The parameter str is null.
public static bool IsNormalized( int[] charArray, PeterO.Text.Normalization form);
Deprecated. Either convert the array to a string or wrap it in an ICharacterInput and call the corresponding overload instead.
Determines whether the given array of characters is in the given Unicode normalization form.
Parameters:
-
charArray: An array of Unicode code points.
-
form: Specifies the normalization form to use when normalizing the text.
Return Value:
true
if the given list of characters is in the given Unicode normalization form; otherwise, false
.
Exceptions:
- System.ArgumentNullException: The parameter “charList” is null.
public static bool IsNormalized( PeterO.Text.ICharacterInput chars, PeterO.Text.Normalization form);
Determines whether the text provided by a character input is normalized.
Parameters:
-
chars: A object that implements a streamable character input.
-
form: Specifies the normalization form to check.
Return Value:
true
if the text is normalized; otherwise, false
.
Exceptions:
- System.ArgumentNullException: The parameter chars is null.
public static bool IsNormalized( string str, PeterO.Text.Normalization form);
Determines whether the given string is in the given Unicode normalization form.
Parameters:
-
str: An arbitrary string.
-
form: Specifies the normalization form to use when normalizing the text.
Return Value:
true
if the given string is in the given Unicode normalization form; otherwise, false
. Returns false
if the string contains an unpaired surrogate code point.
Exceptions:
- System.ArgumentNullException: The parameter str is null.
public static bool IsNormalized( System.Collections.Generic.IList charList, PeterO.Text.Normalization form);
Deprecated. Either convert the list to a string or wrap it in an ICharacterInput and call the corresponding overload instead.
Determines whether the given list of characters is in the given Unicode normalization form.
Parameters:
-
charList: A list of Unicode code points.
-
form: Specifies the normalization form to use when normalizing the text.
Return Value:
true
if the given list of characters is in the given Unicode normalization form; otherwise, false
.
Exceptions:
- System.ArgumentNullException: The parameter charList is null.
public static string Normalize( string str, PeterO.Text.Normalization form);
Converts a string to the given Unicode normalization form.
Parameters:
-
str: An arbitrary string.
-
form: The Unicode normalization form to convert to.
Return Value:
The parameter str converted to the given normalization form.
Exceptions:
-
System.ArgumentException: The parameter str contains an unpaired surrogate code point.
-
System.ArgumentNullException: The parameter str is null.
public sealed int Read( int[] chars, int index, int length);
Reads a sequence of Unicode code points from a data source.
Parameters:
-
chars: Output buffer.
-
index: An index starting at 0 showing where the desired portion of chars begins.
-
length: The number of elements in the desired portion of chars (but not more than chars ‘s length).
Return Value:
The number of Unicode code points read, or 0 if the end of the source is reached.
Exceptions:
-
System.ArgumentNullException: The parameter chars is null.
-
System.ArgumentException: Either index or length is less than 0 or greater than chars ‘s length, or chars ‘s length minus index is less than length .
public sealed int ReadChar();
Reads a Unicode character from a data source.
Return Value:
Either a Unicode code point (from 0-0xd7ff or from 0xe000 to 0x10ffff), or the value -1 indicating the end of the source.