com.upokecenter.text.NormalizerInput

com.upokecenter.text.NormalizerInput

public final class NormalizerInput extends Object implements com.upokecenter.text.ICharacterInput

A character input class that implements the Unicode normalization algorithm and contains methods and functionality to test and convert text strings for normalization. This is similar to the deprecated Normalizer class, except it implements the ICharacterInput interface.

The Unicode Standard includes characters, such as an acute accent, that can be combined with other characters to make new characters. For example, the letter E combines with an acute accent to make E-acute (É). In some cases, the combined form (E-acute) should be treated as equivalent to the uncombined form (E plus acute). For this reason, the standard defines four normalization forms that convert strings to a single equivalent form:

For more information, see Standard Annex 15 at http://www.unicode.org/reports/tr15/.

Thread safety: This class is mutable; its properties can be changed. None of its instance methods are designed to be thread safe. Therefore, access to objects from this class must be synchronized if multiple threads can access them at the same time.

NOTICE: While this class's source code is in the public domain, the class uses an class, called NormalizationData, that includes data derived from the Unicode Character Database. In case doing so is required, the permission notice for the Unicode Character Database is given here:

COPYRIGHT AND PERMISSION NOTICE.

Copyright (c) 1991-2014 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.

Permission is hereby granted, free of charge, to any person obtaining a copy of the Unicode data files and any associated documentation (the "Data Files") or Unicode software and any associated documentation (the "Software") to deal in the Data Files or Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, and/or sell copies of the Data Files or Software, and to permit persons to whom the Data Files or Software are furnished to do so, provided that (a) this copyright and permission notice appear with all copies of the Data Files or Software, (b) this copyright and permission notice appear in associated documentation, and (c) there is clear notice in each modified Data File or in the Software as well as in the documentation associated with the Data File(s) or Software that the data or software has been modified.

THE DATA FILES AND SOFTWARE ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF THIRD PARTY RIGHTS. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR HOLDERS INCLUDED IN THIS NOTICE BE LIABLE FOR ANY CLAIM, OR ANY SPECIAL INDIRECT OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THE DATA FILES OR SOFTWARE.

Except as contained in this notice, the name of a copyright holder shall not be used in advertising or otherwise to promote the sale, use or other dealings in these Data Files or Software without prior written authorization of the copyright holder.

Constructors

Methods

Method Details

IsNormalized

public static boolean IsNormalized(com.upokecenter.text.ICharacterInput chars, Normalization form)

Determines whether the text provided by a character input is normalized.

Parameters:

Returns:

Throws:

Normalize

public static String Normalize(String str, Normalization form)

Converts a string to the given Unicode normalization form.

Parameters:

Returns:

Throws:

IsNormalized

public static boolean IsNormalized(String str, Normalization form)

Determines whether the given string is in the given Unicode normalization form.

Parameters:

Returns:

Throws:

ReadChar

public int ReadChar()

Reads a Unicode character from a data source.

Specified by:

Returns:

Read

public int Read(int[] chars, int index, int length)

Reads a sequence of Unicode code points from a data source.

Specified by:

Parameters:

Returns:

Throws:

Back to MailLib start page.