Changes between Version 3 and Version 4 of CharsetSupport
- Timestamp:
- 04/05/10 18:31:03 (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
CharsetSupport
v3 v4 22 22 === Conflicts between the CLHS and the Unicode standard === 23 23 24 The CLHS specifies that characters may have 'case'. When characters have case they are required to exist in pairs: an upper case and a lower case variant. Unicode does not satisfy this requirement. As an example, the characters LATIN SMALL LETTER I and LATIN SMALL LETTER DOTLESS I both map to LATIN CAPITAL LETTER I.24 The CLHS specifies that characters may have 'case'. When characters have case they are required to exist in pairs: an upper case and a lower case variant. E.g., the lowercase character #\a is uniquely associated with the uppercase character #\A. Converting #\a to uppercase will always return #\A and the other way around, converting #\A to lower case. Unicode does not satisfy this requirement. As an example, the characters LATIN SMALL LETTER I and LATIN SMALL LETTER DOTLESS I both map to LATIN CAPITAL LETTER I. 25 25 26 26 Other examples are the ESZET character which uppercases to "SS" (a two character string) and the GREEK CAPITAL LETTER SIGMA which converts to different characters depending on whether it's the last character in the converted word. 27 27 28 The ESZET violates the CLHS requirement that case conversion takes exactly one character as input and produces exactly one character on output: it produces 2 characters. The dotless i violates the CLHS requirement that characters are associated in pairs; after all, there are 3 characters in the conversion set of the dotless i.