Changes between Version 3 and Version 4 of CharsetSupport


Ignore:
Timestamp:
04/05/10 18:31:03 (13 years ago)
Author:
ehuelsmann
Comment:

Be more explicit about the problems between Unicode and CLHS

Legend:

Unmodified
Added
Removed
Modified
  • CharsetSupport

    v3 v4  
    2222=== Conflicts between the CLHS and the Unicode standard ===
    2323
    24 The CLHS specifies that characters may have 'case'. When characters have case they are required to exist in pairs: an upper case and a lower case variant. Unicode does not satisfy this requirement. As an example, the characters LATIN SMALL LETTER I and LATIN SMALL LETTER DOTLESS I both map to LATIN CAPITAL LETTER I.
     24The CLHS specifies that characters may have 'case'. When characters have case they are required to exist in pairs: an upper case and a lower case variant. E.g., the lowercase character #\a is uniquely associated with the uppercase character #\A. Converting #\a to uppercase will always return #\A and the other way around, converting #\A to lower case. Unicode does not satisfy this requirement. As an example, the characters LATIN SMALL LETTER I and LATIN SMALL LETTER DOTLESS I both map to LATIN CAPITAL LETTER I.
    2525
    2626Other examples are the ESZET character which uppercases to "SS" (a two character string) and the GREEK CAPITAL LETTER SIGMA which converts to different characters depending on whether it's the last character in the converted word.
    2727
     28The ESZET violates the CLHS requirement that case conversion takes exactly one character as input and produces exactly one character on output: it produces 2 characters. The dotless i violates the CLHS requirement that characters are associated in pairs; after all, there are 3 characters in the conversion set of the dotless i.