Re: [mkgmap-dev] [PATCH] Alpha code for Highway Symbols

6 Apr 2009

      On Mon, Apr 06, 2009 at 02:38:15PM +0200, Johann Gail wrote:
...
\u Syntax is java Syntax, and is *NOT* UTF8-Encoding!
Correct.  For example, \u2020 (the dagger symbol, †) would be
\xe2\x80\xa0 or \342\200\240 in the UTF-8 encoding and
\x20\x20 or \40\40 in UTF-16 (no matter if big or little endian,
in this case).  The octal and hex notation are 8-bit byte codes.

I think that it is much more readable to write \u2020 for U+2020 than
\xe2\x80\xa0.  The \u notation will apparently also be in the next
C and C++ syntax.
...
Both of them are unicode, but the encoding scheme is different. At the  
moment it works fine, if you use an editor, which can handle unicode  
properly.
I'm not sure if I understand your comment.  I have understood that
java.lang.String uses something like UTF-16 internally.  I have never
seen a text file containing Unicode characters that would be encoded
in anything else than UTF-8.  As far as I understand, the MySQL database
(which I develop for a living) accepts UTF-16 string literals (called
"ucs2"), but the bug reports I've seen always have been in ASCII,
ISO 8859-1, or UTF-8.
...
But it is good idea, instead of introducing a new proprietary ~[xx]  
style, use a n existing standard, as e.g. the \u4 notation.
That exactly was my point.  It should be trivial to implement all
three notations (\x hex bytes, \ octal bytes, \u hex unicode).

	Marko

Re: [mkgmap-dev] [PATCH] Alpha code for Highway Symbols

Marko Mäkelä