
While reading the code, I stumbled upon code such as if ("unicode".equals(charset)) in src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java. mkgmap --help=options does not mention this option, but there’s a line #OSMDATA = --charset=unicode localtest/osm/czech_test.osm in the Makefile. So I tried this option, but what I got looks like all characters transliterated to just ASCII. Oops? I guess what I actually want to ask: since we’ve seen that newer devices have a rich character repertoire containing latin based, greek, cyrillic and arabic, I wonder whether it is only possible to access them via code pages, limiting us to a subset every time. Do you know whether there’s unicode support in recent devices? rj

On 13-02-26 23:33:44 CET, Robert Joop wrote:
Do you know whether there’s unicode support in recent devices?
Just a side note because is not about a map: I just tried a waypoint, set the keyboard language to Croatian first and entered a few letters, changed to German and added some more. As usual for this device, the waypoint ends up in the file Waypoints_26-FEB-13.gpx which is UTF-8 encoded. The waypoint contains the <cmt>Čćžšäöüßabcd</cmt> I entered. With the device mounted in mass storage mode, I edited the file, added a few Greek and Hebrew letters. With the device rebooted, the waypoint shows the comment with the Greek letters, but nothing for the Hebrew ones. This matches the earlier findings for the code page maps (Greek yes, Hebrew no). Using waypoints may be easier for users to investigate their devices’ character repertoires than having to create maps. (But the question remains: does the img format support any unicode encoding and if so, do any devices support it?) rj

Hi
While reading the code, I stumbled upon code such as if ("unicode".equals(charset)) in src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java.
That was all just experimenting, right at the beginning before we had any proper i18n support at all.
I guess what I actually want to ask: since we’ve seen that newer devices have a rich character repertoire containing latin based, greek, cyrillic and arabic, I wonder whether it is only possible to access them via code pages, limiting us to a subset every time. Do you know whether there’s unicode support in recent devices?
A google search confirms that they can eg: https://forums.garmin.com/showthread.php?19806-Is-Mapsource-compatible-with-... ..Steve

On 13-02-27 12:14:31 CET, Steve Ratcliffe wrote:
Hi
While reading the code, I stumbled upon code such as if ("unicode".equals(charset)) in src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java.
That was all just experimenting, right at the beginning before we had any proper i18n support at all.
So, no unicode map support so far?
I guess what I actually want to ask: since we’ve seen that newer devices have a rich character repertoire containing latin based, greek, cyrillic and arabic, I wonder whether it is only possible to access them via code pages, limiting us to a subset every time. Do you know whether there’s unicode support in recent devices?
A google search confirms that they can eg: https://forums.garmin.com/showthread.php?19806-Is-Mapsource-compatible-with-...
How embarrassing, I should have done that myself before. On the other hand, I Googled for the Garmin character repertoires before starting to test and ask and came up empty handed, so I have a lame excuse. ;-) The term “Unicode maps” is also to be found in some Garmin Mapinstall release notes (addition of support for it). GPSMapEdit release notes also mention unicode, but unfortunately: “Note: UTF-8 is not supported by cgpsmapper and Garmin IMG format” What I haven’t been able to find out after over an hour of searching: Is it publicly known how such “unicode maps” are encoded, or is this a mystery hidden in encrypted maps? rj

On 13-02-27 23:35:49 CET, Robert Joop wrote:
What I haven’t been able to find out after over an hour of searching: Is it publicly known how such “unicode maps” are encoded, or is this a mystery hidden in encrypted maps?
At least getting unicode into street names turned out to be easy: I brutally patched the code to use the codepage 65001 and put UTF-8 bytes for “äαЯب” into the code, i.e. German, Greek, Cyrillic and Arabic, which is usually contained in four codepages (1250, 1253, 1251, 1256). The street names show up with these four characters at the same time. Yeah! I can’t program Java, so: - I don’t see the reason why the Utf8Encoder encodeText() seemingly results in ASCII instead of UTF-8. - with my patching I didn’t go beyond the point of demonstrating a minimal example. The option was “--code-page=65001”. I believe the lines that accomplished this are no more than these: Index: src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java =================================================================== --- src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java (revision 2501) +++ src/uk/me/parabola/imgfmt/app/labelenc/CodeFunctions.java (working copy) @@ -97,6 +97,11 @@ funcs.setEncodingType(ENCODING_FORMAT10); funcs.setEncoder(new Utf8Encoder()); funcs.setDecoder(new Utf8Decoder()); + } else if ("cp65001".equals(charset)) { + funcs.setEncodingType(ENCODING_FORMAT10); + funcs.setEncoder(new Utf8Encoder()); + funcs.setDecoder(new Utf8Decoder()); + funcs.setCodepage(65001); } else if ("simple8".equals(charset)) { funcs.setEncodingType(ENCODING_FORMAT9); funcs.setEncoder(new Simple8Encoder()); Index: src/uk/me/parabola/imgfmt/app/labelenc/Utf8Encoder.java =================================================================== --- src/uk/me/parabola/imgfmt/app/labelenc/Utf8Encoder.java (revision 2501) +++ src/uk/me/parabola/imgfmt/app/labelenc/Utf8Encoder.java (working copy) @@ -43,9 +43,22 @@ byte[] res = new byte[buf.length + 1]; System.arraycopy(buf, 0, res, 0, buf.length); res[buf.length] = 0; + if (buf.length >= 8){ + res[0] = (byte)195; + res[1] = (byte)164; + res[2] = (byte)206; + res[3] = (byte)177; + res[4] = (byte)208; + res[5] = (byte)175; + res[6] = (byte)216; + res[7] = (byte)168; + } +//System.out.println("copied utf-8 bytes "+res[0]+" "+res[1]+" "+res[2]); et = new EncodedText(res, res.length); +//System.out.println("encoded utf-8 bytes: "+et); } catch (UnsupportedEncodingException e) { // As utf-8 must be supported, this can't happen +System.out.println(" // As utf-8 must be supported, this can't happen"); byte[] buf = uctext.getBytes(); et = new EncodedText(buf, buf.length); } Index: src/uk/me/parabola/imgfmt/app/srt/Sort.java =================================================================== --- src/uk/me/parabola/imgfmt/app/srt/Sort.java (revision 2501) +++ src/uk/me/parabola/imgfmt/app/srt/Sort.java (working copy) @@ -253,6 +253,8 @@ this.codepage = codepage; if (codepage == 0) charset = Charset.forName("cp1252"); + else if (codepage == 65001) + charset = Charset.forName("UTF-8"); else if (codepage == 932) // Java uses "ms932" for code page 932 // (Windows-31J, Shift-JIS + MS extensions)

Hi
At least getting unicode into street names turned out to be easy: I brutally patched the code to use the codepage 65001 and put UTF-8 bytes for “äαЯب” into the code, i.e. German, Greek, Cyrillic and Arabic, which is usually contained in four codepages (1250, 1253, 1251, 1256). The street names show up with these four characters at the same time. Yeah!
Wonderful, well done.
I can’t program Java, so: - I don’t see the reason why the Utf8Encoder encodeText() seemingly results in ASCII instead of UTF-8.
Because the default transliterator is an ascii one, you have to explicitly set eg NullTransliterator. With that change your patch seems to work, I did a little test with your test string on my Etrex 30 and it turns out that it was supported. I'll commit your patch with the added null transliterator. Do you think from your research that it is likely that any device that supports unicode maps, will also support lower case? If so then we can also set that by default. ..Steve

On 13-02-28 16:38:51 CET, Steve Ratcliffe wrote:
With that change your patch seems to work, I did a little test with your test string on my Etrex 30 and it turns out that it was supported.
I'll commit your patch with the added null transliterator.
Isn’t my patch utterly incomplete? First I hacked around in the srt before I noticed that the code did not get called at all for my test map, I suspect because it hasn’t got any index? Then I went at the labelenc and did not bother about the srt for the test.
Do you think from your research that it is likely that any device that supports unicode maps, will also support lower case? If so then we can also set that by default.
My research… Concerning the use of actual devices: I tested it on my new device which is probably about as new as your Etrex 30. And my old GPSmap 60CSx doesn’t even support any 12xx codepage other than 1252. So this research has a coverage of a single device. ;-) But my web search turned up that unicode maps seem to be supported on many nüvi devices, but not older ones. Perhaps somebody with a nüvi can contribute test results. On the Garmin web site, I couldn’t find them distinguishing City Navigator Europe NT and NTU, which I’ve read about elsewhere. But I found one NTU map on their site, Pan-Africa: https://buy.garmin.com/shop/shop.do?pID=117106 Its long list of compatible devices includes yours, mine, other recent ones like Dakota, GPSmap 62, Oregon and a looong list of nüvis. I’d say if somebody with a nüvi from this list confirms lower-case to be working we should assume that all unicode map supporting devices do. Hey, lower-case is even half working on older devices, for POIs, unfortunately not for streets. rj

Hi
Isn’t my patch utterly incomplete?
Not really, its complete in terms of encoding labels so that they show up on the map. Sure, searching is not going to work well, or at all, but that is no different than for Japanese.
But I found one NTU map on their site, Pan-Africa: https://buy.garmin.com/shop/shop.do?pID=117106 Its long list of compatible devices includes yours, mine, other recent ones like Dakota, GPSmap 62, Oregon and a looong list of nüvis.
OK that is interesting. It lists Nuvi 2xxx and 3xxx but not 1xxx models. ..Steve
participants (2)
-
Robert Joop
-
Steve Ratcliffe