Option --latin1 not working any more?

newer
Re: [mkgmap-dev] POI and routing...

Jan-Ole Spahn

5 Feb 2009 5 Feb '09

12:58 p.m.

I'm sorry if this was discussed before, I just subscribed! In the version route-r806 the option --latin1 worked to get german Umlauts displayed correctly (Using nocodepage option for osm2mp). Now i tried r858 and r861, and runnig with same options the unlauts will be displayed as ??, on Garmin oregon and QLandkarte. This was pretty working for umlauts: #java -Xmx512M -enableassertions -jar mkgmap-route-r806/mkgmap.jar --latin1 --route --gmapsupp map1.mp This produces ?? instead of umlauts: java -Xmx512M -enableassertions -jar mkgmap-r861/mkgmap.jar --latin1 --route --gmapsupp map1.mp Something known about this? Best regards, Jan

Show replies by date

Ben Konrath

5 Feb 5 Feb

2:51 p.m.

Hi Jan-Ole, On Thu, Feb 5, 2009 at 6:58 AM, Jan-Ole Spahn <Jampl@gmx.de> wrote: <snip>

...

In the version route-r806 the option --latin1 worked to get german Umlauts displayed correctly (Using nocodepage option for osm2mp).

Now i tried r858 and r861, and runnig with same options the unlauts will be displayed as ??, on Garmin oregon and QLandkarte.

This was pretty working for umlauts: #java -Xmx512M -enableassertions -jar mkgmap-route-r806/mkgmap.jar --latin1 --route --gmapsupp map1.mp

This produces ?? instead of umlauts: java -Xmx512M -enableassertions -jar mkgmap-r861/mkgmap.jar --latin1 --route --gmapsupp map1.mp

I had the same problem except with Spanish accents when I switched from using the route branch to truck. It seems that some code was added to the PolishMapDataSource class which assumes the characters are iso-8859-1. To fix it I changed '--codepage' to '--codepage 1252' in my call to osm2mp.pl. If I understand things correctly, this should work for you as well. Cheers, Ben

Jan-Ole Spahn

4:28 p.m.

Hi Ben, many thanks, that works! jan Am Donnerstag, den 05.02.2009, 08:51 -0600 schrieb Ben Konrath:

...

Hi Jan-Ole,

On Thu, Feb 5, 2009 at 6:58 AM, Jan-Ole Spahn <Jampl@gmx.de> wrote: <snip>

...
In the version route-r806 the option --latin1 worked to get german Umlauts displayed correctly (Using nocodepage option for osm2mp).

Now i tried r858 and r861, and runnig with same options the unlauts will be displayed as ??, on Garmin oregon and QLandkarte.

This was pretty working for umlauts: #java -Xmx512M -enableassertions -jar mkgmap-route-r806/mkgmap.jar --latin1 --route --gmapsupp map1.mp

This produces ?? instead of umlauts: java -Xmx512M -enableassertions -jar mkgmap-r861/mkgmap.jar --latin1 --route --gmapsupp map1.mp

I had the same problem except with Spanish accents when I switched from using the route branch to truck. It seems that some code was added to the PolishMapDataSource class which assumes the characters are iso-8859-1. To fix it I changed '--codepage' to '--codepage 1252' in my call to osm2mp.pl. If I understand things correctly, this should work for you as well.

Cheers, Ben _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Steve Ratcliffe

9:14 p.m.

...

added to the PolishMapDataSource class which assumes the characters are iso-8859-1. To fix it I changed '--codepage' to '--codepage 1252'

Yes mkgmap used to have bugs in recognising the codepage in the .mp file and people came up with various workarounds that didn't work for everyone. Now mkgmap is fixed to use the codepage that is in the .mp file have to give the correct code page to osm2mp. Note that the default codepage with osm2mp is 1251 which is for Russian and so it is essential to give the --codepage 1252 option. ..Steve

Ævar Arnfjörð Bjarmason

10 Feb 10 Feb

6:24 p.m.

On Thu, Feb 5, 2009 at 9:14 PM, Steve Ratcliffe <steve@parabola.demon.co.uk> wrote:

...

...
added to the PolishMapDataSource class which assumes the characters are iso-8859-1. To fix it I changed '--codepage' to '--codepage 1252'

Yes mkgmap used to have bugs in recognising the codepage in the .mp file and people came up with various workarounds that didn't work for everyone.

Now mkgmap is fixed to use the codepage that is in the .mp file have to give the correct code page to osm2mp.

Note that the default codepage with osm2mp is 1251 which is for Russian and so it is essential to give the --codepage 1252 option.

I was hit by this bug as well, I used to call osm2mp.pl with --nocodepage which resulted in a UTF-8 .mp file being written, but now I need to call osm2mp.pl with --codepage 1252 as you suggest before mkgmap will grok what encoding it's in. Here's the difference between a --nocodepage and --codepage 1252 file written by osm2mp.pl: """ --- nocodepage.mp 2009-02-10 17:46:53.000000000 +0000 +++ codepage.mp 2009-02-10 17:59:53.000000000 +0000 @@ -3,7 +3,8 @@ Name=OSM routable -; UTF-8 encoding +LblCoding=9 +CodePage=1252 POINumberFirst=N @@ -28,7 +29,7 @@ """ If --nocodepage is used the file will be in UTF-8 but nothing is written in the file to indicate this, is this a osm2mp.pl bug or are .mp files supposed to be in UTF-8 if nothing defines them as being in another encoding? I'd rather produce a UTF-8 .mp file and have mkgmap read that file than producing a Windows 1252 encoded file. Before version 31 of osm2mp.pl it used to write this out if called with --nocodepage: LblCoding=9 CodePage=1251 Now it'll write out nothing, this was changed in revision 31: """ $ svn diff -r 30:31 Index: header.tpl =================================================================== --- header.tpl (revision 30) +++ header.tpl (revision 31) @@ -2,8 +2,12 @@ ID=[% mapid %] Name=[% mapname %] +[% IF codepage %] LblCoding=9 CodePage=[% codepage %] +[% ELSE %] +; UTF-8 encoding +[% END %] POINumberFirst=N DefaultCityCountry=[% defaultcountry %] Index: osm2mp.pl =================================================================== --- osm2mp.pl (revision 30) +++ osm2mp.pl (revision 31) @@ -78,6 +78,10 @@ "background!", => \$background, ); +undef $codepage if ($nocodepage); + + + #### Action use strict; """ However the current mkgmap supports neither file, with osm2mp.pl version 30 it won't pick up that the file is in UTF-8, and with version 31 it'll presume UTF-8 encoded data is in Windows 1252 (or something like that) and write question mark characters where non-ascii occurs.

Johann Gail

7:12 p.m.

Yes, I see the bug and I'm looking at the moment for a solution. The solution should be, that mkgmap opens the mp-file by default as UTF-8 encoded and only change coding if a codapge tag is found.

Johann Gail

7:38 p.m.

osm2mp writes no codapage label into the file if called with the --nocodepage. Instead it writes the comment ';UTF-8 encoded.' So I assume, if no codepage is declared, UTF-8 is the default. With the following patch mkgmap assumes this also. For me it works now perfectly, if I call osm2mp with the option --nocodepage. Index: src/uk/me/parabola/mkgmap/reader/polish/PolishMapDataSource.java =================================================================== --- src/uk/me/parabola/mkgmap/reader/polish/PolishMapDataSource.java (Revision 869) +++ src/uk/me/parabola/mkgmap/reader/polish/PolishMapDataSource.java (Arbeitskopie) @@ -58,7 +58,7 @@ public class PolishMapDataSource extends MapperBasedMapDataSource implements LoadableMapDataSource { private static final Logger log = Logger.getLogger(PolishMapDataSource.class); - private static final String READING_CHARSET = "iso-8859-1"; + private static final String READING_CHARSET = "UTF-8"; private static final int S_IMG_ID = 1; private static final int S_POINT = 2; @@ -470,8 +470,7 @@ if (fc == 'm' || fc == 'M') elevUnits = 'm'; } else if (name.equals("CodePage")) { - if (!value.equals("1252")) - dec = Charset.forName("cp" + value).newDecoder(); + dec = Charset.forName("cp" + value).newDecoder(); } }

Steve Ratcliffe

10:27 p.m.

Hi Yes, as you say, previously, the --nocodepage was useless as it still put CodePage=1251 into the file, so you couldn't tell the difference. As osm2mp is now changed, I will apply the patch that Johann posted. However, when there is no code page, then the .mp file labels would normally be expected to be in ascii and not utf-8. Fortunately as utf-8 is backward compatible with ascii that is probably not going to affect any other file. ..Steve

Ævar Arnfjörð Bjarmason

10:54 p.m.

On Tue, Feb 10, 2009 at 10:27 PM, Steve Ratcliffe <steve@parabola.demon.co.uk> wrote:

...

Yes, as you say, previously, the --nocodepage was useless as it still put CodePage=1251 into the file, so you couldn't tell the difference.

As osm2mp is now changed, I will apply the patch that Johann posted.

However, when there is no code page, then the .mp file labels would normally be expected to be in ascii and not utf-8. Fortunately as utf-8 is backward compatible with ascii that is probably not going to affect any other file.

the cGPSmapper v2.4.3 manual[1] seems to specify a full inclusive list of allowed codpages in section 6.1 (page 55), the format doesn't support non-8-bit pseudo-codepages[2] and it would seem that the spec suggests that a file without a CodePage= declaration should be read as ASCII+8 bit garbage and not UTF-8 as you suggest. And if a user wants interoperable non-ASCII files Windows-1252 is recommended. That's pretty unuseful and I think the patch by Johann Gall should be kept, but that's what the spec seems to say on the matter. 1. http://www.cgpsmapper.com/download/cGPSmapper-UsrMan-v02.4.3.pdf 2. http://en.wikipedia.org/wiki/Code_page#Other_code_pages_of_note

Ævar Arnfjörð Bjarmason

12 Mar 12 Mar

2:32 p.m.

On Tue, Feb 10, 2009 at 10:27 PM, Steve Ratcliffe <steve@parabola.demon.co.uk> wrote:

...

Hi

Yes, as you say, previously, the --nocodepage was useless as it still put CodePage=1251 into the file, so you couldn't tell the difference.

As osm2mp is now changed, I will apply the patch that Johann posted.

His patch: } else if (name.equals("CodePage")) { - if (!value.equals("1252")) - dec = Charset.forName("cp" + value).newDecoder(); + dec = Charset.forName("cp" + value).newDecoder(); Seems to only set the character set when there's a CodePage key in the file, osm2mp.pl does not output any when run with --nocodepage. I tried running osm2mp.pl --nocodepage and mkgmap --latin1 on an osm export recently and all the non-ascii characters were screwed up, it worked again with --codepage 1252 though.

Johann Gail

5:45 p.m.

...

His patch:

} else if (name.equals("CodePage")) { - if (!value.equals("1252")) - dec = Charset.forName("cp" + value).newDecoder(); + dec = Charset.forName("cp" + value).newDecoder();

Seems to only set the character set when there's a CodePage key in the file, osm2mp.pl does not output any when run with --nocodepage.

I tried running osm2mp.pl --nocodepage and mkgmap --latin1 on an osm export recently and all the non-ascii characters were screwed up, it worked again with --codepage 1252 though.

The important part of the patch not this part, but the line in the beginning of the file. With this the default encoding gets changed to UTF8. - private static final String READING_CHARSET = "iso-8859-1"; + private static final String READING_CHARSET = "UTF-8"; Only if there is an explicit codepage given in the mp file, then the encoding gets changed to another charset. If it does not work for you, then please check the encoding of the mp file. There should be on each occurence of an umlaut two characters. Then it is UTF8. If this is not the case, then it has some other encoding.

Ævar Arnfjörð Bjarmason

7:49 p.m.

On Thu, Mar 12, 2009 at 5:45 PM, Johann Gail <johann.gail@gmx.de> wrote:

...

...
His patch:

} else if (name.equals("CodePage")) { - if (!value.equals("1252")) - dec = Charset.forName("cp" + value).newDecoder(); + dec = Charset.forName("cp" + value).newDecoder();

Seems to only set the character set when there's a CodePage key in the file, osm2mp.pl does not output any when run with --nocodepage.

I tried running osm2mp.pl --nocodepage and mkgmap --latin1 on an osm export recently and all the non-ascii characters were screwed up, it worked again with --codepage 1252 though.

The important part of the patch not this part, but the line in the beginning of the file. With this the default encoding gets changed to UTF8.

- private static final String READING_CHARSET = "iso-8859-1"; + private static final String READING_CHARSET = "UTF-8";

Only if there is an explicit codepage given in the mp file, then the encoding gets changed to another charset.

If it does not work for you, then please check the encoding of the mp file. There should be on each occurence of an umlaut two characters. Then it is UTF8. If this is not the case, then it has some other encoding.

Looks like I screwed something up, --nocodepage will produce a UTF-8 .mp file: mkdir -p /tmp/garmin-export cd !$ wget "http://api.openstreetmap.org/api/0.5/map?bbox=-22.093,64.025,-21.594,64.203" -O reykjavik.osm cd ~/src/osm2mp avar@t:~/src/osm2mp$ svn up At revision 49. avar@t:~/src/osm2mp$ perl osm2mp.pl --nocodepage /tmp/garmin-export/reykjavik.osm > /tmp/garmin-export/reykjavik.mp ---| OSM -> MP converter 0.70a (c) 2008,2009 liosha, xliosha@gmail.com Processing file /tmp/garmin-export/reykjavik.osm Loading nodes... 33070 loaded, 769 POIs dumped Loading relations... 3 multipolygons, 0 turn restrictions Loading holes... 4 loaded Processing ways... 4372 roads and 0 coastlines loaded 63 lines and 326 polygons dumped Merging roads... 389 merged Detecting road nodes... 7083 found Detecting duplicates... 73 segments, 37 roads Splitting roads... 247 self-intersections, 0 long roads Fixing close nodes... 302 pairs fixed Writing roads... 4230 written Writing restrictions... 0 written All done!! avar@t:/tmp/garmin-export$ file reykjavik.mp reykjavik.mp: UTF-8 Unicode English text avar@t:/tmp/garmin-export$ grep Álftanes$ reykjavik.mp |hexdump -C 00000000 4c 61 62 65 6c 3d c3 81 6c 66 74 61 6e 65 73 0a |Label=..lftanes.| 00000010 avar@t:/tmp/garmin-export/mp$ java -jar ~/src/mkgmap/dist/mkgmap.jar --description="OSM Reykjavik MP" --latin1 --gmapsupp ../reykjavik.mp And the resulting gmapsupp.img doesn't have encoding problems: http://u.nix.is/mp/gmapsupp.img

5827

Age (days ago)

5862

Last active (days ago)

List overview

11 comments

5 participants

participants (5)

Ben Konrath
Jan-Ole Spahn
Johann Gail
Steve Ratcliffe
Ævar Arnfjörð Bjarmason