
On Thu, Feb 5, 2009 at 9:14 PM, Steve Ratcliffe <steve@parabola.demon.co.uk> wrote:
added to the PolishMapDataSource class which assumes the characters are iso-8859-1. To fix it I changed '--codepage' to '--codepage 1252'
Yes mkgmap used to have bugs in recognising the codepage in the .mp file and people came up with various workarounds that didn't work for everyone.
Now mkgmap is fixed to use the codepage that is in the .mp file have to give the correct code page to osm2mp.
Note that the default codepage with osm2mp is 1251 which is for Russian and so it is essential to give the --codepage 1252 option.
I was hit by this bug as well, I used to call osm2mp.pl with --nocodepage which resulted in a UTF-8 .mp file being written, but now I need to call osm2mp.pl with --codepage 1252 as you suggest before mkgmap will grok what encoding it's in. Here's the difference between a --nocodepage and --codepage 1252 file written by osm2mp.pl: """ --- nocodepage.mp 2009-02-10 17:46:53.000000000 +0000 +++ codepage.mp 2009-02-10 17:59:53.000000000 +0000 @@ -3,7 +3,8 @@ Name=OSM routable -; UTF-8 encoding +LblCoding=9 +CodePage=1252 POINumberFirst=N @@ -28,7 +29,7 @@ """ If --nocodepage is used the file will be in UTF-8 but nothing is written in the file to indicate this, is this a osm2mp.pl bug or are .mp files supposed to be in UTF-8 if nothing defines them as being in another encoding? I'd rather produce a UTF-8 .mp file and have mkgmap read that file than producing a Windows 1252 encoded file. Before version 31 of osm2mp.pl it used to write this out if called with --nocodepage: LblCoding=9 CodePage=1251 Now it'll write out nothing, this was changed in revision 31: """ $ svn diff -r 30:31 Index: header.tpl =================================================================== --- header.tpl (revision 30) +++ header.tpl (revision 31) @@ -2,8 +2,12 @@ ID=[% mapid %] Name=[% mapname %] +[% IF codepage %] LblCoding=9 CodePage=[% codepage %] +[% ELSE %] +; UTF-8 encoding +[% END %] POINumberFirst=N DefaultCityCountry=[% defaultcountry %] Index: osm2mp.pl =================================================================== --- osm2mp.pl (revision 30) +++ osm2mp.pl (revision 31) @@ -78,6 +78,10 @@ "background!", => \$background, ); +undef $codepage if ($nocodepage); + + + #### Action use strict; """ However the current mkgmap supports neither file, with osm2mp.pl version 30 it won't pick up that the file is in UTF-8, and with version 31 it'll presume UTF-8 encoded data is in Windows 1252 (or something like that) and write question mark characters where non-ascii occurs.