patched polish file charset and multipolygon handling

Dear developers, I'm new to mkgmap, just replacing cgpsmapper because it's slow, buggy and closed source. I have tried to compile from an .mp file. The resulting .img file had two huge problems: 1. Wrong character encoding. I used an .mp source file in code page ISO-8859-2 (or Windows-1250). I was unable to fix encoding of Hungarian accented characters (áéíóúöőüű), tried all possible variations of --charset and --code-page options. I even tried to iconv .mp file to UTF-8 but it was also wrong way, img file had UTF-8 labels. Finally, I have modified READING_CHARSET in mkgmap/reader/polish/PolishMapDataSource.java from "UTF-8" to "ISO-8859-2" and accented characters started to work. The used config was --charset=cp1252 --code-page=1252. The same worked when I specified ISO-8859-1, because the listed Hungarian characters are on the same place in two charsets. I think READING_CHARSET should be a new option or linked to --charset option. 2. Multipolygon errors. Polygons with holes looked bizarre, long triangles appeared between endpoints of parts with alternating (negative/positive) rendering. Test showed detailed problems: a. parts of multipart polygons were not closed at the last point b. parts of multipart polygons were appended to each other, not stored as separated parts c. multipolygon splitter treated "new" holes as areas, not holes Investigating problem "b" showed that mkgmap stores polygons as a single List<Coord>. Garmin img also doesn't support holes or multipolygons. For a working solution, I have created a workaround model for multipart polygons: first point of first part stands as a global starting point. Second part (hole or area, doesn't mind) is connected to this starting point with a zero-width capillary on the start and on the end as well. Every other parts are connected on the same way. Every parts should be closed (first node = last node), this was not checked before (problem "a"). Problem "b" became the spine of the solution, parts are appended. The connection between parts are only visible if polygon border is rendered. Old cgpsmapper did the same way for holes, only difference is that cgpsmapper looked up the closest nodes for connection. I did not want to implement such a power-hungry algorithm, it's indifferent theoretically where holes are connected to areas. Another difference from cgpsmapper: multipolygons of disjoint areas were published to img as two independent polygons, resulting two labels on map. I have implemented this model in two polygon processing class. Resulting .img file is now correct, even islands appear in holes. Modified files: mkgmap/reader/polish/PolishMapDataSource.java mkgmap/filters/PolygonSplitterBase.java QUESTION: Where can I post my patch? Can I use SVN? Which branch? Attached patch against r1846. Regards, András Kolesár

Am 21.02.2011 13:22, schrieb Kolesár András:
I'm new to mkgmap, just replacing cgpsmapper because it's slow, buggy and closed source. I have tried to compile from an .mp file.
Hi, you can directly compile .osm Files. And we also already have a working Multipolygon-Splitter for these. Chris

My data is in .mp file, not .osm. I think mkgmap would better to have working .mp file support. András -------- Eredeti üzenet -------- Feladó: Chris66 <chris66nrw@gmx.de> Címzett: mkgmap-dev@lists.mkgmap.org.uk Dátum: 2011.02.21 13:56 Tárgy: Re: [mkgmap-dev] patched polish file charset and multipolygon handling Am 21.02.2011 13:22, schrieb Kolesár András: I'm new to mkgmap, just replacing cgpsmapper because it's slow, buggy and closed source. I have tried to compile from an .mp file. Hi, you can directly compile .osm Files. And we also already have a working Multipolygon-Splitter for these. Chris

Dear András, thanks for your patch and for your comments about mkgmap. I cannot say anything about the mp format processing of mkgmap because I did not have any mp formatted input file. Regarding the multipolygon errors: I can confirm that there is a bug in the PolygonSplitterBase class. Splitted polygons are not closed (often?, each time?). I found that some weeks ago while implementing a new strategy to split the data in to garmin subdivisions. As it does not have any effect on the output of mkgmap I didn't commit a fix for that to the trunk yet. I tried your patch for the PolygonSplitterBase class with osm input files. I could not see any visual problems although I would expect some. Your zero-width capillary often crosses the border of the polygon so you get self intesecting polygons. I wonder why this does not make problems. I expect that there will be problems in case a polygon is splitted twice because the Java2D Area class used in the PolygonSplitterBase class removes such capillarys automatically. Before committing such a patch we have to check that very carefully. Have fun! WanMil P.S.: This mailing list is the correct place to post your patches. Maybe you can create patches without the top directory. That makes it easier to apply patches to differently named workspaces.
Dear developers,
I'm new to mkgmap, just replacing cgpsmapper because it's slow, buggy and closed source. I have tried to compile from an .mp file. The resulting .img file had two huge problems:
1. Wrong character encoding.
I used an .mp source file in code page ISO-8859-2 (or Windows-1250). I was unable to fix encoding of Hungarian accented characters (áéíóúöőüű), tried all possible variations of --charset and --code-page options. I even tried to iconv .mp file to UTF-8 but it was also wrong way, img file had UTF-8 labels.
Finally, I have modified READING_CHARSET in mkgmap/reader/polish/PolishMapDataSource.java from "UTF-8" to "ISO-8859-2" and accented characters started to work. The used config was --charset=cp1252 --code-page=1252. The same worked when I specified ISO-8859-1, because the listed Hungarian characters are on the same place in two charsets. I think READING_CHARSET should be a new option or linked to --charset option.
2. Multipolygon errors.
Polygons with holes looked bizarre, long triangles appeared between endpoints of parts with alternating (negative/positive) rendering. Test showed detailed problems:
a. parts of multipart polygons were not closed at the last point b. parts of multipart polygons were appended to each other, not stored as separated parts c. multipolygon splitter treated "new" holes as areas, not holes
Investigating problem "b" showed that mkgmap stores polygons as a single List<Coord>. Garmin img also doesn't support holes or multipolygons.
For a working solution, I have created a workaround model for multipart polygons: first point of first part stands as a global starting point. Second part (hole or area, doesn't mind) is connected to this starting point with a zero-width capillary on the start and on the end as well. Every other parts are connected on the same way. Every parts should be closed (first node = last node), this was not checked before (problem "a"). Problem "b" became the spine of the solution, parts are appended. The connection between parts are only visible if polygon border is rendered.
Old cgpsmapper did the same way for holes, only difference is that cgpsmapper looked up the closest nodes for connection. I did not want to implement such a power-hungry algorithm, it's indifferent theoretically where holes are connected to areas. Another difference from cgpsmapper: multipolygons of disjoint areas were published to img as two independent polygons, resulting two labels on map.
I have implemented this model in two polygon processing class. Resulting .img file is now correct, even islands appear in holes. Modified files:
mkgmap/reader/polish/PolishMapDataSource.java mkgmap/filters/PolygonSplitterBase.java
QUESTION: Where can I post my patch? Can I use SVN? Which branch? Attached patch against r1846.
Regards, András Kolesár
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

On 21/02/11 12:22, Kolesár András wrote: Hello, Welcome to the list.
Finally, I have modified READING_CHARSET in mkgmap/reader/polish/PolishMapDataSource.java from "UTF-8" to "ISO-8859-2" and accented characters started to work. The used config
Yes, you are correct. The way it was meant to work was that, since you didn't know the codepage before reading the file, you read the file in iso-8859-1 always. When you save a label, you recover the bytes from the string that you have read (it was read incorrectly because the character set is different, but you can always recover the actual bytes that were in the file) and decode them into unicode using the correct charset. The recode() method does this. But.. then READING_CHARSET was changed to utf-8 to deal with a commonly found kind of file, and the recode() method only works properly if the READING_CHARSET is iso-8859-1 (or similar 8-bit only charset). The change to utf-8 was made, I belive, because there are files that do no contain a CodePage and have the strings in utf-8 (produced by osm2mp). I've never used cgpsmapper, so I don't know if there is a standard way to say that the file is in utf-8 for this case. So I guess, we should change READING_CHARSET back to iso-8859-1 and find some other way to deal with utf-8 files if it is still an important use. Best wishes ..Steve
participants (4)
-
Chris66
-
Kolesár András
-
Steve Ratcliffe
-
WanMil