
Hi Ticker, I think I found an example in a map for China. A road name contains U+200E LEFT-TO-RIGHT MARK https://www.fontspace.com/unicode/analyzer#e=55Sw6LSd4oCO5LiJ6Lev Another road name doesn't contain this. Both strings are TERTIARY equal but not identical. It should be possible to create some test maps to find out how Garmin treats this :) Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Samstag, 27. November 2021 13:05 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] Small problem with global index Hi Gerd The drastic case which mdrUnicode_v9b fixes is the index byte size crash, and this is most easily demonstrating by having enough ignorable characters, eg shields or Chinese & Unicode without the original Sort fix. Until this crash point is reached, I've no idea if there is a problem that can be demonstrated. However, the data structure whereby Mdr25 shares the same byte-size pointer to Mdr5 strongly indicates that these should be kept in step and anything that allows Mdr25 to be bigger than Mdr5 muse be wrong. The new version of Sort for Unicode assumes that if ordering has been defined for any characters in a [256] page, then any characters in this page but not defined will get a zero/ignore sortOrder. If nothing has been defined for the page the code will invent an sortOrder. So some diag code in Sort should be able to list some other characters that will get a zero sortOrder, hopefully there might be some nice name-like chars amongst them. Apart from the ignored sortOrder chars making a difference between TERTIARY and .equal(): A significant consideration is that Sort.java doesn't sort higher than the TERTIARY level, so it is possible to end up with a section of TERTIARY same records, but adjacent records in this set might be .equal() or there might be a non equal record between equals. Again, no idea if this will matter in areas like the repeat flag setting, but it indicates strongly that should use collator.compare rather than .equal() for dedup. Ticker On Sat, 2021-11-27 at 10:54 +0000, Gerd Petermann wrote:
Hi Ticker,
running in circles, aren't we? I ask for sample data to show that mdrUnicode_v9b.patch makes a difference in some special case. I totally agree that either Mdr5 or Mdr25 should be changed, and probably other places, too.
What I really like to have is a (small) example that shows a difference between TERTIARY and EQUAL so that I am able to compare mkgmap results with what Garmin does. The highway shield codes may not be a good idea in case Garmin treats them special, but I also would like to understand that special handling if it exists.
I think all I need is a way to find a String that gives TERTIARY == 0 and String.equals() returns false for a given codepage. Maybe this is totally clear to you but it is not for me.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Samstag, 27. November 2021 11:23 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] Small problem with global index
Hi Gerd
mdrUnicode_v9b.patch isn't related to the issue of case-variants; it is about keeping consistency between Mdr5 and Mdr25 indexes. This will go wrong when there is a difference between TERTIARY and EQUAL in Country, Region and City names. It may be that this doesn't matter to Garmin software, or, more likely, will introduce slight errors in what is findable.
If you don't want to accept this patch, I think changes would be needed to Mdr5 to replace TERTIARY collator use with .equals().
Ticker
On Fri, 2021-11-26 at 18:04 +0000, Gerd Petermann wrote:
Hi Ticker,
sorry, meant r4718 instead of 4717 before.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Freitag, 26. November 2021 18:06 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] Small problem with global index
Hi Ticker,
I tried this: use your command to build a gmapi and gmapsupp, but replace r4810 by a binary compiled from mkgmap r4717 + mdrUnicode_v9b.patch (I still see no difference in the output compared to unpatched r4717)
I then use MapSource to create another gmapsupp. I run MdrDisplay and MdrCheck on both gmapsupp.img and see different repeat flags. MdrCheck + my patch display-no-secondary.patch complains a lot about the gmapsupp with your patch but reports only 1 problem about a city without name. When I try this with the unpatched display tool it complains a lot about the gmapsupp from MapSource but not about the one from mkgmap. I think that shows that unpatched MdrCheck is wrong.
I tried this also with a binary from r4817 with attached mkgmap-no- secondary-v2.patch with your command. The two outputs from MdrCheck are identical, and I think the outputs for MdrDisplay differ only in offsets. I consider this very good. In MapSource the search for "Baybride Lane" and "Alma Lane" both return wherWell, so that's also good.
I prefer a patch that changes mkgmap to produce the same index as MapSource.
I hope I've done nothing wrong during testing. Do you get other results?
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Freitag, 26. November 2021 16:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] Small problem with global index
Hi Gerd
I was sort of thinking the opposite. PlaceFile using some method (eg what it does) to dedupe city/region/country and this is extended to the POI/MDRBuilder logic, such that combinations of these 3 have unique sets of index values regardless of case.
Then the relevant MdrX sections should be able to do a SECONDARY dedup on these, to cope with case-variants coming from different tiles.
Then checking that this does actually work with Garmin software (ie hope nothing cares that the index entries might not match the LBL data in some of the tiles)
If this works, there should only be one city presented in the find options - eg, from the original problem data, it might be "De Wijk" or "de Wijk"
Then making MdrCheck tolerant of this as well.
An alternative is just to ignore the whole issue - no one else has ever noticed and complained.
I was hoping to get mdrUnicode_v9b.patch accepted before tackling this. Its purpose is to fix the crash when pathological city / region / country names or incomplete sortorder codepage data causes enough difference between TERTIARY & EQUAL to make Mdr25 index size too big.
Ticker
On Fri, 2021-11-26 at 10:56 +0000, Gerd Petermann wrote:
Hi Ticker,
reg. --lower-case and city/region/country names with different capitalization: I think it would be good to keep the different capitalization within a single tile, so yes, the .toUpperCase() in PlacesFile is probably not a good idea. Results seem better when this is not done. When the global index is created we can log warnings for those cases, but I don't see yet how we can create a valid index which doesn't require the user to decide whether wherWell or Wherwell should be searched.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev