java.lang.AssertionError while building index from unicode tiles
data:image/s3,"s3://crabby-images/33827/3382740045970e570544f2258e3f17810f5f6752" alt=""
Hi all I'm getting error below while building index from this tile: https://files.mkgmap.org.uk/download/523/31177029.o5m. Minimum mkgmap options triggering the error are: java -jar mkgmap-trunk.jar --bounds=bounds.zip --index --unicode 31177029.o5m Another tile from the same splitter run also fails but all other 65 of 67 build fine. With --code-page=936 they all build fine. Command output: Exception in thread "main" java.lang.AssertionError: 10586 at uk.me.parabola.imgfmt.app.FileBackedImgFileWriter.putNu(FileBackedImgFileWriter.java:215) at uk.me.parabola.imgfmt.app.mdr.Mdr29.writeSectData(Mdr29.java:94) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSection(MDRFile.java:424) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:388) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:270) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:331) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:690) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:126) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:147) at uk.me.parabola.mkgmap.main.Main.main(Main.java:118)
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Carlos, I can reproduce the crash. Not sure where to fix this yet... Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Carlos Dávila <carlos@alternativaslibres.org> Gesendet: Donnerstag, 14. Oktober 2021 18:33 An: Development list for mkgmap Betreff: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi all I'm getting error below while building index from this tile: https://files.mkgmap.org.uk/download/523/31177029.o5m. Minimum mkgmap options triggering the error are: java -jar mkgmap-trunk.jar --bounds=bounds.zip --index --unicode 31177029.o5m Another tile from the same splitter run also fails but all other 65 of 67 build fine. With --code-page=936 they all build fine. Command output: Exception in thread "main" java.lang.AssertionError: 10586 at uk.me.parabola.imgfmt.app.FileBackedImgFileWriter.putNu(FileBackedImgFileWriter.java:215) at uk.me.parabola.imgfmt.app.mdr.Mdr29.writeSectData(Mdr29.java:94) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSection(MDRFile.java:424) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:388) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:270) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:331) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:690) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:126) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:147) at uk.me.parabola.mkgmap.main.Main.main(Main.java:118) _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Carlos, I think there are at least two problems: 1) something is wrong with the unicode String comparison, but I have no clue how it should work. The tile contains > 10000 city POI, but mkgmap detects only 145 different names with unicode. Method Mdr5Record.isSameByName(Collator collator, Mdr5Record other) returns true for names which look very different to me. 2) We don't use the method Mdr5Record.isSameByName() when section Mdr25 is written (Cities are sorted by country and then by the mdr5 city record number). Instead normal java String.equals() is used and thus the list contains the expected > 10000 entries. This list requires a two-byte value in the index. The crash happens because we try to write the position in the mdr25 list with only one byte cause of this code in Mdr29.java: int size25 = sizes.getSize(5); // NB appears to be size of 5 (cities), not 25 (cities with country). The comment already shows that this is probably only correct when boths lists have the same number of entries. I hope Steve or Ticker have an idea what's wrong. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Freitag, 15. Oktober 2021 10:09 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Carlos, I can reproduce the crash. Not sure where to fix this yet... Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Carlos Dávila <carlos@alternativaslibres.org> Gesendet: Donnerstag, 14. Oktober 2021 18:33 An: Development list for mkgmap Betreff: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi all I'm getting error below while building index from this tile: https://files.mkgmap.org.uk/download/523/31177029.o5m. Minimum mkgmap options triggering the error are: java -jar mkgmap-trunk.jar --bounds=bounds.zip --index --unicode 31177029.o5m Another tile from the same splitter run also fails but all other 65 of 67 build fine. With --code-page=936 they all build fine. Command output: Exception in thread "main" java.lang.AssertionError: 10586 at uk.me.parabola.imgfmt.app.FileBackedImgFileWriter.putNu(FileBackedImgFileWriter.java:215) at uk.me.parabola.imgfmt.app.mdr.Mdr29.writeSectData(Mdr29.java:94) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSection(MDRFile.java:424) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:388) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:270) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:331) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:690) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:126) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:147) at uk.me.parabola.mkgmap.main.Main.main(Main.java:118) _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi I can also reproduce this. I'll investigate, but am no expert on java sort/collation. Ticker
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see https://softwareengineering.stackexchange.com/questions/102205/should-utf-16... I think it is only --index processing where this is a problem mkgmap. I'll investigate more Ticker
data:image/s3,"s3://crabby-images/33827/3382740045970e570544f2258e3f17810f5f6752" alt=""
In that case, it seems estrange that only 2 of 67 tiles of China map cause problems, doesn't it? El 17/10/21 a las 12:16, Ticker Berkin escribió:
Hi
It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see
https://softwareengineering.stackexchange.com/questions/102205/should-utf-16... <https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful>
I think it is only --index processing where this is a problem mkgmap.
I'll investigate more
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Carlos, no, the index is probably wrong for the other tiles as well. Just the special case that causes the exception doesn't occur when e.g. the list of Mdr5 entries has more than 256 items. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Carlos Dávila <carlos@alternativaslibres.org> Gesendet: Sonntag, 17. Oktober 2021 13:48 An: mkgmap-dev@lists.mkgmap.org.uk Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles In that case, it seems estrange that only 2 of 67 tiles of China map cause problems, doesn't it? El 17/10/21 a las 12:16, Ticker Berkin escribió:
Hi
It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see
https://softwareengineering.stackexchange.com/questions/102205/should-utf-16... <https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful>
I think it is only --index processing where this is a problem mkgmap.
I'll investigate more
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Although 2 16-bit items (surrogate pairs in UTF-16 speak) are required to represent many Chinese characters, this isn't the significant problem in this case. Problem is that resources/sort/cp65001.txt doesn't give ordering to lots of characters; it looks like it covers only about 10,500 of the 1,112,064 possible code-points. Many of these non-ordered characters are being used by the names in the tile in question. The basic handling for other codings (eg cp125*) uses a missing sort as the basis for ignoring the character; it won't be represented in the output so no point in considering it in the sorting. This isn't the case with Unicode as all characters should show, but, more importantly relating to this crash, stable sorting is required for de-duplication of some of the index structures this isn't happening because of characters being ignored. Assuming the actual ordering of unspecified code-points doesn't really matter, I propose to change the logic slightly so undefined Unicode is sorted on its 16-bit value after the range of known sorts. I also need to make SortKey generation consistent in a similar way, fix some of uniqueness tests to be consistent with the sort and verify that the size of mdr5 is >= mdr25 so this type problem is detected before it is exposed when mdr25 indexes can't be represented in the same number of bytes as mdr5 indexes. Ticker On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
Hi
It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see
https://softwareengineering.stackexchange.com/questions/102205/should-utf-16...
I think it is only --index processing where this is a problem mkgmap.
I'll investigate more
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, thanks for looking into this. I have no clue how to test if the index really works with those characters as I don't know how to type them. If I got you right mkgmap isn't able to sort the city names so I wonder how the index can be of any use? I assume we have the same problem for other names like those for highways, POI etc? Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 09:58 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Although 2 16-bit items (surrogate pairs in UTF-16 speak) are required to represent many Chinese characters, this isn't the significant problem in this case. Problem is that resources/sort/cp65001.txt doesn't give ordering to lots of characters; it looks like it covers only about 10,500 of the 1,112,064 possible code-points. Many of these non-ordered characters are being used by the names in the tile in question. The basic handling for other codings (eg cp125*) uses a missing sort as the basis for ignoring the character; it won't be represented in the output so no point in considering it in the sorting. This isn't the case with Unicode as all characters should show, but, more importantly relating to this crash, stable sorting is required for de-duplication of some of the index structures this isn't happening because of characters being ignored. Assuming the actual ordering of unspecified code-points doesn't really matter, I propose to change the logic slightly so undefined Unicode is sorted on its 16-bit value after the range of known sorts. I also need to make SortKey generation consistent in a similar way, fix some of uniqueness tests to be consistent with the sort and verify that the size of mdr5 is >= mdr25 so this type problem is detected before it is exposed when mdr25 indexes can't be represented in the same number of bytes as mdr5 indexes. Ticker On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
Hi
It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see
https://softwareengineering.stackexchange.com/questions/102205/should-utf-16...
I think it is only --index processing where this is a problem mkgmap.
I'll investigate more
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Yes - I don't know how we could test Garmin device/software use of these indexes. Does the mkgmap ordering have to agree with something Garmin is going to presume? Maybe it doesn't matter as long as there is consistency where one ordered mdr structure points into another ordered mdr. So, I propose to not worry about the actual ordering, but just make it use all available information so that sort/unique dedupe works correctly and do this consistently wherever necessary. This also side- steps the issue of surrogate-pairs, which would need more significant changes in code structure to deal with. It's interesting that the existing code would have generated as more- or-less unsorted mdr5 and rubbish mdr25/mdr29 when -unicode for chars without sort entries and no one has complained. Ticker On Mon, 2021-10-18 at 08:12 +0000, Gerd Petermann wrote:
Hi Ticker,
thanks for looking into this. I have no clue how to test if the index really works with those characters as I don't know how to type them. If I got you right mkgmap isn't able to sort the city names so I wonder how the index can be of any use? I assume we have the same problem for other names like those for highways, POI etc?
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 09:58 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi
Although 2 16-bit items (surrogate pairs in UTF-16 speak) are required to represent many Chinese characters, this isn't the significant problem in this case.
Problem is that resources/sort/cp65001.txt doesn't give ordering to lots of characters; it looks like it covers only about 10,500 of the 1,112,064 possible code-points. Many of these non-ordered characters are being used by the names in the tile in question.
The basic handling for other codings (eg cp125*) uses a missing sort as the basis for ignoring the character; it won't be represented in the output so no point in considering it in the sorting.
This isn't the case with Unicode as all characters should show, but, more importantly relating to this crash, stable sorting is required for de-duplication of some of the index structures this isn't happening because of characters being ignored.
Assuming the actual ordering of unspecified code-points doesn't really matter, I propose to change the logic slightly so undefined Unicode is sorted on its 16-bit value after the range of known sorts.
I also need to make SortKey generation consistent in a similar way, fix some of uniqueness tests to be consistent with the sort and verify that the size of mdr5 is >= mdr25 so this type problem is detected before it is exposed when mdr25 indexes can't be represented in the same number of bytes as mdr5 indexes.
Ticker
On Sun, 2021-10-17 at 11:16 +0100, Ticker Berkin wrote:
Hi
It is most likely that this problem is because Chinese requires 2 UTF16 chars to encode many of its characters - see
https://softwareengineering.stackexchange.com/questions/102205/should-utf-16...
I think it is only --index processing where this is a problem mkgmap.
I'll investigate more
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd In imgfmt/app/srt/Sort.java around line 853: // Get the first non-ignorable at this level int c = chars[pos++ & 0xff]; if (!hasPage(c >>> 8)) { I'm at a loss to understand the 0xff mask! am I missing something? Ticker
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I've never tried to understand that code, but yes, masking a position looks wrong. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 10:52 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Gerd In imgfmt/app/srt/Sort.java around line 853: // Get the first non-ignorable at this level int c = chars[pos++ & 0xff]; if (!hasPage(c >>> 8)) { I'm at a loss to understand the 0xff mask! am I missing something? Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Here is first version of the changes to improve MDR unicode and stop the crash. It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort. Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName) I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well. Ticker
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory". Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Gerd Here is first version of the changes to improve MDR unicode and stop the crash. It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort. Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName) I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well. Ticker
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I'd removed the change relating to clearing the reference to the Sort object to allow garbage garbage collection; as you said, this won't happen because Sort is shared. I do notice, however, that on a typical mkgmap run, Sort is created/read 3 times - it isn't shared as fully as possible. The other changes (LargeListSorter) are slight improvements to memory usage and/or processing time - I can remove them if you want. Ticker On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
Hi Ticker,
please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory".
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
Here is first version of the changes to improve MDR unicode and stop the crash.
It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file
In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort.
Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName)
I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well.
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, yes, please remove all unrelated optimizations. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 11:03 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Gerd I'd removed the change relating to clearing the reference to the Sort object to allow garbage garbage collection; as you said, this won't happen because Sort is shared. I do notice, however, that on a typical mkgmap run, Sort is created/read 3 times - it isn't shared as fully as possible. The other changes (LargeListSorter) are slight improvements to memory usage and/or processing time - I can remove them if you want. Ticker On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
Hi Ticker,
please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory".
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
Here is first version of the changes to improve MDR unicode and stop the crash.
It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file
In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort.
Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName)
I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well.
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Here it is Ticker On Tue, 2021-10-19 at 09:22 +0000, Gerd Petermann wrote:
Hi Ticker,
yes, please remove all unrelated optimizations.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 11:03 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
I'd removed the change relating to clearing the reference to the Sort object to allow garbage garbage collection; as you said, this won't happen because Sort is shared. I do notice, however, that on a typical mkgmap run, Sort is created/read 3 times - it isn't shared as fully as possible.
The other changes (LargeListSorter) are slight improvements to memory usage and/or processing time - I can remove them if you want.
Ticker
On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
Hi Ticker,
please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory".
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
Here is first version of the changes to improve MDR unicode and stop the crash.
It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file
In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort.
Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName)
I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well.
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, please double check Mdr25: I just wonder why we compare the region name when we sort by the country name. Looks wrong (also in the unpatched code) Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 12:10 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Gerd Here it is Ticker On Tue, 2021-10-19 at 09:22 +0000, Gerd Petermann wrote:
Hi Ticker,
yes, please remove all unrelated optimizations.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 11:03 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
I'd removed the change relating to clearing the reference to the Sort object to allow garbage garbage collection; as you said, this won't happen because Sort is shared. I do notice, however, that on a typical mkgmap run, Sort is created/read 3 times - it isn't shared as fully as possible.
The other changes (LargeListSorter) are slight improvements to memory usage and/or processing time - I can remove them if you want.
Ticker
On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
Hi Ticker,
please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory".
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
Here is first version of the changes to improve MDR unicode and stop the crash.
It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file
In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort.
Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName)
I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well.
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker & Steve, I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping? I also see differences between the code in MdrCheck and the classes in mkgmap. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Mittwoch, 20. Oktober 2021 09:59 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Ticker, please double check Mdr25: I just wonder why we compare the region name when we sort by the country name. Looks wrong (also in the unpatched code) Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 12:10 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Gerd Here it is Ticker On Tue, 2021-10-19 at 09:22 +0000, Gerd Petermann wrote:
Hi Ticker,
yes, please remove all unrelated optimizations.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 19. Oktober 2021 11:03 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
I'd removed the change relating to clearing the reference to the Sort object to allow garbage garbage collection; as you said, this won't happen because Sort is shared. I do notice, however, that on a typical mkgmap run, Sort is created/read 3 times - it isn't shared as fully as possible.
The other changes (LargeListSorter) are slight improvements to memory usage and/or processing time - I can remove them if you want.
Ticker
On Tue, 2021-10-19 at 08:13 +0000, Gerd Petermann wrote:
Hi Ticker,
please remove the unrelated changes. I think we discussed them with patch mdrSort.patch in May, subject "MDR building out-of-memory".
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 18. Oktober 2021 16:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Gerd
Here is first version of the changes to improve MDR unicode and stop the crash.
It always provides a PRIMARY strength sort value, both in the key for sorting and direct comparison when using the collator. Previously neither of these would have anything for a unicode character not mentioned in the sort/cp65001.txt file
In an attempt to stop ordering clashes between the specified sort and the ones fudged from the actual unicode value, it orders anything unknown after the known values. Unfortunately these can then become larger than 2 bytes - and, as this is all the space available without re-structuring, they have to wrap onto the known sort region. I only found 1 character that did this and I don't know if it conflicted with an existing sort.
Regardless of the character set used, in all the places where sorting is used for de-dupe, I've used the SECONDARY strength collator to detect similar record instead of name.equals(lastName)
I also noticed that my source base included optimisation for LargeListSorter, its use of a key cache and some tidy-up of this in mdr7 & mdr11 so these are here as well.
Ticker
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive). There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next. I didn't look at MdrCheck. Ticker On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, so far I don't understand most of the changes in mdrUnicode_v2.patch Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. a and ä (German umlaut) are treated the same, right? Why do you describe it "generally case-insensitive"? This doesn't seem to be related to unicode maps only, so I wonder what side effects this has. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 20. Oktober 2021 12:32 An: Development list for mkgmap; Steve Ratcliffe Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive). There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next. I didn't look at MdrCheck. Ticker On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd In the existing code, Mdr20, Mdr2x, and Mdr7 set the strength to SECONDARY, PrefixIndex set it to PRIMARY and Mdr5 didn't set it. The Java manual doesn't say what the default strength is for a new Collator: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/text/Colla... but I've seen reference to Collator.getInstance() being locale dependant and/or TERTIARY. Generally, SECONDARY distinguishes between accents and TERTIARY between case. Case-insensitive seems to be the correct option mkgmap / map indexing. Ticker On Thu, 2021-10-21 at 09:21 +0000, Gerd Petermann wrote:
Hi Ticker,
so far I don't understand most of the changes in mdrUnicode_v2.patch
Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. a and ä (German umlaut) are treated the same, right? Why do you describe it "generally case-insensitive"?
This doesn't seem to be related to unicode maps only, so I wonder what side effects this has.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 20. Oktober 2021 12:32 An: Development list for mkgmap; Steve Ratcliffe Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi
In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive).
There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next.
I didn't look at MdrCheck.
Ticker
On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I agree that the original code isn't clear, what I don't understand is this: Do we need the changes reg. the collator to fix the problem regarding unicode or are these two separate problems? The changes in Sort seem to be needed (and I have no clue if your approach is good or not), the others seem to be OK, but not needed to avoid the crash. I don't mind to commit the change to class Sort soon as long as only unicode maps are affected. For all other changes I'd prefer to have a new branch and maybe find a way to verify if they are improvements or not. Gerd ________________________________________ Von: Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Donnerstag, 21. Oktober 2021 11:21 An: Development list for mkgmap Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Ticker, so far I don't understand most of the changes in mdrUnicode_v2.patch Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. a and ä (German umlaut) are treated the same, right? Why do you describe it "generally case-insensitive"? This doesn't seem to be related to unicode maps only, so I wonder what side effects this has. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 20. Oktober 2021 12:32 An: Development list for mkgmap; Steve Ratcliffe Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive). There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next. I didn't look at MdrCheck. Ticker On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I've committed the patch as is. Reg. Mdr25: The current code doesn't make much sense, but maybe there is no Garmin software that uses this index? I have only two maps (AdriaTOPO 2.40 and a Topomap Benelux from 2009) where this section is filled. Both maps are locked, so I cannot analyse the content further. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Donnerstag, 21. Oktober 2021 15:48 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Ticker, I agree that the original code isn't clear, what I don't understand is this: Do we need the changes reg. the collator to fix the problem regarding unicode or are these two separate problems? The changes in Sort seem to be needed (and I have no clue if your approach is good or not), the others seem to be OK, but not needed to avoid the crash. I don't mind to commit the change to class Sort soon as long as only unicode maps are affected. For all other changes I'd prefer to have a new branch and maybe find a way to verify if they are improvements or not. Gerd ________________________________________ Von: Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Donnerstag, 21. Oktober 2021 11:21 An: Development list for mkgmap Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi Ticker, so far I don't understand most of the changes in mdrUnicode_v2.patch Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. a and ä (German umlaut) are treated the same, right? Why do you describe it "generally case-insensitive"? This doesn't seem to be related to unicode maps only, so I wonder what side effects this has. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 20. Oktober 2021 12:32 An: Development list for mkgmap; Steve Ratcliffe Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles Hi In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive). There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next. I didn't look at MdrCheck. Ticker On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I was just starting to reply to your previous mail about which parts were necessary - what I had was: Mdr5 and Mdr25 need to use the same sort/unique algorithm to ensure Mdr25 isn't bigger than Mdr5. Regardless of the character set and the logic changes to Sort.java, it is possible, but very unlikely, to come across a set of city names that cause this problem while the algorithms are different. Given Mdr5 is the most significant, Mdr25 is changed to use a Collator and Mdr29.java now includes an assert for the relative sizes. Mdr23, 24 and 28 using the same logic (sort followed by detecting a change) to get a unique list, so I think they should be fixed in the same way as Mdr25, also bringing them into line with Mdr7, Mdr20 and Mdr2x. using the same collator strength. ... The way that Mdr29Record just takes the first reference per country to mdr17/22/24/25/26, maybe the region/country complexity doesn't matter in the mdr25 logic. Ticker On Fri, 2021-10-22 at 08:01 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. Reg. Mdr25: The current code doesn't make much sense, but maybe there is no Garmin software that uses this index? I have only two maps (AdriaTOPO 2.40 and a Topomap Benelux from 2009) where this section is filled. Both maps are locked, so I cannot analyse the content further.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Donnerstag, 21. Oktober 2021 15:48 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Ticker,
I agree that the original code isn't clear, what I don't understand is this: Do we need the changes reg. the collator to fix the problem regarding unicode or are these two separate problems? The changes in Sort seem to be needed (and I have no clue if your approach is good or not), the others seem to be OK, but not needed to avoid the crash.
I don't mind to commit the change to class Sort soon as long as only unicode maps are affected. For all other changes I'd prefer to have a new branch and maybe find a way to verify if they are improvements or not.
Gerd
________________________________________ Von: Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Donnerstag, 21. Oktober 2021 11:21 An: Development list for mkgmap Betreff: AW: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi Ticker,
so far I don't understand most of the changes in mdrUnicode_v2.patch
Setting strength to SECONDARY (instead of the default TERTIARY) means that e.g. a and ä (German umlaut) are treated the same, right? Why do you describe it "generally case-insensitive"?
This doesn't seem to be related to unicode maps only, so I wonder what side effects this has.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 20. Oktober 2021 12:32 An: Development list for mkgmap; Steve Ratcliffe Betreff: Re: [mkgmap-dev] java.lang.AssertionError while building index from unicode tiles
Hi
In the changes I've just made, I hope I've been consistent and fixed all instances to use collator.compare() where scanning the results of a sort on the same table for a change. Also consistently setting strength to SECONDARY (generally case-insensitive).
There may be places where an indirect test should also use collator.compare(). Maybe this should be tackled next.
I didn't look at MdrCheck.
Ticker
On Wed, 2021-10-20 at 08:24 +0000, Gerd Petermann wrote:
Hi Ticker & Steve,
I don't understand the mixed use of collator.compare() and String.equals() in the Mdr classes. When we use the collator to sort the data we probably also have to use it to compare for equality while grouping?
I also see differences between the code in MdrCheck and the classes in mkgmap.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I didn't understand this either - Mdr29 with lowest refs to Mdr17, Mdr22, Mdr24, Mdr25 and Mdr26 is beyond me so I thought it best leave that part untouched. Ticker On Wed, 2021-10-20 at 07:59 +0000, Gerd Petermann wrote:
Hi Ticker,
please double check Mdr25: I just wonder why we compare the region name when we sort by the country name.
Looks wrong (also in the unpatched code)
Gerd
data:image/s3,"s3://crabby-images/802f4/802f43eb70afc2c91d48f43edac9b0f56b0ec4a4" alt=""
Hi Ticker
Problem is that resources/sort/cp65001.txt doesn't give ordering to lots of characters; it looks like it covers only about 10,500 of the 1,112,064 possible code-points. Many of these non-ordered characters are being used by the names in the tile in question.
I used the program in extra/src/uk/me/parabola/util/CollationRules.java to generate some of the tables. This uses the file "allkeys.txt" which can be obtained from https://www.unicode.org/Public/UCA/latest/allkeys.txt The document explaining the unicode collation rules that references that file is: http://www.unicode.org/reports/tr10/ It includes a section for programmatically deriving the weights for characters that do not have explicit entries in the table.
Assuming the actual ordering of unspecified code-points doesn't really matter, I propose to change the logic slightly so undefined Unicode is sorted on its 16-bit value after the range of known sorts.
I think that is a good initial approach to get things working. Steve
participants (4)
-
Carlos Dávila
-
Gerd Petermann
-
Steve Ratcliffe
-
Ticker Berkin