MDR building out-of-memory
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Since downloading loading britain-and-ireland-latest.osm.pbf I had been unable to build a gmapsupp because of running out of heap (my hardware is 32 bit, -Xmx1540M is largest value allowed) My problem is mainly because I have 1731146 cities (along with 1046096 streets) Looking at Mdr5 processing, I've changed it in 3 ways to improve memory usage and garbage collection. 1/ use trimToSize() after all the cities are loaded from the individual tile .img. I presume that the growth factor gradually increases as it runs out of allocated array space. I had to change the declaration from List<Mdr5Record> to ArrayList<Mdr5Record> to allow this, but I can't see any problem in this. 2/ Move the main part of preWriteImpl into its own method so the first sortKeys ArrayList and Sort can be freed before calcMdr20/1/2() each create another massive SortKeys and Sort. 3/ Move the scope of mdr20s to a class variable. This is referenced by all the Mdr5Records and the scope of where it was declared before seemed to to cause the garbage collector major problems - it churned for 5 mins using all the processors before running out of memory. After moving it, the whole of mdr is built in a couple of mins with cpu usage mostly < 125%. Ticker
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, thanks, good findings! the patch doesn't use trimToSize() on cities. Did you change your mind? The part about the scope is interesting. At first glance I thought this should make no difference but it probably helps GC to detect that this array cannot be garbage collected. The SortKeys stuff is really eating memory and it would good to find a better solution. One approach is to use the cache as in attached patch but that only helps when memory is really the problem, it slows down processing for other situations. Maybe better would be to first collect all combinations of region+country, sort them, and use the position in that list to sort other objects? Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 10. Mai 2021 23:05 An: mkgmap development Betreff: [mkgmap-dev] MDR building out-of-memory Hi Gerd Since downloading loading britain-and-ireland-latest.osm.pbf I had been unable to build a gmapsupp because of running out of heap (my hardware is 32 bit, -Xmx1540M is largest value allowed) My problem is mainly because I have 1731146 cities (along with 1046096 streets) Looking at Mdr5 processing, I've changed it in 3 ways to improve memory usage and garbage collection. 1/ use trimToSize() after all the cities are loaded from the individual tile .img. I presume that the growth factor gradually increases as it runs out of allocated array space. I had to change the declaration from List<Mdr5Record> to ArrayList<Mdr5Record> to allow this, but I can't see any problem in this. 2/ Move the main part of preWriteImpl into its own method so the first sortKeys ArrayList and Sort can be freed before calcMdr20/1/2() each create another massive SortKeys and Sort. 3/ Move the scope of mdr20s to a class variable. This is referenced by all the Mdr5Records and the scope of where it was declared before seemed to to cause the garbage collector major problems - it churned for 5 mins using all the processors before running out of memory. After moving it, the whole of mdr is built in a couple of mins with cpu usage mostly < 125%. Ticker
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd allCities.trimToSize() is at the top of preWriteImpl(). cities is empty at this point. I should trim it at the end of genCitiesAndMdr20s or delay allocation so it can be made the same size as allCities. Is there any reason not to share "sort" between genCitiesAndMdr20s, calcMdr20/1/2SortPos() - I'll try it. Your patch wasn't attached. I had given up trying to understand the different combinations of sort keys for mrd20/1/2 and, until it runs out of memory again, I'd rather not touch it. I'll experiment a bit more and send another patch in a while. Just seen your change to show the stacktrace when OOM - good. Ticker On Tue, 2021-05-11 at 07:04 +0000, Gerd Petermann wrote:
Hi Ticker,
thanks, good findings!
the patch doesn't use trimToSize() on cities. Did you change your mind? The part about the scope is interesting. At first glance I thought this should make no difference but it probably helps GC to detect that this array cannot be garbage collected.
The SortKeys stuff is really eating memory and it would good to find a better solution. One approach is to use the cache as in attached patch but that only helps when memory is really the problem, it slows down processing for other situations. Maybe better would be to first collect all combinations of region+country, sort them, and use the position in that list to sort other objects?
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 10. Mai 2021 23:05 An: mkgmap development Betreff: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Since downloading loading britain-and-ireland-latest.osm.pbf I had been unable to build a gmapsupp because of running out of heap (my hardware is 32 bit, -Xmx1540M is largest value allowed)
My problem is mainly because I have 1731146 cities (along with 1046096 streets)
Looking at Mdr5 processing, I've changed it in 3 ways to improve memory usage and garbage collection.
1/ use trimToSize() after all the cities are loaded from the individual tile .img. I presume that the growth factor gradually increases as it runs out of allocated array space. I had to change the declaration from List<Mdr5Record> to ArrayList<Mdr5Record> to allow this, but I can't see any problem in this.
2/ Move the main part of preWriteImpl into its own method so the first sortKeys ArrayList and Sort can be freed before calcMdr20/1/2() each create another massive SortKeys and Sort.
3/ Move the scope of mdr20s to a class variable. This is referenced by all the Mdr5Records and the scope of where it was declared before seemed to to cause the garbage collector major problems - it churned for 5 mins using all the processors before running out of memory. After moving it, the whole of mdr is built in a couple of mins with cpu usage mostly < 125%.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, sorry, here's the patch. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 10:07 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory Hi Gerd allCities.trimToSize() is at the top of preWriteImpl(). cities is empty at this point. I should trim it at the end of genCitiesAndMdr20s or delay allocation so it can be made the same size as allCities. Is there any reason not to share "sort" between genCitiesAndMdr20s, calcMdr20/1/2SortPos() - I'll try it. Your patch wasn't attached. I had given up trying to understand the different combinations of sort keys for mrd20/1/2 and, until it runs out of memory again, I'd rather not touch it. I'll experiment a bit more and send another patch in a while. Just seen your change to show the stacktrace when OOM - good. Ticker On Tue, 2021-05-11 at 07:04 +0000, Gerd Petermann wrote:
Hi Ticker,
thanks, good findings!
the patch doesn't use trimToSize() on cities. Did you change your mind? The part about the scope is interesting. At first glance I thought this should make no difference but it probably helps GC to detect that this array cannot be garbage collected.
The SortKeys stuff is really eating memory and it would good to find a better solution. One approach is to use the cache as in attached patch but that only helps when memory is really the problem, it slows down processing for other situations. Maybe better would be to first collect all combinations of region+country, sort them, and use the position in that list to sort other objects?
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 10. Mai 2021 23:05 An: mkgmap development Betreff: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Since downloading loading britain-and-ireland-latest.osm.pbf I had been unable to build a gmapsupp because of running out of heap (my hardware is 32 bit, -Xmx1540M is largest value allowed)
My problem is mainly because I have 1731146 cities (along with 1046096 streets)
Looking at Mdr5 processing, I've changed it in 3 ways to improve memory usage and garbage collection.
1/ use trimToSize() after all the cities are loaded from the individual tile .img. I presume that the growth factor gradually increases as it runs out of allocated array space. I had to change the declaration from List<Mdr5Record> to ArrayList<Mdr5Record> to allow this, but I can't see any problem in this.
2/ Move the main part of preWriteImpl into its own method so the first sortKeys ArrayList and Sort can be freed before calcMdr20/1/2() each create another massive SortKeys and Sort.
3/ Move the scope of mdr20s to a class variable. This is referenced by all the Mdr5Records and the scope of where it was declared before seemed to to cause the garbage collector major problems - it churned for 5 mins using all the processors before running out of memory. After moving it, the whole of mdr is built in a couple of mins with cpu usage mostly < 125%.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Here is updated version of patch. Changes from last: Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit. Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile. Shares the Sort object between the 4 methods. Ticker
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors. Maybe look at the code which uses LargeListSorter. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory Hi Gerd Here is updated version of patch. Changes from last: Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit. Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile. Shares the Sort object between the 4 methods. Ticker
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it. A couple of comments: My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic. You could pre-allocate List<> "merged" with the correct size. Ticker On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I think the cache is not meant to improve run time, it is used to deduplicate and thus reduce memory. Maybe it would be better to use a smaller chunk size and no cache. No idea why I didn't use merged = new ArrayList<>(len); Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 19:19 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory Hi Gerd I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it. A couple of comments: My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic. You could pre-allocate List<> "merged" with the correct size. Ticker On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Certainly no cache. Maybe reduce the chunk size, but this might increase copying. It could be improved by doing a linear chunk split/sort then a multi -way merge. This would avoid lots of copying assuming the following: Using the original array to store sorted chunks demands that another array of the same full size is needed for the final merge. If each sorted key chunk is converted to a object chunk and these merged, then although the same total size is needed, it is made of number of smaller arrays. The most space efficient solution might be have Mdr11 "implements Comparable" and generate pairs of sortkeys on the fly and let the java sort take care of all the details. The other use of LargeListSorter is Mdr7. I get a higher hit-rate (~50%) for the first/partialSorter (1046096 allstreets). However, for the repeated fullNameSorter on the partial results, most of the lists are just 1 long, with very few more than 10. I guess this depends on --road-name-config/split-name and use of shields etc, but LargeListSorted seems overkill. Ticker On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the cache is not meant to improve run time, it is used to deduplicate and thus reduce memory. Maybe it would be better to use a smaller chunk size and no cache. No idea why I didn't use merged = new ArrayList<>(len);
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 19:19 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it.
A couple of comments:
My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic.
You could pre-allocate List<> "merged" with the correct size.
Ticker
On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I think the MultiSortKeys were introduced because the on-the-fly solution was far too slow, at least with the normal Java sort. Could well be that the problem was solved with Java 8 or newer releases. I think there was a special case with maps containing huge numbers of equally named roads causing extreme run times. This depends on the style. Some styles add a name like "tr2" for each unnamed track with tracktype=grade2. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 12. Mai 2021 10:48 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory Hi Gerd Certainly no cache. Maybe reduce the chunk size, but this might increase copying. It could be improved by doing a linear chunk split/sort then a multi -way merge. This would avoid lots of copying assuming the following: Using the original array to store sorted chunks demands that another array of the same full size is needed for the final merge. If each sorted key chunk is converted to a object chunk and these merged, then although the same total size is needed, it is made of number of smaller arrays. The most space efficient solution might be have Mdr11 "implements Comparable" and generate pairs of sortkeys on the fly and let the java sort take care of all the details. The other use of LargeListSorter is Mdr7. I get a higher hit-rate (~50%) for the first/partialSorter (1046096 allstreets). However, for the repeated fullNameSorter on the partial results, most of the lists are just 1 long, with very few more than 10. I guess this depends on --road-name-config/split-name and use of shields etc, but LargeListSorted seems overkill. Ticker On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the cache is not meant to improve run time, it is used to deduplicate and thus reduce memory. Maybe it would be better to use a smaller chunk size and no cache. No idea why I didn't use merged = new ArrayList<>(len);
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 19:19 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it.
A couple of comments:
My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic.
You could pre-allocate List<> "merged" with the correct size.
Ticker
On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I did a bit more experimenting with LargeListSorter and Mdr7/11 and have some changes - patch attached. 1/ explicit param to LargeListSorted to useCache 2/ do nothing if 0 or 1 records to sort 3/ reduce chunkSize to reduce memory usage, increase maxDepth so same overall limit 4/ allocate temp "merged" with correct length 5/ useCache parameters and remarks on usage in Mdr7 & 11 6/ prevent a copy of allStreets 7/ share the collator and allow it and "sort" to be garbage collected 8/ A comments that sort of partial streets must use sortkey, even if only sorting a few record. LargeListSorter is a handy way of doing this for a simple/single sortKey per record. Sortkey simply makes a byte array of the string converted into the target charset and sort does a byte comparison on this. collator.compare(str1, str2) ignores the shield chars (and possibly the other prefix/suffix markers) during the comparison and has extra dynamic processing looking to handle char aliases (PRIMARY strength etc). The comment at the top of Sort.java: * found that sorting with the sort keys and the collator gave different results in some * cases. This implementation does not. is not really correct. An alternative collator string comparison could be provided that behaves in the same way as sort.createSortKey. This would speed up 'on -the-fly' sorts, maybe becoming feasible. Ticker On Wed, 2021-05-12 at 09:08 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the MultiSortKeys were introduced because the on-the-fly solution was far too slow, at least with the normal Java sort. Could well be that the problem was solved with Java 8 or newer releases.
I think there was a special case with maps containing huge numbers of equally named roads causing extreme run times. This depends on the style. Some styles add a name like "tr2" for each unnamed track with tracktype=grade2.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 12. Mai 2021 10:48 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Certainly no cache. Maybe reduce the chunk size, but this might increase copying.
It could be improved by doing a linear chunk split/sort then a multi -way merge. This would avoid lots of copying assuming the following:
Using the original array to store sorted chunks demands that another array of the same full size is needed for the final merge. If each sorted key chunk is converted to a object chunk and these merged, then although the same total size is needed, it is made of number of smaller arrays.
The most space efficient solution might be have Mdr11 "implements Comparable" and generate pairs of sortkeys on the fly and let the java sort take care of all the details.
The other use of LargeListSorter is Mdr7. I get a higher hit-rate (~50%) for the first/partialSorter (1046096 allstreets).
However, for the repeated fullNameSorter on the partial results, most of the lists are just 1 long, with very few more than 10. I guess this depends on --road-name-config/split-name and use of shields etc, but LargeListSorted seems overkill.
Ticker
On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the cache is not meant to improve run time, it is used to deduplicate and thus reduce memory. Maybe it would be better to use a smaller chunk size and no cache. No idea why I didn't use merged = new ArrayList<>(len);
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 19:19 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it.
A couple of comments:
My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic.
You could pre-allocate List<> "merged" with the correct size.
Ticker
On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, reg. Sort instances: My understanding is that there is only one instance. So, no case for GC, right? Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 12. Mai 2021 17:59 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory Hi Gerd I did a bit more experimenting with LargeListSorter and Mdr7/11 and have some changes - patch attached. 1/ explicit param to LargeListSorted to useCache 2/ do nothing if 0 or 1 records to sort 3/ reduce chunkSize to reduce memory usage, increase maxDepth so same overall limit 4/ allocate temp "merged" with correct length 5/ useCache parameters and remarks on usage in Mdr7 & 11 6/ prevent a copy of allStreets 7/ share the collator and allow it and "sort" to be garbage collected 8/ A comments that sort of partial streets must use sortkey, even if only sorting a few record. LargeListSorter is a handy way of doing this for a simple/single sortKey per record. Sortkey simply makes a byte array of the string converted into the target charset and sort does a byte comparison on this. collator.compare(str1, str2) ignores the shield chars (and possibly the other prefix/suffix markers) during the comparison and has extra dynamic processing looking to handle char aliases (PRIMARY strength etc). The comment at the top of Sort.java: * found that sorting with the sort keys and the collator gave different results in some * cases. This implementation does not. is not really correct. An alternative collator string comparison could be provided that behaves in the same way as sort.createSortKey. This would speed up 'on -the-fly' sorts, maybe becoming feasible. Ticker On Wed, 2021-05-12 at 09:08 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the MultiSortKeys were introduced because the on-the-fly solution was far too slow, at least with the normal Java sort. Could well be that the problem was solved with Java 8 or newer releases.
I think there was a special case with maps containing huge numbers of equally named roads causing extreme run times. This depends on the style. Some styles add a name like "tr2" for each unnamed track with tracktype=grade2.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Mittwoch, 12. Mai 2021 10:48 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Certainly no cache. Maybe reduce the chunk size, but this might increase copying.
It could be improved by doing a linear chunk split/sort then a multi -way merge. This would avoid lots of copying assuming the following:
Using the original array to store sorted chunks demands that another array of the same full size is needed for the final merge. If each sorted key chunk is converted to a object chunk and these merged, then although the same total size is needed, it is made of number of smaller arrays.
The most space efficient solution might be have Mdr11 "implements Comparable" and generate pairs of sortkeys on the fly and let the java sort take care of all the details.
The other use of LargeListSorter is Mdr7. I get a higher hit-rate (~50%) for the first/partialSorter (1046096 allstreets).
However, for the repeated fullNameSorter on the partial results, most of the lists are just 1 long, with very few more than 10. I guess this depends on --road-name-config/split-name and use of shields etc, but LargeListSorted seems overkill.
Ticker
On Tue, 2021-05-11 at 19:15 +0000, Gerd Petermann wrote:
Hi Ticker,
I think the cache is not meant to improve run time, it is used to deduplicate and thus reduce memory. Maybe it would be better to use a smaller chunk size and no cache. No idea why I didn't use merged = new ArrayList<>(len);
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 19:19 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
I've looked at this, and if Mdr5 space becomes a problem again, I'll consider converting to it.
A couple of comments:
My map had 2909735 poi so the sort chunk size was ~727433. The cache sizes after each chunk were 563095, 603595, 597239 & 605718. Is the cache worth-while for this low hit-rate? Just running the Gmapsupp combiner on existing tiles (without --route, so no streets), I got a run time of 1 min 44 secs with the cache and 1 min 30 without! However most of this time is copying the tiles into gmapsupp.img so not an accurate statistic.
You could pre-allocate List<> "merged" with the correct size.
Ticker
On Tue, 2021-05-11 at 14:40 +0000, Gerd Petermann wrote:
Hi Ticker,
I've committed the patch as is. I've not seen big changes in performance, but I've used a different (already existing) set of files which was created with my own style. For me, Mdr11.preWriteImpl() is the most problematic part reg. OOM errors.
Maybe look at the code which uses LargeListSorter.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Dienstag, 11. Mai 2021 13:27 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] MDR building out-of-memory
Hi Gerd
Here is updated version of patch.
Changes from last:
Uses your cache code for region and country (in 2 places). For British Isles, there are 190 regions and 7 countries, so I don't think the extra memory will be a problem and there should be some performance benefit.
Delays allocating cities until it can use sortKeys.size() for initial allocation. For above map this is 0.07% too big, so I don't think trimToSize() is worthwhile.
Shares the Sort object between the 4 methods.
Ticker _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
participants (2)
-
Gerd Petermann
-
Ticker Berkin