
Hi all, I've just committed r229. I've fixed all reported errors regarding --keep-complete and found some nice performance improvements for Hennings special case (planet as input, split-file with > 800 tiles, many of them overlapping each other) and some general improvements regarding memory requirements. The larger the input file and the number of tiles, the larger is the improvement, so I assume that Henning will now be able to split planet within 3 hours. A few users wanted to see the generated problem list, so it is now written to the output directory. The program writes the generated candidates. If you always use the same split-file, it might be usefull to reuse the list with the problem-file parm instead of using keep-complete. For a two weeks old germany.osm.pbf I get > 1.800.000 lines (32Mb), so I doubt that someone ones to edit such a file ;-) ciao, Gerd -- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230.html Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hi Gerd, it tooks a lot of RAM, but it works now for me. It took 2:30 h so its ok. Thanks a lot! Henning

Henning Scholland wrote
Hi Gerd, it tooks a lot of RAM, but it works now for me. It took 2:30 h so its ok. Thanks a lot!
Great! Please send me the log, I like to find out where it needs most memory. Today I've created a version of SparseLong2ShortMap that requires ~ 10% less memory, but is a bit slower. But I guess the problems are the TreeMaps that are filled in pass 4 of the MultiTileProcessor. They can be replaced by HashMaps and a sort at a later time. Gerd -- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735324.html Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hi, On Sat, Nov 10, GerdP wrote:
I've just committed r229.
I've fixed all reported errors regarding --keep-complete and found some nice performance improvements for Hennings special case (planet as input, split-file with > 800 tiles, many of them overlapping each other) and some general improvements regarding memory requirements.
Could it be that r229 needs much more memory than r224? r224 had no problems, but with r229 I get for the same input file: java.lang.OutOfMemoryError: Java heap space at java.lang.Integer.valueOf(Unknown Source) at uk.me.parabola.splitter.DataStorer$WriterMapper.put(DataStorer.java:171) at uk.me.parabola.splitter.DataStorer.putWriterIdx(DataStorer.java:99) at uk.me.parabola.splitter.MultiTileProcessor.addOrMergeWriters(MultiTileProcessor.java:603) at uk.me.parabola.splitter.MultiTileProcessor.processWay(MultiTileProcessor.java:145) at uk.me.parabola.splitter.BinaryMapParser.parseWays(BinaryMapParser.java:170) It's a DACH extract. Thorsten -- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

Hi Thorsten, that's strange, the newer version should require less memory, esp. in that part of the program. Did you use exactly the same input files and parameters? I will check if D A CH is somehow special, please let me know your parameters. Please try again with r231, which I've just committed. Ciao, Gerd Thorsten Kukuk wrote
Hi,
On Sat, Nov 10, GerdP wrote:
I've just committed r229.
I've fixed all reported errors regarding --keep-complete and found some nice performance improvements for Hennings special case (planet as input, split-file with > 800 tiles, many of them overlapping each other) and some general improvements regarding memory requirements.
Could it be that r229 needs much more memory than r224?
r224 had no problems, but with r229 I get for the same input file: java.lang.OutOfMemoryError: Java heap space at java.lang.Integer.valueOf(Unknown Source) at uk.me.parabola.splitter.DataStorer$WriterMapper.put(DataStorer.java:171) at uk.me.parabola.splitter.DataStorer.putWriterIdx(DataStorer.java:99) at uk.me.parabola.splitter.MultiTileProcessor.addOrMergeWriters(MultiTileProcessor.java:603) at uk.me.parabola.splitter.MultiTileProcessor.processWay(MultiTileProcessor.java:145) at uk.me.parabola.splitter.BinaryMapParser.parseWays(BinaryMapParser.java:170)
It's a DACH extract.
Thorsten
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) _______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735453.html Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hi, On Mon, Nov 12, GerdP wrote:
Hi Thorsten,
that's strange, the newer version should require less memory, esp. in that part of the program. Did you use exactly the same input files and parameters?
Yes, I did, and verfied it two times.
I will check if D A CH is somehow special, please let me know your parameters.
java -jar /usr/share/java/splitter-r224.jar --mapid=71200001 --max-nodes=1000000 --overlap=0 --keep-complete=true --geonames-file=osmmaps/scripts/cities/DACH.txt --description=TK-DACH-Tile --output=pbf --output-dir=build/DACH/tiles data/osm/DACH.osm.pbf cache= description=TK-DACH-Tile geonames-file=osmmaps/scripts/cities/DACH.txt keep-complete=true mapid=71200001 max-areas=255 max-nodes=1000000 max-threads=4 (auto) mixed=false no-trim=false output=pbf output-dir=build/DACH/tiles overlap=0 problem-file= resolution=13 split-file= status-freq=120 write-kml=
Please try again with r231, which I've just committed.
I will do so tomorrow. Thorsten
Ciao, Gerd
Thorsten Kukuk wrote
Hi,
On Sat, Nov 10, GerdP wrote:
I've just committed r229.
I've fixed all reported errors regarding --keep-complete and found some nice performance improvements for Hennings special case (planet as input, split-file with > 800 tiles, many of them overlapping each other) and some general improvements regarding memory requirements.
Could it be that r229 needs much more memory than r224?
r224 had no problems, but with r229 I get for the same input file: java.lang.OutOfMemoryError: Java heap space at java.lang.Integer.valueOf(Unknown Source) at uk.me.parabola.splitter.DataStorer$WriterMapper.put(DataStorer.java:171) at uk.me.parabola.splitter.DataStorer.putWriterIdx(DataStorer.java:99) at uk.me.parabola.splitter.MultiTileProcessor.addOrMergeWriters(MultiTileProcessor.java:603) at uk.me.parabola.splitter.MultiTileProcessor.processWay(MultiTileProcessor.java:145) at uk.me.parabola.splitter.BinaryMapParser.parseWays(BinaryMapParser.java:170)
It's a DACH extract.
Thorsten
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) _______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735453.html Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev -- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

Hi Thorsten,
java -jar /usr/share/java/splitter-r224.jar --mapid=71200001 --max-nodes=1000000 --overlap=0 --keep-complete=true --geonames-file=osmmaps/scripts/cities/DACH.txt --description=TK-DACH-Tile --output=pbf --output-dir=build/DACH/tiles data/osm/DACH.osm.pbf
if you really execute splitter without any -Xmx parm I wonder how it gets so far ? Ciao, Gerd

On Tue, Nov 13, Gerd Petermann wrote:
Hi Thorsten,
java -jar /usr/share/java/splitter-r224.jar --mapid=71200001 --max-nodes=1000000 --overlap=0 --keep-complete=true --geonames-file=osmmaps/scripts/cities/DACH.txt --description=TK-DACH-Tile --output=pbf --output-dir=build/DACH/tiles data/osm/DACH.osm.pbf
if you really execute splitter without any -Xmx parm I wonder how it gets so far ?
Sorry, forgot to Cut&Paste it: -Xmx=3000M Thorsten
Ciao,
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

Hi Thorsten, no problem. Just to make sure: I want to reproduce the problem with an extact of europe using the DACH.poly from your package http://osm.thkukuk.de/tk-osm.tar.bz2 I did not try to understand or run all the scripts. Is it correct that you do not merge other data into this file? Ciao, Gerd Thorsten Kukuk wrote
On Tue, Nov 13, Gerd Petermann wrote:
Hi Thorsten,
java -jar /usr/share/java/splitter-r224.jar --mapid=71200001
--max-nodes=1000000 --overlap=0 --keep-complete=true --geonames-file=osmmaps/scripts/cities/DACH.txt --description=TK-DACH-Tile --output=pbf --output-dir=build/DACH/tiles data/osm/DACH.osm.pbf
if you really execute splitter without any -Xmx parm I wonder how it gets so far ?
Sorry, forgot to Cut&Paste it:
-Xmx=3000M
Thorsten
Ciao,
Gerd
_______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) _______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735502.html Sent from the Mkgmap Development mailing list archive at Nabble.com.

On Mon, Nov 12, GerdP wrote:
Hi Thorsten,
no problem. Just to make sure: I want to reproduce the problem with an extact of europe using the DACH.poly from your package http://osm.thkukuk.de/tk-osm.tar.bz2
I did not try to understand or run all the scripts. Is it correct that you do not merge other data into this file?
Correct, I all do is: osmconvert planet.osm.pbf -B=osmmaps/scripts/poly/DACH.poly --drop-author --drop-version --drop-broken-refs --out-pbf -o=data/osm/DACH.osm.pbf Thorsten
Thorsten Kukuk wrote
On Tue, Nov 13, Gerd Petermann wrote:
Hi Thorsten,
java -jar /usr/share/java/splitter-r224.jar --mapid=71200001
--max-nodes=1000000 --overlap=0 --keep-complete=true --geonames-file=osmmaps/scripts/cities/DACH.txt --description=TK-DACH-Tile --output=pbf --output-dir=build/DACH/tiles data/osm/DACH.osm.pbf
if you really execute splitter without any -Xmx parm I wonder how it gets so far ?
Sorry, forgot to Cut&Paste it:
-Xmx=3000M
Thorsten
Ciao,
Gerd
_______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) _______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735502.html Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev -- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

Thorsten Kukuk wrote
Correct, I all do is:
osmconvert planet.osm.pbf -B=osmmaps/scripts/poly/DACH.poly --drop-author --drop-version --drop-broken-refs --out-pbf -o=data/osm/DACH.osm.pbf
Ok. First test with r224 crashed with NPE . I did not use the -drop-broken-refs parm, so I used r225 to test older version. Both versions passed the point where splitter crashed on your machine, but both came very close to the limit, so I assume that you were just lucky to finish with r224. Some numbers with java version "1.6.0_24": r225: Max Heap Space: 2803840K Max Heap Occupancy: 2614966K r229: Max Heap Space: 2799168K Max Heap Occupancy: 2592023K r231: Max Heap Space: 2834944K Max Heap Occupancy: 2599999K Ahh, didn't I write that r229 uses less memory? Yes, I did. The explanation is here: r225 detected fewer problem cases: r225: Number of detected problem ways: 133.404 Number of detected problem rels: 32.552 r229 + r231: Number of detected problem ways: 1.501.122 Number of detected problem rels: 51.845 More interesting are the final numbers of pass 4: r225: TreeMap<Long,Integer> node-Writers : 15.833.747 TreeMap<Long,Integer> way-Writers : 1.298.226 TreeMap<Long,Integer> rel-Writers : 44.763 r229 + r231: TreeMap<Long,Integer> node-Writers : 28.442.937 TreeMap<Long,Integer> way-Writers : 2.608.846 TreeMap<Long,Integer> rel-Writers : 62.263 So, r225 required almost the same memory to store only *half* of the data. I was not aware of these big differences. I assume that r229 does a better job, means, r225 did not find enough problem cases. The results are so different that it's difficult to say. I try to find out using a smaller set of test data. Maybe mkgmap will help me to detect data that is not used. Besides that I have an idea how to reduce the memory needs in the critical pass :-) Gerd -- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735518.html Sent from the Mkgmap Development mailing list archive at Nabble.com.

On Tue, Nov 13, GerdP wrote:
Ahh, didn't I write that r229 uses less memory? Yes, I did. The explanation is here: r225 detected fewer problem cases:
Ok, thanks for the analysis!
Besides that I have an idea how to reduce the memory needs in the critical pass :-)
Good to hear :-) Thorsten -- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

Thorsten Kukuk wrote
Besides that I have an idea how to reduce the memory needs in the critical pass :-)
Good to hear :-)
One drawback: This better solution will not work if input is mixed (ids not strictly ordered). I see two options: 1) use two alternative methods to store data. The existing one with HashMaps and a new , faster one with simple arrays and binary search. In fact, the latter one requires less memory and less CPU, at least during my tests. Binary search doesn't work if data is not sorted, and sorting again might require to much heap, so mixed data is a big problem here. 2) stop with an error message if user tries to use mixed data with parameters --keep-complete or --problem-list. I still have to find out why r231 finds so many more problem polygons. Gerd -- View this message in context: http://gis.19327.n5.nabble.com/splitter-improvements-tp5735230p5735633.html Sent from the Mkgmap Development mailing list archive at Nabble.com.
participants (4)
-
Gerd Petermann
-
GerdP
-
Henning Scholland
-
Thorsten Kukuk