
http://gis.638310.n2.nabble.com/file/n6958659/splitter_memory.patch splitter_memory.patch This patch reduces memory requirement in splitter by ~50% by replacing class SparseInt2ShortMultiMap with class Node2AreaMap. It also includes small performance tweaks. Still not solved by this patch: - Memory usage depends on the highest node id in the data. - Node id > Integer.MAX_VALUE do not work (not a problem at this moment) If this small patch is okay for you, I can provide a bigger one that also fixes these issues, but touches many more files Ciao, Gerd -- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p695865... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Thanks for your analysis and your patch. I do not have enough time to check if the patch produces the same valid results. But to me it sounds like a good aproach. Therefore I created a branch memory_optimization including your patch so that your changes might be tested more easily by a bigger group of people. The compiled release of the branch should appear automatically at http://www.mkgmap.org.uk/splitter/. If that's not the case Steve might has to do some configuration stuff on the webserver? WanMil
http://gis.638310.n2.nabble.com/file/n6958659/splitter_memory.patch splitter_memory.patch
This patch reduces memory requirement in splitter by ~50% by replacing class SparseInt2ShortMultiMap with class Node2AreaMap. It also includes small performance tweaks.
Still not solved by this patch: - Memory usage depends on the highest node id in the data. - Node id> Integer.MAX_VALUE do not work (not a problem at this moment)
If this small patch is okay for you, I can provide a bigger one that also fixes these issues, but touches many more files
Ciao, Gerd
-- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p695865... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hello Wanmil, thanks, I tested the patch by comparing the output files with those from r181 for different osm.pbf and old xml input files and different combinations of parms --max-areas and --max-nodes. By the way, the patch will only save up to 50% memory when rather small files are processed, but esp. on Win XP this seems to help a lot because the VM doesn't allow much more than -Xmx1600m (at least on my machine). I am currently benchmarking r181, r181+patch and the latest version on my machine. I'll post the results and the patch soon. Gerd -- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p696804... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hello WanMil, attached is the larger patch, together with a modified fastutil.jar. The patch must be applied to the memory_optimization branch. Added features compared to r183: - Allows Node Ids up to 2^37 - 1 I (hopefully) changed all places where node ids were stored in int or Integer to long or Long. I tested it with a modified OSM.gz file. - Is a bit faster when default methods are used and enough heap is available During the 1st phase, the highest used Node Id is saved, this value is used to allocate the needeed arrays. The previous version used ArrayLists, which were resized quite often because data from osm.pbf delivers the node ids in order. - With new parm optimize-mem it uses a hash map instead of the huge arrays to store chunk data. This allows to split files on machines with small available java heap, but is usually slower. remarks: a) as mentioned before I am a newbe to Java, so I am pretty sure that experts will find some nonsense in my code, esp. regarding coding style. b) Someone should review the way how I collect the information regarding the highest node id and the number of nodes c) Reg. the new parm: I would prefer to decide the best algorithm within the program, but found it very hard to do, so this is the simle solution. d) If the limit of 2^37-1 is a problem I'd suggest to change the code that stores the data in the chunks. I played with larger CHUNK_SIZE values combined with a (very simple) run length encoding to compress the chunks. I'll continue analysing that way because it might be faster as well. regards, Gerd
Date: Sun, 6 Nov 2011 14:13:18 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Thanks for your analysis and your patch.
I do not have enough time to check if the patch produces the same valid results. But to me it sounds like a good aproach.
Therefore I created a branch memory_optimization including your patch so that your changes might be tested more easily by a bigger group of people. The compiled release of the branch should appear automatically at http://www.mkgmap.org.uk/splitter/. If that's not the case Steve might has to do some configuration stuff on the webserver?
WanMil

Thanks Gerd! I didn't yet review the patch but I have commited (r184) it so it's available at http://www.mkgmap.org.uk/splitter/ I am looking forward that your patches will improve splitter for several use cases. WanMil
Hello WanMil,
attached is the larger patch, together with a modified fastutil.jar. The patch must be applied to the memory_optimization branch.
Added features compared to r183: - Allows Node Ids up to 2^37 - 1 I (hopefully) changed all places where node ids were stored in int or Integer to long or Long. I tested it with a modified OSM.gz file.
- Is a bit faster when default methods are used and enough heap is available During the 1st phase, the highest used Node Id is saved, this value is used to allocate the needeed arrays. The previous version used ArrayLists, which were resized quite often because data from osm.pbf delivers the node ids in order.
- With new parm optimize-mem it uses a hash map instead of the huge arrays to store chunk data. This allows to split files on machines with small available java heap, but is usually slower.
remarks: a) as mentioned before I am a newbe to Java, so I am pretty sure that experts will find some nonsense in my code, esp. regarding coding style. b) Someone should review the way how I collect the information regarding the highest node id and the number of nodes c) Reg. the new parm: I would prefer to decide the best algorithm within the program, but found it very hard to do, so this is the simle solution. d) If the limit of 2^37-1 is a problem I'd suggest to change the code that stores the data in the chunks. I played with larger CHUNK_SIZE values combined with a (very simple) run length encoding to compress the chunks. I'll continue analysing that way because it might be faster as well.
regards, Gerd
Date: Sun, 6 Nov 2011 14:13:18 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Thanks for your analysis and your patch.
I do not have enough time to check if the patch produces the same valid results. But to me it sounds like a good aproach.
Therefore I created a branch memory_optimization including your patch so that your changes might be tested more easily by a bigger group of people. The compiled release of the branch should appear automatically at http://www.mkgmap.org.uk/splitter/. If that's not the case Steve might has to do some configuration stuff on the webserver?
WanMil
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hello WanMil, please wait, I am going to send another patch today. I think we have to check also the ids of ways when searching the maximum id. Also, I think it is better to merge the two classes SparseLong2ShortMapInline and SparseLong2ShortMapFix and always use the method to store the chunk mask togetther with the chunk data. I also found a few more simple performance tweaks. Ciao, Gerd
I didn't yet review the patch but I have commited (r184) it so it's available at http://www.mkgmap.org.uk/splitter/
I am looking forward that your patches will improve splitter for several use cases.
WanMil

Date: Wed, 9 Nov 2011 19:24:37 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Thanks Gerd!
I didn't yet review the patch but I have commited (r184) it so it's available at http://www.mkgmap.org.uk/splitter/
I am looking forward that your patches will improve splitter for several use cases.
WanMil
Hello WanMil, please apply the attached patch on the memory_optimization branch. It contains: a) code simplification b) correction regarding the possibilty that a way id is higher than the highest node id c) small performance tweaks ciao, Gerd

Am 11.11.2011 14:58, schrieb Gerd Petermann:
Date: Wed, 9 Nov 2011 19:24:37 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Thanks Gerd!
I didn't yet review the patch but I have commited (r184) it so it's available at http://www.mkgmap.org.uk/splitter/
I am looking forward that your patches will improve splitter for several use cases.
WanMil
Hello WanMil,
please apply the attached patch on the memory_optimization branch. It contains: a) code simplification b) correction regarding the possibilty that a way id is higher than the highest node id c) small performance tweaks
ciao, Gerd
Hi Gerd, commited it to r187. Have fun! WanMil

Hello WanMil, I fear I went a wrong way with the idea to calculate the highest Node ID or number of nodes just to avoid re-hashing or resizing. I did not notice the option split-file which allows to skip the 1st phase which I used to calculate these numbers :-( So, I am going to revert this part of my fix. (I already suspected that I missed something ...) ciao, Gerd
Date: Sat, 12 Nov 2011 12:41:11 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Hello WanMil,
please apply the attached patch on the memory_optimization branch. It contains: a) code simplification b) correction regarding the possibilty that a way id is higher than the highest node id c) small performance tweaks
ciao, Gerd
Hi Gerd,
commited it to r187.
Have fun! WanMil
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hello WanMil, the attached patch reverts the changes that calculated the MaxNodeId and nodeCount values. The default is now to assume a highest node id of 2^31 and resize when needed. This still needs much fewer memory compared to r181 because it allocates exactly two of these large arrays, while r181 typically allocated 8. So, on my machine splitting germany.osm.pbf took 938 secs with r181 (GC very busy because of -Xmx1600m ) and only 638 secs with patchv4 (optimize-mem=false). With optimize-mem=true, the program is typically ~ 10% slower. I've also increased the allowed number of areas per pass, because this is now limited only by available memory and by the number of elements in the dictionary. If the latter reaches 32767 the program will crash, but I doubt that this will happen with real OSM data. Ciao, Gerd
Date: Sat, 12 Nov 2011 12:41:11 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Am 11.11.2011 14:58, schrieb Gerd Petermann:
Date: Wed, 9 Nov 2011 19:24:37 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] [PATCH]splitter memory usage
Thanks Gerd!
I didn't yet review the patch but I have commited (r184) it so it's available at http://www.mkgmap.org.uk/splitter/
I am looking forward that your patches will improve splitter for several use cases.
WanMil
Hello WanMil,
please apply the attached patch on the memory_optimization branch. It contains: a) code simplification b) correction regarding the possibilty that a way id is higher than the highest node id c) small performance tweaks
ciao, Gerd
Hi Gerd,
commited it to r187.
Have fun! WanMil
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi GerdP, I'm a little bit confused about the changes of the changes.
The default is now to assume a highest node id of 2^31 and resize when needed.
Does this mean that the 'old' limit is back ? Regards Klaus -- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p699285... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hello Klaus,
I'm a little bit confused about the changes of the changes.
The default is now to assume a highest node id of 2^31 and resize when needed.
Does this mean that the 'old' limit is back ?
no, the deault is to allocate two arrays with 2^31 / 64 elements, that means 256Mbyte. Splitter version r181 typically allocates 8 array with ~ 1.5*2^30, so that was ~800 MByte as a fixed part. With the new parameter --optimize-mem the program doesn't even allocate these arrays, but uses hash maps. This parameter may help when you have only 1GB or less (e.g. with a netbook) and small input files. BUT this is only the storage needed to address the so called "chunks". There is also a variable part that depends on the size of the input file and on the number of areas that are processed in one part. A version that is also reducing the number of bytes needed for that variable is work in progress. ciao, Gerd

Thanks Gerd, r188 contains your patch. WanMil
reverts the changes that calculated the MaxNodeId and nodeCount values. The default is now to assume a highest node id of 2^31 and resize when needed. This still needs much fewer memory compared to r181 because it allocates exactly two of these large arrays, while r181 typically allocated 8. So, on my machine splitting germany.osm.pbf took 938 secs with r181 (GC very busy because of -Xmx1600m ) and only 638 secs with patchv4 (optimize-mem=false). With optimize-mem=true, the program is typically ~ 10% slower.
I've also increased the allowed number of areas per pass, because this is now limited only by available memory and by the number of elements in the dictionary. If the latter reaches 32767 the program will crash, but I doubt that this will happen with real OSM data.

GerdP wrote:
... Still not solved by this patch: - Memory usage depends on the highest node id in the data. - Node id > Integer.MAX_VALUE do not work (not a problem at this moment) If this small patch is okay for you, I can provide a bigger one that also fixes these issues, but touches many more files ...
I'm also interested in this additional fixes. Reason: I want to integrate elevation lines. Regards Klaus -- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p697380... Sent from the Mkgmap Development mailing list archive at Nabble.com.

I can report that Splitter 183 works for me. I did not pick up any adverse effects as yet. My map has build fine, everything looks and feels normal and right. Address search works as normal. Routing as far as tested worked as normal. As far as I can tell: no problem. For the first time I could build my map from my older (favorite) machine. It was always too big for splitter to handle on this machine. Pentium 4; 3.2 GHz; 1G Ram; XP SP3 (That was astronomical when I got it!). Splitter would just hang up, and I would have to start up my bigger, Windows 7-infected machine to get the job done. largest PBF tile size after split: 15 702 kb total map PBF approx 39 MB (6 tiles) GMapSupp 58 MB (I split Southern Africa from a GeoFabrik download of Africa) I like it Thanks for developing it. Thanks for making it available to me, technofobe, to test BennieD

toc-rox wrote:
GerdP wrote:
... Still not solved by this patch: - Memory usage depends on the highest node id in the data. - Node id > Integer.MAX_VALUE do not work (not a problem at this moment) If this small patch is okay for you, I can provide a bigger one that also fixes these issues, but touches many more files ...
I'm also interested in this additional fixes. Reason: I want to integrate elevation lines.
Regards Klaus
I fear I don't understand why elevation lines are a problem for version 181 so maybe my version will not work with that as well. What do I have to test to verify it? Gerd -- View this message in context: http://gis.638310.n2.nabble.com/PATCH-splitter-memory-usage-tp6958659p697604... Sent from the Mkgmap Development mailing list archive at Nabble.com.
participants (5)
-
Bennie du Plessis
-
Gerd Petermann
-
GerdP
-
toc-rox
-
WanMil