[PATCH v5] make maps in parallel

Here's a better fix than last night's effort for the problem where the mapname and description for each job were getting clobbered due to the way that the command args are processed. Each job now gets a "snapshot" of the command args so it doesn't matter if they subsequently get changed. --------- Whoops! fixed a bad bug whereby each map was being output to the same file. Not sure if the fix is very elegant but at least it's not being silly any more. Now limits the default value of max-jobs to 4 no matter how many cores you have as further testing shows that having more threads just burns CPU cycles but doesn't actually finish any quicker. I guess the memory system is limiting the performance and the CPUs are spinning waiting for access. Now showing a real speedup of around 240% (my earlier higher claim was based on CPU usage and I now realise that was erroneous, sorry). -------- Now defaults to creating a thread per core so without doing anything you should see a speedup on a SMP box when processing multiple maps. You can use --max-jobs=N to limit the concurrency - you may want to specify that if you can't increase the VM size to what is required. However, it occurs to me that if you can afford a box with more than 2 cores, then you can probably afford a reasonable amount of memory (otherwise, what's the point in having more cores?) Added help blurb. -------- OK, let it not be said that I don't listen to others! The attached patch provides support for making maps in parallel. By default, the behaviour is the same as before but if you specify --num-threads=N where N is greater than 1, it will process N maps at the same time and then combine the results (if required). Don't forget to increase the heap size appropriately. A quick test on the big box shows good speedup - specifying --num-threads=4 and 2GB VM size. I was seeing better than 380% utilisation with 8 cores in use. I suspect the performance limitation here will be VM size and memory system bandwidth. BTW - I don't think num-threads is actually the best name for the option, so please suggest alternatives. Cheers, Mark

On Tue, May 12, 2009 at 9:57 AM, Mark Burton <markb@ordern.com> wrote:
Here's a better fix than last night's effort for the problem where the mapname and description for each job were getting clobbered due to the way that the command args are processed. Each job now gets a "snapshot" of the command args so it doesn't matter if they subsequently get changed.
Hm... I tested the v5 patch on a 2 core machine (MS Vista). Although the processing time appeared to be greatly increased :-), many files were skipped :-(. I attempted to compile a map of Germany split into 37 tiles, numbered 63240001.osm.gz to 63240037.osm.gz. - mkgmap only created 21 img files, skipping 63240001.img and 63240017.img - 63240026.img and 63240032.img - 63240036.img The command line looked like this: java -Xmx1536M -jar \mkgmap.jar [options] 63*.osm.gz Does this help? Cheers.

Hi Clinton,
Hm... I tested the v5 patch on a 2 core machine (MS Vista).
Although the processing time appeared to be greatly increased :-), many files were skipped :-(.
I attempted to compile a map of Germany split into 37 tiles, numbered 63240001.osm.gz to 63240037.osm.gz.
- mkgmap only created 21 img files, skipping 63240001.img and 63240017.img - 63240026.img and 63240032.img - 63240036.img
The command line looked like this:
java -Xmx1536M -jar \mkgmap.jar [options] 63*.osm.gz
Does this help?
Not a lot! Seriously, something is badly wrong there. Just to confirm, you haven't changed how you invoke mkgmap from the previously successful runs have you? Please try specifying --max-jobs=1 so that only one thread is active and see if it's OK. Also, you should get diagnostic messages at the INFO level that say when the processing of each map starts and ends. Cheers, Mark

On Wed, May 13, 2009 at 3:32 PM, Mark Burton <markb@ordern.com> wrote:
Seriously, something is badly wrong there. Just to confirm, you haven't changed how you invoke mkgmap from the previously successful runs have you?
No, I made no changes to the command line. I am currently recompiling with the patch reversed (this will take some time), but with the same command line options: this appears to be working correctly.
Please try specifying --max-jobs=1 so that only one thread is active and see if it's OK.
I'll reapply the patch and test with --max-jobs=1 (the next time I have a chance).
Also, you should get diagnostic messages at the INFO level that say when the processing of each map starts and ends.
Are these messages sent to STDOUT or do I have to invoke some kind of Java logging to get these diagnostic messages? Cheers.

Clinton,
No, I made no changes to the command line. I am currently recompiling with the patch reversed (this will take some time), but with the same command line options: this appears to be working correctly.
OK
Please try specifying --max-jobs=1 so that only one thread is active and see if it's OK.
I'll reapply the patch and test with --max-jobs=1 (the next time I have a chance).
Thanks.
Also, you should get diagnostic messages at the INFO level that say when the processing of each map starts and ends.
Are these messages sent to STDOUT or do I have to invoke some kind of Java logging to get these diagnostic messages?
Add this to the java args: -Dlog.config=/SOMEPATH/logging.properties I attach a sample logging.properties file. This currently sends stuff to the console but you can send it to file by un-commenting the appropriate handlers: line. Cheers, Mark

On Wed, May 13, 2009 at 3:54 PM, Mark Burton <markb@ordern.com> wrote:
Please try specifying --max-jobs=1 so that only one thread is active and see if it's OK.
I tested again on a Vista machine with --max-jobs=1. The output was fine: all files were generated. I then tested on a Mac OS X (Intel) machine. The patch had more or less the same behaviour: without --max-jobs=1, files were skipped; with --max-jobs=1 all files were generated.
-Dlog.config=/SOMEPATH/logging.properties
I logged to file, and then grepped the log files for the map compile information. The log only indicated that the maps were started and finished. Tiles which were not generated were also not mentioned in the log. The following is an excerpt where the compilation skips from 63240015 to 63240025. 2009/05/14 00:26:59 INFO (MapMaker): Started making 63240014 (osm-cg-de) 2009/05/14 00:27:03 INFO (MapMaker): Started making 63240015 (osm-cg-de) 2009/05/14 00:27:32 INFO (MapMaker): finished making map 63240014.img closing 2009/05/14 00:27:50 INFO (MapMaker): finished making map 63240015.img closing 2009/05/14 00:29:50 INFO (MapMaker): Started making 63240025 (osm-cg-de) 2009/05/14 00:30:25 INFO (MapMaker): finished making map 63240025.img closing 2009/05/14 00:30:58 INFO (MapMaker): Started making 63240027 (osm-cg-de) 2009/05/14 00:31:04 INFO (MapMaker): Started making 63240026 (osm-cg-de) 2009/05/14 00:31:30 INFO (MapMaker): finished making map 63240027.img closing 2009/05/14 00:32:00 INFO (MapMaker): finished making map 63240026.img closing I hope this can help you with finding the cause of the error. Please let me know if I can perform other tests or provide further information. Cheers.

Hi Clinton, Just got back - very tired so I can't work on this tonight. Thanks for the info. One question: in the case where maps are missed out, did you get all of the messages that look like "Submitting job FILENAME"? These are generated at the very start of the run. I would expect that you did, just checking. I shall be looking into this over the weekend. Cheers, Mark

Hi Clinton,
I then tested on a Mac OS X (Intel) machine. The patch had more or less the same behaviour: without --max-jobs=1, files were skipped; with --max-jobs=1 all files were generated.
What Java versions are you using on the Windows box and the Mac? I'm using a variety of Java VM's, the oldest is 1.6.0_07-b06. Cheers, Mark

On a 2 core Linux box I just split a map into 142 tiles and processed it OK in parallel (all output files created). Routing didn't work very well in mapsource but that may be because of the tiny tiles. I am not saying there's no problem, but I can't reproduce it here yet (perhaps I need to go out a buy a Windows box?) Cheers, Mark

On Wed, May 13, 2009 at 4:48 PM, Mark Burton <markb@ordern.com> wrote:
I am not saying there's no problem, but I can't reproduce it here yet (perhaps I need to go out a buy a Windows box?)
This may be caused by the different manner in which the shells interpret the command line options. If memory serves me correctly, bash type shells automatically expand file name wildcards, changing 63*.osm.gz to a list of matching file names. DOS type shells do not do this. Or something. I'll see if I can try this later on my Mac using bash. Cheers.

This may be caused by the different manner in which the shells interpret the command line options. If memory serves me correctly, bash type shells automatically expand file name wildcards, changing 63*.osm.gz to a list of matching file names. DOS type shells do not do this.
Yes, *nix shells expand wildcards but DOS shells don't and leave it up to the application to do the expansion.
I'll see if I can try this later on my Mac using bash.
That would be good. Cheers, Mark
participants (2)
-
Clinton Gladstone
-
Mark Burton