[PATCH v1] Experimental support for multi-threading

Hi Folks, I have recently gained access to a monster box containing 16 3GHz cores and 32 G of memory so I thought it would make an ideal machine to test a multi-threaded version of mkgmap. As a lot of the processing has to be done in sequence, the opportunities for using multiple threads are somewhat limited. However, the attached patch parallelises: 1 - processing of ways from OSM form to MapLine/MapRoad. 2 - processing of cities 3 - processing of POIs 4 - processing of polylines and polyshapes within each division At this time, the performance gain is useful but not stunning (typically, 25-30% speedup when using at least 4 cores). There is a small gain even when only using 2 cores. With the patch in place, it will default to using as many threads as you have cores. If you wish, you can explicitly specify the number of threads to use with the --num-threads=N option. You may specify more threads than you have cores but it won't make it any faster. Specifying less threads than cores is useful for testing purposes or disabling the parallel processing (by specifying just 1 thread). One side effect of the multi-threading is that the elements are written to the output file in a random order. If you specify the (existing) --preserve-element-order option, the order will be preserved but all of the threading will be disabled! The patch has received some testing using mapsource and I think I have removed the worst of the bugs but I would very much appreciate it if people could try it out and see if it causes any breakage. Obviously, I would also like to know how well it performs. As it has only been tested under Linux any reports from other platforms would be especially useful. Cheers, Mark

I didn't get this exception prior to applying the patch: /-------- | Exception in thread "main" java.lang.IllegalStateException: Offset not known yet. | at uk.me.parabola.imgfmt.app.lbl.POIRecord.getOffset(POIRecord.java:377) | at uk.me.parabola.imgfmt.app.trergn.Point.write(Point.java:61) | at uk.me.parabola.imgfmt.app.trergn.RGNFile.addMapObject(RGNFile.java:96) | at uk.me.parabola.imgfmt.app.map.Map.addMapObject(Map.java:237) | at uk.me.parabola.mkgmap.build.MapBuilder.processPoints(MapBuilder.java:817) | at uk.me.parabola.mkgmap.build.MapBuilder.makeSubdivision(MapBuilder.java:697) | at uk.me.parabola.mkgmap.build.MapBuilder.makeMapAreas(MapBuilder.java:633) | at uk.me.parabola.mkgmap.build.MapBuilder.makeMap(MapBuilder.java:178) | at uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:90) | at uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:56) | at uk.me.parabola.mkgmap.main.Main.processFilename(Main.java:163) | at uk.me.parabola.mkgmap.CommandArgs$Filename.processArg(CommandArgs.java:340) | at uk.me.parabola.mkgmap.CommandArgs.readArgs(CommandArgs.java:119) | at uk.me.parabola.mkgmap.main.Main.main(Main.java:98) \-------- Could that be caused by the parallelisation?

Hi Toby,
I didn't get this exception prior to applying the patch:
/-------- | Exception in thread "main" java.lang.IllegalStateException: Offset not known yet. | at uk.me.parabola.imgfmt.app.lbl.POIRecord.getOffset(POIRecord.java:377) | at uk.me.parabola.imgfmt.app.trergn.Point.write(Point.java:61) | at uk.me.parabola.imgfmt.app.trergn.RGNFile.addMapObject(RGNFile.java:96) | at uk.me.parabola.imgfmt.app.map.Map.addMapObject(Map.java:237) | at uk.me.parabola.mkgmap.build.MapBuilder.processPoints(MapBuilder.java:817) | at uk.me.parabola.mkgmap.build.MapBuilder.makeSubdivision(MapBuilder.java:697) | at uk.me.parabola.mkgmap.build.MapBuilder.makeMapAreas(MapBuilder.java:633) | at uk.me.parabola.mkgmap.build.MapBuilder.makeMap(MapBuilder.java:178) | at uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:90) | at uk.me.parabola.mkgmap.main.MapMaker.makeMap(MapMaker.java:56) | at uk.me.parabola.mkgmap.main.Main.processFilename(Main.java:163) | at uk.me.parabola.mkgmap.CommandArgs$Filename.processArg(CommandArgs.java:340) | at uk.me.parabola.mkgmap.CommandArgs.readArgs(CommandArgs.java:119) | at uk.me.parabola.mkgmap.main.Main.main(Main.java:98) \--------
Could that be caused by the parallelisation?
Absolutely! I have weeded a few similar bugs out but, obviously, some are still lurking and are not being triggered by my test maps. I shall investigate. Cheers, Mark PS - if any other diagnostic messages appeared before the exception, please post them here.

0> In article <20090509172811.36c3c8b8@crow>, 0> Mark Burton <URL:mailto:markb@ordern.com> ("Mark") wrote: Mark> PS - if any other diagnostic messages appeared before the exception, Mark> please post them here. Just these, which were previously present: /-------- | SEVERE (StyledConverter): Way Rosscoff - Cork (OSM id 27279778) contains a segment that is longer than 16383 (routing will fail for that way) | SEVERE (StyledConverter): Way Rosslare - Cherbourg by IrishFerries.com (OSM id 33408220) contains a segment that is longer than 16383 (routing will fail for that way) | SEVERE (StyledConverter): Way Rosslare - Cherbourg by IrishFerries.com (OSM id 33408221) contains a segment that is longer than 16383 (routing will fail for that way) \-------- Also, I forgot to mention that mkgmap hangs after the exception, rather than exiting with an error. Thanks for looking into it. FWIW, I do see the exception each time I run it (with 4 CPUs). ADDENDUM - just tried with --num-threads=1 and I still get the exception - any ideas?

Hi Toby,
Also, I forgot to mention that mkgmap hangs after the exception, rather than exiting with an error.
Yes, that's quite possible when an exception occurs.
Thanks for looking into it. FWIW, I do see the exception each time I run it (with 4 CPUs). ADDENDUM - just tried with --num-threads=1 and I still get the exception - any ideas?
Not yet, I can't reproduce it here. What options are you calling mkgmap with? Cheers, Mark

I'm calling mkgmap as /-------- | java -ea -Xmx3500m -jar | /home/tms/maps/mkgmap/trunk/dist/mkgmap.jar --num-threads=1 --description="UK" --latin1 --style-file=toby-style --net --route --country-name="UNITED | KINGDOM" --country-abbr="GBR" --family-id=6324 --product-id=6324 --area-name="Great | Britain" --family-name="Open Streetmap" --gmapsupp --net --route | 63240???.osm.gz 50??????.osm.gz \-------- I'll try reverting to SVN HEAD and see if I can repro the symptoms there. I'll also try downloading a newer great_britain.osm.bz2 (after midnight, when it's untolled!).

On Sat, 09 May 2009 19:57:38 +0100 Toby Speight <T.M.Speight.90@cantab.net> wrote:
I'm calling mkgmap as
/-------- | java -ea -Xmx3500m -jar | /home/tms/maps/mkgmap/trunk/dist/mkgmap.jar --num-threads=1 --description="UK" --latin1 --style-file=toby-style --net --route --country-name="UNITED | KINGDOM" --country-abbr="GBR" --family-id=6324 --product-id=6324 --area-name="Great | Britain" --family-name="Open Streetmap" --gmapsupp --net --route | 63240???.osm.gz 50??????.osm.gz \--------
I'll try reverting to SVN HEAD and see if I can repro the symptoms there. I'll also try downloading a newer great_britain.osm.bz2 (after midnight, when it's untolled!).
Thanks for that info, it all looks perfectly innocuous. I can't see the problem at the moment - I shall have to add some more debugging output. Cheers, Mark

Hi Toby, I just can't see it at the moment. What version of Java are you using? Cheers, Mark

Toby, How about trying adding -Dlog.config=/SOMEPATH/logging.properties and using the attached logging.properties file and post what it produces? It may contain something useful. Cheers, Mark

</panic> <apology> I just tried reverting the changed files to confirm that an unpatched r1033 doesn't suffer from the same symptoms (it doesn't), then re-applied the patch. And it all seems to be working okay! Which is good, of course, but I've rather wasted your time a bit - sorry!

Hi Toby,
</panic> <apology> I just tried reverting the changed files to confirm that an unpatched r1033 doesn't suffer from the same symptoms (it doesn't), then re-applied the patch. And it all seems to be working okay! Which is good, of course, but I've rather wasted your time a bit - sorry!
Hmm, weird. Well, if it comes back please let me know. Cheers, Mark

Mark Burton wrote:
Hi Folks,
I have recently gained access to a monster box containing 16 3GHz cores and 32 G of memory so I thought it would make an ideal machine to test a multi-threaded version of mkgmap.
The patch has received some testing using mapsource and I think I have removed the worst of the bugs but I would very much appreciate it if people could try it out and see if it causes any breakage.
Obviously, I would also like to know how well it performs.
As it has only been tested under Linux any reports from other platforms would be especially useful.
Using a patched r1033 on openSUSE_11.1 64bit with 4GB RAM I successfully compiled a map from todays great_britain.osm from Geofabrik. Command line is:- java -Xmx2048M -jar mkgmap.jar --latin1 --gmapsupp --route --region-name="Great Britain" --region-abbr="GBR" --net --code-page=1252 --country-name="UNITED KINGDOM" --country-abbr=UK --tdbfile --tdb-v4 --stylefile=./mystylefiles --description="Map of GB" 700000??.osm I have no actual timings for run time but system monitor showed both cores extensively used throughout. Cheers Paul

Hi Paul,
I have no actual timings for run time but system monitor showed both cores extensively used throughout.
Thanks for the feedback. At least it didn't blow up! With 2 cores the gain is quite small but worth having none the less (as long as it's reliable). You could just use "time" to output the process times. The ratio of CPU time (user + system) to elapsed will give you an indication of how well the 2 cores were utilised. The time command may be built in to the shell but if you use /usr/bin/time you can give it a -v option and get some useful info out of it. The other useful command is mpstat which can print out times for all of the cores separately, i.e. mpstat -P ALL 1 That reports the times for all cores every second. Cheers, Mark

As a lot of the processing has to be done in sequence, the opportunities for using multiple threads are somewhat limited. However, the attached patch parallelises:
1 - processing of ways from OSM form to MapLine/MapRoad.
2 - processing of cities
3 - processing of POIs
4 - processing of polylines and polyshapes within each division
In this situation (a lot of sequential processing), wouldn't it be an better approach to start several processes of mkgmap with different tiles? This could be fully parallelized without dependencies. Afterwards the different tiles had to be merged into an single img file. This process could be started in another extra process with the finishing of the first tile.

Johann Gail wrote:
As a lot of the processing has to be done in sequence, the opportunities for using multiple threads are somewhat limited. However, the attached patch parallelises:
1 - processing of ways from OSM form to MapLine/MapRoad.
2 - processing of cities
3 - processing of POIs
4 - processing of polylines and polyshapes within each division
In this situation (a lot of sequential processing), wouldn't it be an better approach to start several processes of mkgmap with different tiles?
This could be fully parallelized without dependencies.
Afterwards the different tiles had to be merged into an single img file. This process could be started in another extra process with the finishing of the first tile.
I would support that approach too. If you parse several .osm or .gz files in one command to mkgmap. It should try to run them in several processes respecting the maximum java heapspace you give (so split up the heapspace to the processes). running two instances of mkgmap I get often problems with Java heap space (even though there is enough available). I suspect there is some other error.
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi Johann,
As a lot of the processing has to be done in sequence, the opportunities for using multiple threads are somewhat limited. However, the attached patch parallelises:
1 - processing of ways from OSM form to MapLine/MapRoad.
2 - processing of cities
3 - processing of POIs
4 - processing of polylines and polyshapes within each division
In this situation (a lot of sequential processing), wouldn't it be an better approach to start several processes of mkgmap with different tiles?
This could be fully parallelized without dependencies.
Afterwards the different tiles had to be merged into an single img file. This process could be started in another extra process with the finishing of the first tile.
That's fine as long as you have the memory available and are processing more than one tile. I wrote the patch as I was intrigued as to how much performance gain could be achieved using multi-threading. It's not a stunning improvement but certainly good enough to consider doing more work on it. Cheers, Mark
participants (5)
-
Felix Hartmann
-
Johann Gail
-
Mark Burton
-
Paul
-
Toby Speight