
On Sun, Apr 10, 2011 at 07:42:53PM +0200, WanMil wrote:
* The advantage to create multiple img files in one mkgmap run is that parsing and preparing of the OSM data must happen once only. Do you have numbers how many percent of the time is used for these steps?
No, I haven't collected any profiling data yet. Which tool would you recommend? For C and C++, which is what I have mainly been developing in, I have been using OProfile and before that, gprof. Which Java tools would come closest?
I would add some time logging to the mkgmap source code. There are profiles for Java but I think they are not useful for such a job.
Please be aware that mkgmap is optimized in such a way that it loads only the tags which are needed in the style file.
Based on the log files, multipolygon relations seem to be processed even though the style does not contain any rules for polygons. Could you fix this?
Yes that's an additional optimization. I can add an easy solution: Do not process a multipolygon if the mp does not contain any additional tag (other than type=multipolygon) and if no member way contains any tag.
* I think the layer concept will be too complicated. I would prefer to have multiple styles (as Torsten Leistikow proposed in a seperate mail).
Yes, doing it with styles would be better from the compatibility point of view. My example of "ski routes", "bus routes", "hiking routes", "mtb routes" could be implemented with a parameterized style, something like -style=routes --style-param route=bus.
* From my point of view it won't be possible to run mkgmap with different options for the different styles (or layers in your concept). Will this remove the timing advantages?
Why wouldn't it be possible to use style-specific options? I think that there could be a command line option for associating subsequent options with an output file. Something like this:
java -jar mkgmap.jar \ --output-file 10000001.img -c family1.args \ --output-file 20000001.img -c family2.args \ --output-file 30000001.img -c family3.args \ --input-file 10000001.osm.gz \ --output-file 10000002.img -c family1.args \ --output-file 20000002.img -c family2.args \ --output-file 30000002.img -c family3.args \ --input-file 10000002.osm.gz \ and so on, for every input file.
The problem arises with options that change the processing before the style come into play. Maybe that's a small number of options only: family1.args: generate-sea=multipolygon family2.args: generate-sea=polygon or family1.args: remove-short-arcs=3 family2.args: remove-short-arcs=-1 One need to check the complete list of options which will cause a problem.
Basically, we would have a producer (input file parser) and several consumers (img file generators). Each img file generator would have its own set of options.
Currently, another problem with multiple styles (apart from the alleged parsing overhead) is that the processing cannot be parallelized in an optimal way. For example, if I have an N-core machine, --max-jobs does the right thing for splitter and each mkgmap run, but my script would invoke mkgmap several times in succession, once for each output style. If I have M output tiles so that M is not an integer multiple of N, some cores would be sitting idle when a mkgmap run is close to completion.
I usually have more problems with memory limitations than CPU limitations (of course probably that's due to my hardware configuration). Your example with the N-core machine is not completely correct. You assume that each tile needs a fixed time T for compilation, so that you have some cores sitting idle at the end. But that's not a real example. Each tile needs a different time T(x) for compilation and so the CPU consumption decreases from 100% to 100%/N while compiling the last N tiles. After this the index and the other stuff (gmapsupp, nsis file etc.) is created with a single CPU only. I assume your new idea would not improve this point.
Best regards,
Marko
Have fun! WanMil