Re: [mkgmap-dev] Increasing the performance of whole-planet splits.

10 Jun 2010

      ...
It'll be a few days before I have time to look at this in much detail and
give you my thoughts but from what I can see it sounds promising. We
actually
use protobufs at my job for shunting around vast amounts of data so I'm
pretty
familiar with the pros and cons of them. One concern is that even if your
binary format becomes reasonably standardised in the OSM world we'll still
need to maintain .osm support for the forseeable future.Perhaps that can
be tackled by swapping out the existing cache format with your protobuf
...
format.
I envisioned it as an alternate format that could exist in parallel with the
osm,
but also be smaller and faster. I still need to clean up the osmosis patches
and
turn them into a plugin.
...
Even that I'm not sure about in the longer term; to tackling some of the
harder problems with the splitter will require more than just a linear
protobuf
or cache format. Data structures such as Btree indexes and so on will
likely
be required, though I'm sure we can cross that bridge when we come to it.
What harder problems do you think the splitter needs solved?

I'm not so sure about your lazy HashMap initialisation in Element.java,
...
though
it does highlight a bug that was introduced by the recent r109
multithreaded
patch! Prior to patch r109, Element objects were never recreated, rather
.reset() was called to just reset their state for each new node/way/rel.
The reason for this was the costs of initializing were showing up in the
profile.
...
r109 changed that to create new instances each time, hence some of the
object
churn you were probably seeing that lead you to your patch. I'll have a
think
about to do here and probably run a few benchmarks to see what works best
(eg keeping one current Node/Way/Rel per thread might be best).
May I suggest holding off on this for a bit? The branch that adds support
for reading
the binary file format refactors this interface so that node/way/rel objects
are passed
directly instead of using an interface that looks like an XML callbacks.

[As an aside, note that in your lazy HashMap patch Collections.EMPTY_SET
...
results in an unchecked assignment warning. That of course is no big deal,
easily fixed with Collections.<String, String>emptyMap().entrySet(). The
sting though is that the call to Collections.EMPTY_SET.iterator() actually
creates a new iterator object each time(!). Better would be to create a
single
static immutable iterator and return that]
Good catch. Should I respin the patch or will you?
...
...
My multithread changes reduced the real and user CPU time by 30-50% on
a uk extract. Other tests, e.g., a whole planet, show almost no
effect. I think this is because a lot of my changes increase the
efficiency of code that is parallelized, which paradoxically ends up
reducing the relative benefits of parellelization.
This sounds promising, I'll look at what you've done further in the next
couple of days and hopefully get it committed this weekend.
...
The single biggest remaining slowdown is that for each point, the
splitter sends each Node to each area (represented by an OSMWriter),
which writes it in the output file if it is in the bounds of the
region (with overlap), or does nothing otherwise. On a whole planet,
with 840 areas and 550M nodes, that is over 5 trillion bounds-checks
that are not run in parallel. An index that can can map a node to the
regions that may contain it could avoid 99% of these checks.
This is something I've been aware of for a long time but haven't found the
time to tackle yet. One idea I have is to generate an r-tree in memory and
run the node comparisons against that to find out what area(s) each node
falls into.
josm already includes a quadtree implementation. I wish it could be
disentangled and
re-used across the different OSM applications. It'd be a perfect solution
here.
...
I'd estimate it would reduce the number of comparisons required
from 840/node to less than 10. I suspect that would be a win over
generating
a node to area index which would be expensive to create intially, plus
expensive
in terms of disk access (or alternatively, require lots of RAM). Also, with
a bit of effort these r-tree comparisons could be parallelised too.
Perfection isn't necessary.

How about a 64x64 array. Divide the world into 4096 regions. Each region
contains a list of all areas --- including their overlap --- that intersect
it. Then, for each node, you only need to iterate the OSMWriters in that
region. Getting the round-offs right would be tricky, but it should be good
enough filter out almost all non-matching areas.

I'll follow up in a while once I've had a bit more time to investigate and
...
respond. Many many thanks for your work on all this, very much appreciated!
Chris
Thanks for a useful program. Glad I could help make it better.

Scott

Re: [mkgmap-dev] Increasing the performance of whole-planet splits.

Scott Crosby