I can't say for sure what the pbf reader is doing in detail, but it is for sure creating a
lot more temporary objects which have to be GCed. In the o5m reader I tried
to avoid that. In fact, in the current implementation, the o5m reader still
pbf reader.
I'll look at your logs soon. I am working on a tuning guide for splitter, because
I found a lot of nonsense in the net searching for splitter.
> Date: Thu, 13 Dec 2012 15:27:07 +0100
> From: wmgcnfg@web.de
> To: mkgmap-dev@lists.mkgmap.org.uk
> Subject: [mkgmap-dev] Splitter pbf vs o5m processing
>
> > Hi Steve,
> >
> >
> > Steve Ratcliffe wrote
> >> Hello Gerd
> >>
> >>> no, it is not (yet). I plan to add o5m support to mkgmap soon. With my
> >>> patch you can use splitter
> >>
> >> As an aside, what do you think it is about the o5m format that makes
> >> it quicker than pbf?
> >
> > Well, not easy to say. I think it's a combination of many small points:
> > 1) pbf uses (by default) compressied blocks, so you have to unzip a complete
> > block before you can
> > use any information in the block.
> > 2) pbf read routines create a lot of temporary objects, this seems to stress
> > GC
> > 3) pbf doesn't allow to skip processing of node tags or way tags, but
> > splitters' read passes often don't need them. So, with pbf we create lists
> > of tags and return them to GC, with o5m we can simply skip them.
> >
> > To be fair, using the --drop-version parm in osmconvert removes a lot of
> > info which is ignored by splitter and mkgmap. I did never try what effect is
> > has to use pbf input that was created with this parm.
> >
> > When writing, o5m is probably only faster because it doesn't zip the data.
> > As long as mkgmap doesn't understand o5m I see no benefit in using this.
> >
> > Maybe other computers show different results, esp. if the CPU is much faster
> > than mine and the Disk access is slower.
> > By the way: my patch also speeds up pbf reading a little bit.
> >
> > Ciao,
> > Gerd
>
> Hi Gerd
>
> I've done some tests with the latest splitter version r255.
> I have split the geofabrics europe extract in pbf and o5m format.
>
> As you pointed out o5m processing is much quicker (8528s vs. 12939s).
> I also observed that pbf seemed to use more memory than o5m and
> therefore I activated gc logging and checked it with garbagecat.
>
> The interesting values are
> Throughput
> o5m: 94%
> pbf: 61%
> So 3400m seems to be too small for pbf processing to workout the europe
> extract so that the GC runs permanently.
>
> Total Pause:
> o5m: 527816ms = 528s
> pbf: 5093916ms = 5094s
> Wow, so for pbf GC requires 4566s more time.
>
> Subtracting the GC time from the total processing time o5m and pbf need
> quite the same time:
> o5m: 8528s - 528s = 8000s
> pbf: 12939s - 5094s = 7845s
>
> Obviously a part of the difference in GC time can be explained with your
> thoughts (pbf must extract all parts and must read tags which are thrown
> away directly afterwards). But do you think that the whole difference
> can be explained with that?
>
> I will post my logfiles directly to you because they are too big to be
> posted on the mailing list.
>
> WanMil
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev