Hi WanMil,

I can't say for sure what the pbf reader is doing in detail, but it is for sure creating a
lot more temporary objects which have to be GCed. In the o5m reader I tried
to avoid that. In fact, in the current implementation, the o5m reader still
reads and saves the tags to the internal string table, so that is similar to the
pbf reader.

I'll look at your logs soon. I am working on a tuning guide for splitter, because
I found a lot of nonsense in the net searching for splitter.

Ciio,
Gerd

> Date: Thu, 13 Dec 2012 15:27:07 +0100
> From: wmgcnfg@web.de
> To: mkgmap-dev@lists.mkgmap.org.uk
> Subject: [mkgmap-dev] Splitter pbf vs o5m processing
>
> > Hi Steve,
> >
> >
> > Steve Ratcliffe wrote
> >> Hello Gerd
> >>
> >>> no, it is not (yet). I plan to add o5m support to mkgmap soon. With my
> >>> patch you can use splitter
> >>
> >> As an aside, what do you think it is about the o5m format that makes
> >> it quicker than pbf?
> >
> > Well, not easy to say. I think it's a combination of many small points:
> > 1) pbf uses (by default) compressied blocks, so you have to unzip a complete
> > block before you can
> > use any information in the block.
> > 2) pbf read routines create a lot of temporary objects, this seems to stress
> > GC
> > 3) pbf doesn't allow to skip processing of node tags or way tags, but
> > splitters' read passes often don't need them. So, with pbf we create lists
> > of tags and return them to GC, with o5m we can simply skip them.
> >
> > To be fair, using the --drop-version parm in osmconvert removes a lot of
> > info which is ignored by splitter and mkgmap. I did never try what effect is
> > has to use pbf input that was created with this parm.
> >
> > When writing, o5m is probably only faster because it doesn't zip the data.
> > As long as mkgmap doesn't understand o5m I see no benefit in using this.
> >
> > Maybe other computers show different results, esp. if the CPU is much faster
> > than mine and the Disk access is slower.
> > By the way: my patch also speeds up pbf reading a little bit.
> >
> > Ciao,
> > Gerd
>
> Hi Gerd
>
> I've done some tests with the latest splitter version r255.
> I have split the geofabrics europe extract in pbf and o5m format.
>
> As you pointed out o5m processing is much quicker (8528s vs. 12939s).
> I also observed that pbf seemed to use more memory than o5m and
> therefore I activated gc logging and checked it with garbagecat.
>
> The interesting values are
> Throughput
> o5m: 94%
> pbf: 61%
> So 3400m seems to be too small for pbf processing to workout the europe
> extract so that the GC runs permanently.
>
> Total Pause:
> o5m: 527816ms = 528s
> pbf: 5093916ms = 5094s
> Wow, so for pbf GC requires 4566s more time.
>
> Subtracting the GC time from the total processing time o5m and pbf need
> quite the same time:
> o5m: 8528s - 528s = 8000s
> pbf: 12939s - 5094s = 7845s
>
> Obviously a part of the difference in GC time can be explained with your
> thoughts (pbf must extract all parts and must read tags which are thrown
> away directly afterwards). But do you think that the whole difference
> can be explained with that?
>
> I will post my logfiles directly to you because they are too big to be
> posted on the mailing list.
>
> WanMil
> _______________________________________________
> mkgmap-dev mailing list
> mkgmap-dev@lists.mkgmap.org.uk
> http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev