
Hi WanMil, I can't say for sure what the pbf reader is doing in detail, but it is for sure creating a lot more temporary objects which have to be GCed. In the o5m reader I tried to avoid that. In fact, in the current implementation, the o5m reader still reads and saves the tags to the internal string table, so that is similar to the pbf reader. I'll look at your logs soon. I am working on a tuning guide for splitter, because I found a lot of nonsense in the net searching for splitter. Ciio, Gerd
Date: Thu, 13 Dec 2012 15:27:07 +0100 From: wmgcnfg@web.de To: mkgmap-dev@lists.mkgmap.org.uk Subject: [mkgmap-dev] Splitter pbf vs o5m processing
Hi Steve,
Steve Ratcliffe wrote
Hello Gerd
no, it is not (yet). I plan to add o5m support to mkgmap soon. With my patch you can use splitter
As an aside, what do you think it is about the o5m format that makes it quicker than pbf?
Well, not easy to say. I think it's a combination of many small points: 1) pbf uses (by default) compressied blocks, so you have to unzip a complete block before you can use any information in the block. 2) pbf read routines create a lot of temporary objects, this seems to stress GC 3) pbf doesn't allow to skip processing of node tags or way tags, but splitters' read passes often don't need them. So, with pbf we create lists of tags and return them to GC, with o5m we can simply skip them.
To be fair, using the --drop-version parm in osmconvert removes a lot of info which is ignored by splitter and mkgmap. I did never try what effect is has to use pbf input that was created with this parm.
When writing, o5m is probably only faster because it doesn't zip the data. As long as mkgmap doesn't understand o5m I see no benefit in using this.
Maybe other computers show different results, esp. if the CPU is much faster than mine and the Disk access is slower. By the way: my patch also speeds up pbf reading a little bit.
Ciao, Gerd
Hi Gerd
I've done some tests with the latest splitter version r255. I have split the geofabrics europe extract in pbf and o5m format.
As you pointed out o5m processing is much quicker (8528s vs. 12939s). I also observed that pbf seemed to use more memory than o5m and therefore I activated gc logging and checked it with garbagecat.
The interesting values are Throughput o5m: 94% pbf: 61% So 3400m seems to be too small for pbf processing to workout the europe extract so that the GC runs permanently.
Total Pause: o5m: 527816ms = 528s pbf: 5093916ms = 5094s Wow, so for pbf GC requires 4566s more time.
Subtracting the GC time from the total processing time o5m and pbf need quite the same time: o5m: 8528s - 528s = 8000s pbf: 12939s - 5094s = 7845s
Obviously a part of the difference in GC time can be explained with your thoughts (pbf must extract all parts and must read tags which are thrown away directly afterwards). But do you think that the whole difference can be explained with that?
I will post my logfiles directly to you because they are too big to be posted on the mailing list.
WanMil _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev