data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
Exactly, uncompressing is pretty hard on the CPU so ideally we'd want to make as few passes as possible. There are other factors too though, eg I'm thinking about doing the decompression on one thread and the parsing/splitting on another thread if there is more than one core available. I haven't yet tried to profile the splitter but once I've got rid of most of the low-hanging fruit I'll explore options like that a bit further. At this stage it's still hard to know whether additional passes are going to be better or slower than paging some of the data to disk. Chris
Just an thought from reading the thread: Multiple parsing runs with an bz2 zipped file could do worse to the performance. It would mean multiple decompressing of the input files. And in my experience decompressing bz2 costs a lot of resources. (In my case I'm directly using the osm.bz2 files from geofabrik as input.)
Regards, Johann