
bz2 is *very* slow to decompress, so yes if you have the space I'd recommend decompressing the osm first before running the splitter (since the splitter has to make a minimum of two passes over the file, thus also decompressing it at least twice). The (limited and simple) benchmarks I tried with .bz2 vs .osm showed that .bz2 splitting takes ~6 times longer than an uncompressed .osm file. As for gz - it is quite a lot faster to deal with than bz2 though I haven't done any benchmarking with it as far as the splitter is concerned. My guess is that uncompressed will still win out unless you have fairly slow disks and a very fast CPU. Interestingly, it's theoretically possible to parallelise bz2 compression/decompression algorithm to give an almost linear performance improvements per core. Implementing this would be a big job but on a 4+ core machine would make a pretty significant difference. It's on my todo list but please don't hold your breath! Chris
Just out of interest, what performance gains (or disadvantages) would there be to working with uncompressed files, instead of bz2 and gz files?
Would this be faster for those of us with copious amounts of disk space, or would the extra IO negate any CPU-related performance gains?
I know that Osmosis performance on multi-core systems can apparently be improved by piping the OSM file through a decompression program, but I assume that would not be practical for Splitter which must make several passes through the file.
Cheers.