
It is also worth noting something that osmcut (an other map splitter) does. It has two modes; one where a map is used for when the input file is small and another where a list is used and the node-id is the index into the list. The later is used for splitting the whole planet file.
Ah I didn't even realise there was another map splitter! Thanks for the pointer and info. I've already been considering various ways to trade off memory & performance based on the size and content of the dataset (including using an indexed array), but comparing my ideas with what osmcut does looks helpful.
The second method uses less memory where the memory wasted by node-ids that are not in use is less than the amount of memory used by storing the node-id plus the memory wasted in free space in the hash maps. As currently most nodes are used (about 88%) this is a big win on the whole planet file, while still giving access to the node ids.
Makes sense. From what I can see osmcut appears to be only subdividing into evenly sized tiles though, rather than quadtree partitioning. The advantage they have there is they only need to hold 2 bytes/node (as a tile ID) rather than 4 bytes/node (lat+lon), and they don't need to perform a secondary pass to determine which way belongs in which tile. That's OK though, it's a more interesting challenge taking the quadtree approach :)
I've committed the patch.
Thanks! The next patch I produce will likely include a third party XPP parser jar file and associated changes to the IDEA project file. What's the best way to deal with this - create a build.xml so non-IDEA users can build it too? Chris