
2009/8/15 Chris Miller <chris.miller@kbcfp.com>:
Hi Paul,
This is because the .osm file you are splitting has a node with no 'lat' attribute. As far as I'm aware this shouldn't happen; something is probably wrong with your .osm file? I downloaded the europe.osm file a few hours after you posted your message and it processes without a problem. My europe.osm file is 28,626,280,448 bytes in size.
My file was about 200KB smaller...
Anyway I've put in a check for this since in this case the change is simple and it doesn't have much affect on performance. The splitter will now output details of the problem, ignore the node and carry on with the split. Generally speaking though it's not such a good idea to put too much validation of the XML into the splitter because it will just complicate the code and slow things down. I guess if we any further problems like this we'll have to decide what's best on a case-by-case basis.
I think that putting additional validation in the splitter for the case of reporting and ignoring bad data is useful. I can understand the problem you mentioned about performance -- it is almost always the case of making compromise between robustness and speed. The alternative would be creating additional app that would filter the bad data ("bad" as defined in splitter, not necessarily OSM) and write valid xml to file for processing with splitter. It is almost always the case you cannot assume that data coming from outside of your private framework is valid. The use case I could imagine would be processing data with splitter, if it fails, then preprocessing it with "cleaner" application and starting splitter once again on the cleaned data. It is just an idea. If we come across significant speed reduction because of necessary robustness then splitting validation/cleaning and processing might be a good way to go. Thanks for the fix :) Paul -- Don't take life too seriously; you will never get out of it alive. -- Elbert Hubbard