Possible splitter bug

Hi, I am processing a contours file that is 95MB in Gzipped format. If I don't use the --cache option then splitter gets confused and doesn't count the nodes properly, putting all the data into a single file: java -Xmx1500m -ea -jar "../../splitter/splitter-r97/splitter.jar" --mapid=63240201 Scotland_contours.osm cache= description= geonames-file= legacy-mode=false mapid=63240201 max-areas=255 max-nodes=1600000 mixed=false overlap=2000 resolution=13 split-file= write-kml= Time started: Fri Oct 16 12:25:48 BST 2009 Map is being split for resolution 13: - area boundaries are aligned to 0x800 map units - areas are multiples of 0x1000 map units wide and high The input osm file(s) will be re-parsed during the split (slower) because no --cache parameter was specified Processing Scotland_contours.osm A total of 21,813 nodes, 0 ways and 0 relations were processed in 1 file Min node ID = 1000000000 Max node ID = 1000021812 Time: Fri Oct 16 12:25:49 BST 2009 Exact map coverage is (56.000404357910156,-7.9987335205078125) to (60.38789749145508,-0.0012445449829101562) Rounded map coverage is (55.986328125,-8.0419921875) to (60.46875,0.0439453125) Splitting nodes into areas containing a maximum of 1,600,000 nodes each... 1 areas: Area 63240201 covers (0x27d000,0xfffa4800) to (0x2b0000,0x800) Writing out split osm files Fri Oct 16 12:25:49 BST 2009 Processing 1 areas in a single pass Starting pass 1 of 1, processing 1 areas (63240201 to 63240201) Processing Scotland_contours.osm Writing ways Fri Oct 16 12:25:49 BST 2009 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... Wrote 8,591,927 nodes, 113,945 ways, 0 relations Time finished: Fri Oct 16 12:27:47 BST 2009 Total time taken: 118s If I use the --cache option, it works properly. Here's the output this time: java -Xmx1500m -ea -jar "../../splitter/splitter-r97/splitter.jar" --cache=cache --mapid=63240201 Scotland_contours.osm cache=cache description= geonames-file= legacy-mode=false mapid=63240201 max-areas=255 max-nodes=1600000 mixed=false overlap=2000 resolution=13 split-file= write-kml= Time started: Fri Oct 16 12:00:00 BST 2009 Checking for an existing cache and verifying contents... No suitable cache was found. A new cache will be created to speed up the splitting stage Map is being split for resolution 13: - area boundaries are aligned to 0x800 map units - areas are multiples of 0x1000 map units wide and high Processing Scotland_contours.osm 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... A total of 8,591,927 nodes, 113,945 ways and 0 relations were processed in 1 file Min node ID = 1000000000 Max node ID = 1008591926 Time: Fri Oct 16 12:01:30 BST 2009 Exact map coverage is (56.000404357910156,-7.9987335205078125) to (60.38789749145508,-0.0012445449829101562) Rounded map coverage is (55.986328125,-8.0419921875) to (60.46875,0.0439453125) Splitting nodes into areas containing a maximum of 1,600,000 nodes each... 7 areas: Area 63240201 covers (0x27d000,0xfffa8800) to (0x28a000,0xfffc3800) Area 63240202 covers (0x27d000,0xfffc3800) to (0x28a000,0xfffcc800) Area 63240203 covers (0x28a000,0xfffa8800) to (0x290000,0xfffcc800) Area 63240204 covers (0x290000,0xfffa4800) to (0x2ab000,0xfffcc800) Area 63240205 covers (0x27d000,0xfffcc800) to (0x28e000,0xfffd6800) Area 63240206 covers (0x27d000,0xfffd6800) to (0x28e000,0xffff5800) Area 63240207 covers (0x28e000,0xfffcc800) to (0x2b0000,0x800) Writing out split osm files Fri Oct 16 12:01:30 BST 2009 Processing 7 areas in a single pass Starting pass 1 of 1, processing 7 areas (63240201 to 63240207) Loading and processing nodes 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... Loading and processing ways Writing ways Fri Oct 16 12:02:54 BST 2009 Loading and processing relations Wrote 8,591,927 nodes, 113,945 ways, 0 relations Time finished: Fri Oct 16 12:03:14 BST 2009 Total time taken: 193s -- Charlie

nope, you simply have not read instructions. If you include contourlines you have to use --mixed parameter. Charlie Ferrero wrote:
Hi,
I am processing a contours file that is 95MB in Gzipped format. If I don't use the --cache option then splitter gets confused and doesn't count the nodes properly, putting all the data into a single file:
java -Xmx1500m -ea -jar "../../splitter/splitter-r97/splitter.jar" --mapid=63240201 Scotland_contours.osm cache= description= geonames-file= legacy-mode=false mapid=63240201 max-areas=255 max-nodes=1600000 mixed=false overlap=2000 resolution=13 split-file= write-kml= Time started: Fri Oct 16 12:25:48 BST 2009 Map is being split for resolution 13: - area boundaries are aligned to 0x800 map units - areas are multiples of 0x1000 map units wide and high The input osm file(s) will be re-parsed during the split (slower) because no --cache parameter was specified Processing Scotland_contours.osm A total of 21,813 nodes, 0 ways and 0 relations were processed in 1 file Min node ID = 1000000000 Max node ID = 1000021812 Time: Fri Oct 16 12:25:49 BST 2009 Exact map coverage is (56.000404357910156,-7.9987335205078125) to (60.38789749145508,-0.0012445449829101562) Rounded map coverage is (55.986328125,-8.0419921875) to (60.46875,0.0439453125) Splitting nodes into areas containing a maximum of 1,600,000 nodes each... 1 areas: Area 63240201 covers (0x27d000,0xfffa4800) to (0x2b0000,0x800) Writing out split osm files Fri Oct 16 12:25:49 BST 2009 Processing 1 areas in a single pass Starting pass 1 of 1, processing 1 areas (63240201 to 63240201) Processing Scotland_contours.osm Writing ways Fri Oct 16 12:25:49 BST 2009 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... Wrote 8,591,927 nodes, 113,945 ways, 0 relations Time finished: Fri Oct 16 12:27:47 BST 2009 Total time taken: 118s
If I use the --cache option, it works properly. Here's the output this time: java -Xmx1500m -ea -jar "../../splitter/splitter-r97/splitter.jar" --cache=cache --mapid=63240201 Scotland_contours.osm cache=cache description= geonames-file= legacy-mode=false mapid=63240201 max-areas=255 max-nodes=1600000 mixed=false overlap=2000 resolution=13 split-file= write-kml= Time started: Fri Oct 16 12:00:00 BST 2009 Checking for an existing cache and verifying contents... No suitable cache was found. A new cache will be created to speed up the splitting stage Map is being split for resolution 13: - area boundaries are aligned to 0x800 map units - areas are multiples of 0x1000 map units wide and high Processing Scotland_contours.osm 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... A total of 8,591,927 nodes, 113,945 ways and 0 relations were processed in 1 file Min node ID = 1000000000 Max node ID = 1008591926 Time: Fri Oct 16 12:01:30 BST 2009 Exact map coverage is (56.000404357910156,-7.9987335205078125) to (60.38789749145508,-0.0012445449829101562) Rounded map coverage is (55.986328125,-8.0419921875) to (60.46875,0.0439453125) Splitting nodes into areas containing a maximum of 1,600,000 nodes each... 7 areas: Area 63240201 covers (0x27d000,0xfffa8800) to (0x28a000,0xfffc3800) Area 63240202 covers (0x27d000,0xfffc3800) to (0x28a000,0xfffcc800) Area 63240203 covers (0x28a000,0xfffa8800) to (0x290000,0xfffcc800) Area 63240204 covers (0x290000,0xfffa4800) to (0x2ab000,0xfffcc800) Area 63240205 covers (0x27d000,0xfffcc800) to (0x28e000,0xfffd6800) Area 63240206 covers (0x27d000,0xfffd6800) to (0x28e000,0xffff5800) Area 63240207 covers (0x28e000,0xfffcc800) to (0x2b0000,0x800) Writing out split osm files Fri Oct 16 12:01:30 BST 2009 Processing 7 areas in a single pass Starting pass 1 of 1, processing 7 areas (63240201 to 63240207) Loading and processing nodes 2,500,000 nodes processed... 5,000,000 nodes processed... 7,500,000 nodes processed... Loading and processing ways Writing ways Fri Oct 16 12:02:54 BST 2009 Loading and processing relations Wrote 8,591,927 nodes, 113,945 ways, 0 relations Time finished: Fri Oct 16 12:03:14 BST 2009 Total time taken: 193s

Felix Hartmann wrote:
nope, you simply have not read instructions. If you include contourlines you have to use --mixed parameter. I'm always eager to read instructions, but there's no mention of --mixed on the splitter instruction page at http://www.mkgmap.org.uk/page/tile-splitter
Anyway... I've also compiled other contour maps fine without using --mixed (though in those cases I was using my own areas.list). The --help output for splitter, on the subject of --mixed, says: "Specify this if the input osm file has nodes, ways and relations intermingled". Should it actually say "Specify this if the input osm file has nodes, ways and relations intermingled and you are not providing your own areas definitions through the --split-file option"? Charlie

I've been away for a few weeks and won't have much time to work on the splitter for a bit longer yet but I've made a note to update the docs on the website to bring it in line with the latest splitter version. As for your original query, there's a few different scenarios so let me try to explain each of them: First off, when --cache isn't specified, the splitter scans through the osm file in an initial pass to find all the nodes so it can figure out how they are distributed, and hence generate the areas.list file. Because parsing the XML is quite slow, as an optimisation the splitter by default stops looking for more nodes the moment it discovers the first <way> or <relation> tag because it assumes the <node> tags all appear at the top of the file. If you happen to have an osm file where the tags are mixed in with each other (as is the case with your contour file) you have to specify --mixed option to force the splitter to keep scanning the XML right to the very end so all the nodes are found on this first pass. Without that, the splitter will only find a few nodes and end up generating a bad areas.list file, thus messing up the rest of the split. When the --cache parameter is used, the splitter behaves quite differently. It first makes a pass through the *entire* osm file, building up a cache of all nodes/ways/rels (while still gathering the node information it needs to generate areas.list). Because the entire file is processed during the cache generation, it doesn't matter if the XML contains mixed nodes/ways/rels or not. So with --cache enabled, the --mixed parameter becomes redundant. Similarly (and as you point out), if you use --split-file then --mixed becomes redundant too because the first pass isn't required. Hopefully that explains the behaviour you're experiencing. I'll try to get the docs (and code) updated to make it a bit clearer what's going on. Chris CF> Felix Hartmann wrote: CF>
nope, you simply have not read instructions. If you include contourlines you have to use --mixed parameter.
CF> I'm always eager to read instructions, but there's no mention of CF> --mixed on the splitter instruction page at CF> http://www.mkgmap.org.uk/page/tile-splitter CF> CF> Anyway... CF> CF> I've also compiled other contour maps fine without using --mixed CF> (though in those cases I was using my own areas.list). CF> CF> The --help output for splitter, on the subject of --mixed, says: CF> "Specify this if the input osm file has nodes, ways and relations CF> intermingled". Should it actually say "Specify this if the input CF> osm file has nodes, ways and relations intermingled and you are not CF> providing your own areas definitions through the --split-file CF> option"? CF> CF> Charlie CF>

Quoting Chris Miller <chris.miller@kbcfp.com>:
I've been away for a few weeks and won't have much time to work on the splitter for a bit longer yet but I've made a note to update the docs on the website to bring it in line with the latest splitter version.
As for your original query, there's a few different scenarios so let me try to explain each of them:
First off, when --cache isn't specified, the splitter scans through the osm file in an initial pass to find all the nodes so it can figure out how they are distributed, and hence generate the areas.list file. Because parsing the XML is quite slow, as an optimisation the splitter by default stops looking for more nodes the moment it discovers the first <way> or <relation> tag because it assumes the <node> tags all appear at the top of the file. If you happen to have an osm file where the tags are mixed in with each other (as is the case with your contour file) you have to specify --mixed option to force the splitter to keep scanning the XML right to the very end so all the nodes are found on this first pass. Without that, the splitter will only find a few nodes and end up generating a bad areas.list file, thus messing up the rest of the split.
When the --cache parameter is used, the splitter behaves quite differently. It first makes a pass through the *entire* osm file, building up a cache of all nodes/ways/rels (while still gathering the node information it needs to generate areas.list). Because the entire file is processed during the cache generation, it doesn't matter if the XML contains mixed nodes/ways/rels or not. So with --cache enabled, the --mixed parameter becomes redundant.
Similarly (and as you point out), if you use --split-file then --mixed becomes redundant too because the first pass isn't required.
Hopefully that explains the behaviour you're experiencing. I'll try to get the docs (and code) updated to make it a bit clearer what's going on.
Chris
Chris, Thanks for explaining it...what you've written would be a fantastic addition to the wiki even if it just went in verbatim! I guess one suggestion would be that if you need to use --mixed on certain types of data, it would be good if splitter told you that*, rather than it having a go but not really doing anything and then not obviously failing. In other words (and I stress imho) it's preferable from a user perspective for software to fail cleanly with an explanation of why it failed, rather than for it to seemingly continue to work but not do what it's supposed to do. * Obviously, if it could do this, it ought to be able to assume --mixed implicitly and proceed anyway outputting a warning that this is what it was doing! Charlie
participants (4)
-
Charlie Ferrero
-
charlieï¼ cferrero.net
-
Chris Miller
-
Felix Hartmann