bugreport for new splitter

Hello! I have just testet the latest svn version of splitter (r64) I get the following exception: 12,000,000 ways processed... [GC 926169K->330369K(1422720K), 0.0038790 secs] Writing relations Sun Aug 09 11:11:14 CEST 2009 [GC 837769K->592513K(1423296K), 0.1517410 secs] Exception in thread "main" java.lang.IndexOutOfBoundsException: bitIndex < 0: -2147483627 at java.util.BitSet.set(BitSet.java:262) at uk.me.parabola.splitter.SplitParser.processRelation(SplitParser.java:189) at uk.me.parabola.splitter.SplitParser.startElement(SplitParser.java:96) at uk.me.parabola.splitter.AbstractXppParser.parse(AbstractXppParser.java:38) at uk.me.parabola.splitter.Main.writeAreas(Main.java:231) at uk.me.parabola.splitter.Main.split(Main.java:98) at uk.me.parabola.splitter.Main.main(Main.java:78) The splittter was called as follows: java -Xmx3800m -verbose:gc -jar ../splitter.jar --max-areas=100 --max-nodes=300000 ../../europe.osm With todays (uncompressed) Europe extract from geofabrik. I have tried to use java -Xmx3800m -verbose:gc -jar ../splitter.jar --max-areas=100 --max-nodes=300000 ../../europe.osm but after "Writing relations" the CPU was at 100% for an hour without reporting anything on the console nor writing any data to disk. I was waiting for a few months for updating my maps on the etrex... I thought already, now it will be possible again... Thanks a lot for fixing splitter. It seems to be really close! I'll try to report any problems I encounter with new versions ASAP. Paul -- Don't take life too seriously; you will never get out of it alive. -- Elbert Hubbard

On Aug 9, 2009, at 11:24, Paul Ortyl wrote:
I have just testet the latest svn version of splitter (r64) I get the following exception:
12,000,000 ways processed... [GC 926169K->330369K(1422720K), 0.0038790 secs] Writing relations Sun Aug 09 11:11:14 CEST 2009 [GC 837769K->592513K(1423296K), 0.1517410 secs] Exception in thread "main" java.lang.IndexOutOfBoundsException
Hm... I reported a similar exception using splitter with the Geofabrik Europe extract. In my case, the exception occurred in the bz2 code, as I was working with the compressed file. I was at least able to confirm that the bz2 file was corrupt; you seem to have successfully decompressed the file though. I still wonder if the problem is with Geofabrik, and not with splitter? Cheers.

I still wonder if the problem is with Geofabrik, and not with splitter?
I haven't started digging into this bug yet but superficially at least it does appear to be a problem with some of the new code in the splitter. Possibly the problem is triggered by malformed XML, but more likely I think is that I've just overlooked a corner-case somewhere :)

Hi Paul, Thanks for the detailed bug report, and sorry for the inconvenience this one has caused you. It looks like there's a bug in the new code for handling ways that span > 4 areas. I'll take a look shortly and hopefully have a fix later today. As an aside, I see you have set --max-areas=100. Given that you have a 4GB heap and --max-nodes at just 300000, you should easily be able to get away with the default of --max-areas=255. From my (limited) experiements so far, with a 4GB heap you shouldn't need to worry about reducing max-areas until your max-nodes gets up somewhere near 1,000,000. Setting it to a lower value will just cause the split to run slower. Chris
Hello!
I have just testet the latest svn version of splitter (r64) I get the following exception:
12,000,000 ways processed... [GC 926169K->330369K(1422720K), 0.0038790 secs] Writing relations Sun Aug 09 11:11:14 CEST 2009 [GC 837769K->592513K(1423296K), 0.1517410 secs] Exception in thread "main" java.lang.IndexOutOfBoundsException: bitIndex < 0: -2147483627 at java.util.BitSet.set(BitSet.java:262) at uk.me.parabola.splitter.SplitParser.processRelation(SplitParser.java:1 89) at uk.me.parabola.splitter.SplitParser.startElement(SplitParser.java:96) at uk.me.parabola.splitter.AbstractXppParser.parse(AbstractXppParser.java :38) at uk.me.parabola.splitter.Main.writeAreas(Main.java:231) at uk.me.parabola.splitter.Main.split(Main.java:98) at uk.me.parabola.splitter.Main.main(Main.java:78) The splittter was called as follows: java -Xmx3800m -verbose:gc -jar ../splitter.jar --max-areas=100 --max-nodes=300000 ../../europe.osm With todays (uncompressed) Europe extract from geofabrik.
I have tried to use java -Xmx3800m -verbose:gc -jar ../splitter.jar --max-areas=100 --max-nodes=300000 ../../europe.osm but after "Writing relations" the CPU was at 100% for an hour without reporting anything on the console nor writing any data to disk. I was waiting for a few months for updating my maps on the etrex... I thought already, now it will be possible again... Thanks a lot for fixing splitter. It seems to be really close! I'll try to report any problems I encounter with new versions ASAP.
Paul

Have a try with splitter r65, I've just checked in a fix that should solve your problem. I reran it against today's europe.osm and it completed successfully: .. 12,000,000 ways processed... Writing relations Sun Aug 09 14:19:27 BST 2009 50,000 relations processed... 100,000 relations processed... Wrote 118,698,421 nodes, 12,124,642 ways, 110,327 relations Time finished: Sun Aug 09 14:19:34 BST 2009 Total time taken: 1836s (I forgot to set --max-nodes=300000 to make it a fair comparison with your run, however I'm rerunning it with 300000 right now just to be absolutely sure the problem is fixed) Chris

Hmm, running with max-nodes=300000 on r65 now makes it further, but fails with another (related) problem. I'm going to try and find the problematic way(s)/relation(s) so I can make a test case. Please hold off on big splits with r65 for now.
Have a try with splitter r65, I've just checked in a fix that should solve your problem. I reran it against today's europe.osm and it completed successfully:
.. 12,000,000 ways processed... Writing relations Sun Aug 09 14:19:27 BST 2009 50,000 relations processed... 100,000 relations processed... Wrote 118,698,421 nodes, 12,124,642 ways, 110,327 relations Time finished: Sun Aug 09 14:19:34 BST 2009 Total time taken: 1836s (I forgot to set --max-nodes=300000 to make it a fair comparison with your run, however I'm rerunning it with 300000 right now just to be absolutely sure the problem is fixed)
Chris

2009/8/9 Chris Miller <chris.miller@kbcfp.com>:
Hmm, running with max-nodes=300000 on r65 now makes it further, but fails with another (related) problem. I'm going to try and find the problematic way(s)/relation(s) so I can make a test case. Please hold off on big splits with r65 for now.
Thanks! I am waiting for the green light from you then :) BTW: run with r64 with 'java -Xmx3800m -verbose:gc -jar ../splitter.jar --max-areas=100 --max-nodes=600000 ../../europe.osm' caught the same exception. Paul -- Don't take life too seriously; you will never get out of it alive. -- Elbert Hubbard

OK I've isolated the secondary problem and checked in a fix. Subtle bug, I was using a 1 (int) instead of a 1L (long) during some bitshifting and was suffering from overflow in certain situations. Sorry for any trouble this bug may have caused, hopefully it's all working now as I'd originally intended. One thing I noted during the testing of this fix is that setting max-nodes=300000 for europe.osm triggers a lot of "Node in too many areas" warnings, due to a fair number of nodes ending up in 5+ areas. I can partially or even completely fix this as long as the number of nodes suffering from this doesn't get too high (otherwise memory usage will go through the roof). "Partially fix" = nodes get written to all 5+ areas correctly but ways/rels only see 4 of the node's areas. No memory impact, very minor performance impact. Could print warnings about the affected nodes. "Completely fix" = nodes/ways/rels see all areas a node belongs to even if it's 5+. Slightly higher but still small performance impact, memory impact depends on the number of nodes in > 4 areas. If there are a lot of these nodes (say 1x10^5), memory starts taking quite a big hit (100s of MB). What are people's thoughts on this? Perhaps we'd just be better off recommending to increase --max-nodes until the problem goes away? Or I could cap the number of nodes that are allowed to be in 5+ areas, then start printing warnings? All comments appreciated, I don't have a lot of experience with how this might benefit/adversely affect mkgmap. Chris
Hmm, running with max-nodes=300000 on r65 now makes it further, but fails with another (related) problem. I'm going to try and find the problematic way(s)/relation(s) so I can make a test case. Please hold off on big splits with r65 for now.

Hi
One thing I noted during the testing of this fix is that setting max-nodes=300000 for europe.osm triggers a lot of "Node in too many areas" warnings, due to a fair number of nodes ending up in 5+ areas. I can partially or even completely fix this as long as the number of nodes suffering from this doesn't get too high (otherwise memory usage will go through the roof).
How does a node get to be in more than four areas? ..Steve

How does a node get to be in more than four areas?
..Steve
Good question, I was wondering that myself. It looks like with so few nodes per area, we end up with some very thin areas that for example result in the two areas on each side, plus two adjacent areas above, being included in the extended bounds/overlap if a node is in the centre of the thin area and close to the top. Here's an example of a node that wants to be in 5 areas: <node id="447665000" lat="46.0753181" lon="13.1930056" version="1" changeset="1930879" user="Stefano Salvador" uid="86130" visible="true" timestamp="2009-07-25T06:50:50Z"/> And here are the areas taken from areas.list, as generated with today's europe.osm file and running the splitter with --max-nodes=300000 (and a default overlap of 2000): 63240332: 2142208,608256 to 2148352,614400 # : 45.966797,13.051758 to 46.098633,13.183594 63240333: 2142208,614400 to 2146304,620544 # : 45.966797,13.183594 to 46.054688,13.315430 63240334: 2146304,614400 to 2148352,620544 # : 46.054688,13.183594 to 46.098633,13.315430 63240335: 2148352,608256 to 2154496,614400 # : 46.098633,13.051758 to 46.230469,13.183594 63240336: 2148352,614400 to 2154496,620544 # : 46.098633,13.183594 to 46.230469,13.315430 All the examples I've seen so far are hitting 5 areas max, but given the above I could imagine it's possible to get 6 or more too if there's an especially densely populated area on the map. Chris

Hi
Good question, I was wondering that myself. It looks like with so few nodes per area, we end up with some very thin areas that for example result in the two areas on each side, plus two adjacent areas above, being included in the extended bounds/overlap if a node is in the centre of the thin area and close to the top.
OK, if that is the case I was thinking that the overlap might be better as a percentage of the size of the area. In areas where there is a straight road that continues for miles and miles nodes might be widely spaced, but that is unlikely in densely mapped areas. The Garmin units have different sizes depending on direction and latitude, but 2000 is over 4km(?) and that might be too much in any circumstances.
Here's an example of a node that wants to be in 5 areas:
<node id="447665000" lat="46.0753181" lon="13.1930056" version="1"
That is an interesting area, as it appears that buildings are mapped so there is a much higher node density than normal, but only the main roads are there. ..Steve

Hey Steve
OK, if that is the case I was thinking that the overlap might be better as a percentage of the size of the area. In areas where there is a straight road that continues for miles and miles nodes might be widely spaced, but that is unlikely in densely mapped areas.
Makes some sense, though I don't suppose we'll know what it affects until we try. My suspicion is that it will introduce other problems (like dropping too many overlap nodes from just outside dense areas, or including far too many overlap nodes in a sparse area if a dense area falls just outside it). And even with a percentage based approach it is still possible (though less likely?) to see this problem. If I implement special handling for 5+ areas per node as described (perhaps with an upper limit/warning on the number of these special nodes) the problem goes away anyway. Another option might be to set the overlap in each of the 4 directions to Math.min(2000, smallestAdjacentTile<Width|Height> / 2 - delta). That should ensure a node can't fall into more than 4 areas.
The Garmin units have different sizes depending on direction and latitude, but 2000 is over 4km(?) and that might be too much in any circumstances.
Given that it's only when a relatively low number is given for --max-nodes that we run into any trouble, maybe it's best to leave at 2000 for now. Is there a reason why people would want to use such a low number? Smaller tiles so more flexibility about exactly what they upload to their Garmin perhaps?
Here's an example of a node that wants to be in 5 areas:
<node id="447665000" lat="46.0753181" lon="13.1930056" version="1"
That is an interesting area, as it appears that buildings are mapped so there is a much higher node density than normal, but only the main roads are there.
Yeah I saw that (and compared the area with what Google maps has). I suppose someone traced the buildings off some other source but not the streets for whatever reason. Chris

Hi On 10/08/09 10:52, Chris Miller wrote:
Given that it's only when a relatively low number is given for --max-nodes that we run into any trouble, maybe it's best to leave at 2000 for now. Is there a reason why people would want to use such a low number? Smaller tiles so more flexibility about exactly what they upload to their Garmin perhaps?
Well I'm just saying that there was no particular thought in choosing 2000 in the first place. For really old devices that have limited memory you may only be able to load a few tiles at a time and then it would be more flexible to have smaller tiles. But this is really from before OSM was started, I've never had anyone complain that tiles were too big, in fact quite the opposite. So ideally you want to be able to make the tiles as large as possible. However if you only look at the number of nodes then you find that you have to have it quite low to cope with one particular area when a higher value would have been fine everywhere else. ..Steve

Well I'm just saying that there was no particular thought in choosing 2000 in the first place.
I see. Obviously 2000 can't be working too badly since otherwise I'd assume more people would have complained by now :)
For really old devices that have limited memory you may only be able to load a few tiles at a time and then it would be more flexible to have smaller tiles. But this is really from before OSM was started, I've never had anyone complain that tiles were too big, in fact quite the opposite.
So ideally you want to be able to make the tiles as large as possible. However if you only look at the number of nodes then you find that you have to have it quite low to cope with one particular area when a higher value would have been fine everywhere else. ..Steve
Thanks for the explanation. Is there anywhere you know of where I can read more about what the known limits on tile sizes/content/quantities are? I've seen various comments about a 2025 map segment limit (is a map segment the same as a map tile?), a 2048MB limit (due to 32 bit indexing in the file format and/or file system?) on forums like these: http://garminoregon.wikispaces.com/message/view/home/10590340 http://forums.groundspeak.com/GC/index.php?showtopic=170615&st=54# But it's still not completely clear to me what's going on. In particular, what determines the maximum tile size? You're saying it's not the number of nodes, but perhaps the number of ways? Or is it even more complex than that, ie a factor of the node/way/relation count combined with the number of nodes per way and/or complexity of the ways? If the exact criteria was known then maybe we can come up with a better approach to choosing the area boundaries to split on. Thanks, Chris

Chris Miller <chris.miller@kbcfp.com> writes:
For really old devices that have limited memory you may only be able to load a few tiles at a time and then it would be more flexible to have smaller tiles. But this is really from before OSM was started, I've never had anyone complain that tiles were too big, in fact quite the opposite.
So ideally you want to be able to make the tiles as large as possible. However if you only look at the number of nodes then you find that you have to have it quite low to cope with one particular area when a higher value would have been fine everywhere else. ..Steve
Thanks for the explanation. Is there anywhere you know of where I can read more about what the known limits on tile sizes/content/quantities are? I've seen various comments about a 2025 map segment limit (is a map segment the same as a map tile?), a 2048MB limit (due to 32 bit indexing in the file format and/or file system?) on forums like these:
http://garminoregon.wikispaces.com/message/view/home/10590340 http://forums.groundspeak.com/GC/index.php?showtopic=170615&st=54#
But it's still not completely clear to me what's going on. In particular, what determines the maximum tile size? You're saying it's not the number of nodes, but perhaps the number of ways? Or is it even more complex than that, ie a factor of the node/way/relation count combined with the number of nodes per way and/or complexity of the ways?
If the exact criteria was known then maybe we can come up with a better approach to choosing the area boundaries to split on.
I suspect there are a bunch of limits in the img format and maybe in Garmin firmware that parses it and if you exceed any of them there is a problem. Certainly the ones above exist, but I wouldn't be surprised if there are more. For receivers with a 2GB uSD, I think one wants tiles pretty big. I have 2009 vintage Garmin proprietary maps, and all of New England is in 2 tiles, and the .img I think are about 25 MB each. I also have a 2002 or 2003 vintage receiver and proprietary map data, and that has tiles that are about 1-4MB. This lets me choose what I want to fit in the 19 MB internal memory. There are still some devices like that around and useful, so I can see a demand for ~3 MB tiles. But, for the 2GB types, tiles that are more like 25 MB seem better. There may also be an effect where smaller tiles makes the receiver draw maps faster, but I think the internal TRE scheme means that isn't true.

2009/8/10 Greg Troxel <gdt@ir.bbn.com>:
For receivers with a 2GB uSD, I think one wants tiles pretty big. I have 2009 vintage Garmin proprietary maps, and all of New England is in 2 tiles, and the .img I think are about 25 MB each. I also have a 2002 or 2003 vintage receiver and proprietary map data, and that has tiles that are about 1-4MB. This lets me choose what I want to fit in the 19 MB internal memory. There are still some devices like that around and useful, so I can see a demand for ~3 MB tiles. But, for the 2GB types, tiles that are more like 25 MB seem better.
Greg, is that map with the larger tiles in NT format? I've noticed that these tend to be bigger, and indeed, the devices that support this format are also newer and more powerful. Dermot -- -------------------------------------- Iren sind menschlich

Dermot McNally <dermotm@gmail.com> writes:
2009/8/10 Greg Troxel <gdt@ir.bbn.com>:
For receivers with a 2GB uSD, I think one wants tiles pretty big. I have 2009 vintage Garmin proprietary maps, and all of New England is in 2 tiles, and the .img I think are about 25 MB each. I also have a 2002 or 2003 vintage receiver and proprietary map data, and that has tiles that are about 1-4MB. This lets me choose what I want to fit in the 19 MB internal memory. There are still some devices like that around and useful, so I can see a demand for ~3 MB tiles. But, for the 2GB types, tiles that are more like 25 MB seem better.
Greg, is that map with the larger tiles in NT format? I've noticed that these tend to be bigger, and indeed, the devices that support this format are also newer and more powerful.
Yes, the 2003 is the old format (I think), and the larger tiles definitely NT. But, you can get an etrex without the uSD (why you would, I don't know, but you can) and those have I think 24MB of internal memory, which is not all that different from 19MB. I think the NT tiles will fit one of them in that, which is great unless you are on the border between two tiles. I don't know if NT is more or less space efficient. I can't imagine it's all that different.

Greg Troxel wrote:
Dermot McNally <dermotm@gmail.com> writes:
2009/8/10 Greg Troxel <gdt@ir.bbn.com>:
For receivers with a 2GB uSD, I think one wants tiles pretty big. I have 2009 vintage Garmin proprietary maps, and all of New England is in 2 tiles, and the .img I think are about 25 MB each. I also have a 2002 or 2003 vintage receiver and proprietary map data, and that has tiles that are about 1-4MB. This lets me choose what I want to fit in the 19 MB internal memory. There are still some devices like that around and useful, so I can see a demand for ~3 MB tiles. But, for the 2GB types, tiles that are more like 25 MB seem better.
Greg, is that map with the larger tiles in NT format? I've noticed that these tend to be bigger, and indeed, the devices that support this format are also newer and more powerful.
Yes, the 2003 is the old format (I think), and the larger tiles definitely NT. But, you can get an etrex without the uSD (why you would, I don't know, but you can) and those have I think 24MB of internal memory, which is not all that different from 19MB. I think the NT tiles will fit one of them in that, which is great unless you are on the border between two tiles.
I don't know if NT is more or less space efficient. I can't imagine it's all that different.
NT saves about 30% (look at City Navigator Classic vs City Navigator size). However it needs much more processing power on GPS to display. On etrex or 60CSx units you should allways use non NT maps. There are many pseudo NT maps around by Garmin however. (you can get the same effect with gmaptool "create pseudo NT map".). I don't know any advantage other than size of NT so I don't think it should be a high priority to decipher NT maps to teach mkgmap to write NT .img maps. Currently depending on the number of POI you can create maps up to around 30MB with mkgmap. Also most 3rd party .img viewers can't show NT maps. By now nearly all Garmin Units if using newest firmware can have 4GB big gmapsupp.img. With Oregon/Colorado/Nuvi and all other Garmin GPS building on the NUVI platform, the gps does not need the maps to be called gmapsupp.img but gmapsup1.img or even random names will work too. I becomes pointless therefore to be able to write mapsets bigger 2GB (City Navigator Classic Europe and newest CN are over 2GB, I don't know any other single map to be so large). One great thing for having a huge tile would be to use it to flash the basemap with OSM basemap. But then internal memory for basemap on old units is pretty small, and on NUVI platform gps basemap does not offer much advantage....
------------------------------------------------------------------------
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

An OpenStreetMap Garmin map is much larger: 3.45 GB is the size of the 231 tiles covering the whole world based on the planet file from July 23rd. Felix Hartmann wrote:
I becomes pointless therefore to be able to write mapsets bigger 2GB (City Navigator Classic Europe and newest CN are over 2GB, I don't know any other single map to be so large).
participants (8)
-
Chris Miller
-
Clinton Gladstone
-
Dermot McNally
-
Felix Hartmann
-
Greg Troxel
-
Lambertus
-
Paul Ortyl
-
Steve Ratcliffe