
At the moment I try to understand how the splitter works to make the mulitpolygon code splitter compatible. I have taken the OSM dump of austria from Jan 8th and run splitter on it (with the default settings). Splitter generated 6 tiles. So far so good. To get a visual feeling I wrote a simple gpx dump at the entry point of Osm5XmlHandler.endDocument(). That makes it easy to visualize the data (e.g. with JOSM). Attached you find the files: bbox.gpx: this is the bounding box of one tile 39612875.gpx, 45656928.gpx and 456456929.gpx: three ways taken from the tile. *.png : visualization of the ways including the bbox. The splitter homepage (http://www.mkgmap.org.uk/page/tile-splitter) tells me "Tiles join exactly with no overlap or gaps." Now my questions: * Why does the tile contain points of way 45656928 and 45656929? They don't intersect the bounding box and they are contained in the adjacent tiles. For me this does not comply to the splitter homepage statement. * Why does 39612875 contain some points within the bbox and some out of the bbox? I think because the points are exported for the two other ways? * But why are the points of way 39612875 (point 474857138 and 474857142) that lie next to but in the bounding box are not contained in the tile? The described behaviour of the tile splitter makes it complicated to implement a reasonable behaviour of the mulitpolygon code on the borders of the tiles. WanMil

W> At the moment I try to understand how the splitter works to make the W> mulitpolygon code splitter compatible. Hi WanMil, The statement "tiles join exactly with no overlap or gaps" is referring to the bounding boxes of the resultant tiles, not the nodes/ways/rels that end up in each tile. There is some deliberate overlap (see the --overlap parameter) that causes nodes/ways/rels to be retained even if they fall just outside a tile boundary, so that mkgmap can reliably cut ways exactly on the boundaries. It's quite possible that a way contains nodes that only fall into the overlap area and not the bounding box itself - due to memory & performance concerns there's currently no effort made in the splitter to remove such ways & nodes. The basic algorithm is this: 1) determine where the tile boundaries lie, based on the distribution of nodes in the osm file. None of these tiles will overlap. 2) for each node, determine which tile (or tiles) it belongs to. A node is considered to belong to a tile if it falls anywhere within the tile itself OR the overlap area. It's quite common for a single node to be assigned to four tiles - one real tile plus three overlap areas (or to even more tiles, if a tile is thinner than double the overlap). 3) for each way, lookup its nodes and find out what tile(s) they were assigned to. Assign the way to those tile(s) too. 4) for each relation, find out what tile(s) its nodes and ways were assigned to. Assign the relation to those tile(s) too. I realise this isn't perfect - nodes and ways that fall completely outside the tile are included when they don't need to be, and nodes that fall outside the overlap are excluded when we may not want them to be. There are also problems with ways that cross a tile without actually having any nodes contained within the tile (even including the overlap). I'm not sure how much of a problem these are in practice. Any suggestions you have for improvements would be welcome. Chris

I realise this isn't perfect - nodes and ways that fall completely outside the tile are included when they don't need to be, and nodes that fall outside the overlap are excluded when we may not want them to be. There are also problems with ways that cross a tile without actually having any nodes contained within the tile (even including the overlap). I'm not sure how much of a problem these are in practice.
Any suggestions you have for improvements would be welcome.
Chris
Chris, ok, I understand. For me it's no problem if points that fall outside the bounding box are included in the tile. They can be easily filtered. I have the following requirements: 1st and most important: All nodes that fall inside the bounding box should reliably be contained in the tile. In my previous mail I mentioned two nodes where this wasn't the case. 2nd: Ways should be included if they intersect the bounding box. For me it would be perfect if ways are included with all nodes. But it is also ok if all inner nodes and for all parts of the way the first node out of the bounding box is included. This enables me (and others) to calculate the intersection point with the bounding box. 3rd: The relation handling is ok. Include them if one or more nodes/ways are contained in the tile. WanMil

Hi WanMil, W> ok, I understand. W> For me it's no problem if points that fall outside the bounding box W> are included in the tile. They can be easily filtered. W> I have the following requirements: W> 1st and most important: All nodes that fall inside the bounding box W> should reliably be contained in the tile. In my previous mail I W> mentioned two nodes where this wasn't the case. Ah, I hadn't realised this from your original description, apologies for missing that. Indeed, this should not be happening. Would it be possible for you to make the OSM file you're using available, or send me a link to it? It's not obvious to me why nodes are being excluded so having the same OSM file as you will make debugging much easier. W> 2nd: Ways should be included if they intersect the bounding box. W> For me it would be perfect if ways are included with all nodes. But I've investigated doing this in the past however it's very tricky to do correctly without having a huge impact on performance and/or memory requirements. I do have one idea that might work without too much slowdown. If I was to build up a big disk index of all nodes (there is already a disk cache of node details, but it's not currently indexed), I could then lookup all nodes that weren't found in the current tile and write them out too. It's a reasonable amount of work to build the b-tree index efficiently but it's sort-of on the todo list already anyway since I'm aware of other uses for something like this too. W> it W> is also ok if all inner nodes and for all parts of the way the first W> node out of the bounding box is included. This enables me (and W> others) W> to calculate the intersection point with the bounding box. Sure, as I understand it that's why the overlap handling exists in the first place. As it currently stands it's not robust though, there are various corner cases where it gets things wrong (eg if the first node of a way that falls outside a tile also falls outside the overlap area then the way will appear to stop before the tile edge). I can think of a few ways to improve the splitter to help with this, will try to take a look once I find a bit of spare time. Cheers, Chris

W> ok, I understand. W> For me it's no problem if points that fall outside the bounding box W> are included in the tile. They can be easily filtered. W> I have the following requirements: W> 1st and most important: All nodes that fall inside the bounding box W> should reliably be contained in the tile. In my previous mail I W> mentioned two nodes where this wasn't the case.
Ah, I hadn't realised this from your original description, apologies for missing that. Indeed, this should not be happening. Would it be possible for you to make the OSM file you're using available, or send me a link to it? It's not obvious to me why nodes are being excluded so having the same OSM file as you will make debugging much easier.
Aah, I don't have any server where I can put the files. I have retried the splitting. http://download.geofabrik.de/osm/europe/austria.osm.bz2 from today (Jan 13th) and that gave the same results. This is my splitter call (v103):
java -Xmx950m -jar splitter.jar data/*.osm.bz2
These are the generated areas: # List of areas # Generated Wed Jan 13 19:44:27 CET 2010 # 63240001: 2174976,442368 to 2228224,581632 # : 46.669922,9.492188 to 47.812500,12.480469 63240002: 2164736,581632 to 2213888,659456 # : 46.450195,12.480469 to 47.504883,14.150391 63240003: 2213888,581632 to 2275328,659456 # : 47.504883,12.480469 to 48.823242,14.150391 63240004: 2160640,659456 to 2226176,798720 # : 46.362305,14.150391 to 47.768555,17.138672 63240005: 2226176,659456 to 2287616,724992 # : 47.768555,14.150391 to 49.086914,15.556641 63240006: 2226176,724992 to 2283520,802816 # : 47.768555,15.556641 to 48.999023,17.226563 The attached way 39612875 is still missing two nodes 474857138 and 474857142. I hope you can reproduce this. Otherwise I have to look for a server where I can upload the files (Does anyone have a temporary FTP-account for ~200MB? ;-)
W> it W> is also ok if all inner nodes and for all parts of the way the first W> node out of the bounding box is included. This enables me (and W> others) W> to calculate the intersection point with the bounding box.
Sure, as I understand it that's why the overlap handling exists in the first place. As it currently stands it's not robust though, there are various corner cases where it gets things wrong (eg if the first node of a way that falls outside a tile also falls outside the overlap area then the way will appear to stop before the tile edge).
I can think of a few ways to improve the splitter to help with this, will try to take a look once I find a bit of spare time.
I think it would improve the splitter files if the overlap is removed and the ways are bounding box complete plus first nodes outside the bounding box. Anyhow I was impressed how fast the splitter worked on my lame old computer. That's good work! WanMil

W> Aah, I don't have any server where I can put the files. I have W> retried the splitting. W> http://download.geofabrik.de/osm/europe/austria.osm.bz2 from today W> (Jan 13th) and that gave the same results. Thanks for the additional details. I've grabbed a copy of austria.osm.bz2 and will try to reproduce the problem when I get a chance in the next couple of days. W> I hope you can reproduce this. Otherwise I have to look for a server W> where I can upload the files (Does anyone have a temporary W> FTP-account for ~200MB? ;-) I'm sure I can; even if not I can always set up an FTP account for you to upload to directly. W> I think it would improve the splitter files if the overlap is removed W> and the ways are bounding box complete plus first nodes outside the W> bounding box. Agreed however it's difficult to do that robustly without big performance penalties. The overlap is a bit of a hack that captures most of those first nodes (and even ways that cross the tile without actually having a node within the tile) without having much impact on the split. W> Anyhow I was impressed how fast the splitter worked on my lame old W> computer. That's good work! Thanks. I still have a few more ideas for improving and speeding it up further, it's just a matter of finding more time to implement them.

Hi WanMil, W> The attached way 39612875 is still missing two nodes 474857138 and W> 474857142. I've just taken a look at this. Those two nodes don't appear in the austria.osm file at all, so it's not too surprising they're not in the splitter output either :) Regards, Chris

Hi WanMil,
W> The attached way 39612875 is still missing two nodes 474857138 and W> 474857142.
I've just taken a look at this. Those two nodes don't appear in the austria.osm file at all, so it's not too surprising they're not in the splitter output either :)
Regards, Chris
Oh, so that's an OSMOSIS problem or the austria shape used by OSMOSIS has some shortcuts. Thanks a lot for your investigation! WanMil

I've investigated doing this in the past however it's very tricky to do correctly without having a huge impact on performance and/or memory requirements. I do have one idea that might work without too much slowdown. If I was to build up a big disk index of all nodes (there is already a disk cache of node details, but it's not currently indexed), I could then lookup all nodes that weren't found in the current tile and write them out too. It's a reasonable amount of work to build the b-tree index efficiently but it's sort-of on the todo list already anyway since I'm aware of other uses for something like this too.
You know that there are free implementations for b-trees? Look e.g. at http://bplusdotnet.sourceforge.net/ which offers you a java class, which includes storage to a file stream. Regards, Johann

2nd: Ways should be included if they intersect the bounding box. For me it would be perfect if ways are included with all nodes. But it is also ok if all inner nodes and for all parts of the way the first node out of the bounding box is included. This enables me (and others) to calculate the intersection point with the bounding box.
You don't have to reinvent the wheel. This code is already in the LineClipper and AreaClipper/PolygonClipper classes. As far as I know, it is called at import of osm files. Looking at it, I wonder why there are problems. The poylgonClipper does an java.awt.geom.Area.intersect(bounding box). I would expect it to return an closed polygon. So why are there incoomplete polygons?? Regards, Johann

2nd: Ways should be included if they intersect the bounding box. For me it would be perfect if ways are included with all nodes. But it is also ok if all inner nodes and for all parts of the way the first node out of the bounding box is included. This enables me (and others) to calculate the intersection point with the bounding box.
You don't have to reinvent the wheel.
I don't want to!
This code is already in the LineClipper and AreaClipper/PolygonClipper classes. As far as I know, it is called at import of osm files.
Good, I didn't know the AreaClipper and PolygonClipper classes. Lot's of my questions are caused by my little knowledge of the mkgmap classes and their interaction. What do you mean with "at import of osm files"? The Osm5XmlHandler does not clip things, does it?
Looking at it, I wonder why there are problems. The poylgonClipper does an java.awt.geom.Area.intersect(bounding box). I would expect it to return an closed polygon. So why are there incoomplete polygons??
The incomplete polygons are in the raw tile file. This data is parsed by the Osm5XmlHandler and afterwards processed by the MultipolygonRelation code. At this unclipped stage there are incomplete polygons. After a short review of the PolygonClipper code I think it has the same problem like the PolygonSplitter code. In case a polygon has an inner hole which is constructed with two overlapping lines connecting the outer polygon with the inner hole then the overlapping lines are removed and the inner hole is handled as independent polygon. I don't know if that's a problem. WanMil

Hi WanMil,
What do you mean with "at import of osm files"? The Osm5XmlHandler does not clip things, does it?
Osm5XmlHandler does a "soft clip" - what I mean by that is it adds nodes to ways where they intersect the tile's boundaries. This is only for lines, not areas. By adding those boundary nodes there, it can avoid generating short routing arcs when the ways are really (hard) clipped later in StyledConverter. To be honest, the boundary between Osm5XmlHandler and StyledConverter has become rather blurred over time (mostly my fault) and it really could do with refactoring. Still, it works tolerably well so it's not too high on the list of stuff to rewrite! Cheers, Mark

1) determine where the tile boundaries lie, based on the distribution of nodes in the osm file. None of these tiles will overlap. 2) for each node, determine which tile (or tiles) it belongs to. A node is considered to belong to a tile if it falls anywhere within the tile itself OR the overlap area. It's quite common for a single node to be assigned to four tiles - one real tile plus three overlap areas (or to even more tiles, if a tile is thinner than double the overlap). 3) for each way, lookup its nodes and find out what tile(s) they were assigned to. Assign the way to those tile(s) too. 4) for each relation, find out what tile(s) its nodes and ways were assigned to. Assign the relation to those tile(s) too.
I realise this isn't perfect - nodes and ways that fall completely outside the tile are included when they don't need to be, and nodes that fall outside the overlap are excluded when we may not want them to be. Couldn't you ignore the overlapping region completely and insert instead a step 3a) After lookup the nodes of a way, add them to all touched tiles too.
E.g. nodes 11,12,13 assigned to tiles 1,2,3. While processing the line you detect, that the line belongs to tile 1,2,3 and add the line to the tiles. At this time you can see, that the node 12 belongs to tiles 1,2,3 but is assigned to tile 2 only. So assign them additionally to tile 1 and 3. (I don't know if this is possible, or if there are readonly cache structures) This should not increase resources much. With this alway the complete line is included into the tiles. There will not be missing nodes.
There are also problems with ways that cross a tile without actually having any nodes contained within the tile (even including the overlap). I'm not sure how much of a problem these are in practice.
If the line crosses an corner of a tile, it will be clipped at the boundaries or the other tiles. So there will be a small gap exactly in the corner. I have never seen such a thing and think this is neglectable. (On the other hand, if this gap is a piece of a motorway, routing will became a little unexpected)
Any suggestions you have for improvements would be welcome.
Chris
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev .

JG> Couldn't you ignore the overlapping region completely and insert JG> instead a step 3a) After lookup the nodes of a way, add them to all JG> touched tiles too. Sort of... this is roughly the approach I'm considering, but it's not as straightforward as it first appears. The problem is that it's not possible to hold all the information required in the XML file in memory for each node. Currently that's not a problem; the first pass of the splitter figures out what the resultant split areas will be and the second pass scans through the source OSM file (or the disk cache), reads in each node and writes all of its info to the appropriate output file. But as soon as you need to write out additional nodes during processing of the ways, the following happens: 1) Nodes and ways get intermingled in the output OSM file. I don't think this is a problem at all since there still won't be any forward references from eg ways to nodes, but currently the splitter writes out <node/> tags first, then <way/> tags, then <rel/> tags so maybe this will break/annoy something/someone? 2) We have to somehow find out all the information about the additional node so we can write it out as XML. This is the killer. It means the cache becomes required rather than optional because it's impractical to rescan the source OSM file, and a disk-based index into the cache is required because even custom code mapping from node ID to cache index (long) takes up several GB of memory when splitting a large OSM file. There will be a lot of these lookups required, so a very fast index with minimum disk access is essential. My intention is to experiment with a b-tree variant that uses node IDs as keys (the example you posted uses Strings for both keys and values so isn't very suitable), and additionally is able to take some advantage of the fact that sequential node IDs are strongly correlated to their coordinates (and hence the tile they'll end up in). Careful prefetch caching will make a huge difference here. Another optimisation will be to cache either all the node data (or at least the IDs, coordinates, and cache index) in memory for nodes that fall into the overlap area of each tile since these will almost always be the nodes we're most interested in when processing the ways. Only if we can't find the node in that cache would we need to hit the disk. The downside is that this could add significant memory overhead to an already strained heap. An unrelated improvement I'd also like to build is an R-tree (in memory) of the split tiles to further speed mapping nodes to areas. JG> If the line crosses an corner of a tile, it will be clipped at the JG> boundaries or the other tiles. So there will be a small gap exactly JG> in the corner. I have never seen such a thing and think this is JG> neglectable. (On the other hand, if this gap is a piece of a JG> motorway, routing will became a little unexpected) You don't see this happen currently because the overlap almost always catches both of the nodes that fall outside the tile. Without the overlap, for each line segment in a way that crossed a tile boundary we'd need to test whether it clipped another tile or not. As far as I can see, this test would be required even if I implement all the above but disable the current overlap fucntionality. Hope that wasn't too much of a waffle and helps explain where I see this heading. Chris

For what it is worth, my 2 cents: On Thu, Jan 14, 2010 at 10:36:54AM +0000, Chris Miller wrote:
1) Nodes and ways get intermingled in the output OSM file. I don't think this is a problem at all since there still won't be any forward references from eg ways to nodes, but currently the splitter writes out <node/> tags first, then <way/> tags, then <rel/> tags so maybe this will break/annoy something/someone?
Relations can contain forward references to member relations (subrelations). The forward references are unavoidable in the general case, because there may be circular references, for example. Even small map extracts downloaded by JOSM may contain forward subrelation references even for tree-like relation hierarchies. I implemented deferred lookup of subrelations in mkgmap's OSM XML parser. The parser still throws away forward references to nodes and ways. It has been pointed out earlier that JOSM refuses to load the files from splitter. Sometimes, it might be useful to load files to JOSM for troubleshooting. But JOSM cannot load big files anyway. Could there be some filter for "normalizing" the output? And another filter for throwing out uninteresting elements or attributes, so that the tile becomes small enough for JOSM? Perhaps something like this already exists? Best regards, Marko

Hi Marko, MM> Relations can contain forward references to member relations MM> (subrelations). The forward references are unavoidable in the MM> general case, because there may be circular references, for example. MM> Even small map extracts downloaded by JOSM may contain forward MM> subrelation references even for tree-like relation hierarchies. Currently the splitter throws away references between relations. It's on my todo list to fix - initially I thought this would be quite tricky to support efficiently but I have since realised I can solve it without too much trouble. I hadn't realised that forward and even circular references were common though so thanks for pointing that out, I'll need to bear that in mind when I add support for it. I don't think it'll be too much of a problem to handle given there are far fewer relations than there are nodes. MM> I implemented deferred lookup of subrelations in mkgmap's OSM XML MM> parser. The parser still throws away forward references to nodes and MM> ways. Do you know of any cases where ways or rels have forward references to nodes, ie have you seen that happen in the wild? I think the splitter can actually handle these sorts of forward references now, as long as the cache is used. The split files that are output won't have any forward references. I guess this means that "splitting" one of these files to a single output file would clean it up so mkgmap could then use it :) MM> It has been pointed out earlier that JOSM refuses to load the files MM> from splitter. Sometimes, it might be useful to load files to JOSM MM> for troubleshooting. But JOSM cannot load big files anyway. Could Hmm, do you know why it can't load them? Maybe it's an easy fix. I'll add it to the todo list to investigate. MM> there be some filter for "normalizing" the output? And another MM> filter for throwing out uninteresting elements or attributes, so MM> that the tile becomes small enough for JOSM? Perhaps something like MM> this already exists? Nothing like this already exists that I know of but it's something I've thought about in the past (with the aim of generating more appropriate tile sizes). I guess ideally the splitter could take a filter from mkgmap and split based on the filtered data. Unfortunately I suspect it will still be quite difficult to throw away nodes that are no longer referenced because the only way(s) and/or rel(s) that were referring to those nodes were filtered out. Chris

Hi Chris,
I hadn't realised that forward and even circular references were common though so thanks for pointing that out, I'll need to bear that in mind when I add support for it.
I don't know about circular references in practice. I ran into the forward references when learning about defining stops in bus route relations. I learned that in some applications, role=stop members of type=route relations should be nodes on the ways that constitute the route, while in others it would be useful to have POIs for passenger wait areas (next to the street) as members in the relation. I further understood that the preferred way is by subrelations, like I did for this short ferry line that you might call a water bus: http://www.openstreetmap.org/browse/relation/155054 (The passenger wait areas are tagged as highway=bus_stop nodes.) The route relation has type=site subrelations in role=stop. Each type=site subrelation has two nodes, one in role=vehicle and another in role=passengers. (The latter two role names were not mentioned in the wiki, I made them up.) I downloaded the area around the above mentioned relation a few times in JOSM. Each time, the type=route relation was defined before its type=site subrelations.
MM> I implemented deferred lookup of subrelations in mkgmap's OSM XML MM> parser. The parser still throws away forward references to nodes and MM> ways.
Do you know of any cases where ways or rels have forward references to nodes, ie have you seen that happen in the wild? I think the splitter can actually handle these sorts of forward references now, as long as the cache is used.
Sorry, no. I don't even know if the OSM XML schema (if one exists; I was unable to find a copy) allows node, way, relation in an arbitrary order within the osm element.
MM> It has been pointed out earlier that JOSM refuses to load the files MM> from splitter. Sometimes, it might be useful to load files to JOSM MM> for troubleshooting. But JOSM cannot load big files anyway. Could
Hmm, do you know why it can't load them? Maybe it's an easy fix. I'll add it to the todo list to investigate.
I have not investigated, but it could be as simple as replacing version='0.5' with version='0.6' in the osm element.
MM> there be some filter for "normalizing" the output? And another MM> filter for throwing out uninteresting elements or attributes, so MM> that the tile becomes small enough for JOSM? Perhaps something like MM> this already exists?
Nothing like this already exists that I know of but it's something I've thought about in the past (with the aim of generating more appropriate tile sizes). I guess ideally the splitter could take a filter from mkgmap and split based on the filtered data. Unfortunately I suspect it will still be quite difficult to throw away nodes that are no longer referenced because the only way(s) and/or rel(s) that were referring to those nodes were filtered out.
Yes, you would have to store all nodes and flag those that are referenced by way(s) or rel(s), and not start outputting nodes until all the input has been consumed. Marko

Thank you for the detailed explanation.
1) Nodes and ways get intermingled in the output OSM file. I don't think this is a problem at all since there still won't be any forward references from eg ways to nodes, but currently the splitter writes out <node/> tags first, then <way/> tags, then <rel/> tags so maybe this will break/annoy something/someone?
This is true. Maybe it could be worked around, if you write to three separate files and append them afterwards.
participants (5)
-
Chris Miller
-
Johann Gail
-
Mark Burton
-
Marko Mäkelä
-
WanMil