Coastline issues - analysis and possible solution

Hi list I spent the last 12 hours debugging sea generation problems in Europe. After digging through a lot of mkgmap and splitter code, I believe I understand the source of the issues now. Since data is processed in tiles, the sea generator will often encounter coastlines clipped at the tile boundaries. There are heuristics in the code that should produce valid sea multipolygons for tiles with incomplete coastlines as well. However, the heuristics fail for the Geofabrik extracts of several European countries. Here is what I believe the problem to be: The sea generator correctly identifies incomplete coastlines. It then attempts to determine for each such coastline whether it was clipped by a tile boundary. It is this test that fails for some countries, leading to missing or inverted sea. The source of this issue can be traced back to the splitter. Here, tile boundaries are expanded to coarse multiples of Garmin map units. Tiles therefore become larger than originally requested. If map data for the entire expanded area is available, this is OK. But when dealing with country extracts, there may be no data available for some of the expanded regions. Here is an ASCII art example, a rectangular extract with a coastline inside, shown as a double line: +---------+ | | | | |=========| | | +---------+ If this data is fed to mkgmap directly, the sea generator will correctly determine that the coastline was clipped by the left and right tile boundaries. If the data is passed through the splitter first, even if no splitting into multiple tiles should be needed, the tile boundaries are rounded to the multiples of Garmin map units mentioned above. The tile may thus grow: +-----------+ | | | | | | | ========= | | | | | +-----------+ The problem is that now, the coastline no longer touches the tile boundaries. The mkgmap sea generator is confused by this and produces invalid or no sea at all. An obvious but incorrect solution is for the sea generator to check whether the coastline reaches the *original* tile boundary, not the rounded one. This would require the splitter to pass the original boundary to mkgmap along with each tile. The real issue with this solution would be that even if mkgmap correctly detects a clipped coastline, there is no valid data between the original and rounded boundaries. To construct a proper sea multipolygon, mkgmap would have to synthesize a coastline in that space, likely introducing ugly artifacts. A correct solution, already implemented in mkgmap today, is to read coastlines from a separate file. The file should contain coastlines that extend at least as far as the rounded tile boundaries. From the mail archives, it appears that WanMil has identified the same issue before and proposed this very solution [1]. Coastlines for a larger area can be extracted using: osmosis \ --rb larger_area.osm.pbf \ --tf accept-ways natural=coastline \ --tf reject-relations \ --used-node \ --wb coastlines.osm.pbf omitmetadata=true The coaslines.osm.pbf file can then be fed to mkgmap via the --coastlinefile option. While WanMil described this solution in December 2010 already, it seems to have been buried in the mailing list without becoming common knowledge. Another mailing list post [2] shows that the --coastlinefile had been broken for a while without anyone noticing, confirming that it is not in common use. After spending 12 hours on this today, I hope that others will benefit from this write-up instead of having to repeat my odyssey. Anyone wanting to use the --coastlinefile option will require coastlines for a larger region than the extract they are processing. While the option does allow several files to be specified whose contents are then concatenated, it is easiest to load a single file with coastlines for a larger area. I have prepared two such files, one from today's Geofabrik europe.osm.pbf extract, the other from the most recent planet-latest.osm.pbf. Both files can be found at [3]. I am providing these files so that anyone working with extracts can download compact coastlines only instead of having to extract them from the huge europe.osm.pbf or planet-latest.osm.pbf files. Right now, the files were generated once-off. If they are found to be useful by others, I am happy to automate the process and make updated coastline files available on a regular basis. Finally, a word of warning: When using coastlines_planet.osm.pbf, mkgmap needs a *lot* of memory. I found that even for small tiles, 4.5GB of RAM are consumed by the Java process. With coastlines_europe.osm.pbf, the memory consumption is about 2.5GB. - Bartosz [1] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2010q4/009636.html [2] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2011q1/010138.html [3] http://www.fabianowski.eu/osm/coastlines/

My feeling with the problem of flooded tiles always was that the geofabrik extracts are to blame - they're too "tight" in some places, so the coastline breaks in the extraction process already, not during the splitting. That happens for instance in the northwest of germany around emden and in the northwest of india in gujarat. I generated poly files for these countries which simply extend roughly 10 km into the neighbouring countries and the broken coastlines vanish, even if the splitter splits the coastline in these places. Am 29.08.2011 01:45, schrieb Bartosz Fabianowski:
Hi list
I spent the last 12 hours debugging sea generation problems in Europe. After digging through a lot of mkgmap and splitter code, I believe I understand the source of the issues now.
Since data is processed in tiles, the sea generator will often encounter coastlines clipped at the tile boundaries. There are heuristics in the code that should produce valid sea multipolygons for tiles with incomplete coastlines as well. However, the heuristics fail for the Geofabrik extracts of several European countries. Here is what I believe the problem to be:
The sea generator correctly identifies incomplete coastlines. It then attempts to determine for each such coastline whether it was clipped by a tile boundary. It is this test that fails for some countries, leading to missing or inverted sea. The source of this issue can be traced back to the splitter. Here, tile boundaries are expanded to coarse multiples of Garmin map units. Tiles therefore become larger than originally requested. If map data for the entire expanded area is available, this is OK. But when dealing with country extracts, there may be no data available for some of the expanded regions.
Here is an ASCII art example, a rectangular extract with a coastline inside, shown as a double line:
+---------+ | | | | |=========| | | +---------+
If this data is fed to mkgmap directly, the sea generator will correctly determine that the coastline was clipped by the left and right tile boundaries.
If the data is passed through the splitter first, even if no splitting into multiple tiles should be needed, the tile boundaries are rounded to the multiples of Garmin map units mentioned above. The tile may thus grow:
+-----------+ | | | | | | | ========= | | | | | +-----------+
The problem is that now, the coastline no longer touches the tile boundaries. The mkgmap sea generator is confused by this and produces invalid or no sea at all.
An obvious but incorrect solution is for the sea generator to check whether the coastline reaches the *original* tile boundary, not the rounded one. This would require the splitter to pass the original boundary to mkgmap along with each tile. The real issue with this solution would be that even if mkgmap correctly detects a clipped coastline, there is no valid data between the original and rounded boundaries. To construct a proper sea multipolygon, mkgmap would have to synthesize a coastline in that space, likely introducing ugly artifacts.
A correct solution, already implemented in mkgmap today, is to read coastlines from a separate file. The file should contain coastlines that extend at least as far as the rounded tile boundaries. From the mail archives, it appears that WanMil has identified the same issue before and proposed this very solution [1]. Coastlines for a larger area can be extracted using:
osmosis \ --rb larger_area.osm.pbf \ --tf accept-ways natural=coastline \ --tf reject-relations \ --used-node \ --wb coastlines.osm.pbf omitmetadata=true
The coaslines.osm.pbf file can then be fed to mkgmap via the --coastlinefile option.
While WanMil described this solution in December 2010 already, it seems to have been buried in the mailing list without becoming common knowledge. Another mailing list post [2] shows that the --coastlinefile had been broken for a while without anyone noticing, confirming that it is not in common use.
After spending 12 hours on this today, I hope that others will benefit from this write-up instead of having to repeat my odyssey.
Anyone wanting to use the --coastlinefile option will require coastlines for a larger region than the extract they are processing. While the option does allow several files to be specified whose contents are then concatenated, it is easiest to load a single file with coastlines for a larger area.
I have prepared two such files, one from today's Geofabrik europe.osm.pbf extract, the other from the most recent planet-latest.osm.pbf. Both files can be found at [3]. I am providing these files so that anyone working with extracts can download compact coastlines only instead of having to extract them from the huge europe.osm.pbf or planet-latest.osm.pbf files.
Right now, the files were generated once-off. If they are found to be useful by others, I am happy to automate the process and make updated coastline files available on a regular basis.
Finally, a word of warning: When using coastlines_planet.osm.pbf, mkgmap needs a *lot* of memory. I found that even for small tiles, 4.5GB of RAM are consumed by the Java process. With coastlines_europe.osm.pbf, the memory consumption is about 2.5GB.
- Bartosz
[1] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2010q4/009636.html [2] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2011q1/010138.html [3] http://www.fabianowski.eu/osm/coastlines/ _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

On Mon, Aug 29, 2011 at 08:07:04AM +0200, michael lohr wrote:
My feeling with the problem of flooded tiles always was that the geofabrik extracts are to blame - they're too "tight" in some places, so the coastline breaks in the extraction process already, not during the splitting. That happens for instance in the northwest of germany around emden and in the northwest of india in gujarat.
The Geofabrik cutting polycons are not set in stone. A long time ago, I took a couple of iterations with Frederik Ramm to get a good extract of Finland that would include all of the country border, plus some lake multipolygons in the neighbour countries. I did this in Osmosis and JOSM by extracting the country borders from the Geofabrik extract and by downloading some data in JOSM, and finally editing the cutting polygon in a separate JOSM layer. It still takes a little tweaking to get the coastlines right. I have manually chosen the tile borders so that the coastline will end outside the tile border. Only in the Swedish/Finnish border I am using extend-sea-sectors to make up some coastline in Sweden. I think that it could be useful to have the tile-splitter support a set of fixed tiles that it would split itself further as needed. Best regards, Marko

It still takes a little tweaking to get the coastlines right. I have manually chosen the tile borders so that the coastline will end outside the tile border. Only in the Swedish/Finnish border I am using extend-sea-sectors to make up some coastline in Sweden.
This is why I collected the coastlines for all of Europe in a single file. When all European coastlines are available, you no longer have to tweak tile boundaries.
I think that it could be useful to have the tile-splitter support a set of fixed tiles that it would split itself further as needed.
I presume you would then use this so that you extract a larger region and then let the splitter generate tiles for a subset of this region? With a sufficient margin of safety between the two regions, this should work by providing the splitter with coastline data even when tiles grow due to rounding. - Bartosz

fixed tiles may produce better coastlines, but you'll get "map too big" at some point as the map grows. so you could either set a large safety margin and get a lot more tiles or you'd have to fix the tiles anew more often. Am 29.08.2011 09:08, schrieb Bartosz Fabianowski:
It still takes a little tweaking to get the coastlines right. I have manually chosen the tile borders so that the coastline will end outside the tile border. Only in the Swedish/Finnish border I am using extend-sea-sectors to make up some coastline in Sweden. This is why I collected the coastlines for all of Europe in a single file. When all European coastlines are available, you no longer have to tweak tile boundaries.
I think that it could be useful to have the tile-splitter support a set of fixed tiles that it would split itself further as needed. I presume you would then use this so that you extract a larger region and then let the splitter generate tiles for a subset of this region? With a sufficient margin of safety between the two regions, this should work by providing the splitter with coastline data even when tiles grow due to rounding.
- Bartosz _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

fixed tiles may produce better coastlines, but you'll get "map too big" at some point as the map grows.
The way I understand Marko's suggestion, you would give the splitter an initial list of tiles (or just a region) and it would then subdivide that into smaller tiles as it does today. The difference would be that you could provide the splitter with a large dataset (say all of Europe) while telling it to cut out tiles for a small region (say the Netherlands) only. This would essentially integrate the osmosis cutting step into the splitter. The advantage of this is that the splitter has data for the tiles it cuts out, including ones that expand when rounding. The disadvantages are a need to push around larger datasets, to need to specify tile boundaries each time the splitter is invoked and a loss of the UNIX principle that each tool handles one small thing and handles it well. - Bartosz

On Mon, Aug 29, 2011 at 10:08:26AM +0300, Bartosz Fabianowski wrote:
This is why I collected the coastlines for all of Europe in a single file. When all European coastlines are available, you no longer have to tweak tile boundaries.
There are a few drawbacks to that approach. First, you will have to download and process much more data than just the country extract. Second, if the continent extract was generated at a different time than the country extract and there have been edits to the coastline, your map might not correspond to any snapshot of the OSM database.
I think that it could be useful to have the tile-splitter support a set of fixed tiles that it would split itself further as needed.
I presume you would then use this so that you extract a larger region and then let the splitter generate tiles for a subset of this region?
Right, the idea is to make the combined bounding box of the splitter-generated tiles something else than a rectangle. If we ensure that all coastlines that were "cut open" in the Geofabrik extract will have their endpoints outside the splitter-generated coastline, the flooding should be gone.
With a sufficient margin of safety between the two regions, this should work by providing the splitter with coastline data even when tiles grow due to rounding.
Right. I wonder if Geofabrik or some other entity could start providing a little wider coastline extracts that are in sync with the Geofabrik country extracts. Boundary extracts could be nice too, but I suppose that the Geofabrik country extracts already include all the boundary=administrative lines of the country. Marko

First, you will have to download and process much more data than just the country extract.
This is why I shared my extracted coastlines. The extraction has to be done only once. Everyone else can just download the (relatively small) coastline files.
Second, if the continent extract was generated at a different time than the country extract and there have been edits to the coastline, your map might not correspond to any snapshot of the OSM database.
Absolutely. What I uploaded yesterday were snapshots based on the most recent Geofabrik Europe extract and the most recent full planet. If these prove useful, I will make the generation a batch job so that the coastlines updated whenever a new Geofabrik Europe extract or full planet are made.
Right, the idea is to make the combined bounding box of the splitter-generated tiles something else than a rectangle. If we ensure that all coastlines that were "cut open" in the Geofabrik extract will have their endpoints outside the splitter-generated coastline, the flooding should be gone.
There is no need to make the combined bounds non-rectangular. The simplest solution actually is a rectangle, one large enough to cover all splitter-generated tiles plus the padding added to these by boundary rounding.
I wonder if Geofabrik or some other entity could start providing a little wider coastline extracts that are in sync with the Geofabrik country extracts.
This is exactly what I volunteered to do. The wider regions I would extract coastlines for are Geofabrik Europe (daily) and planet (weekly). We have the server capacity and are happy to do it. Our website is not set up for it yet but I can use my private hosting for now. I am trying to gauge interest. If people are interested in such regular extracts, I will be happy to provide them.
Boundary extracts could be nice too, but I suppose that the Geofabrik country extracts already include all the boundary=administrative lines of the country.
This is an interesting point. The boundary files seem to be updated about once a month right now. If there is interest in such boundary extracts, I could see what I can do on our servers. - Bartosz

On Mon, Aug 29, 2011 at 07:08:44PM +0300, Bartosz Fabianowski wrote:
Absolutely. What I uploaded yesterday were snapshots based on the most recent Geofabrik Europe extract and the most recent full planet. If these prove useful, I will make the generation a batch job so that the coastlines updated whenever a new Geofabrik Europe extract or full planet are made.
This would be great. If you have the disk space, you might also keep N latest coastline extracts, so that when the coastline gets broken somewhere, a work-around of using an older extract exists.
Boundary extracts could be nice too, but I suppose that the Geofabrik country extracts already include all the boundary=administrative lines of the country.
This is an interesting point. The boundary files seem to be updated about once a month right now. If there is interest in such boundary extracts, I could see what I can do on our servers.
I will probably change my workflow so that I will generate the boundary data directly from the Geofabrik finland.osm.pbf, but I suppose it could be very useful for others. Right now, many boundaries inside Finland are either non-existent or broken. Marko

I have just uploaded today's coastlines_europe.osm.pbf. The time stamp matches the europe.osm.pbf file that this was created from. Right now, I still need to manually upload and fiddle with the timestamps. I am switching to a better hosting setup in a couple of days that will allow me to fully automate this.
If you have the disk space, you might also keep N latest coastline extracts, so that when the coastline gets broken somewhere, a work-around of using an older extract exists.
Good idea. I renamed the old extract and kept it around. I have 50GB of disk space, so I can keep a lot of extracts around. - Bartosz

Thanks for providing the coastline data. Regards Klaus -- View this message in context: http://gis.638310.n2.nabble.com/Coastline-issues-analysis-and-possible-solut... Sent from the Mkgmap Development mailing list archive at Nabble.com.

No problem. I have a batch job running that uploads fresh coastlines to [1] every day. I will move this to our company webspace somewhere underneath [2] eventually. - Bartosz [1] http://fabianowski.eu/osm/coastlines/ [2] http://labs.dobini.com/

On Oct 1, 2011, at 10:39, Bartosz Fabianowski wrote:
No problem. I have a batch job running that uploads fresh coastlines to [1] every day. I will move this to our company webspace somewhere underneath [2] eventually.
- Bartosz
By the way, I just used your planet extract with the --coastlinefile option for a map of Canada (Geofabrik extract). The results were vastly superior compared to the same map compiled without the coastline file. Thanks very much for your analysis and for providing the coastline data. I hope you continue to do so. :-) Cheers.

Glad I could help. The coastline files will definitely keep getting generated day by day. They will just move to our company site somewhere underneath dobini.com one day. But I will put in a redirect once that happens. - Bartosz

Boundary extracts could be nice too, but I suppose that the Geofabrik country extracts already include all the boundary=administrative lines of the country.
This is an interesting point. The boundary files seem to be updated about once a month right now. If there is interest in such boundary extracts, I could see what I can do on our servers.
- Bartosz
Up to now I have created and uploaded the boundary files manually once a month. An "automatic" service for this would be better. But: Each time I have created these extracts I have found several country borders that have been broken. So I repaired them before creating the same stuff again and ensured that the boundary extracts were uploaded only if the country information was complete. Regaring Geofabriks dumps: The europe dump does not contain the complete spanish border because some parts are in africa and they wanted to have continent specific dumps. Therefore I create my boundary extracts from a planet file, split it into 5 areas (america (north & south), africa, europe asia and australia-oceania), compile the boundary extracts and merge them afterwards. If you are interested I can send you the steps more in detail (with .poly files). WanMil

But: Each time I have created these extracts I have found several country borders that have been broken. So I repaired them before creating the same stuff again and ensured that the boundary extracts were uploaded only if the country information was complete.
I am happy to devote server capacity and bandwidth to this but I fear I would not have the time to maintain things manually. So I will stay away from boundaries for now. - Bartosz

Hi, Am 29.08.2011 08:07, schrieb michael lohr:
My feeling with the problem of flooded tiles always was that the geofabrik extracts are to blame - they're too "tight" in some places, so the coastline breaks in the extraction process already, not during the splitting. That happens for instance in the northwest of germany around emden and in the northwest of india in gujarat.
I generated poly files for these countries which simply extend roughly 10 km into the neighbouring countries and the broken coastlines vanish, even if the splitter splits the coastline in these places.
+1 I use my own polygon file to extract germany from the european extract. The geofabrik people didn't want to use this file, because they don't want to have the little part of Netherlands in the German extract.

I use my own polygon file to extract germany from the european extract.
It should be sufficient to extract coastlines using that larger polygon only. Of course once you are extracting, you can just use your own extracts throughout. But a combination of the Geofabrik extract for map data and your larger poly for coastlines should work equally well. - Bartosz

Hi, Am 29.08.2011 09:09, schrieb Bartosz Fabianowski:
I use my own polygon file to extract germany from the european extract.
It should be sufficient to extract coastlines using that larger polygon only. Of course once you are extracting, you can just use your own extracts throughout. But a combination of the Geofabrik extract for map data and your larger poly for coastlines should work equally well.
That's great. With this procedure I can use the german extract from geofabrik without the missing sea near Emden. Thanks for the hint. Greetings -- PGP Schlüssel: 311D1055 http://keyserver.pgp.com

That's great. With this procedure I can use the german extract from geofabrik without the missing sea near Emden.
You will still have to download all of Europe to extract the boundaries for your larger polygon. Or alternatively you can use the extracted boundaries I provided. - Bartosz

Hi, Am 29.08.2011 17:16, schrieb Bartosz Fabianowski:
That's great. With this procedure I can use the german extract from geofabrik without the missing sea near Emden.
You will still have to download all of Europe to extract the boundaries for your larger polygon. Or alternatively you can use the extracted boundaries I provided.
I would prefer the last one. Josef -- PGP Schlüssel: 311D1055 http://keyserver.pgp.com

Thanks, that's one person interested in automatically updated coastlines. I will watch the download logs over the next few days to see whether I should start generating coastlines daily. - Bartosz

geofabrik extracts are to blame - they're too "tight" in some places, so the coastline breaks in the extraction process already
In a sense, this is true. However, *no* polygon can ever be guaranteed to contain enough data. Even if Geofabrik used a bounding box instead of a bounding polygon so that data for the entire region containing the extract was available, coastlines would still break. It is the rounding of tile boundaries in the splitter that pushes them beyond the region of the original extract. The only robust solution is to extract coastlines separately, for a larger region of the map.
I generated poly files for these countries which simply extend roughly 10 km into the neighbouring countries and the broken coastlines vanish, even if the splitter splits the coastline in these places.
Do you use these poly files to extract coastlines only or do you use them for all your data? In the former case, you are doing what I found to be a good workaround. In the latter case, you are increasing the region for which coastlines are available while also increasing the region for which tiles are built. If the coastlines happen to work with the new regions, that is great - but it is coincidental, not guaranteed :(. - Bartosz

i'm starting to wonder if we're talking about 2 separate issues that lead to similar outcomes. coastlines break because: 1. they get broken in the extraction process because the poly is too tight 2. they get broken in the splitting process a separate coastline file would quite likely mend both issues in the same go by creating the coastline out of external data. only drawback: if the poly for extracting the country is too tight, then there'd be an empty area between the coast and the rest of the map data, so all the beach bars are missing. Am 29.08.2011 09:03, schrieb Bartosz Fabianowski:
geofabrik extracts are to blame - they're too "tight" in some places, so the coastline breaks in the extraction process already
In a sense, this is true. However, *no* polygon can ever be guaranteed to contain enough data. Even if Geofabrik used a bounding box instead of a bounding polygon so that data for the entire region containing the extract was available, coastlines would still break. It is the rounding of tile boundaries in the splitter that pushes them beyond the region of the original extract. The only robust solution is to extract coastlines separately, for a larger region of the map.
I generated poly files for these countries which simply extend roughly 10 km into the neighbouring countries and the broken coastlines vanish, even if the splitter splits the coastline in these places.
Do you use these poly files to extract coastlines only or do you use them for all your data? In the former case, you are doing what I found to be a good workaround. In the latter case, you are increasing the region for which coastlines are available while also increasing the region for which tiles are built. If the coastlines happen to work with the new regions, that is great - but it is coincidental, not guaranteed :(.
- Bartosz

i'm starting to wonder if we're talking about 2 separate issues
Indeed, it seems so :).
1. they get broken in the extraction process because the poly is too tight
You mean the bounding polygon is so tight that it actually clips away the coastlines? If so, the bounding polygon is simply wrong. It is no longer *bounding* but cutting into the country. This would definitely damage coastlines. And yes, you are right - this is not the problem I was referring to.
2. they get broken in the splitting process
This is what I meant. More specifically, the coastlines stay as they are but the map area increases so that there are suddenly regions in the map for which no coastline data is available, breaking the subsequent sea generation.
a separate coastline file would quite likely mend both issues in the same go by creating the coastline out of external data. only drawback: if the poly for extracting the country is too tight, then there'd be an empty area between the coast and the rest of the map data, so all the beach bars are missing.
Issue 1 described by you is one of missing data. A separate coastline file would be able to restore coastline data but would be unable to fill in the gaps, just as you say. I think that the only way to address issue 1 is by fixing the bounding polygon. If issue 1 happens with Geofabrik extracts, I am rather sure Fred would like to hear so he can fix the polygons used. For issue 2, which I found to be quite common throughout Europe, separate coastlines are a correct solution. This is what I suggested they be used for. - Bartosz

Hi, Am 29.08.2011 01:45, schrieb Bartosz Fabianowski:
Right now, the files were generated once-off. If they are found to be useful by others, I am happy to automate the process and make updated coastline files available on a regular basis.
Good idea. Josef -- PGP Schlüssel: 311D1055 http://keyserver.pgp.com

Bartosz, thanks for you detailed analysis!! I have never seen the splitter problem you've described. Maybe the overlap parameter of splitter prevented that because splitter puts all points in the tile that is either contained in the bounding box or contained in a overlap region with width "overlap" garmin units. If overlap is larger than the expansion you don't see this problem. If you go in such details I propose to make use of the GpxCreator class in the mkgmap source code. It has helped me very much to visualize the different steps during processing. You have to add some code to the lines where the GPX files should be created and with which data but that's not a big deal. The --coastlinefile option is a good way to be 100% sure that you get valid coastlines and no flooded tiles. But only if you use "confirmed" coastlines (so a file that is confirmed to be error free). You also pointed to a very big disadvantage: the memory usage is very HIGH, because the coastline file needs to be completey loaded in parallel. This might be fixed by adapting the bounds file algorithm to the coastline processing. A precompiler (that can be run on a system with very much memory) can compile coast tiles. These tiles can be used by mkgmap to fill the complete OSM tiles with this coastline information. The memory requirements and the processing time are low for this step. I don't have time to implement that but if someone likes to do that I can give hints where to start and which part of the boundary precompiler can be reused (with small changes). WanMil
Hi list
I spent the last 12 hours debugging sea generation problems in Europe. After digging through a lot of mkgmap and splitter code, I believe I understand the source of the issues now.
Since data is processed in tiles, the sea generator will often encounter coastlines clipped at the tile boundaries. There are heuristics in the code that should produce valid sea multipolygons for tiles with incomplete coastlines as well. However, the heuristics fail for the Geofabrik extracts of several European countries. Here is what I believe the problem to be:
The sea generator correctly identifies incomplete coastlines. It then attempts to determine for each such coastline whether it was clipped by a tile boundary. It is this test that fails for some countries, leading to missing or inverted sea. The source of this issue can be traced back to the splitter. Here, tile boundaries are expanded to coarse multiples of Garmin map units. Tiles therefore become larger than originally requested. If map data for the entire expanded area is available, this is OK. But when dealing with country extracts, there may be no data available for some of the expanded regions.
Here is an ASCII art example, a rectangular extract with a coastline inside, shown as a double line:
+---------+ | | | | |=========| | | +---------+
If this data is fed to mkgmap directly, the sea generator will correctly determine that the coastline was clipped by the left and right tile boundaries.
If the data is passed through the splitter first, even if no splitting into multiple tiles should be needed, the tile boundaries are rounded to the multiples of Garmin map units mentioned above. The tile may thus grow:
+-----------+ | | | | | | | ========= | | | | | +-----------+
The problem is that now, the coastline no longer touches the tile boundaries. The mkgmap sea generator is confused by this and produces invalid or no sea at all.
An obvious but incorrect solution is for the sea generator to check whether the coastline reaches the *original* tile boundary, not the rounded one. This would require the splitter to pass the original boundary to mkgmap along with each tile. The real issue with this solution would be that even if mkgmap correctly detects a clipped coastline, there is no valid data between the original and rounded boundaries. To construct a proper sea multipolygon, mkgmap would have to synthesize a coastline in that space, likely introducing ugly artifacts.
A correct solution, already implemented in mkgmap today, is to read coastlines from a separate file. The file should contain coastlines that extend at least as far as the rounded tile boundaries. From the mail archives, it appears that WanMil has identified the same issue before and proposed this very solution [1]. Coastlines for a larger area can be extracted using:
osmosis \ --rb larger_area.osm.pbf \ --tf accept-ways natural=coastline \ --tf reject-relations \ --used-node \ --wb coastlines.osm.pbf omitmetadata=true
The coaslines.osm.pbf file can then be fed to mkgmap via the --coastlinefile option.
While WanMil described this solution in December 2010 already, it seems to have been buried in the mailing list without becoming common knowledge. Another mailing list post [2] shows that the --coastlinefile had been broken for a while without anyone noticing, confirming that it is not in common use.
After spending 12 hours on this today, I hope that others will benefit from this write-up instead of having to repeat my odyssey.
Anyone wanting to use the --coastlinefile option will require coastlines for a larger region than the extract they are processing. While the option does allow several files to be specified whose contents are then concatenated, it is easiest to load a single file with coastlines for a larger area.
I have prepared two such files, one from today's Geofabrik europe.osm.pbf extract, the other from the most recent planet-latest.osm.pbf. Both files can be found at [3]. I am providing these files so that anyone working with extracts can download compact coastlines only instead of having to extract them from the huge europe.osm.pbf or planet-latest.osm.pbf files.
Right now, the files were generated once-off. If they are found to be useful by others, I am happy to automate the process and make updated coastline files available on a regular basis.
Finally, a word of warning: When using coastlines_planet.osm.pbf, mkgmap needs a *lot* of memory. I found that even for small tiles, 4.5GB of RAM are consumed by the Java process. With coastlines_europe.osm.pbf, the memory consumption is about 2.5GB.
- Bartosz
[1] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2010q4/009636.html [2] http://www.mkgmap.org.uk/pipermail/mkgmap-dev/2011q1/010138.html [3] http://www.fabianowski.eu/osm/coastlines/ _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Maybe the overlap parameter of splitter prevented that because splitter puts all points in the tile that is either contained in the bounding box or contained in a overlap region with width "overlap" garmin units. If overlap is larger than the expansion you don't see this problem.
I investigated whether the overlap parameter could fix this. The answer, unfortunately, is no. This problem occurs whenever a tile is expanded, due to rounding, beyond the boundary of the extract being used as input data. Even with the overlap parameter set to a large value, there is simply no data in the extract for the offending region. The only way to get this data is to take it from somewhere else, such as a separate coastline file covering a larger area.
If you go in such details I propose to make use of the GpxCreator class in the mkgmap source code.
Thanks for the pointer. I will use it in future investigations.
The --coastlinefile option is a good way to be 100% sure that you get valid coastlines and no flooded tiles. But only if you use "confirmed" coastlines (so a file that is confirmed to be error free).
I will be making available daily coastline extracts from now on. I will not be verifying that these are error-free. They will simply be coastlines extracted by osmosis. If the coastlines in Geofabrik's europe.osm.pbf are broken, so will be those in coastlines_europe.osm.pbf. My aim is to make the complete European coastlines available as an easy download. Fixing broken coastlines is another issue which I am not trying to address at this time.
You also pointed to a very big disadvantage: the memory usage is very HIGH, because the coastline file needs to be completey loaded in parallel.
This might be fixed by adapting the bounds file algorithm to the coastline processing.
I have an alternative idea: As I described, the problem is that coastlines for part of a tile may be missing. To fix this, there is no need to load a complete coastline file possibly covering the entire planet. Only the part of the coastline file overlapping the tile is needed. Memory consumption could be reduced to exactly that of normal tile processing by parsing the input data for a tile first and then loading only the ways and nodes from the coastline file that fall within the tile's boundary. This would require a reload of the coastline data for each tile. Another option would be to preload the coastline data for the combined bounding box of all tiles to be processed. - Bartosz
participants (7)
-
Bartosz Fabianowski
-
Clinton Gladstone
-
Josef Latt
-
Marko Mäkelä
-
michael lohr
-
toc-rox
-
WanMil