Fun with splitter tile descriptions
data:image/s3,"s3://crabby-images/c43df/c43df9cc4edc536b01f34bf1bdf12f0d54a2bbd5" alt=""
I found it somewhat annoying that my tiles always had identical generic descriptions such as "OSM Map". It made it very difficult to recognise which tiles belonged to which areas, in particular when attempting to select specific tiles on my GPSr. Since my maps could have over 200 tiles, it was tedious to add descriptive names manually. To solve this, I performed the following small hack, which might be interesting to others with a similar predicament: - I created a Perl script which read the tile boundaries from the areas.list file generated by splitter. - For each bounding box, I called a webservice from GeoNames to retrieve a list of populated areas (cities, etc.) within the tile. - For each populated area within the tile, I called another webservice to determine the population of that area. - I determined the city with the largest population, and then wrote the ISO country code and name of the city to the description parameter in template.args. To my astonishment, this worked rather well. Here is an excerpt from the resulting template.args file: [...] mapname: 23000223 description: EE-Tartu input-file: 23000223.osm.gz mapname: 23000224 description: FI-Espoo input-file: 23000224.osm.gz mapname: 23000225 description: FI-Helsinki input-file: 23000225.osm.gz mapname: 23000226 description: RU-Saint-Petersburg input-file: 23000226.osm.gz [...] A description of the webservices I used is at http://www.geonames.org/export/ I used the GeoNames Perl client module to parse the webservice results; this may not have been the most efficient. There are of course many other ways of doing this, and other services and data which can be used. Perhaps this will inspire some of you to create a better solution. Cheers.
data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
Nice idea! I'll put that on the todo list for incorporating into the splitter, sounds like a very useful feature to add. What might be even better is if I use those webservices to generate a data file that can be distributed with the splitter itself, so people don't need to have an internet connection when splitting. The static lookup could also be customised to suit individual needs if so desired. CG> I found it somewhat annoying that my tiles always had identical CG> generic descriptions such as "OSM Map". It made it very difficult to CG> recognise which tiles belonged to which areas, in particular when CG> attempting to select specific tiles on my GPSr. Since my maps could CG> have over 200 tiles, it was tedious to add descriptive names CG> manually. To solve this, I performed the following small hack, which CG> might be interesting to others with a similar predicament: CG> CG> - I created a Perl script which read the tile boundaries from the CG> areas.list file generated by splitter. CG> CG> - For each bounding box, I called a webservice from GeoNames to CG> retrieve a list of populated areas (cities, etc.) within the tile. CG> CG> - For each populated area within the tile, I called another CG> webservice to determine the population of that area. CG> CG> - I determined the city with the largest population, and then wrote CG> the ISO country code and name of the city to the description CG> parameter in template.args. CG> CG> To my astonishment, this worked rather well. CG> CG> Here is an excerpt from the resulting template.args file: CG> CG> [...] CG> mapname: 23000223 CG> description: EE-Tartu CG> input-file: 23000223.osm.gz CG> mapname: 23000224 CG> description: FI-Espoo CG> input-file: 23000224.osm.gz CG> mapname: 23000225 CG> description: FI-Helsinki CG> input-file: 23000225.osm.gz CG> mapname: 23000226 CG> description: RU-Saint-Petersburg CG> input-file: 23000226.osm.gz CG> [...] CG> A description of the webservices I used is at CG> http://www.geonames.org/export/ I used the GeoNames Perl client CG> module to parse the webservice results; this may not have been the CG> most efficient. CG> CG> There are of course many other ways of doing this, and other CG> services and data which can be used. Perhaps this will inspire some CG> of you to create a better solution. CG> CG> Cheers. CG>
data:image/s3,"s3://crabby-images/c43df/c43df9cc4edc536b01f34bf1bdf12f0d54a2bbd5" alt=""
On Wed, Sep 9, 2009 at 4:02 PM, Chris Miller<chris.miller@kbcfp.com> wrote:
Nice idea! I'll put that on the todo list for incorporating into the splitter, sounds like a very useful feature to add. What might be even better is if I use those webservices to generate a data file that can be distributed with the splitter itself, so people don't need to have an internet connection when splitting. The static lookup could also be customised to suit individual needs if so desired.
GeoNames does provide downloadable files for offline use. (Many people seem to import the data into a MySQL or PostgreSQL DB for faster access and queries.) See http://download.geonames.org/export/ The complete allCountries.zip file is about 175MB though. I wonder if the splitter could perform this analysis while splitting the OSM data? The data should all be in the relevant .osm file. Of course if this significantly reduces performance or increases memory requirements, then it's not a good idea. Cheers.
data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
Well I wasn't planning on bundling the whole thing. My idea was to create a grid that's the same resolution as the splitter's density map (typically 8192x4096), I only need to store some simple summary data (a reference to the predominant country/city, plus perhaps a weighting based on population) for each grid square and use that to figure out what each tile should be called based on the grid squares it covers. That still works out to be quite a lot of data (70MB+) but it can probably be made coarser still, it'll compress extremely well, and the whole thing can be generated from code as a one-off job rather than shipped with the splitter. I don't think performance will be a problem, it shouldn't take too long to process and a lot of it could probably be done in background thread(s). Memorywise, even the most memory hungry implementation would only add ~100MB to the first stage of the split which isn't an issue now. Plus of course it would be optional anyway. It's an interesting point you make about extracting the data from the osm file instead. I can't see that being nearly as reliable however, plus as you suspected it would put a larger burden on the splitter in terms of both complexity and performance. On nice thing about handling it via webservice-derived lookups is that the generated lookup table could also be used by a 3rd party area editing tool, in much the same way as the splitter's density maps could be. Having a good density map and country/city map of the planet together open up a lot of possibilities for both automated and manual generation of split areas beyond what raw osm files can provide. Chris CG> GeoNames does provide downloadable files for offline use. (Many CG> people seem to import the data into a MySQL or PostgreSQL DB for CG> faster access and queries.) See http://download.geonames.org/export/ CG> CG> The complete allCountries.zip file is about 175MB though. CG> CG> I wonder if the splitter could perform this analysis while splitting CG> the OSM data? The data should all be in the relevant .osm file. Of CG> course if this significantly reduces performance or increases memory CG> requirements, then it's not a good idea. CG> CG> Cheers.
data:image/s3,"s3://crabby-images/c43df/c43df9cc4edc536b01f34bf1bdf12f0d54a2bbd5" alt=""
On Wed, Sep 9, 2009 at 5:07 PM, Chris Miller<chris.miller@kbcfp.com> wrote:
Well I wasn't planning on bundling the whole thing. My idea was to create a grid that's the same resolution as the splitter's density map (typically 8192x4096),
This would be very cool. I'm in favour! ;-) Cheers.
data:image/s3,"s3://crabby-images/65b66/65b66aedfb8c69a1feef42153928d1d262ea0abd" alt=""
I played with an similar thing some time ago, a year or so. My algorithm was not searching for the biggest city, but counted the 'is_in' tags of the osm data. The idea behind it was that should scale with different resolutions. With big tiles it should name the tiles with the names of boundaries, if I break it down into small tiles it should take the names of the cities and villages. With statistical probability in a big city exists much more is_in tags as in a small village. This was inedependent of the geonames service and fully based on the osm data. As far as I can remind, the results worked in general, but the code does not fit well into the splitter. I would have to search on my harddrive, if the code lays around somewhere. Regards, Johann Clinton Gladstone schrieb:
estI found it somewhat annoying that my tiles always had identical generic descriptions such as "OSM Map". It made it very difficult to recognise which tiles belonged to which areas, in particular when attempting to select specific tiles on my GPSr. Since my maps could have over 200 tiles, it was tedious to add descriptive names manually. To solve this, I performed the following small hack, which might be interesting to others with a similar predicament:
- I created a Perl script which read the tile boundaries from the areas.list file generated by splitter.
- For each bounding box, I called a webservice from GeoNames to retrieve a list of populated areas (cities, etc.) within the tile.
- For each populated area within the tile, I called another webservice to determine the population of that area.
- I determined the city with the largest population, and then wrote the ISO country code and name of the city to the description parameter in template.args.
To my astonishment, this worked rather well.
Here is an excerpt from the resulting template.args file:
[...] mapname: 23000223 description: EE-Tartu input-file: 23000223.osm.gz
mapname: 23000224 description: FI-Espoo input-file: 23000224.osm.gz
mapname: 23000225 description: FI-Helsinki input-file: 23000225.osm.gz
mapname: 23000226 description: RU-Saint-Petersburg input-file: 23000226.osm.gz [...]
A description of the webservices I used is at http://www.geonames.org/export/ I used the GeoNames Perl client module to parse the webservice results; this may not have been the most efficient.
There are of course many other ways of doing this, and other services and data which can be used. Perhaps this will inspire some of you to create a better solution.
Cheers. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev .
data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
Thanks Johann, this sounds like quite a reasonable approach too. It shouldn't be too hard to add in to the splitter (eg by tacking the info on to the density map, or by holding on to the is_in data until the areas are known then calculating the names as a separate step). Looking at a few osm files the is_in tags and their values seem inconsistent at best though so I don't know how easy it would be to get sensible/consistent data from them. Another possible issue is that osm files without the tags wouldn't work at all. If you do have some code that deals with filtering/sanitising the is_in data I'd be interested to see it however as it sounds like it would be worth investigating further. The approach I'm currently looking at is using a geonames file (eg cities15000.zip) from http://download.geonames.org/export/dump/ to decide which country and city is the most predominant one in each tile. The advantage is it should be very fast and accurate, won't require an internet connection (aside from the initial download of the file), and can be tailored by people if they use custom input files. The downside is the minor hassle involved in dealing with that extra file. Depending on how well this approach works in practice and what feedback I get, I might look at making the automatic tile naming more configurable/plugable at a later date. Chris JG> I played with an similar thing some time ago, a year or so. My JG> algorithm was not searching for the biggest city, but counted the JG> 'is_in' tags of the osm data. The idea behind it was that should JG> scale with different resolutions. With big tiles it should name the JG> tiles with the names of boundaries, if I break it down into small JG> tiles it should take the names of the cities and villages. With JG> statistical probability in a big city exists much more is_in tags as JG> in a small village. JG> JG> This was inedependent of the geonames service and fully based on the JG> osm data. JG> JG> As far as I can remind, the results worked in general, but the code JG> does not fit well into the splitter. I would have to search on my JG> harddrive, if the code lays around somewhere. JG> JG> Regards, JG> Johann JG> Clinton Gladstone schrieb:
data:image/s3,"s3://crabby-images/c43df/c43df9cc4edc536b01f34bf1bdf12f0d54a2bbd5" alt=""
On Sep 11, 2009, at 23:05, Chris Miller wrote:
The approach I'm currently looking at is using a geonames file (eg cities15000.zip) from http://download.geonames.org/export/dump/ to decide which country and city is the most predominant one in each tile.
I tried this out on an extract of Europe. It worked pretty well; of the 234 tiles in my map, 5 had no entry in the cities15000 file. Four of the tiles were in Italy (Largest centers according to the GeoNames Web service: Codroipo, Maniago, Tavagnacco, Cervignano del Friuli, ), and one was in Slovenia (Podhom). This is good enough for my purposes. Cheers.
data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
I can agree with your findings, it appears to be working pretty well for me here too. I still have a bit of code cleanup and testing to do before I check it in but hopefully everyone will be reasonably happy with the results. Of course it can always be fine tuned in the future anyway if so desired. In your particular case, using cities5000 instead might have caught those last remaining tiles? Cheers, Chris
The approach I'm currently looking at is using a geonames file (eg cities15000.zip) from http://download.geonames.org/export/dump/ to decide which country and city is the most predominant one in each tile. CG> I tried this out on an extract of Europe. It worked pretty well; of CG> the 234 tiles in my map, 5 had no entry in the cities15000 file. CG> CG> Four of the tiles were in Italy (Largest centers according to the CG> GeoNames Web service: Codroipo, Maniago, Tavagnacco, Cervignano del CG> Friuli, ), and one was in Slovenia (Podhom). CG> CG> This is good enough for my purposes. CG> CG> Cheers.
data:image/s3,"s3://crabby-images/65b66/65b66aedfb8c69a1feef42153928d1d262ea0abd" alt=""
Thanks Johann, this sounds like quite a reasonable approach too. It shouldn't be too hard to add in to the splitter (eg by tacking the info on to the density map, or by holding on to the is_in data until the areas are known then calculating the names as a separate step). Looking at a few osm files the is_in tags and their values seem inconsistent at best though so I don't know how easy it would be to get sensible/consistent data from them. Another possible issue is that osm files without the tags wouldn't work at all. Yes, thats true. As far as I can remember, the is_in tags was very inconsistent. But I had hoped my algorithm is flexible enough to ignore this inconsistency. The idea was to extract each name at each level of the is_in tag. So if I take for example is_in = Germany,Bavaria,Munich,suburb,street name,.... then I will count the frequency of all five words. With statistical probability Germany will be the most used word in this tile. Afterwards I try to find unique names for the tiles. The name Germany will be occur in nearly all tiles, so it is not unique and will not be used. Also the region Bavaria will be in more then one tile and will not be used. If the city Munich is contained fully in one tile, the name will get taken, otherwise I will go down to the next. So I will get the most used name which is unique for this tile.
If you do have some code that deals with filtering/sanitising the is_in data I'd be interested to see it however as it sounds like it would be worth investigating further.
Find attached a patch, which works against the relative outdated R37. I've tried to update to the recent splitter, but it wont work. There was some structural changes from SubArea to Area. Regards, Johann
data:image/s3,"s3://crabby-images/5a29e/5a29edacbb2a9633c93680d5446c1467748d80a0" alt=""
Many thanks Johann, I'll take a look. I don't mind that the patch won't apply to the current splitter, it was more the logic you had used to sanitise the is_in values that I was interested in seeing since I imagine it took a bit of effort to get "right" :) As an aside for anyone who's interested, here's a copy of the planet with tiles that have been named using the cities15000.zip GeoNames file (and max-nodes=1600000): http://maps.google.co.uk/maps?q=http:%2F%2Fredyeti.net%2Fosm%2Fplanet-named.... Regards, Chris JG> Yes, thats true. As far as I can remember, the is_in tags was very JG> inconsistent. But I had hoped my algorithm is flexible enough to JG> ignore JG> this inconsistency. The idea was to extract each name at each level JG> of JG> the is_in tag. So if I take for example is_in = JG> Germany,Bavaria,Munich,suburb,street name,.... then I will count the JG> frequency of all five words. With statistical probability Germany JG> will JG> be the most used word in this tile. JG> Afterwards I try to find unique names for the tiles. The name JG> Germany JG> will be occur in nearly all tiles, so it is not unique and will not JG> be JG> used. Also the region Bavaria will be in more then one tile and will JG> not JG> be used. If the city Munich is contained fully in one tile, the name JG> will get taken, otherwise I will go down to the next. So I will get JG> the JG> most used name which is unique for this tile.
If you do have some code that deals with filtering/sanitising the is_in data I'd be interested to see it however as it sounds like it would be worth investigating further.
JG> Find attached a patch, which works against the relative outdated JG> R37. I've tried to update to the recent splitter, but it wont work. JG> There was some structural changes from SubArea to Area. JG> JG> Regards, JG> Johann
data:image/s3,"s3://crabby-images/46fb7/46fb707b82bbd3f38f5b314c62153255359d312f" alt=""
Hello, Currently if maxspeed=45 mph (with a space between the number and mph), mkgmap fails to read the maxspeed correctly. Below is a proposed patch to solve that issue. Thanks, N. Index: src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java =================================================================== --- src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java (revision 1182) +++ src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java (working copy) @@ -1168,7 +1168,7 @@ if(speedTag.matches(".*mph")) // Check if it is a limit in mph { - speedTag = speedTag.replaceFirst("mph", ""); + speedTag = speedTag.replaceFirst(" *mph", ""); factor = 1.61; } else
data:image/s3,"s3://crabby-images/c8507/c8507a9b36d2ae012454d358e06b6320aac0fa43" alt=""
Could one use theese lines also to replace % with °? Or phrased better, can mkgmap calculate the tangent? Nakor wrote:
Hello,
Currently if maxspeed=45 mph (with a space between the number and mph), mkgmap fails to read the maxspeed correctly. Below is a proposed patch to solve that issue.
Thanks,
N.
Index: src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java =================================================================== --- src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java (revision 1182) +++ src/uk/me/parabola/mkgmap/osmstyle/StyledConverter.java (working copy) @@ -1168,7 +1168,7 @@
if(speedTag.matches(".*mph")) // Check if it is a limit in mph { - speedTag = speedTag.replaceFirst("mph", ""); + speedTag = speedTag.replaceFirst(" *mph", ""); factor = 1.61; } else
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
participants (5)
-
Chris Miller
-
Clinton Gladstone
-
Felix Hartmann
-
Johann Gail
-
Nakor