multi-word street search

Hello! I've a problem regarding street search. In Poland (and I suppose few more countries also may have this issue) person-named streets can have two forms - long one: Miroslawa Dudka or short one: Dudka there is no standard for tagging such names in OSM. Problem is that Garmin/Mapsource search function is matching names from the beginning of the string, so if I enter "Dudka" it will find all streets named "Dudka" but not "Miroslawa Dutka". Do you have any idea how can I deal with such issue? best regards Michal Rogala

two forms - long one:
Miroslawa Dudka
or short one:
Dudka
there is no standard for tagging such names in OSM.
Problem is that Garmin/Mapsource search function is matching names from the beginning of the string, so if I enter "Dudka" it will find all streets named "Dudka" but not "Miroslawa Dutka".
Do you have any idea how can I deal with such issue?
There is no simple answer to this at the moment. I do plan to implement the possibility of adding names to the index where searching starts in the middle of the string. This is a problem with languages where the word for street is typically at the beginning of the name, such as french. The case for Poland is a bit different if I understand correctly in that you would want to search for both "Miroslawa Dutka" and just "Dutka"? Also if there is no standard for tagging them, and no way of recognising them, then how would mkgmap know that it should treat them specially? Best wishes ..Steve

On 2013-02-25 22:43, Steve Ratcliffe wrote:
two forms - long one:
Miroslawa Dudka
or short one:
Dudka
there is no standard for tagging such names in OSM.
Problem is that Garmin/Mapsource search function is matching names from the beginning of the string, so if I enter "Dudka" it will find all streets named "Dudka" but not "Miroslawa Dutka".
Do you have any idea how can I deal with such issue?
There is no simple answer to this at the moment.
I do plan to implement the possibility of adding names to the index where searching starts in the middle of the string.
This is a problem with languages where the word for street is typically at the beginning of the name, such as french.
The case for Poland is a bit different if I understand correctly in that you would want to search for both "Miroslawa Dutka" and just "Dutka"?
Also if there is no standard for tagging them, and no way of recognising them, then how would mkgmap know that it should treat them specially?
it might require implementing a list of "street" names for each country. in many cases for streets named after some person people would want to refer to it by both first and last name, but "street" wouldn't be useful at all
Best wishes ..Steve -- Rich

On Mon, Feb 25, 2013 at 10:55:32PM +0200, Rich wrote:
it might require implementing a list of "street" names for each country. in many cases for streets named after some person people would want to refer to it by both first and last name, but "street" wouldn't be useful at all
This sounds like full-text search (inverted index). Each word in the street name would become an index entry. If we have street names "Calle b c" (id 1) and "Calle c d" (id 2) then we would have index entries like this: Calle -> 1 Calle -> 2 b -> 1 c -> 1 c -> 2 d -> 2 In full-text search, common words such as 'a' or 'the' are often treated as useless garbage and thrown away both when indexing and when searching. These words are commonly referred to as stop-words. mkgmap could implement a stop-word list in country-specific rules, similar to how we set the flags for formatting addresses. This list could be something like 'Street,St,Ave,Lane' for English-speaking countries. As far as I understand, mkgmap could easily do this splitting and stop-word implementation. Do Garmin devices implement multi-word street search? What happens if the user types "Main Street"? Will it search for either Main or Street, or both? Marko

Hi
Do Garmin devices implement multi-word street search? What happens if the user types "Main Street"? Will it search for either Main or Street, or both?
In the global index, it depends entirely on how it is built. You have entries that consist of the name and the character offset of the start character. So the entry "Main Street",1 would be sorted with the "M"s and you could have a second entry "Main Street",6 which would be sorted with "S" and allow you to search for "Street". So you can construct the index with as many name,offset pairs as you like, with the only downside being that the index could easily double in size if you just make every word indexable. ..Steve

On Tue, Feb 26, Steve Ratcliffe wrote:
So you can construct the index with as many name,offset pairs as you like, with the only downside being that the index could easily double in size if you just make every word indexable.
One idea I have here: Add all words indexable except they are "Street", "Avenue", "Road" or whatever else we have in the world which is part of nearly every street in a country. And Ignore all words which contains only 3 or less letters (Rd, Ave, Str, ...). I think this would be a great help if you search for a street. Thorsten -- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg)

I think that its worth introducing some internal variable, like mkgmap:stop_words where you can specify your custom wordlist instead of hard-coding it. In my example from the first post there is no problem with the "street" word because it is not included in OSM street names in Poland - main difficulty with searching is because street can have two different, yet totally valid names: Miroslawa Dutka - name and surname of a person Dutka - just surname of course in OSM we enter full name and surname, but 99% of people use just surname for specifying an address. Indexing every word in street name would solve this problem :). best regards Michal Rogala 2013/2/26 Thorsten Kukuk <kukuk@suse.de>
On Tue, Feb 26, Steve Ratcliffe wrote:
So you can construct the index with as many name,offset pairs as you like, with the only downside being that the index could easily double in size if you just make every word indexable.
One idea I have here:
Add all words indexable except they are "Street", "Avenue", "Road" or whatever else we have in the world which is part of nearly every street in a country. And Ignore all words which contains only 3 or less letters (Rd, Ave, Str, ...).
I think this would be a great help if you search for a street.
Thorsten
-- Thorsten Kukuk, Project Manager/Release Manager SLES SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 16746 (AG Nürnberg) _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

On Wed, Feb 27, 2013 at 01:01:31AM +0100, Michał Rogala wrote:
I think that its worth introducing some internal variable, like mkgmap:stop_words where you can specify your custom wordlist instead of hard-coding it.
Right, and it should be set in the location rules, because the stopwords vary country by country. 'Ave' or 'St' might be stopword in some countries, but in a different country someone might want to use them to find 'Calle Ave Maria' or 'Via St Peter' :-)
In my example from the first post there is no problem with the "street" word because it is not included in OSM street names in Poland
I guess it is mostly the same in Estonia, but there are some street signs that end in pst (puiestee, avenue), tn (tänav, street), mnt (maantee, road). In Finland, the 'street' often is a suffix to the name: -katu, -polku, -rinne, -tie, as in Kisatie (competition way). Only if the way designation is avenue (puistotie) or the way is named by a person, the way designation would be a word on its own. Marko

Hi, I have decided to try simple solution for this problem. I have modified addStreet() procedure in file imgfmt\app\mdr\MDRFile.java. While I don't know all ramification of this change, search seems to work as expected, see attached picture form Mapsource. Attached patch is created half-manually, I hope it will work in your environment. -- Best regards, Andrzej

Andrzej, I hope this will be committed soon, would be a great improvement for mkgmap search. Even better if it also works for address search.
Hi,
I have decided to try simple solution for this problem. I have modified addStreet() procedure in file imgfmt\app\mdr\MDRFile.java. While I don't know all ramification of this change, search seems to work as expected, see attached picture form Mapsource.
Attached patch is created half-manually, I hope it will work in your environment.
-- Best regards, Andrzej

Andrzej, I couldn't find the description of the "problem" this aims to fix but if I am guessing right it will add individual words from street names into the index so they can be found when searching for an address? Will this not add millions of entries for "The" and "Road" and similar articles, prepositions and road types? It will depend on the language as to what is written separately and what is joined to other words. In English/French everything is separate but in Dutch/German the rules are of course difference. To start with, does anyone have any idea how we could implement a language-dependent stop-word list (words to be ignored completely)? Colin On 2014-12-06 10:12, Minko wrote:
Andrzej, I hope this will be committed soon, would be a great improvement for mkgmap search. Even better if it also works for address search.
Hi, I have decided to try simple solution for this problem. I have modified addStreet() procedure in file imgfmtappmdrMDRFile.java. While I don't know all ramification of this change, search seems to work as expected, see attached picture form Mapsource. Attached patch is created half-manually, I hope it will work in your environment. -- Best regards, Andrzej
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev [1]
Links: ------ [1] http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

This patch is interesting, although I don't think I would use it the way it is, it does give me some ideas for adding other road aliases. Many roads have an alt_name, which would be useful to index. I.e, Avenue of the Americas and 6th Avenue are the same. Ben has some patches for abbreviating street names, which works but you have to remember to use them when using address search. I.e., I'd like for both "W 43rd St" and "West 43rd Street" to be in the index. On Sat Dec 06 2014 at 7:27:35 AM Colin Smale <colin.smale@xs4all.nl> wrote:
Andrzej, I couldn't find the description of the "problem" this aims to fix but if I am guessing right it will add individual words from street names into the index so they can be found when searching for an address?
Will this not add millions of entries for "The" and "Road" and similar articles, prepositions and road types? It will depend on the language as to what is written separately and what is joined to other words. In English/French everything is separate but in Dutch/German the rules are of course difference. To start with, does anyone have any idea how we could implement a language-dependent stop-word list (words to be ignored completely)?
Colin
On 2014-12-06 10:12, Minko wrote:
Andrzej, I hope this will be committed soon, would be a great improvement for mkgmap search. Even better if it also works for address search.
Hi, I have decided to try simple solution for this problem. I have modified addStreet() procedure in file imgfmt\app\mdr\MDRFile.java. While I don't know all ramification of this change, search seems to work as expected, see attached picture form Mapsource. Attached patch is created half-manually, I hope it will work in your environment. -- Best regards, Andrzej
_______________________________________________ mkgmap-dev mailing listmkgmap-dev@lists.mkgmap.org.ukhttp://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi, the problem is, that mkgmap creates search index basing on full street label. This is not the way people use street names. Usually we use shortened name, for example instead of "Allée Benjamin Franklin" or "Rue Benjamin Franklin" we remember simple "Franklin". With current version of index, you wont find "Franklin" street unless you know if this is "Rue" or "Allée", since you have to input correct full street name for search index to work. My idea is to add additional inputs to search index, so "Allée Benjamin Franklin" is additionally indexed as "Benjamin Franklin" and "Franklin". This is other feature than alternative names. Alternative names are already supported, you can have up to 4 names for a street and all are indexed. My patch is very simple and doesn't check for common words like "the" or "and". But it doesn't add each separate word to search, only a shortened sequence of words. There is superfluous index for "Street" in England but not for "The". I have compiled map of Europe and resulting index is about 590MB, maybe 30-40% bigger than without patch. I think this reasonable size, for comparison index of City Navigator Europe is about 1GB. -- Best regards, Andrzej

Hi I wrote a Garmin compatible implementation of searching on multiple words last year. It is still on the 'mixed-index' branch and I just updated it to the latest trunk. Everyone that is interested should take a look and try it out. I was trying to automatically determine which words are not useful to index because they occur too often. This wasn't very successful and I think we just need a per country configuration. Of course it is really more to do with language than country, but that is not so easily accessible in mkgmap. ..Steve

Hi Steve, I have looked at "mixed-index" branch. I see I have repeated your idea, only in a more crude way. IMHO you could release this version without optimization for popular words. This should be already usable. I would add some ideas. For example you could split label not only on spaces but on hyphen "-" too. This probably would help with German streets and some hyphenated names. As for detecting popular words, I think this should be done for each language separately. Would be easier to analyze results and would help in case where word has different meaning in different countries - it could be deleted only in one language. A note: I think original label should be always indexed. In current version you won't find way named Calle in Germany: http://www.openstreetmap.org/way/26258052 -- Best regards, Andrzej

El 08/12/14 a las 16:57, Andrzej Popowski escribió:
Hi Steve,
I have looked at "mixed-index" branch. I see I have repeated your idea, only in a more crude way.
IMHO you could release this version without optimization for popular words. This should be already usable.
I would add some ideas. For example you could split label not only on spaces but on hyphen "-" too. This probably would help with German streets and some hyphenated names.
As for detecting popular words, I think this should be done for each language separately. Would be easier to analyze results and would help in case where word has different meaning in different countries - it could be deleted only in one language.
A note: I think original label should be always indexed. In current version you won't find way named Calle in Germany: http://www.openstreetmap.org/way/26258052
I'm testing mixed-index jar, but get the error below with default style and with mine when the index is being created: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:658) at uk.me.parabola.imgfmt.app.mdr.Mdr7.createPartials(Mdr7.java:162) at uk.me.parabola.imgfmt.app.mdr.Mdr7.preWriteImpl(Mdr7.java:108) at uk.me.parabola.imgfmt.app.mdr.MdrSection.preWrite(MdrSection.java:129) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:308) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:247) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:338) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:575) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:128) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:134) at uk.me.parabola.mkgmap.main.Main.main(Main.java:105)

Hi OK I will remove all that code tomorrow, as it is not required. I think that problem mostly happens on small files. Steve On 8 December 2014 22:12:24 GMT+00:00, "Carlos Dávila" <cdavilam@orangecorreo.es> wrote:
El 08/12/14 a las 16:57, Andrzej Popowski escribió:
Hi Steve,
I have looked at "mixed-index" branch. I see I have repeated your idea, only in a more crude way.
IMHO you could release this version without optimization for popular words. This should be already usable.
I would add some ideas. For example you could split label not only on
spaces but on hyphen "-" too. This probably would help with German streets and some hyphenated names.
As for detecting popular words, I think this should be done for each language separately. Would be easier to analyze results and would help in case where word has different meaning in different countries - it could be deleted only in one language.
A note: I think original label should be always indexed. In current version you won't find way named Calle in Germany: http://www.openstreetmap.org/way/26258052
I'm testing mixed-index jar, but get the error below with default style
and with mine when the index is being created: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0 at java.lang.String.charAt(String.java:658) at uk.me.parabola.imgfmt.app.mdr.Mdr7.createPartials(Mdr7.java:162) at uk.me.parabola.imgfmt.app.mdr.Mdr7.preWriteImpl(Mdr7.java:108) at uk.me.parabola.imgfmt.app.mdr.MdrSection.preWrite(MdrSection.java:129) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:308) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:247) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:338) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:575) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:128) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:134) at uk.me.parabola.mkgmap.main.Main.main(Main.java:105)
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

On 08/12/14 22:12, Carlos Dávila wrote:
I'm testing mixed-index jar, but get the error below with default style and with mine when the index is being created: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
OK this should be fixed now. ..Steve

Hi Steve, I have tested mixed-index branch and it doesn't work for me. Mkgmap creates bigger index file but search work the same in Mapsource, no results when searching for partial label. I'm not able to find the reason. I think function writeSectData doesn't save partial labels, but I can't tell if this is a problem. I haven't tested in device. I have modified a bit my version to test some other ideas, patch is attached. I have removed text in parenthesis form partial search, added split at hyphen and excluded numbers at the end of label. -- Best regards, Andrzej

El 10/12/14 a las 02:25, Andrzej Popowski escribió:
Hi Steve,
I have tested mixed-index branch and it doesn't work for me. Mkgmap creates bigger index file but search work the same in Mapsource, no results when searching for partial label. I'm not able to find the reason. I think function writeSectData doesn't save partial labels, but I can't tell if this is a problem.
I haven't tested in device.
I have modified a bit my version to test some other ideas, patch is attached. I have removed text in parenthesis form partial search, added split at hyphen and excluded numbers at the end of label.
It doesn't work for me either. For example I can't find "Calle Las Grullas" searching by "Grullas". Not tested your patch yet...

Hi Andrzej Thanks for testing. I've tried using it again and there are problems with the merged version. Sometimes it works for me and sometimes it does not. Typing letters brings up expected streets, but when I actually try to find them, it often does not find anything. Sometimes there is an empty list, sometimes I get the pop-up message saying that this street is not present in this map product. I went back to the version before merging with trunk and this works a lot better, like I remember it. So there is an incompatibility between changes that have been made since that I will look into - its been over a year! I uploaded the jar that seems to work for me: http://files.mkgmap.org.uk/download/237/mkgmap.jar ..Steve
I have tested mixed-index branch and it doesn't work for me. Mkgmap creates bigger index file but search work the same in Mapsource, no results when searching for partial label. I'm not able to find the reason. I think function writeSectData doesn't save partial labels, but I can't tell if this is a problem.
I haven't tested in device.
I have modified a bit my version to test some other ideas, patch is attached. I have removed text in parenthesis form partial search, added split at hyphen and excluded numbers at the end of label.

With this "old" jar things still don't work for me. Some examples: -If I type any combination between g and grullas in the street search box, nothing is found. Streets in the selection list don't contain the searched string, most of them beginning by cm- + a number (road refs). If I type calle las grullas it is correctly found. -If I type full street name such as calle abelardo it finds calle abelardo gallego and calle abelardo lopez ayala -If I type abel, selection list shows some streets containing abel and then calle san abelardo, calle de abelardo, camin d'abelardo, which seems correct, but if I add an a (abela), then selection list begins with streets containing abeja, then abel and finally streets containing abela, like the ones above. For some of the streets shown in the selection list, if I click Find button I get the "no valid in this map" message. El 10/12/14 a las 14:11, Steve Ratcliffe escribió:
Hi Andrzej
Thanks for testing.
I've tried using it again and there are problems with the merged version. Sometimes it works for me and sometimes it does not.
Typing letters brings up expected streets, but when I actually try to find them, it often does not find anything. Sometimes there is an empty list, sometimes I get the pop-up message saying that this street is not present in this map product.
I went back to the version before merging with trunk and this works a lot better, like I remember it. So there is an incompatibility between changes that have been made since that I will look into - its been over a year!
I uploaded the jar that seems to work for me: http://files.mkgmap.org.uk/download/237/mkgmap.jar
..Steve
I have tested mixed-index branch and it doesn't work for me. Mkgmap creates bigger index file but search work the same in Mapsource, no results when searching for partial label. I'm not able to find the reason. I think function writeSectData doesn't save partial labels, but I can't tell if this is a problem.
I haven't tested in device.
I have modified a bit my version to test some other ideas, patch is attached. I have removed text in parenthesis form partial search, added split at hyphen and excluded numbers at the end of label.
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-- Por favor, no me envíe documentos con extensiones .doc, .docx, .xls, .xlsx, .ppt, .pptx, .mdb, mdbx Instale LibreOffice desde http://es.libreoffice.org/descarga/ LibreOffice es libre: se puede copiar, modificar y redistribuir libremente. Gratis y totalmente legal. LibreOffice está en continuo desarrollo y no tendrá que pagar por las nuevas versiones.

Hi
With this "old" jar things still don't work for me. Some examples: -If I type any combination between g and grullas in the street search box, nothing is found. Streets in the selection list don't contain the searched string, most of them beginning by cm- + a number (road refs). If I type calle las grullas it is correctly found. -If I type full street name such as calle abelardo it finds calle abelardo gallego and calle abelardo lopez ayala -If I type abel, selection list shows some streets containing abel and then calle san abelardo, calle de abelardo, camin d'abelardo, which seems correct, but if I add an a (abela), then selection list begins with streets containing abeja, then abel and finally streets containing abela, like the ones above. For some of the streets shown in the selection list, if I click Find button I get the "no valid in this map" message.
Well that is interesting but disappointing, sounds very much like what happens for me with the 'new' version. I do worry that there is a problem when there are very many names beginning the same (eg calle) that it might not work. I am going to put together some working (for me) examples. ..Steve

On 10/12/14 01:25, Andrzej Popowski wrote:
I have modified a bit my version to test some other ideas, patch is attached. I have removed text in parenthesis form partial search, added split at hyphen and excluded numbers at the end of label.
Thanks yes it is useful to experiment with different ways of splitting the names up. We could always put this patch in and then covert to the Garmin format when it is working. ..Steve

Hi Steve, what is the reason for using "dirty names" for sorting? You have this comment in code: // We sort on the dirty name (ie with the Garmin shield codes) although those codes do not // affect the sort order. The string for mdr15 does not include the shield codes. I have tried to swap "dirty names" with "clean names" and search seems to work in Mapsource. Maybe it depends on label coding? Processing of clean names would be easier. -- Best regards, Andrzej

Hi Andrzej
what is the reason for using "dirty names" for sorting? You have this comment in code:
Because I believed that was the correct way to do it.
// We sort on the dirty name (ie with the Garmin shield codes) although those codes do not // affect the sort order. The string for mdr15 does not include the shield codes.
This is because it shouldn't make much difference: if A sorts before B then A will also sort before <shield>B and <shield>A will sort before B. Different kinds of shields do (or can) sort among themselves <shieldA>A and <shieldB>A can sort in that order, but would be entirely overridden by what comes next so <shieldB>A would be before <shieldA>B.
I have tried to swap "dirty names" with "clean names" and search seems to work in Mapsource. Maybe it depends on label coding?
So that is interesting, it really shouldn't make any/much difference in practice (it may have done when I wrote the comment). So we need to find out what difference it makes to the sort order.
Processing of clean names would be easier.
Maybe, but it not about what is easier ;) If you could work out what the difference is that would be great. ..Steve

Hi Steve,
This is because it shouldn't make much difference: if A sorts before B then A will also sort before <shield>B and <shield>A will sort before B.
But there are more codes, like separation code, which can be placed in the middle of the label. Or maybe they aren't available in mkgmap? I have never used them.
So we need to find out what difference it makes to the sort order.
Well, I have done more tests with nuvi. Devices perform search differently. My findings are following: - My patches don't work in device. - Mixed-index works in device. - Mixed-index behaves similarly to City Navigator maps in Mapsource and in device. I have tried mkgmap.jar, that you have provided, and compilation of current mixed-index branch. Both worked correctly in device. There is no easy partial search in Mapsource, like with my patch, but search in Mapsource is usable. If you write partial string, then Mapsource offer valid labes to choose from. Mixed-index works in BaseCamp too. My conclusion is, that your solutions in mixed-index is correct (maybe it needs some cleaning, there is still test code active). I encourage users to make more tests. -- Best regards, Andrzej

Hi Steve, I tried the branch with a map for Poland. I can't test with the device because it is broken (my own fault, I tried to exchange the display) A few results from MapSource: The map contains many streets named "Ignacego Paderewskiego" (different cities), and also some named "Ignacego Jana Paderewskiego" A lot of street names start with "Ignacego". The data is here: http://files.mkgmap.org.uk/ I used the command java mkgmap.jar --route --housenumbers --index --bounds=f:\osm\bounds-latest.zip --nsis 63240001.osm.pbf to produce the map. In Mapsource I search for an address, leaving all fields besides the street name empty, the result list is limited to 20 entries. - search for "Ignacego": + 1st hit: "Kraszewskiego Józefa Ignacego" + followed by a sorted list of roads starting with "Ignacego" + followed by a sorted list of roads having "Ignacego" as 2nd word + following by a list of roads having a word starting with "Ii" - search for "Paderew": + 1st hit: "Ignacego Paderewskiego" + followed by results like "Pallacowa", "Palmova","Pancerna", "6 I Brygady Pancernej Wojska Polskiego", "I Brygady Pancernej Wojska Polskiego(6)" which seems to be a sorted list of roads containing the string "Pa" ++ When I select "6 I Brygady Pancernej Wojska Polskiego", the street is not found ++ When I select "I Brygady Pancernej Wojska Polskiego(6)" the street is found ++ The name in OSM is ""I Brygady Pancernej Wojska Polskiego"" + I DON'T see the one named "Ignacego Jana Paderewskiego" in this list - search for "Jana": + 1st hit: "Swietego Jana" + followed by some roads starting with "Jana" or "Jana" as 2nd word. Interesting: the list seems to be sorted by the word following Jana, no matter whether Jana is the 1st word or not (see screenshot Jana.png), - search for "jana pa" + 1st hit: "Ignacego Jana Paderewskiego" + followed by list of roads with "jana pa" in the name + followed by list of roads with "jana ??" in the name, where ?? is Ry, Si, Sk, .. so these seem to be the roads sorted by the word following "Jana" Conclusion: The Mapsource algo seems to list 1) all street names where the last word begins with the search string (dashes seem to be used like blanks) 2) all street names containing the 1st word of the search string as the beginning of a word, sorted by the characters following the search string. Problem: - all listed streets starting with a digit don't seem to exist (I guess these are refs?) I hope it helps somehow. Let me know if I should test other combinations. Gerd
Date: Thu, 11 Dec 2014 01:11:26 +0100 From: popej@poczta.onet.pl To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] multi-word street search
Hi Steve,
This is because it shouldn't make much difference: if A sorts before B then A will also sort before <shield>B and <shield>A will sort before B.
But there are more codes, like separation code, which can be placed in the middle of the label. Or maybe they aren't available in mkgmap? I have never used them.
So we need to find out what difference it makes to the sort order.
Well, I have done more tests with nuvi. Devices perform search differently. My findings are following: - My patches don't work in device. - Mixed-index works in device. - Mixed-index behaves similarly to City Navigator maps in Mapsource and in device.
I have tried mkgmap.jar, that you have provided, and compilation of current mixed-index branch. Both worked correctly in device. There is no easy partial search in Mapsource, like with my patch, but search in Mapsource is usable. If you write partial string, then Mapsource offer valid labes to choose from. Mixed-index works in BaseCamp too.
My conclusion is, that your solutions in mixed-index is correct (maybe it needs some cleaning, there is still test code active). I encourage users to make more tests.
-- Best regards, Andrzej _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi Gerd Thanks, lets make this file our test for the moment. In particular:
- search for "Paderew": + 1st hit: "Ignacego Paderewskiego" + followed by results like "Pallacowa", "Palmova","Pancerna", "6 I Brygady Pancernej Wojska Polskiego", "I Brygady Pancernej Wojska Polskiego(6)" which seems to be a sorted list of roads containing the string "Pa" ++ When I select "6 I Brygady Pancernej Wojska Polskiego", the street is not found ++ When I select "I Brygady Pancernej Wojska Polskiego(6)" the street is found ++ The name in OSM is ""I Brygady Pancernej Wojska Polskiego"" + I DON'T see the one named "Ignacego Jana Paderewskiego" in this list
I tried it with the 'old' version and it was worse... I notice that MdrCheck gives many errors, which might just be because it is not fully understanding the new fields, or it may be a genuine problem to look into. So I will start there. ..Steve

Hi
+ I DON'T see the one named "Ignacego Jana Paderewskiego" in this list
I've fixed this problem. Also it behaves a bit better for 'Jana' too. In that where the whole name is just 'Jana' it comes first. Haven't started on the 'not in map' problems. The problem is probably in mdr20. ..Steve

Hi The Poland test file now works with r3373 in MapSource for me. All roads are found and the expected names are displayed. It should work on a device if transferred with mapsource, probably not after generating the gmapsupp directly with gmapsupp, at least if you need to enter a city. ..Steve

Hi Steve, yes, seems to work well with --latin1, but --unicode still doesn't work. Gerd Steve Ratcliffe wrote
Hi
The Poland test file now works with r3373 in MapSource for me. All roads are found and the expected names are displayed.
It should work on a device if transferred with mapsource, probably not after generating the gmapsupp directly with gmapsupp, at least if you need to enter a city.
..Steve _______________________________________________ mkgmap-dev mailing list
mkgmap-dev@.org
-- View this message in context: http://gis.19327.n5.nabble.com/multi-word-street-search-tp5750803p5827271.ht... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hi,
seems to work well with --latin1, but --unicode still doesn't work.
I have assumed, that name offset in MDR7 is in bytes, not in characters. Attached is a patch for mixed-index branch, which recalculate offset. It seems to work in Mapsource, both cp1250 and UTF-8, but please check on your examples. I haven't tested in device. -- Best regards, Andrzej

Hi Andrzej
I have assumed, that name offset in MDR7 is in bytes, not in characters. Attached is a patch for mixed-index branch, which recalculate offset.
Great, you've confirmed it.
It seems to work in Mapsource, both cp1250 and UTF-8, but please check on your examples. I haven't tested in device.
Still need to deal with initial shield characters; I've combined everything and will commit the result. Seems to work with all the tests I was using on latin1 and unicode. Even works on ascii, which surprised me a bit! ..Steve

El 15/12/14 a las 12:22, Steve Ratcliffe escribió:
Hi
The Poland test file now works with r3373 in MapSource for me. All roads are found and the expected names are displayed.
It should work on a device if transferred with mapsource, probably not after generating the gmapsupp directly with gmapsupp, at least if you need to enter a city.
..Steve
r3373 works much better with my test case, now only minor issues. Selection list is not sorted and some streets are repeated in it. See attached screenshot.

Hi,
Selection list is not sorted and some streets are repeated in it
My next experiment to remove repeated streets, patch attached. And here mkgmap for tests: http://files.mkgmap.org.uk/download/240/mkgmap.jar -- Best regards, Andrzej

On 15/12/14 23:44, Andrzej Popowski wrote:
My next experiment to remove repeated streets, patch attached.
How does this remove repeated streets? It does make the sorting nicer, although "Jana" sorts after "Jana Bazynskiego" and so on. Might be better to just append the initial part of the name - not the full name. ..Steve

Hi Steve,
How does this remove repeated streets?
Your code for removing duplicates checks for both: name and partial name, so I have expanded search key to include both. Seems to help, probably because of cases, when partial name and offset is the same but full name is different. -- Best regards, Andrzej

On 16/12/14 00:43, Andrzej Popowski wrote:
Your code for removing duplicates checks for both: name and partial name, so I have expanded search key to include both. Seems to help, probably because of cases, when partial name and offset is the same but full name is different.
OK you will be right, the initial part of the name would sort randomly. I've sorted on the full name in a different way in the latest commit which hopefully has the same effect. Thanks ..Steve

Hi Steve, I have experienced crash with recent version: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -126 at java.lang.String.substring(Unknown Source) at uk.me.parabola.imgfmt.app.mdr.Mdr7Record.getPartialName(Mdr7Record.java:114) at uk.me.parabola.imgfmt.app.mdr.Mdr7.preWriteImpl(Mdr7.java:162) at uk.me.parabola.imgfmt.app.mdr.MdrSection.preWrite(MdrSection.java:129) at uk.me.parabola.imgfmt.app.mdr.MDRFile.writeSections(MDRFile.java:308) at uk.me.parabola.imgfmt.app.mdr.MDRFile.write(MDRFile.java:247) at uk.me.parabola.mkgmap.combiners.MdrBuilder.onFinish(MdrBuilder.java:338) at uk.me.parabola.mkgmap.main.Main.endOptions(Main.java:575) at uk.me.parabola.mkgmap.CommandArgsReader.readArgs(CommandArgsReader.java:128) at uk.me.parabola.mkgmap.main.Main.mainStart(Main.java:134) at uk.me.parabola.mkgmap.main.Main.main(Main.java:105) I guess this is offset greater than 127 coded on byte. I hope Garmin treat offset as unsigned byte, so you could support values up to 255. And maybe limit "end" value in addStreet function, unless size of label is already limited. -- Best regards, Andrzej

On 16/12/14 23:09, Andrzej Popowski wrote:
I guess this is offset greater than 127 coded on byte. I hope Garmin treat offset as unsigned byte, so you could support values up to 255. And maybe limit "end" value in addStreet function, unless size of label is already limited.
You really have names over 127 characters in length? I will add checks assuming that 255 is the max. Could you see if it works or if we have to limit to 127. Thanks ..Steve

On Wed, Dec 17, 2014 at 10:30:30AM +0000, Steve Ratcliffe wrote:
On 16/12/14 23:09, Andrzej Popowski wrote:
I guess this is offset greater than 127 coded on byte. I hope Garmin treat offset as unsigned byte, so you could support values up to 255. And maybe limit "end" value in addStreet function, unless size of label is already limited.
You really have names over 127 characters in length? I will add checks assuming that 255 is the max. Could you see if it works or if we have to limit to 127.
127 bytes in UTF-8 can be significantly less than 127 characters (31 characters if all characters use 4 bytes). Maybe CJK names are not typically using that many characters. But perhaps Arabic or Hebrew or Indian exclusively uses 3-byte characters? In that case, 127 bytes would be only 42 characters, which is not impossibly long. I am under the impression that European scripts use at most 2 bytes per character in UTF-8. 63 characters would seem more than long enough for my Finnish taste. And Finnish uses close to 1 UTF-8 byte per character in the average. :) Marko

El 17/12/14 a las 12:06, Steve Ratcliffe escribió:
Hi Marko
127 bytes in UTF-8 can be significantly less than 127 characters (31 characters if all characters use 4 bytes).
Although the crash was related to a character offset in a Java string that is a very good point and I also need to limit the output utf-8 offset.
..Steve I have compiled Spain with r3381. On MapSource 6.13.7 (Linux) address search works fine, but on MapSource 6.16.3 some streets are found only if you leave city field empty. If you enter any city, streets are not found (not valid street error). Sending the map from MapSource to device (nuvi 300) no street is found, neither entering full name nor with partial name (city name has to be entered first).

Hi, I confirm problems in Mapsource when search include city. I have studied cpreview source a bit. It looks like "trailingFlags" in MDR7 are more complicated, when multi-word sort is used. The lowest bit of flag is a marker for new group of streets with the same name. Other bits indicate some sort order, probably calculated from first part of name, which is removed to get partial name. I don't know if order is calculated for single map or globally. If any trailingFlags is greater than 254, then size of MDR7 record is increased by 1 byte. -- Best regards, Andrzej

On 18/12/14 02:42, Andrzej Popowski wrote:
Hi,
I confirm problems in Mapsource when search include city.
Is this for a map of Poland?
I have studied cpreview source a bit. It looks like "trailingFlags" in MDR7 are more complicated, when multi-word sort is used. The lowest bit of flag is a marker for new group of streets with the same name. Other bits indicate some sort order, probably calculated from first part of name, which is removed to get partial name. I don't know if order is calculated for single map or globally.
I think it is split into two fields. The first part increases sequentially for names with the same stem but different prefixes and the second part increases sequentially for each different suffix. I am less sure about the suffix part. I was hoping that I wouldn't need to implement this, but I can see if the prefix part makes a difference if I can get a test running. ..Steve

Hi Carlos
I have compiled Spain with r3381. On MapSource 6.13.7 (Linux) address search works fine, but on MapSource 6.16.3 some streets are found only if you leave city field empty. If you enter any city, streets are not found (not valid street error).
Do you have a small testcase available?
Sending the map from MapSource to device (nuvi 300) no street is found, neither entering full name nor with partial name (city name has to be entered first).
Does that device usually work with just a street name? ..Steve

Hi,
You really have names over 127 characters in length?
These could be for example names for marked trails. When there are multiple trails on a single way, I try to concatenate their names in style. I think these names could be cut anyway and I accept limits that you set in mkgmap. -- Best regards, Andrzej

Hi, I have looked a bit at long names. These are mostly cases, where people put comments into name or ref. See example: http://www.openstreetmap.org/way/260604190 http://www.openstreetmap.org/way/177018529 http://www.openstreetmap.org/way/162105748 http://www.openstreetmap.org/way/51789800 http://www.openstreetmap.org/way/292953863 http://www.openstreetmap.org/way/160904978 I have compiled Europe (CP1252) with recent mixed-index branch (3381M). No problems with compilation, search seems to work correctly. I have noticed, that Mapsource shows names up to 74 characters, BaseCamp shows up to 150 characters. I haven't tested in device. -- Best regards, Andrzej

Hi Andrzej
But there are more codes, like separation code, which can be placed in the middle of the label. Or maybe they aren't available in mkgmap? I have never used them.
They are all ignored for sorting purposes. It is all defined by what is in the SRT file anyway. Thanks for the further testing. ..Steve

Hi, I have made some minor changes to mixed-index branch, patch is attached. Like I said before, I think this is workable solution. See pictures from Mapsource and nuvi. I have created jar for tests: http://files.mkgmap.org.uk/download/239/mkgmap.jar The only feature I have added is a break at opening parenthesis "(", so text after parenthesis is not added to multi-word index. -- Best regards, Andrzej

El 11/12/14 a las 23:08, Andrzej Popowski escribió:
Hi,
I have made some minor changes to mixed-index branch, patch is attached. Like I said before, I think this is workable solution. See pictures from Mapsource and nuvi.
I have created jar for tests: http://files.mkgmap.org.uk/download/239/mkgmap.jar
The only feature I have added is a break at opening parenthesis "(", so text after parenthesis is not added to multi-word index. Your patched version works better for me with the same examples of my previous mail, but there are still some issues. If I search for way 127988768 which is tagged with name=Avenida de Abelardo Nuñez, ref=ZA-P-2661 see what happens: 1: Typing "Avenida de ab" displays some streets before searched string. See multiword1.png. If I select "Za-P-2661 Avenida de Abelardo Nuñez" or "Avenida de Abelardo Nuñez (Za-P-2661)" nothing is found (Not valid street message). 2: Typing "Abelardo" selection list only shows streets containing that string. See multiword2.png. If I select "Za-P-2661 Avenida de Abelardo Nuñez" it is correctly found, but if I select "Avenida de Abelardo Nuñez (Za-P-2661)" nothing is found (No error message). 3: Typing any string between "n" and "nuñez" results in a list that has nothing to do with the searched string. See multiword3.png

Hi Carlos, are you using the --unicode option (or something similar ? ) I see very different results with this with the Poland tile, in fact, I'd say the result is not completely wrong. Gerd Date: Fri, 12 Dec 2014 11:02:34 +0100 From: cdavilam@orangecorreo.es To: mkgmap-dev@lists.mkgmap.org.uk Subject: Re: [mkgmap-dev] multi-word street search El 11/12/14 a las 23:08, Andrzej Popowski escribió:
Hi,
I have made some minor changes to mixed-index branch, patch is attached. Like I said before, I think this is workable solution. See pictures from Mapsource and nuvi.
I have created jar for tests: http://files.mkgmap.org.uk/download/239/mkgmap.jar
The only feature I have added is a break at opening parenthesis "(", so text after parenthesis is not added to multi-word index. Your patched version works better for me with the same examples of my previous mail, but there are still some issues. If I search for way 127988768 which is tagged with name=Avenida de Abelardo Nuñez, ref=ZA-P-2661 see what happens: 1: Typing "Avenida de ab" displays some streets before searched string. See multiword1.png. If I select "Za-P-2661 Avenida de Abelardo Nuñez" or "Avenida de Abelardo Nuñez (Za-P-2661)" nothing is found (Not valid street message). 2: Typing "Abelardo" selection list only shows streets containing that string. See multiword2.png. If I select "Za-P-2661 Avenida de Abelardo Nuñez" it is correctly found, but if I select "Avenida de Abelardo Nuñez (Za-P-2661)" nothing is found (No error message). 3: Typing any string between "n" and "nuñez" results in a list that has nothing to do with the searched string. See multiword3.png
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi Carlos,
I use --latin1 --code-page=1252
okay. @Steve: Just to make sure because my previous post was unclear: the map created with java -jar mkgmap.jar --route --housenumbers --index --bounds=f:\osm\bounds-latest.zip --nsis --unicode 63240001.osm.pbf from trunk works as expected in Mapsource, and I see no messages like "Street not valid in this map" when I select one of the listed roads, also the ones starting with numbers work fine. The same options used with the branch (r3368) gave a completely wrong result, e.g. search for "ignacego" didn't show any road starting with "Ignacego", same with "paderew" Much better results with r3369 (and --unicode) . Search for "ignacego" and "jana" and "jana pa" look good, but "paderew" is wrong. (As you mentioned, the "street not valid problem " remained) On the other hand, the results without --unicode look worse now with r3369. Search for "ignacego", "jana", "jana pa" show nonsense, only "paderew" looks good. Seems that the search works only when searching for the last word. Gerd

On 12/12/14 14:14, Gerd Petermann wrote:
Much better results with r3369 (and --unicode) . Search for "ignacego" and "jana" and "jana pa" look good, but "paderew" is wrong. (As you mentioned, the "street not valid problem " remained)
My quick testing environment isn't set up for unicode, so I have not being testing it. It seems like it is worse with unicode.
On the other hand, the results without --unicode look worse now with r3369. Search for "ignacego", "jana", "jana pa" show nonsense, only "paderew" looks good. Seems that the search works only when searching for the last word.
So these are without unicode, but with --latin1 (or code-page=1252) ? For me all your test cases look plausible with --latin1. ..Steve

Hi Steve, Steve Ratcliffe wrote
On the other hand, the results without --unicode look worse now with r3369. Search for "ignacego", "jana", "jana pa" show nonsense, only "paderew" looks good. Seems that the search works only when searching for the last word.
So these are without unicode, but with --latin1 (or code-page=1252) ?
For me all your test cases look plausible with --latin1.
The results are without any option regarding codepage. Just the same command without --unicode. Gerd -- View this message in context: http://gis.19327.n5.nabble.com/multi-word-street-search-tp5750803p5827045.ht... Sent from the Mkgmap Development mailing list archive at Nabble.com.

Hi, I don't know details about MDR format but what is bothering me is the dual format of labels used in creating index. Label can be in Garmin format, with special codes for shields and some separators, or with "clean" format, where special codes are removed or replaced by space. What is unclear for me, is how to use offset inside labels, since offset to a word in Garmin format could be different than offset to the same word in clean format. This could be a bigger problem for Unicode or ASCII labels (if these are supported for index). That's why I'm using code-page=1252 at a moment. I have moved changes from mixed-index to trunk and compiled a map of Europe (I like option drive-on=detect). I'm using my own style, where street names aren't prefixed with street number in a shield - this doesn't look good in nuvi. I have tried searches proposed by Carlos in Mapsource: Avenida - gives too much results to find proper street, Avenida de ab - easy to find and correct search, Za-P-2661 - correct search, nunez - too many results to find Abelardo Nuñez I think for code-page=1252 index works correctly. There could be problems, but current results are encouraging. -- Best regards, Andrzej

What is unclear for me, is how to use offset inside labels, since offset to a word in Garmin format could be different than offset to the same word in clean format. This could be a bigger problem for Unicode or ASCII labels (if these are supported for index). That's why I'm using code-page=1252 at a moment.
Yes you are right this is likely an issue to be examined. The 'clean' form is only used in mapsource not the device. So mostly it is necessary to use the actual device form of the name. That is after translation, perhaps even after conversion to utf8 when using Unicode -I don't know. ..Steve _
mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi Gerd Ah OK. Well it is possible that it is not possible to create a multi word index with a format 6 file. But this raises an important point. I may not be working with the translated form of the word. I made changes to deal with that but they were after mixed index was branched I think. The more transliteration there is the worse the index would be if that is the case. ..steve On 12 December 2014 14:40:04 GMT+00:00, GerdP <gpetermann_muenchen@hotmail.com> wrote:
Hi Steve,
Steve Ratcliffe wrote
On the other hand, the results without --unicode look worse now with r3369. Search for "ignacego", "jana", "jana pa" show nonsense, only "paderew" looks good. Seems that the search works only when searching for the last word.
So these are without unicode, but with --latin1 (or code-page=1252) ?
For me all your test cases look plausible with --latin1.
The results are without any option regarding codepage. Just the same command without --unicode.
Gerd
-- View this message in context: http://gis.19327.n5.nabble.com/multi-word-street-search-tp5750803p5827045.ht... Sent from the Mkgmap Development mailing list archive at Nabble.com. _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Hi
But this raises an important point. I may not be working with the translated form of the word. I made changes to deal with that but they were after mixed index was branched I think. The more transliteration there is the worse the index would be if that is the case.
OK I am talking nonsense here :) As the index is created by reading the names from the compiled tiles, then we do have the exact text of the labels to work with already. So I expect that it should be working for single tiles with single byte character sets. Different rules may apply for unicode and ascii, and with more than one tile there may be bugs. There is still the not-found problem, which I will get back to. ..Steve

Hi Steve,
But this raises an important point. I may not be working with the translated form of the word. I made changes to deal with that but they were after mixed index was branched I think. The more transliteration there is the worse the index would be if that is the case.
OK I am talking nonsense here :) As the index is created by reading the names from the compiled tiles, then we do have the exact text of the labels to work with already.
So I expect that it should be working for single tiles with single byte character sets. Different rules may apply for unicode and ascii, and with more than one tile there may be bugs.
Not sure if this is true. You use a rather complex method to sort the road names before writing the img file. If I got it right, you use a simple sort method which uses String.compareTo() when you sort the index entries. Are you sure that this will always work? I guess a simple test should show if not: If you sort the list of road names with the simple sort method the order should not change.
There is still the not-found problem, which I will get back to.
Good luck! BTW: the display tool r440 uses a method NetHeader.getRoadShift() which is not yet in trunk. Gerd

Hi Gerd
Not sure if this is true. You use a rather complex method to sort the road names before writing the img file. If I got it right, you use a simple sort method which uses String.compareTo() when you sort the index entries.
Where is that? I did see that I have some String.equals() instead of collator.compare() tests in the mdr still. I am changing the ones I find.
There is still the not-found problem, which I will get back to.
I think I have this solved. It is caused by shields. The shield does not count when calculating the offset of the word to be indexed. I believe I have proved that but the code is just a hack, I need to work out how to do this properly.
Good luck! BTW: the display tool r440 uses a method NetHeader.getRoadShift() which is not yet in trunk.
Right sorry about that. ..Steve

Hi Steve,
Not sure if this is true. You use a rather complex method to sort the road names before writing the img file. If I got it right, you use a simple sort method which uses String.compareTo() when you sort the index entries.
Where is that? I did see that I have some String.equals() instead of collator.compare() tests in the mdr still. I am changing the ones I find.
Sorry, I thought that in Mdr7.java the line Collections.sort(sortedStreets); is sorting strings, but it is not. So forget that. Gerd
participants (12)
-
Andrzej Popowski
-
Brian Egge
-
Carlos Dávila
-
Colin Smale
-
Gerd Petermann
-
GerdP
-
Marko Mäkelä
-
Michał Rogala
-
Minko
-
Rich
-
Steve Ratcliffe
-
Thorsten Kukuk