Hi  Gerd,

at first: many thanks for your work. I'm typically just reading your posts and using mkgmap with a lot of fun.

For this topic I will try to explain my thougths, but maybe I didn't understood all those options

- the original name should always be in the index, because that's it what everybody knows and expects to find
- with --x-mdr7-excl=X  we should avoid the entry "X" in the index

for example:

Using --x-mdr7-excl=Road,Street,Chenin,des,de in combination with --x-split-name-index
should insert into index, for
name="ABC Straße"  -  in index: "ABC Straße", but not "Straße"
name="Straße des 17. Juni" - in index "Straße des 17. Juni", "des 17. Juni", "17. Juni", "Juni"
name="Chenin de Pierre Froide" - in index "Chenin de Pierre Froide", "de Pierre Froide", "Pierre Froide", "Froide"

We should think like the user of the map. He will not necessarily know about splitting street names. He just likes to find the name. And he knows the name or at least a part of it. I agree that nobody will search for "des 17. Juni", everybody will search for the complete name "Straße des 17. Juni" or "17. Juni". But this is nearly impossible to describe which part we need and which not.

 
Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.
I wouldn't do that.
 
I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .
Oh that idea is good, but I'm not sure if it will work. Think about a map like a map of the Alps. This map covers a lot of different countries with different languages.  I haven't checked it, but I assume that the most streets will be in german speaking countries (Germany, Austria, Switzerland). How should we compute a value for appearing of strings for France?  Or another example: "Straße" is very common in Germany, but if I have a map with whole France and a small part of Germany, then "Straße" would just get a low value and therefore gets included. So if we would do something like that, then we need a value for each country. 
And think about the street names, which nearly every town in Germany has, like "Hauptstraße".
So, I wouldn't do that.
 

Overall: I would prefer an easy to understand rule.

best regards,
Gert

 
 
Gesendet: Dienstag, 04. April 2017 um 17:13 Uhr
Von: "Gerd Petermann" <GPetermann_muenchen@hotmail.com>
An: "mkgmap-dev@lists.mkgmap.org.uk" <mkgmap-dev@lists.mkgmap.org.uk>
Betreff: [mkgmap-dev] Meaning of option --x-mdr7-excl
Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index
The road name "Chemin de Pierre Froide" is added to the index as
"Chemin de Pierre Froide"
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide"
"Pierre Froide"
"Froide"
Now, would you expect a change if you use option --x-mdr7-excl=Chemin,Rue,Aveue ?
And what would you expect with --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude
"de Pierre Froide"

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x % .

Comments?

Gerd
_______________________________________________
mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev