Meaning of option --x-mdr7-excl

4 Apr 2017

      Hi all,

I did not yet document this option because I don't think that it is useful as it is implemented now.
I think it works fine for english speeking countries with road names like "Abc Street" and "Xyz Road".
Using --x-mdr7-excl=Road,Street in combination with --x-split-name-index  will work fine.

A different picture is a frensh country.
Let's look at an example. Assume you have options --index and --x-split-name-index 
The road name "Chemin de Pierre Froide" is added to the index as 
"Chemin de Pierre Froide" 
and because of --x-split-name-index the following extries are also added:
"de Pierre Froide" 
"Pierre Froide" 
"Froide" 
Now, would you expect a change  if you use option --x-mdr7-excl=Chemin,Rue,Aveue ? 
And what would you expect with  --x-mdr7-excl=de,du,la ?

With the current implementation there would be no change in output, because the 
the check works in this way:
Build the string that should be added to the index
Check if that string is in the exclude list, if not, add it to the index.

I might change that like this:
Build the string that should be added to the index
Check if the first word in that string is in the exclude list, if not, add it to the index.

With this change the option --x-mdr7-excl=Chemin,Rue,Aveue
would exclude the entry
"Chemin de Pierre Froide"
and --x-mdr7-excl=de,du,la would exclude 
"de Pierre Froide" 

Another option would be to allow regular expressions in the exclude list, but that would require more
effort (input file instead of single option) and probably much more run time.

I'd prefer to have a logic which first analyses all strings added by the --x-split-name-index option so that
only those are generated which do not appear more than x %  . 

Comments?

Gerd

Gerd Petermann

Andrzej Popowski

Carlos Dávila

Gerd Petermann

Andrzej Popowski

Carlos Dávila

thesurveyor＠wolke7.net

Felix Hartmann

Felix Hartmann

Carlos Dávila

tags

participants (5)