On 06/08/13 18:57, Geoff Sherlock wrote:
Hi Steve,

When you collect the data for the index you could also increment a count
for each word. Then only add the word to the index if the count is less
than a optional value (default say 10000). This should work for most
languages and reduce the size of the index, although it will require
more memory for compiling the map.

I was looking into doing something like that. Turns out though that it
is not as easy as it sounds. So for example, in English, the words 'the' 
and 'square' are top words that could be removed. Yet there are
names such as 'The Square' and there are a whole bunch of similar problems.

Ideally we need methods that fail in a safe way by only rejecting a
word if it it (reasonably) certain that it should
not be there. At
the moment I am thinking that this will probably require language 
specific rules.

..Steve

mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://lists.mkgmap.org.uk/mailman/listinfo/mkgmap-dev