data:image/s3,"s3://crabby-images/65b66/65b66aedfb8c69a1feef42153928d1d262ea0abd" alt=""
Am 01.03.2011 19:38, schrieb Johann Gail:
1-The list of regions (state/country field) is much better than the one obtained with trunk. All those included are actual regions (some with two different names, e.g. Castilla la Mancha& Castilla-la Mancha). Trunk includes many names that are not actual regions of Spain, but provinces, cities or even villages. That's fine! I don't understand why you get two different similar names. I think this is caused by addr: tags that don't use the same spelling like the boundary multipolygons. Do you know about any similar name detection algorithm? So something like a "sounds-like(String cityname)" function? This would be necessary to fix that.
Look for the SOUNDEX algorithm. It is described at least at the german and english wikipedia. It was originally developed to find similar names in genealogy, but I think it could be well used in your situation, maybe with slight modifications.
I have searched a little more and found the metaphone algorithm as a successor of the soundex. For metaphone is a java class already available at http://commons.apache.org/codec/apidocs/org/apache/commons/codec/language/Me... I have never used this, but looks quite reasonable. Regards, Johann