Improving mkgmap's Unicode transliteration
data:image/s3,"s3://crabby-images/710c7/710c7b9f877e8d927af5c34ce435ac7300eb535a" alt=""
I'm going to Greece tomorrow and I noticed to my dismay that mkgmap's transliteration tables for the Greek alphabet were totally missing. So I hacked up a small Perl script which uses the Unicode::UCD and the Text::Unidecode modules to fillin the blanks: avar@aoeu:~/src/mkgmap/resources/chars/ascii$ perl re-transliterate.pl < row03.trans > row03.trans.tmp && mv row03.trans.tmp row03.trans The script and a patch to row03.trans which Works For Me are attached. But of course the tool can also be run on the rest of the files to fill in more blanks. And my script can of course be modified a bit further to spit out transliterations for files not yet in mkgmap row* files. I don't know what the row* files were originally based on but there's a lot of prior art for transliterating Unicode and there's no need to redo all this work for mkgmap. The Unicode Consortium has published transliteration tables (which Text::Unidecode is largely based on), it's much easier to use stuff like that rather than doing all the work yourselves. Anyway, off to pack for my flight.
data:image/s3,"s3://crabby-images/710c7/710c7b9f877e8d927af5c34ce435ac7300eb535a" alt=""
2009/8/12 Ævar Arnfjörð Bjarmason <avarab@gmail.com>:
I'm going to Greece tomorrow and I noticed to my dismay that mkgmap's transliteration tables for the Greek alphabet were totally missing.
So I hacked up a small Perl script which uses the Unicode::UCD and the Text::Unidecode modules to fillin the blanks:
avar@aoeu:~/src/mkgmap/resources/chars/ascii$ perl re-transliterate.pl < row03.trans > row03.trans.tmp && mv row03.trans.tmp row03.trans
The script and a patch to row03.trans which Works For Me are attached. But of course the tool can also be run on the rest of the files to fill in more blanks.
And my script can of course be modified a bit further to spit out transliterations for files not yet in mkgmap row* files.
I don't know what the row* files were originally based on but there's a lot of prior art for transliterating Unicode and there's no need to redo all this work for mkgmap. The Unicode Consortium has published transliteration tables (which Text::Unidecode is largely based on), it's much easier to use stuff like that rather than doing all the work yourselves.
Anyway, off to pack for my flight.
I see that my patch has been applied: http://www.mkgmap.org.uk/svn/wsvn/mkgmap/resources/chars/ascii/?op=revision&... But really what I'm more interesting in is where mkgmap's transliteration database comes from. Knowing that will help with further contributions. There's a lot more that can be done than just improving row03.trans. That was just a sample improvement that I needed at that time.
data:image/s3,"s3://crabby-images/802f4/802f43eb70afc2c91d48f43edac9b0f56b0ec4a4" alt=""
Hi
But really what I'm more interesting in is where mkgmap's transliteration database comes from. Knowing that will help with further contributions. There's a lot more that can be done than just improving row03.trans. That was just a sample improvement that I needed at that time.
The templates are generated and the values were filled in by hand. Nothing sophisticated. Your post did lead to me to find a lot of useful tools that I had previously not been aware of and it would be great if you have ideas to improve it. As I understand it though, you really need to know the language and not just the character set in many cases. There has already been a posting saying that the Cyrillic transliteration should be different for non-Russian languages. So perhaps a somewhat different approach is needed. ..Steve
participants (2)
-
Steve Ratcliffe
-
Ævar Arnfjörð Bjarmason