Re: [mkgmap-dev] StandardCharsets and try (with-resources)

19 Jan 2020

      Hi Gerd

Here is new version of patch with line.trim() restored and exception
thrown.

@mike - It is likely that this will fix your problem with the display
of option text with non-ascii characters; with previous code, mkgmap
*read* the text incorrectly unless your local charset is was utf-8.

Ticker

On Fri, 2020-01-17 at 17:04 +0000, Ticker Berkin wrote:
...
Hi Gerd
The line.trim() deletion wasn't intended - I'll put it back.
I think it best to change sortForCode IOException to throw
ExitException. Maybe they meant to return some default "Sort", ie
sortForCodepage(1252), but this seems wrong.
I started looking at CombinedStyleFileLoader. It does its Input and
Output in the default charset and I don't know if anyone uses it
anymore, but I didn't want to change any of its behaviour, so I
thought
best not to touch it.
Reg. new class for files that use '#' for comments. Some of these
already use TokenScanner which can be configured. The only other one
that a quick grep finds is the character transliteration tables, so I
don't think it is worth it at the moment.
Ticker
On Fri, 2020-01-17 at 16:20 +0000, Gerd Petermann wrote:
...
Hi Ticker,
- I think there is a small change in the handling of lines in
OsmMapDataSource.readDeleteTagsFile. The old code used
line = line.trim();
This is missing now. Is that intended?
- I also don't understand the line with your comment "// ??? I
don't
understand this" . Looks like an endless recursive call?
- You sometimes replaced FileReader, but not in
CombinedStyleFileLoader. Why not?
We have a few places where we read files which use "#" for comment
lines.  Would it help to create a class for that?
I made a few minor mods, see attachment.
Gerd
________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag
von Ticker Berkin <rwb-mkgmap@jagit.co.uk>
Gesendet: Freitag, 17. Januar 2020 13:53
An: Development list for mkgmap
Betreff: [mkgmap-dev] StandardCharsets and try (with-resources)
Hi Gerd
Attached patch
- uses StandardCharsets.* where possible.
- notes some usage of the java local DefaultCharset.
- changed a couple of these to force utf-8 instead.
- if --read-config file gives decoding errors, names the charset
used
to read the file (ie DefaultCharset) instead of 'utf-8' in the
error
message.
- accepts/ignores unicode BOM in more files
- uses try (open...) {} where possible in files changed for the
above
reasons.
There is some code in
mkgmap/srt/SrtTextReader.java:sortForCodepage()
that I don't understand; it would appear to get into a recursive
loop
on IOException.
Ticker
On Tue, 2020-01-14 at 09:55 +0000, Gerd Petermann wrote:
...
Hi Ticker,
yes, and every missing close() is a brain teaser ;)
We have a few places where files are opened and closed in a
different
method. This is likely to cause trouble in unit tests, esp. on
Windows.
Whereever possible we should use try-with-ressources instead of
Utils.closeFile() and add a comment
like in SeaGenerator line
in zipFile = new ZipFile(precompSeaDir); // don't close here!
when a file is intentionally kept open.
Gerd
...
________________________________________
Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im
Auftrag
von Ticker Berkin <rwb-mkgmap@jagit.co.uk>
Gesendet: Dienstag, 14. Januar 2020 10:43
An: Development list for mkgmap
Betreff: Re: [mkgmap-dev] TYP files and character encoding
Hi Gerd
Here is updated patch that closes the file, although I find
many
files
in mkgmap that don't have explicit close(), but I presume
.finalize()
will close them eventually.
I'll do another patch for other text file handling, using
StandardCharset where possible and fixing TokenScanner message
for
bad
characters if not utf-8 and, if reasonable, allowing a BOM even
if
the
file is opened as utf-8 anyway.
Ticker
On Tue, 2020-01-14 at 08:21 +0000, Gerd Petermann wrote:
Hi Ticker,
thanks for the patch.
Please review TypCompiler.CharsetProbe.  BufferedReader br is
not
closed. Is that intended?
I see that we have a mix of "utf-8" and "UTF-8" in the mkgmap
sources. I think it would be good to use StandardCharsets.UTF_8
where
possible
and unify the rest.

mkgmap-dev mailing list
mkgmap-dev@lists.mkgmap.org.uk
http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev