bad file format error

Hi, During the run of mkgmap java -Xmx768M -jar mkgmap.jar --latin1 --mapname="09030406" cartes/paris.osm I got that message : Error at line 148424, col 84 Bad file format: cartes/paris.osm Looking at that line, it seems it's the "é" of "josé" that gives that error. <node id='94135255' lat='48.8761986' lon='2.423346' user='nitrix' xapi:users='josé,josém,Esperanza36,Charlie Echo,Hardangels,nitrix' timestamp='2009-05-17T09:53:48Z' uid='125299' version='28' changeset='1219222'> Francois --

0> In article <4A105A9C.3080303@free.fr>, 0> frmas <URL:mailto:frmas@free.fr> ("Frmas") wrote: Frmas> I got that message : Frmas> Frmas> Error at line 148424, col 84 Frmas> Bad file format: cartes/paris.osm Frmas> Frmas> Looking at that line, it seems it's the "é" of "josé" that gives Frmas> that error. Frmas> Frmas> <node id='94135255' lat='48.8761986' lon='2.423346' user='nitrix' Frmas> xapi:users='josé,josém,Esperanza36,Charlie Echo,Hardangels,nitrix' Frmas> timestamp='2009-05-17T09:53:48Z' uid='125299' version='28' Frmas> changeset='1219222'> What encoding is the file? It looks like there's a mix of Latin-1 and UTF-8 in that source file. How was it generated?

Toby Speight a écrit :
Frmas> Error at line 148424, col 84 Frmas> Bad file format: cartes/paris.osm Frmas> Frmas> Looking at that line, it seems it's the "é" of "josé" that gives Frmas> that error. Frmas> Frmas> <node id='94135255' lat='48.8761986' lon='2.423346' user='nitrix' Frmas> xapi:users='josé,josém,Esperanza36,Charlie Echo,Hardangels,nitrix' Frmas> timestamp='2009-05-17T09:53:48Z' uid='125299' version='28' Frmas> changeset='1219222'>
What encoding is the file? It looks like there's a mix of Latin-1 and UTF-8 in that source file. How was it generated?
A script I ran in the console. "locale -a" gives me : fr_FR.utf8 But that's the first time I got such a problem. Francois

0> In article <4A105A9C.3080303@free.fr>, 0> frmas <URL:mailto:frmas@free.fr> ("Frmas") wrote: Frmas> Error at line 148424, col 84 Frmas> Bad file format: cartes/paris.osm 0> In article <87ljovy2k0.fsf@balti.rawlyn.homeip.net>, 0> Toby Speight <URL:mailto:T.M.Speight.90@cantab.net> ("Toby") wrote: Toby> What encoding is the file? It looks like there's a mix of Latin-1 and Toby> UTF-8 in that source file. How was it generated? I forgot to say: it may help to see the bytes of the file. Try /-------- | head cartes/paris.osm | grep -F '<?xml' \-------- to see what encoding the XML claims to be, and /-------- | sed -n -e '148424{' -e 'p' -e 'q' -e '}' cartes/paris.osm | od -t x1c \-------- to view the problematic part.

Toby Speight a écrit :
I forgot to say: it may help to see the bytes of the file. Try
/-------- | head cartes/paris.osm | grep -F '<?xml' \--------
This is it : head cartes/paris.osm | grep -F '<?xml' <?xml version='1.0' standalone='no'?>
/-------- | sed -n -e '148424{' -e 'p' -e 'q' -e '}' cartes/paris.osm | od -t x1c \--------
to view the problematic part.
As the result could be malformated, see file attached. Francois 0000000 20 20 3c 6e 6f 64 65 20 69 64 3d 27 39 34 31 33 < n o d e i d = ' 9 4 1 3 0000020 35 32 35 35 27 20 6c 61 74 3d 27 34 38 2e 38 37 5 2 5 5 ' l a t = ' 4 8 . 8 7 0000040 36 31 39 38 36 27 20 6c 6f 6e 3d 27 32 2e 34 32 6 1 9 8 6 ' l o n = ' 2 . 4 2 0000060 33 33 34 36 27 20 75 73 65 72 3d 27 6e 69 74 72 3 3 4 6 ' u s e r = ' n i t r 0000100 69 78 27 20 78 61 70 69 3a 75 73 65 72 73 3d 27 i x ' x a p i : u s e r s = ' 0000120 6a 6f 73 e9 2c 6a 6f 73 c3 a9 6d 2c 45 73 70 65 j o s 351 , j o s 303 251 m , E s p e 0000140 72 61 6e 7a 61 33 36 2c 43 68 61 72 6c 69 65 20 r a n z a 3 6 , C h a r l i e 0000160 45 63 68 6f 2c 48 61 72 64 61 6e 67 65 6c 73 2c E c h o , H a r d a n g e l s , 0000200 6e 69 74 72 69 78 27 20 74 69 6d 65 73 74 61 6d n i t r i x ' t i m e s t a m 0000220 70 3d 27 32 30 30 39 2d 30 35 2d 31 37 54 30 39 p = ' 2 0 0 9 - 0 5 - 1 7 T 0 9 0000240 3a 35 33 3a 34 38 5a 27 20 75 69 64 3d 27 31 32 : 5 3 : 4 8 Z ' u i d = ' 1 2 0000260 35 32 39 39 27 20 76 65 72 73 69 6f 6e 3d 27 32 5 2 9 9 ' v e r s i o n = ' 2 0000300 38 27 20 63 68 61 6e 67 65 73 65 74 3d 27 31 32 8 ' c h a n g e s e t = ' 1 2 0000320 31 39 32 32 32 27 3e 0d 0a 1 9 2 2 2 ' > \r \n 0000331

On Sun, May 17, 2009 at 09:22:32PM +0200, frmas wrote:
0000120 6a 6f 73 e9 2c 6a 6f 73 c3 a9 6d 2c 45 73 70 65 j o s 351 , j o s 303 251 m , E s p e
The first é is in ISO 8859-1 and the second is in UTF-8. Furthermore, the <?xml> header lacks the encoding attribute. I don't know XML that well, but I would tend to believe that UTF-8 is the default. Thus, it seems to me that mkgmap rightfully complains about the non-UTF-8 octet 0xe9 (0351 octal). Marko

0> In article <4A1063F8.8080209@free.fr>, 0> frmas <URL:mailto:frmas@free.fr> ("Frmas") wrote: Frmas> <?xml version='1.0' standalone='no'?> Okay, so the XML is UTF-8 (since no coding is specified).
/-------- | sed -n -e '148424{' -e 'p' -e 'q' -e '}' cartes/paris.osm | od -t x1c \--------
to view the problematic part.
Frmas> 0000120 6a 6f 73 e9 2c 6a 6f 73 c3 a9 6d 2c 45 73 70 Frmas> j o s 351 , j o s 303 251 m , E s p That's the problem: E9 2C isn't a valid UTF-8 sequence - it looks like that part has been encoded as Latin-1. The later sequence C3 A9 is okay: a valid UTF-8 2-byte sequence. So mkgmap is right to reject the file; now the question is, how did that malformed sequence get in there in the first place? Could your script have written it, or has it come from (say) the planet.osm file (or a subset cut from it)?

Toby Speight a écrit :
That's the problem: E9 2C isn't a valid UTF-8 sequence - it looks like that part has been encoded as Latin-1. The later sequence C3 A9 is okay: a valid UTF-8 2-byte sequence.
So mkgmap is right to reject the file; now the question is, how did that malformed sequence get in there in the first place? Could your script have written it, or has it come from (say) the planet.osm file (or a subset cut from it)?
I checked in the file, and there are many occurances of that "josé", but "josé", is the only malformed sequence of the whole file. I downloaded 11 files from informationfreeway.org. This is the only problem (that é) I got from more than 400 megabytes of datas. This is the part of the script that grab the datas for that Paris area: curl -L "http://www.informationfreeway.org/api/0.6/map?bbox=2.110,48.736,2.540,48.995" -o cartes/paris.osm Francois

Toby Speight a écrit : Hello,
So mkgmap is right to reject the file; now the question is, how did that malformed sequence get in there in the first place? Could your script have written it, or has it come from (say) the planet.osm file (or a subset cut from it)?
Is it possible for someone to grab the following data (the .osm file produced is just 1259178 bytes). This is the command line I used : curl -L "http://www.informationfreeway.org/api/0.6/map?bbox=2.3769,48.8421,2.4218,48...." -o paris.osm Then : java -Xmx768M -jar mkgmap.jar --latin1 --mapname="09030406" paris.osm If it gives : Error at line 2539, col 92 Bad file format: paris.osm Then the problem is from "planet osm", if not, it's a problem with my Linux box. Thanks. Francois

Hi! I just downloaded it and got the same error here! So, it's not your box... Sending it through xmllint you get the following error: $ xmllint paris.osm paris.osm:2539: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x2C 0x6A 0x6F '212612912' lat='48.8643095' lon='2.4127779' user='Charlie Echo' xapi:users='jos ^ There seem to be many of those... Regards, Andre -------- Original-Nachricht --------
Datum: Mon, 18 May 2009 19:11:03 +0200 Von: frmas <frmas@free.fr> An: Development list for mkgmap <mkgmap-dev@lists.mkgmap.org.uk> Betreff: Re: [mkgmap-dev] Re: bad file format error
Toby Speight a écrit : Hello,
So mkgmap is right to reject the file; now the question is, how did that malformed sequence get in there in the first place? Could your script have written it, or has it come from (say) the planet.osm file (or a subset cut from it)?
Is it possible for someone to grab the following data (the .osm file produced is just 1259178 bytes).
This is the command line I used :
curl -L "http://www.informationfreeway.org/api/0.6/map?bbox=2.3769,48.8421,2.4218,48...." -o paris.osm
Then : java -Xmx768M -jar mkgmap.jar --latin1 --mapname="09030406" paris.osm
If it gives : Error at line 2539, col 92 Bad file format: paris.osm
Then the problem is from "planet osm", if not, it's a problem with my Linux box. Thanks. Francois
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
-- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a

Andre Hinrichs a écrit : Hi,
I just downloaded it and got the same error here! So, it's not your box...
Thank you Andre
Sending it through xmllint you get the following error:
$ xmllint paris.osm paris.osm:2539: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x2C 0x6A 0x6F '212612912' lat='48.8643095' lon='2.4127779' user='Charlie Echo' xapi:users='jos ^
There seem to be many of those...
Yeap and if you take a larger area, you will get dozen and dozen of those. So now the question : how do we deal with that? This makes mkgmap to reject the whole file in such situation. As this "José" has mapped different parts of Paris, now, it's impossible to compile Paris itself and its suburbs :-) . Francois

Hi, I've downloaded the same area from api.openstreetmap.org and everything was fine there. It seems that the new xapi implementation for 0.6 is now stable yet. Thus, I don't think this is a mkgmap issue. Some people working on xapi should be informed about this. Don't know which mailing list is the right place therefor... Maybe, someone else knows it... Regards, Andre Am Montag, den 18.05.2009, 20:18 +0200 schrieb frmas:
Andre Hinrichs a écrit : Hi,
I just downloaded it and got the same error here! So, it's not your box...
Thank you Andre
Sending it through xmllint you get the following error:
$ xmllint paris.osm paris.osm:2539: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE9 0x2C 0x6A 0x6F '212612912' lat='48.8643095' lon='2.4127779' user='Charlie Echo' xapi:users='jos ^
There seem to be many of those...
Yeap and if you take a larger area, you will get dozen and dozen of those. So now the question : how do we deal with that? This makes mkgmap to reject the whole file in such situation. As this "José" has mapped different parts of Paris, now, it's impossible to compile Paris itself and its suburbs :-) . Francois _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
participants (4)
-
Andre Hinrichs
-
frmas
-
Marko Mäkelä
-
Toby Speight