data:image/s3,"s3://crabby-images/c5978/c59786c096da1e4cdc11523b0019dec5fbb40792" alt=""
Hi, when there is a CR in the input data (coded as " " or " " ) splitter writes a real CR (ascii 13) to the output file. Altough this is still legal XML, better IMHO is to keep the code sequence. Chris
data:image/s3,"s3://crabby-images/802f4/802f43eb70afc2c91d48f43edac9b0f56b0ec4a4" alt=""
Hi On 04/10/12 16:46, Chris66 wrote:
when there is a CR in the input data (coded as " " or " " ) splitter writes a real CR (ascii 13) to the output file.
Altough this is still legal XML, better IMHO is to keep the code sequence.
After a lot of research I now think there is a bug here. Since most of the information on the net appears to be incorrect or incomplete it is worth explaining. In an attribute value the characters 0xA 0xD and 0x9 are all valid characters in xml attribute values, however they are not preserved on reading and are all replaced with a space character. So for example: v="hello world" Means the same as: v="hello world" So to preserve the input data, those three characters must be encoded as character references in attribute values. This is only true of attribute values, if a " " occurs somewhere else in the file, where it does not need to be, then we can not preserve that on output (and there would be no advantage in doing so if we could). See: http://www.w3.org/TR/xml/#AVNormalize Also: http://recycledknowledge.blogspot.co.uk/2006/03/writing-out-xml.html (but ignore the comment, which I believe is incorrect). Attached is a patch that implements this. ..Steve
data:image/s3,"s3://crabby-images/c125b/c125b853f0995d45aaac92eceb3ca5c1f81f52f5" alt=""
Hi Steve, On Sat, Oct 13, 2012 at 01:16:35PM +0100, Steve Ratcliffe wrote:
+ case '\n': + writeString(" "); + break; + case '\r': + writeString(" "); + break;
It looks like you got these twisted around. CR is U+000D and LF is U+000A. Marko
data:image/s3,"s3://crabby-images/c5978/c59786c096da1e4cdc11523b0019dec5fbb40792" alt=""
Am 13.10.2012 23:51, schrieb Steve Ratcliffe:
Hi
It looks like you got these twisted around. CR is U+000D and LF is U+000A.
Oops! well spotted, thanks!
..Steve
Hi, thank's for the patch. From my point of view it can be committed. Chris
participants (3)
-
Chris66
-
Marko Mäkelä
-
Steve Ratcliffe