Splitter: Possible bug: not escaping '>' with '>'

Hi, Possible bug in the splitter, the '>' character is not escaped in output, i.e.: <way id='30408924'> Original tag: <tag k="name" v="Zakynthos <> Kyllini"/> Splitter output: <tag k='name' v='Zakynthos <> Kyllini'/> This does not break mkgmap, but breaks my poorly written parser... And is not according to XML specs, i.e. should be escaped with '>' Thanks, Ivan

Hi Ivan, Yep that sounds like a bug all right. I haven't looked at the code yet but it's interesting that < is OK but > isn't... Thanks for the report, I've put it on my todo list. Chris IK> Hi, IK> IK> Possible bug in the splitter, the '>' character is not escaped in IK> output, i.e.: IK> IK> <way id='30408924'> IK> Original tag: IK> <tag k="name" v="Zakynthos <> Kyllini"/> IK> Splitter output: IK> <tag k='name' v='Zakynthos <> Kyllini'/> IK> This does not break mkgmap, but breaks my poorly written parser... IK> And is not according to XML specs, i.e. should be escaped with IK> '>' IK> IK> Thanks, IK> Ivan

On 22/10/09 10:27, Chris Miller wrote:
Yep that sounds like a bug all right. I haven't looked at the code yet but it's interesting that< is OK but> isn't... Thanks for the report, I've put it on my todo list.
No, the '>' is a valid character inside the attribute value. Only '&', '<' and the enclosing quote character need to be escaped. It would break mkgmap if it was invalid.
IK> <tag k='name' v='Zakynthos<> Kyllini'/>
..Steve

On Thu, Oct 22, 2009 at 1:16 PM, Steve Ratcliffe <steve@parabola.me.uk>wrote:
On 22/10/09 10:27, Chris Miller wrote:
Yep that sounds like a bug all right. I haven't looked at the code yet but it's interesting that< is OK but> isn't... Thanks for the report, I've put it on my todo list.
No, the '>' is a valid character inside the attribute value. Only '&', '<' and the enclosing quote character need to be escaped.
Please allow me to disagree, at least on definition... Quote from http://www.w3.org/TR/REC-xml/ --- The ampersand character (&) and the left angle bracket (<) *MUST NOT* appear in their literal form, except when used as markup delimiters, or within a comment <http://www.w3.org/TR/REC-xml/#dt-comment>, a processing instruction<http://www.w3.org/TR/REC-xml/#dt-pi>, or a CDATA section <http://www.w3.org/TR/REC-xml/#dt-cdsection>. If they are needed elsewhere, they *MUST* be escaped<http://www.w3.org/TR/REC-xml/#dt-escape>using either numeric character references <http://www.w3.org/TR/REC-xml/#dt-charref> or the strings " & " and " < " respectively. The right angle bracket (>)* *may be represented using the string " > ", and *MUST*, for compatibility<http://www.w3.org/TR/REC-xml/#dt-compat>, be escaped using either " > " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section <http://www.w3.org/TR/REC-xml/#dt-cdsection>. --- So in any XML document (i.e. OSM data), it MUST be escaped to be valid XML. As I mentioned, it does not break mkgmap so it is not a big issue... In the meantime, I have worked around it by better tracking quotes. I just reported what I think it is a possible bug... Thank you, Ivan
It would break mkgmap if it was invalid.
IK> <tag k='name' v='Zakynthos<> Kyllini'/>
..Steve
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Note that the use of the word "may" means escaping it is optional. The bit that discusses "must" is only referring to its use within the substring "]]>" within a CDATA section (which makes perfect sense). So, sounds like the splitter behaviour is indeed within spec. IK> The right angle bracket (>) may be represented using the string " IK> > ",and MUST, for compatibility, be escaped using either " > " IK> or a character reference when it appears in the string " ]]> " in IK> content, when that string is not marking the end of a CDATA section.

On Thu, Oct 22, 2009 at 2:34 PM, Chris Miller <chris.miller@kbcfp.com>wrote:
Note that the use of the word "may" means escaping it is optional. The bit that discusses "must" is only referring to its use within the substring "]]>" within a CDATA section (which makes perfect sense). So, sounds like the splitter behaviour is indeed within spec.
Ok. I am sorry as I misunderstood the spec. I would just made my life easier if it is escaped. Thanks for clarification. Ivan.

Hi Ivan
No, the '>' is a valid character inside the attribute value. Only '&', '<' and the enclosing quote character need to be escaped.
Please allow me to disagree, at least on definition...
Quote from http://www.w3.org/TR/REC-xml/ --- The ampersand character (&) and the left angle bracket (<) /MUST NOT/ appear in their literal form, except when used as markup delimiters, or within a comment <http://www.w3.org/TR/REC-xml/#dt-comment>, a processing instruction <http://www.w3.org/TR/REC-xml/#dt-pi>, or a CDATA section <http://www.w3.org/TR/REC-xml/#dt-cdsection>. If they are needed elsewhere, they /MUST/ be escaped <http://www.w3.org/TR/REC-xml/#dt-escape> using either numeric character references <http://www.w3.org/TR/REC-xml/#dt-charref> or the strings " |&| " and " |<| " respectively. The right angle bracket (>)* *may be represented using the string " |>| ", and /MUST/, for compatibility <http://www.w3.org/TR/REC-xml/#dt-compat>, be escaped using either " |>| " or a character reference when it appears in the string " |]]>| " in content, when that string is not marking the end of a CDATA section <http://www.w3.org/TR/REC-xml/#dt-cdsection>.
How does that differ from what I said? Any & and < must be escaped. The
character does not need to be escaped. The exception for " ]]>" does not apply to attribute values because, I believe, they are not "content" and this would not apply for the given example in any case.
<tag k='name' v='Zakynthos<> Kyllini'/> ..Steve
participants (3)
-
Chris Miller
-
Ivan Kostoski
-
Steve Ratcliffe