[PATCH] Bug in label encoding

I think there is a bug in label encoding in Format6Encoder. For some string length the last encoded byte is not stored. E.g. having a string "10007" the encoded byte buffer looks like this [0] [0x86] [1] [0x8] [2] [0x20] [3] [0x9f] [4] [0xf0] The number of stored bytes is 4. So the 0xf0 will not show up in the final image file.

On Mon, Feb 08, 2010 at 12:47:50AM +0100, Ronny Klier wrote:
I think there is a bug in label encoding in Format6Encoder. For some string length the last encoded byte is not stored.
E.g. having a string "10007" the encoded byte buffer looks like this
[0] [0x86] [1] [0x8] [2] [0x20] [3] [0x9f] [4] [0xf0]
The number of stored bytes is 4. So the 0xf0 will not show up in the final image file.
Index: Format6Encoder.java =================================================================== --- Format6Encoder.java (Revision 1541) +++ Format6Encoder.java (Arbeitskopie) @@ -86,7 +86,7 @@
buf = put6(buf, off++, 0xff);
- int len = ((off - 1) * 6) / 8 + 1; + int len = (int)Math.ceil((off * 6) / 8.0);
You can do this with integer math, truncating division. Your example was off=6 (5 chars and the end-of-string code), and I suppose we would get len=4 instead of 5: (6-1)*6 / 8 + 1 = 30/8 + 1 = 3.75 + 1 = 4 If you want to round up to full blocks, the normal trick is to add divisor-1 before dividing, like this: int len = ((off - 1) * 6 + 7) / 8 + 1 = 4.625 + 1 = 5 I don't know if the off-1 and the +1 are correct. An integer version of your formula would also work in this case: int len = (off * 6 + 7) / 8 = 43/8 = 5.375 = 5 This formula is clear to me: it will clearly convert the "off" 6-byte chars (including the end-of-string code) to the number of required 8-bit octets. Best regards, Marko

0> In article <20100208071528.GA11669@x60s>, 0> Marko Mäkelä <URL:mailto:marko.makela@iki.fi> ("Marko") wrote: Marko> An integer version of your formula would also work in this case: Marko> Marko> int len = (off * 6 + 7) / 8 = 43/8 = 5.375 = 5 Marko> Marko> This formula is clear to me: it will clearly convert the "off" Marko> 6-byte chars (including the end-of-string code) to the number of Marko> required 8-bit octets. I'm with Marko here - this integer version is both computationally efficient and clear in its intent. That's the standard idiom for rounded-up division.

On 07/02/10 23:47, Ronny Klier wrote:
I think there is a bug in label encoding in Format6Encoder. For some string length the last encoded byte is not stored.
E.g. having a string "10007" the encoded byte buffer looks like this
[0] [0x86] [1] [0x8] [2] [0x20] [3] [0x9f] [4] [0xf0]
The number of stored bytes is 4. So the 0xf0 will not show up in the final image file.
I believe the code is correct and the 0xf0 is not required and omitted on purpose. A 5 byte string where each character is encoded as 6 bits requires 30 bits of storage which is 3.75 bytes. There is then a end-of-string marker which is written as the 6 bit value 0x3f. However, the end of string marker is actually variable length and as long as the first two bits are written you can drop the rest. So the whole thing fits into 4 bytes which is what is written. The code is this way because originally I did not realise that the string terminator was effectively a two bit quantity ie. if the first two bits are both 1 then the strings ends and you stop reading and throw away any remaining part of the byte. You could probably write 2 bits and then round the byte count up instead of writing 6 and rounding down. ..Steve

Am 08.02.2010 10:46, schrieb Steve Ratcliffe:
On 07/02/10 23:47, Ronny Klier wrote:
I think there is a bug in label encoding in Format6Encoder. For some string length the last encoded byte is not stored.
E.g. having a string "10007" the encoded byte buffer looks like this
[0] [0x86] [1] [0x8] [2] [0x20] [3] [0x9f] [4] [0xf0]
The number of stored bytes is 4. So the 0xf0 will not show up in the final image file.
I believe the code is correct and the 0xf0 is not required and omitted on purpose.
A 5 byte string where each character is encoded as 6 bits requires 30 bits of storage which is 3.75 bytes. There is then a end-of-string marker which is written as the 6 bit value 0x3f. However, the end of string marker is actually variable length and as long as the first two bits are written you can drop the rest. So the whole thing fits into 4 bytes which is what is written.
The code is this way because originally I did not realise that the string terminator was effectively a two bit quantity ie. if the first two bits are both 1 then the strings ends and you stop reading and throw away any remaining part of the byte. You could probably write 2 bits and then round the byte count up instead of writing 6 and rounding down.
..Steve
OK, I got this wrong. I thought the label section could be continously read. Every label ending with 0x3f and next label starting at next byte boundary.
participants (4)
-
Marko Mäkelä
-
Ronny Klier
-
Steve Ratcliffe
-
Toby Speight