What's the maximum size of global index MDR size?
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi all, does anybody know the actual limits? The structure of the MDR file encodes offsets with 4 bytes. We can assume those are interpreted as unsigned integers, so 0xffffffff (2^32 ~ 4G) would be the highest possible offset, the section length is also encoded with 4 bytes, so maybe we can even write a correct MDR that is close to 8G if the last section is the largest. The MDR sub file for the PC is stored in the *_mdr.img file which also contains the SRT sub file. As far as I know the *.img file can grow > 4G, so I hope that our routines to write large *.img files are OK. As of now, the --gmapi option doesn't work with an MDR sub file > 2G, the corresponding folder will contain a *.MDR file with 0 bytes. The display tool programs also fail to analyse such a file. I see that both mkgmap and MapSource can handle an *_mdr.img written by mkgmap file that is > 2G as long as the MDR sub file itself is < 2G. Current mkgmap fails with different errors when an offset in the MDR sub file gets > 2G. I started to fix those errors but MapSource crashes with such an index file. My problem: I don't know if it crashes because mkgmap still does something wrong or because MapSource interprets the offset field as a signed int (as mkgmap does so far in some places). With signed int the values > 2G are interpreted as negative values and that cannot work. Is it possible to have an MDR sub file > 2G in an *.img file? If yes I may invest more time to find out what is wrong in mkgmap. If no, we may skip the writing of sections (e.g. Mdr 21 (streets sorted by region) or Mdr 22 (streets sorted by country) or maybe even Mdr 15 (the string table). Has anybody an idea how the string compression of Mdr15 section might work? Gerd
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi all, for now I've added a check to stop if the MDR subfile gets > 2G. My findings reg. string compression in MDR 15: If Mdr15 is compressed I also find an MDR 16 section which is rather small. My first idea was that MDR15 might be the content of a zipped file, but offsets into MDR15 never exceed the size of the MDR15 section, so it's more likely that something simple like Huffman encoding is used and MDR 16 contains further data (the frequencies of the Huffman tree?) Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <GPetermann_muenchen@hotmail.com> Gesendet: Samstag, 11. Dezember 2021 11:06 An: mkgmap-dev@lists.mkgmap.org.uk Betreff: [mkgmap-dev] What's the maximum size of global index MDR size? Hi all, does anybody know the actual limits? The structure of the MDR file encodes offsets with 4 bytes. We can assume those are interpreted as unsigned integers, so 0xffffffff (2^32 ~ 4G) would be the highest possible offset, the section length is also encoded with 4 bytes, so maybe we can even write a correct MDR that is close to 8G if the last section is the largest. The MDR sub file for the PC is stored in the *_mdr.img file which also contains the SRT sub file. As far as I know the *.img file can grow > 4G, so I hope that our routines to write large *.img files are OK. As of now, the --gmapi option doesn't work with an MDR sub file > 2G, the corresponding folder will contain a *.MDR file with 0 bytes. The display tool programs also fail to analyse such a file. I see that both mkgmap and MapSource can handle an *_mdr.img written by mkgmap file that is > 2G as long as the MDR sub file itself is < 2G. Current mkgmap fails with different errors when an offset in the MDR sub file gets > 2G. I started to fix those errors but MapSource crashes with such an index file. My problem: I don't know if it crashes because mkgmap still does something wrong or because MapSource interprets the offset field as a signed int (as mkgmap does so far in some places). With signed int the values > 2G are interpreted as negative values and that cannot work. Is it possible to have an MDR sub file > 2G in an *.img file? If yes I may invest more time to find out what is wrong in mkgmap. If no, we may skip the writing of sections (e.g. Mdr 21 (streets sorted by region) or Mdr 22 (streets sorted by country) or maybe even Mdr 15 (the string table). Has anybody an idea how the string compression of Mdr15 section might work? Gerd _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd In the example with the compressed strings, did it have Mdr30/31 and/or Mdr32/33? Ticker On Sun, 2021-12-12 at 11:07 +0000, Gerd Petermann wrote:
Hi all,
for now I've added a check to stop if the MDR subfile gets > 2G. My findings reg. string compression in MDR 15: If Mdr15 is compressed I also find an MDR 16 section which is rather small. My first idea was that MDR15 might be the content of a zipped file, but offsets into MDR15 never exceed the size of the MDR15 section, so it's more likely that something simple like Huffman encoding is used and MDR 16 contains further data (the frequencies of the Huffman tree?)
Gerd
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I have 3 maps with compressed strings. Only one (Adria Topo) has these sections filled. I think those sections are not related to MDR 15. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Sonntag, 12. Dezember 2021 15:31 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size? Hi Gerd In the example with the compressed strings, did it have Mdr30/31 and/or Mdr32/33? Ticker On Sun, 2021-12-12 at 11:07 +0000, Gerd Petermann wrote:
Hi all,
for now I've added a check to stop if the MDR subfile gets > 2G. My findings reg. string compression in MDR 15: If Mdr15 is compressed I also find an MDR 16 section which is rather small. My first idea was that MDR15 might be the content of a zipped file, but offsets into MDR15 never exceed the size of the MDR15 section, so it's more likely that something simple like Huffman encoding is used and MDR 16 contains further data (the frequencies of the Huffman tree?)
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi all, yes, Mdr16 obviously contains some kind of codebook that is used to decode the strings in Mdr15. I've modified the code to write a copy of the MDR 16 section that I found in the Adria Topo map and changed Mdr15 code so that it always returns 1 as string offset and copies the first 24 bytes of the Mdr15 section. The original index contains offsets 1,11,16,20,24,27,... into Mdr15 so the first string seems quite long compared to the next ones. If created a map that contains a single city POI with the name abc and MapSource displays Abc as expected (did not use --lower-case) Next I start to search for cities. No matter what character I type in the city name field the string that is displayed is "Baca Pri Podbrdu" (not sure about Upper/Lowercase yet) When I change the code to return offset 11 instead of 1 the string "Bavsica" is displayed instead of "Baca Pri Podbrdu" These strings can be found in the Adria Topo map. When I modify the content of the MDR 16 table, esp. the last byte, I see different strings displayed. E.g. when I change the last byte 0x42 ('B') to 0x58 ('X') the string "Baca Pri Podbrdu" changes to "Xaca Pri Podxrdu" So, I am now trying to understand the meaning of that table. It is different in the three maps, all use codepage 1252 and the MDR 16 sizes are 165, 177 and 212. It's very likely that the content depends on the frequency of the characters in the uncompressed MDR 15 file. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Gerd Petermann <gpetermann_muenchen@hotmail.com> Gesendet: Sonntag, 12. Dezember 2021 15:36 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size? Hi Ticker, I have 3 maps with compressed strings. Only one (Adria Topo) has these sections filled. I think those sections are not related to MDR 15. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Sonntag, 12. Dezember 2021 15:31 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size? Hi Gerd In the example with the compressed strings, did it have Mdr30/31 and/or Mdr32/33? Ticker On Sun, 2021-12-12 at 11:07 +0000, Gerd Petermann wrote:
Hi all,
for now I've added a check to stop if the MDR subfile gets > 2G. My findings reg. string compression in MDR 15: If Mdr15 is compressed I also find an MDR 16 section which is rather small. My first idea was that MDR15 might be the content of a zipped file, but offsets into MDR15 never exceed the size of the MDR15 section, so it's more likely that something simple like Huffman encoding is used and MDR 16 contains further data (the frequencies of the Huffman tree?)
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd Does Mdr16 look like some form of a flattened Huffman tree? A common representation is a stream of bits describing the shape and a stream of chars giving the contents. If you send me an example + the little bit at the start of Mdr15 you've been using I'll have a look. Ticker On Sun, 2021-12-12 at 16:20 +0000, Gerd Petermann wrote:
Hi all,
yes, Mdr16 obviously contains some kind of codebook that is used to decode the strings in Mdr15.
I've modified the code to write a copy of the MDR 16 section that I found in the Adria Topo map and changed Mdr15 code so that it always returns 1 as string offset and copies the first 24 bytes of the Mdr15 section. The original index contains offsets 1,11,16,20,24,27,... into Mdr15 so the first string seems quite long compared to the next ones. If created a map that contains a single city POI with the name abc and MapSource displays Abc as expected (did not use --lower-case) Next I start to search for cities. No matter what character I type in the city name field the string that is displayed is "Baca Pri Podbrdu" (not sure about Upper/Lowercase yet) When I change the code to return offset 11 instead of 1 the string "Bavsica" is displayed instead of "Baca Pri Podbrdu"
These strings can be found in the Adria Topo map.
When I modify the content of the MDR 16 table, esp. the last byte, I see different strings displayed. E.g. when I change the last byte 0x42 ('B') to 0x58 ('X') the string "Baca Pri Podbrdu" changes to "Xaca Pri Podxrdu" So, I am now trying to understand the meaning of that table. It is different in the three maps, all use codepage 1252 and the MDR 16 sizes are 165, 177 and 212. It's very likely that the content depends on the frequency of the characters in the uncompressed MDR 15 file.
Gerd
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, attached is my experimental code patch for mkgmap. It changes mkgmap to write the MDR 16 section ad MDR15 section with compression set to 1 in the MDR header. The Mdr15 content are the first 32 bytes of the original table. I use it with the attached OSM file and options --index --gmapi --code-page=1252 Hope it helps. Maybe you can find the download link to the adria map somewhere in the archive, my downloaded file is called "AdriaTOPO 2.40 HR.exe" Unfortunately my links to Nabble no longer work. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 13. Dezember 2021 11:08 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size? Hi Gerd Does Mdr16 look like some form of a flattened Huffman tree? A common representation is a stream of bits describing the shape and a stream of chars giving the contents. If you send me an example + the little bit at the start of Mdr15 you've been using I'll have a look. Ticker On Sun, 2021-12-12 at 16:20 +0000, Gerd Petermann wrote:
Hi all,
yes, Mdr16 obviously contains some kind of codebook that is used to decode the strings in Mdr15.
I've modified the code to write a copy of the MDR 16 section that I found in the Adria Topo map and changed Mdr15 code so that it always returns 1 as string offset and copies the first 24 bytes of the Mdr15 section. The original index contains offsets 1,11,16,20,24,27,... into Mdr15 so the first string seems quite long compared to the next ones. If created a map that contains a single city POI with the name abc and MapSource displays Abc as expected (did not use --lower-case) Next I start to search for cities. No matter what character I type in the city name field the string that is displayed is "Baca Pri Podbrdu" (not sure about Upper/Lowercase yet) When I change the code to return offset 11 instead of 1 the string "Bavsica" is displayed instead of "Baca Pri Podbrdu"
These strings can be found in the Adria Topo map.
When I modify the content of the MDR 16 table, esp. the last byte, I see different strings displayed. E.g. when I change the last byte 0x42 ('B') to 0x58 ('X') the string "Baca Pri Podbrdu" changes to "Xaca Pri Podxrdu" So, I am now trying to understand the meaning of that table. It is different in the three maps, all use codepage 1252 and the MDR 16 sizes are 165, 177 and 212. It's very likely that the content depends on the frequency of the characters in the uncompressed MDR 15 file.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/968e2/968e263046578ab884b00b63dcd9f38a68e6de01" alt=""
Hi Gerd I've build and examined this. I assume that the table came from a map that used most of the available characters, so not seeing these precludes the theory of it being flattened shape/contents lists in any simple representation. Can you tell if the original map was mixed case? If it wasn't then it can't be a simple normalized frequency count as there would be an area of zeros for the lower case letters. Ticker On Mon, 2021-12-13 at 10:25 +0000, Gerd Petermann wrote:
Hi Ticker,
attached is my experimental code patch for mkgmap. It changes mkgmap to write the MDR 16 section ad MDR15 section with compression set to 1 in the MDR header. The Mdr15 content are the first 32 bytes of the original table.
I use it with the attached OSM file and options --index --gmapi -- code-page=1252
Hope it helps. Maybe you can find the download link to the adria map somewhere in the archive, my downloaded file is called "AdriaTOPO 2.40 HR.exe" Unfortunately my links to Nabble no longer work.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 13. Dezember 2021 11:08 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size?
Hi Gerd
Does Mdr16 look like some form of a flattened Huffman tree? A common representation is a stream of bits describing the shape and a stream of chars giving the contents.
If you send me an example + the little bit at the start of Mdr15 you've been using I'll have a look.
Ticker
On Sun, 2021-12-12 at 16:20 +0000, Gerd Petermann wrote:
Hi all,
yes, Mdr16 obviously contains some kind of codebook that is used to decode the strings in Mdr15.
I've modified the code to write a copy of the MDR 16 section that I found in the Adria Topo map and changed Mdr15 code so that it always returns 1 as string offset and copies the first 24 bytes of the Mdr15 section. The original index contains offsets 1,11,16,20,24,27,... into Mdr15 so the first string seems quite long compared to the next ones. If created a map that contains a single city POI with the name abc and MapSource displays Abc as expected (did not use --lower-case) Next I start to search for cities. No matter what character I type in the city name field the string that is displayed is "Baca Pri Podbrdu" (not sure about Upper/Lowercase yet) When I change the code to return offset 11 instead of 1 the string "Bavsica" is displayed instead of "Baca Pri Podbrdu"
These strings can be found in the Adria Topo map.
When I modify the content of the MDR 16 table, esp. the last byte, I see different strings displayed. E.g. when I change the last byte 0x42 ('B') to 0x58 ('X') the string "Baca Pri Podbrdu" changes to "Xaca Pri Podxrdu" So, I am now trying to understand the meaning of that table. It is different in the three maps, all use codepage 1252 and the MDR 16 sizes are 165, 177 and 212. It's very likely that the content depends on the frequency of the characters in the uncompressed MDR 15 file.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
data:image/s3,"s3://crabby-images/f0134/f0134b5004a2a90c1324ff9331e4ce1f20ff1c83" alt=""
Hi Ticker, I am pretty sure that the original data doesn't contain lower case characters. The characters B and b in name "Baca Pri Podbrdu" both change when the last byte in MDR16 (0x42) is changed. I've looked at several suggestions how the tree could be stored, nothing seems to match the pattern in MDR 16 so far. Of course we cannot even be sure that huffman encoding is used. Gerd ________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 13. Dezember 2021 13:14 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size? Hi Gerd I've build and examined this. I assume that the table came from a map that used most of the available characters, so not seeing these precludes the theory of it being flattened shape/contents lists in any simple representation. Can you tell if the original map was mixed case? If it wasn't then it can't be a simple normalized frequency count as there would be an area of zeros for the lower case letters. Ticker On Mon, 2021-12-13 at 10:25 +0000, Gerd Petermann wrote:
Hi Ticker,
attached is my experimental code patch for mkgmap. It changes mkgmap to write the MDR 16 section ad MDR15 section with compression set to 1 in the MDR header. The Mdr15 content are the first 32 bytes of the original table.
I use it with the attached OSM file and options --index --gmapi -- code-page=1252
Hope it helps. Maybe you can find the download link to the adria map somewhere in the archive, my downloaded file is called "AdriaTOPO 2.40 HR.exe" Unfortunately my links to Nabble no longer work.
Gerd
________________________________________ Von: mkgmap-dev <mkgmap-dev-bounces@lists.mkgmap.org.uk> im Auftrag von Ticker Berkin <rwb-mkgmap@jagit.co.uk> Gesendet: Montag, 13. Dezember 2021 11:08 An: Development list for mkgmap Betreff: Re: [mkgmap-dev] What's the maximum size of global index MDR size?
Hi Gerd
Does Mdr16 look like some form of a flattened Huffman tree? A common representation is a stream of bits describing the shape and a stream of chars giving the contents.
If you send me an example + the little bit at the start of Mdr15 you've been using I'll have a look.
Ticker
On Sun, 2021-12-12 at 16:20 +0000, Gerd Petermann wrote:
Hi all,
yes, Mdr16 obviously contains some kind of codebook that is used to decode the strings in Mdr15.
I've modified the code to write a copy of the MDR 16 section that I found in the Adria Topo map and changed Mdr15 code so that it always returns 1 as string offset and copies the first 24 bytes of the Mdr15 section. The original index contains offsets 1,11,16,20,24,27,... into Mdr15 so the first string seems quite long compared to the next ones. If created a map that contains a single city POI with the name abc and MapSource displays Abc as expected (did not use --lower-case) Next I start to search for cities. No matter what character I type in the city name field the string that is displayed is "Baca Pri Podbrdu" (not sure about Upper/Lowercase yet) When I change the code to return offset 11 instead of 1 the string "Bavsica" is displayed instead of "Baca Pri Podbrdu"
These strings can be found in the Adria Topo map.
When I modify the content of the MDR 16 table, esp. the last byte, I see different strings displayed. E.g. when I change the last byte 0x42 ('B') to 0x58 ('X') the string "Baca Pri Podbrdu" changes to "Xaca Pri Podxrdu" So, I am now trying to understand the meaning of that table. It is different in the three maps, all use codepage 1252 and the MDR 16 sizes are 165, 177 and 212. It's very likely that the content depends on the frequency of the characters in the uncompressed MDR 15 file.
Gerd
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev _______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk https://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev
participants (3)
-
Gerd Petermann
-
Gerd Petermann
-
Ticker Berkin