Binary OSM file support?

newer
Commit: r1685: Extract more code...

Jeffrey Ollie

8 Sep 2010 8 Sep '10

4:50 p.m.

Now that Osmosis is getting support for a binary format, is there anyone working on getting binary support into the splitter and mkgmap? I could take a look, but I'm not much of a Java hacker... -- Jeff Ollie

Show replies by date

Chris Miller

8 Sep 8 Sep

7:27 p.m.

Hi Jeff, Scott Crosby posted a message to this list back in June asking for comment on his binary format. My thoughts at the time were that it looked rather promising and adding support for it to the splitter would be a good idea, though I was reluctant to use it other than internally if the file format wasn't relatively stable because of the maintenance problems that it might introduce. I haven't been following the discussions on the Osmosis side (and indeed, have had hardly any time to dedicate to the splitter lately either), but it sounds like this file format is now at release candidate status in Osmosis? If so I think that's great news! I can't speak for the other developers on splitter or mkgmap, but personally at least I'd really like to see a revamped splitter and mkgmap that can work with the format directly - there are clearly some huge benefits to be had from going that route. Perhaps Scott, Steve and co can chime in with their thoughts on the topic and where they see things heading from here? Chris

...

Now that Osmosis is getting support for a binary format, is there anyone working on getting binary support into the splitter and mkgmap? I could take a look, but I'm not much of a Java hacker...

Steve Ratcliffe

11:06 p.m.

...

some huge benefits to be had from going that route. Perhaps Scott, Steve and co can chime in with their thoughts on the topic and where they see things heading from here?

I am entirely in favour of supporting the format. Most importantly, it seems to me, in splitter, but in mkgmap too. I'll have more time to spend on development for a while too so, so I would like to get it added soon. ..Steve

Scott Crosby

9 Sep 9 Sep

5:55 p.m.

On Wed, Sep 8, 2010 at 2:27 PM, Chris Miller <chris_overseas@hotmail.com> wrote:

...

Hi Jeff,

Scott Crosby posted a message to this list back in June asking for comment on his binary format. My thoughts at the time were that it looked rather promising and adding support for it to the splitter would be a good idea, though I was reluctant to use it other than internally if the file format wasn't relatively stable because of the maintenance problems that it might introduce.

...

I haven't been following the discussions on the Osmosis side (and indeed, have had hardly any time to dedicate to the splitter lately either), but it sounds like this file format is now at release candidate status in Osmosis?

The format is stable, but I want to release one more RC, with a full validation before I declare it stable. I expect no incompatible changes.

...

If so I think that's great news! I can't speak for the other developers on splitter or mkgmap, but personally at least I'd really like to see a revamped splitter and mkgmap that can work with the format directly - there are clearly some huge benefits to be had from going that route. Perhaps Scott, Steve and co can chime in with their thoughts on the topic and where they see things heading from here?

I have a version of the splitter that reads the binary file format sitting in my local git repository, last used a few days ago. I've been keeping that branch up-to-date with respect to the changes I've made to the binary format. However, that branch split off last June and longer applies cleanly due to the various input handling changes that have happened in the meantime. My repo has a bunch of other stuff that needs to be cleaned up, rebased to trunk, tested, and submitted. I'm not sure when I'll have time, but my repo is public on http://github.com/scrosby/OSM-splitter Scott

12 Sep 12 Sep

1:25 p.m.

Hello Jeffrey, Thanks very much for this. It's likely to be a while before I get a chance to look at and apply these sorry, but hopefully will find some time next weekend. Perhaps in the meantime if you have a current build of the splitter that includes these patches you could make it available for others to test and provide some initial feedback on? The one thing I'm not sure about is the last one with the new thread design - I know it provides a decent performance boost, but I'm a little uneasy with the way it achieves this by using more threads than CPU cores. AFAIK that patch is incidental to the rest of the binary file support though so shouldn't affect this transition. Cheers, Chris

...

Here's a quick rebasing of the patches in Scott's splitter repository to the current splitter trunk. My Java dev system is down for some upgrades at the moment so I haven't even compiled these yet but hopefully I didn't do too bad of a job of resolving the conflicts.

Scott Crosby

2:25 p.m.

My internal repository has several independent development threads. Only one of them is the binary format. The other development threads included patches that are already in and some have not been benchmarked/tested thoroughly. * The binary format. * Improved double writing for XML output. * Alternate threading design. * Avoid sending each node to each osmwriter to do a bbox check. (Depends on some of the binary format refactoring.) * too-many-areas. (already in) * Tag representation (partially in; not benchmarked) Git knows which patch is in which thread. The two critical ones for the binary format are the binary format patches and the ULP patch in the double-writing thread. The patches where I avoid sending each node to each OSM writer improved the Big-O from O(n) to O(sqrt(n))), but the benchmarks seem to be the identical performance. I think things are bottlenecked updating the large shared arrays, which may also be why there's no need to worry about the threading design making one thread per writer; they're all idle waiting for the bottleneck to dribble out stuff to write. Scott On Sun, Sep 12, 2010 at 8:25 AM, Chris Miller <chris_overseas@hotmail.com> wrote:

...

Hello Jeffrey,

Thanks very much for this. It's likely to be a while before I get a chance to look at and apply these sorry, but hopefully will find some time next weekend. Perhaps in the meantime if you have a current build of the splitter that includes these patches you could make it available for others to test and provide some initial feedback on?

The one thing I'm not sure about is the last one with the new thread design - I know it provides a decent performance boost, but I'm a little uneasy with the way it achieves this by using more threads than CPU cores. AFAIK that patch is incidental to the rest of the binary file support though so shouldn't affect this transition.

Cheers, Chris

...
Here's a quick rebasing of the patches in Scott's splitter repository to the current splitter trunk. My Java dev system is down for some upgrades at the moment so I haven't even compiled these yet but hopefully I didn't do too bad of a job of resolving the conflicts.

_______________________________________________ mkgmap-dev mailing list mkgmap-dev@lists.mkgmap.org.uk http://www.mkgmap.org.uk/mailman/listinfo/mkgmap-dev

Steve Ratcliffe

13 Sep 13 Sep

11:06 a.m.

Hi Scott I am adding support for your binary format to mkgmap itself (not the splitter). Now I've done enough refactoring of the XML reader to allow me to start I have a few questions. 1. Is there an final name for the format and the jar file? 2. Is there an 'offical' download location for a pre-built jar file? 3. How do I recognise a file in your format. Is there a conventional file extension for it? Are the OSMHeader and OSMData blocks required file block types? 4. Does the osmosis conversion from XML to binary keep the order of the elements the same as they were in the XML file? ..Steve

Scott Crosby

4:55 p.m.

On Mon, Sep 13, 2010 at 6:06 AM, Steve Ratcliffe <steve@parabola.me.uk> wrote:

...

Hi Scott

I am adding support for your binary format to mkgmap itself (not the splitter).

Excellent! Thank you.

...

Now I've done enough refactoring of the XML reader to allow me to start I have a few questions.

1. Is there an final name for the format and the jar file?

Not yet. There are too many 'osm binary formats'. I asked for suggestions on osm-dev yesterday and got 'protobuf binary format'. Do you have any ideas?

...

2. Is there an 'offical' download location for a pre-built jar file?

No.

...

3. How do I recognise a file in your format. Is there a conventional file extension for it?

Those are good questions; I wish someone had asked them earlier. I have not put in error checking to detect illegal/wrong inputs. I have fixes in my local tree that throw exceptions on very ill-formed inputs. They will be out in my next RC. I have not thought about how one might detect the format. How important is this? For now, try feeding the data to the parser and see if there is an exception? I can add a more robust check with magic at the start of the file, but I won't have time to implement it for a while. I designed a concatenable and streamable format. Magic at the start of a file needs to also be a legal fileblock. I can specify and define such a magic, as the static serialized contents of a '__Magic' fileblock but implementing this may take a little while. I have used *.bin as an extension, but I am open to suggestions.

...

Are the OSMHeader and OSMData blocks required file block types?

I am not sure what you are asking, but yes, both are required. OSMHeader contains HeaderBlock::required_features, which must be examined to confirm that your implementation can parse the file. You may also want the contents of HeaderBlock::bbox.

...

4. Does the osmosis conversion from XML to binary keep the order of the elements the same as they were in the XML file?

Yes. Absolutely. Scott

Johann Gail

7:30 p.m.

...

I have used *.bin as an extension, but I am open to suggestions.

I find *.bin a little general. Most of files are bin. I would suggest instead *.osm.bin A abbreviation of the phrase 'protobuf binary format' would give *.pbf. Both from 'ProtoBuF' and from 'Protobuf Bin Format'. The more I think about it, the more *.osm.pbf sounds good to me.

Steve Ratcliffe

8:36 p.m.

On 13/09/10 17:55, Scott Crosby wrote:

...

Not yet. There are too many 'osm binary formats'. I asked for suggestions on osm-dev yesterday and got 'protobuf binary format'. Do you have any ideas?

Not really, I'm calling it osmprotobuf at the moment...

...

I have not thought about how one might detect the format. How important is this?

Its needed because mkgmap takes all kinds of input files and I don't want the user to have to say what the file format is, I just want mkgmap to work out the file format and use the correct reader. I do this by extension or by reading the beginning of the file and looking for something distinctive. It doesn't have to be fool proof just good enough to tell genuine files apart. Hence why I was asking if OSMHeader and or OSMData would always be present near the beginning of the file, I could look for them.

...

...
4. Does the osmosis conversion from XML to binary keep the order of the elements the same as they were in the XML file?

Yes. Absolutely.

Great, should be easy to test it is working by comparing the resulting files. Thanks ..Steve

Scott Crosby

10:21 p.m.

On Mon, Sep 13, 2010 at 3:36 PM, Steve Ratcliffe <steve@parabola.me.uk> wrote:

...

Its needed because mkgmap takes all kinds of input files and I don't want the user to have to say what the file format is, I just want mkgmap to work out the file format and use the correct reader.

I do this by extension or by reading the beginning of the file and looking for something distinctive. It doesn't have to be fool proof just good enough to tell genuine files apart.

Hence why I was asking if OSMHeader and or OSMData would always be present near the beginning of the file, I could look for them.

Yes, I believe OSMHeader should occur in the first 16 bytes or so. (Just looked at a hexdump. Yes, confirmed, byte offset [6,14]. Furthermore, I believe that byte offsets [4,15] will always be static when the first block is an OSMHeader, however, I cannot guarantee it because I cannot guarantee that a given protocol buffer will always serialize to the same output in the future. Scott

Clinton Gladstone

12 Sep 12 Sep

11:39 a.m.

On Sep 9, 2010, at 19:55, Scott Crosby wrote:

...

The format is stable, but I want to release one more RC, with a full validation before I declare it stable. I expect no incompatible changes.

Is this the binary format in discussion: http://wiki.openstreetmap.org/wiki/OSMbin(file_format) I had trouble finding the right description. There seem to be a few other "binary" format projects, apparently primarily intended for mobile applications. (And the link in Scot's original e-mail to this list is sadly broken.) Cheers.

Scott Crosby

1:10 p.m.

On Sun, Sep 12, 2010 at 6:39 AM, Clinton Gladstone <clinton.gladstone@googlemail.com> wrote:

...

On Sep 9, 2010, at 19:55, Scott Crosby wrote:

...
The format is stable, but I want to release one more RC, with a full validation before I declare it stable. I expect no incompatible changes.

Is this the binary format in discussion:

http://wiki.openstreetmap.org/wiki/OSMbin(file_format)

The original email is: http://www.mail-archive.com/dev@openstreetmap.org/msg11392.html I also just posted up some text on the Wiki with a complete description. http://wiki.openstreetmap.org/wiki/APIbin Scott

5277

Age (days ago)

5282

Last active (days ago)

List overview

27 comments

7 participants

participants (7)

Chris Miller
Clinton Gladstone
Jeffrey C. Ollie
Jeffrey Ollie
Johann Gail
Scott Crosby
Steve Ratcliffe